OSD #105

ricardomga · 2018-02-27T16:38:47Z

Hello,
Is it possible to use psm 0 to get the osd information? I am geting an error doing this.

The code

pytesseract.image_to_string(
            img,
            lang='por',
            config='--tessdata-dir "./tessdata/" -psm 0',
            output_type='dict'
)

The error

pytesseract.pytesseract.TesseractError: (1, "read_params_file: Can't open 0 read_params_file: Can't open txt Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica Warning. Invalid resolution 0 dpi. Using 70 instead. Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.")

The text was updated successfully, but these errors were encountered:

bozhodimitrov · 2018-02-27T16:47:58Z

Did you first tried the same CLI command equivalent in terminal/shell?
As far as I can tell, the documentation specifies --psm NUM format instead -psm.
Can you try it?
If this is the case, I will updated the documented line in the README file.

PS: Also for the output_type, try to use the pytesseract.Output class attributes instead of hard-coding it, because the notation can change in the future and this can break your code. :)

ricardomga · 2018-02-27T17:19:16Z

Thank you for the help in advance.
You were right. Now it gives me the following error:

line 116, in run_tesseract
    raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (3221225477, '')

I think is because --psm 0 creates an .osd file instead of .txt file

bozhodimitrov · 2018-02-27T17:40:09Z

That's correct, but what is the raw output message from the tesseract command itself?
Can you run it and let us know what's the result.

I guess that we need additional function/s (or functionality) for the different PSM modes, since the output format is not txt.

ricardomga · 2018-02-27T17:45:12Z

The tesseract command:

tesseract img.jpg out --psm 0

Output:

Tesseract Open Source OCR Engine v4.0.0-alpha.20180109 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 346
Warning. Invalid resolution 0 dpi. Using 70 instead.

File output(out.osd):

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 13.95
Script: Latin
Script confidence: 6.67

bozhodimitrov · 2018-02-27T17:57:09Z

The warning is familiar to me, the problem is that tesseract doesn't return exit code 0 in that case, which is not nice :D

But the warning itself means that there is a missing image metadata information. Maybe a beforehand conversion of the image can help.

At the moment we should adjust the pytesseract logic at two places in order to be able to read and return the content of the .osd file.

Can you try to report the exit code of the tesseract command.
You can check that with the following command after execution of the tesseract command:

echo $?

Also try to convert the image in order to workaround the tesseract warning.

ricardomga · 2018-02-27T18:35:52Z

The exit code when run in the terminal is 0. But when run through pytesseract it is not, do you have any idea why?
There is any way of knowing what 3221225477 exit code means?

bozhodimitrov · 2018-02-27T19:12:09Z

You can patch the pytesseract library temporarily on line 133 and you can print the command with:

print(' '.join(command))

PS: We have a new function image_to_osd. You can try your example images with it.
Feel free to reopen if you have any other comments/questions.

me-suzy · 2022-10-08T07:39:51Z

You can patch the pytesseract library temporarily on line 133 and you can print the command with:
print(' '.join(command))
PS: We have a new function image_to_osd. You can try your example images with it. Feel free to reopen if you have any other comments/questions.

your pytesseract.py doesn't exist anymore. Please upload again.

stefan6419846 · 2022-10-08T09:08:26Z

your pytesseract.py doesn't exist anymore. Please upload again.

The comment is from 2018, so things might have changed.

The file still exists, although the directory structure has been migrated and this file is available at https://github.com/madmaze/pytesseract/blob/master/pytesseract/pytesseract.py now. At the moment, you will have to add the print statement to this line:

pytesseract/pytesseract/pytesseract.py

Line 253 in 32454d2

me-suzy · 2022-10-08T09:15:09Z

ok, thanks, I download and replace the file.

Now, I have another problem with this pytesseract.py

    import pytesseract
  File "C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\pytesseract\__init__.py", line 70
    <title>pytesseract/__init__.py at master · madmaze/pytesseract</title>
                                             ^
SyntaxError: invalid character '·' (U+00B7)

stefan6419846 · 2022-10-08T09:16:33Z

You did not download the actual (raw) file, but the rendered HTML code from GitHub.

me-suzy · 2022-10-08T10:32:06Z

your pytesseract.py doesn't exist anymore. Please upload again.

The comment is from 2018, so things might have changed.

The file still exists, although the directory structure has been migrated and this file is available at https://github.com/madmaze/pytesseract/blob/master/pytesseract/pytesseract.py now. At the moment, you will have to add the print statement to this line:

pytesseract/pytesseract/pytesseract.py

Line 253 in 32454d2

ok, I download, and change the line. But I get the same error.

Can you please attach the file, after edit with one of those 2 commands?

    print(' '.join(command))
    print(' '.join(cmd_args))

I realy don't understand where to change the file. Because I change many time, and didn't work. Please edit and attach here the new version, please.

stefan6419846 · 2022-10-08T10:46:36Z

I still do not get what you want to achieve: What is your intent with commenting on this old issue and trying to do some changes there? If it is related to #455, please answer the actual questions there. You will (usually) never be able to fix an issue by just printing anything to the terminal - the print(' '.join(cmd_args)) just allows you to see what pytesseract calls Tesseract with to further debug possible call issues.

me-suzy · 2022-10-08T11:01:32Z

ok, this is my problem. And I don't know what to do. My Python code (convert with OCR from PDF in Text file) is very good, but cannot succed because of this error.

Please tell me how to fix it.

stefan6419846 · 2022-10-09T13:30:02Z

As mentioned previously, please keep these issues separate ones - the issue from your last comment is already discussed in #455, while asking twice will not really change anything about the support.

bozhodimitrov closed this as completed Apr 15, 2018

Helyux mentioned this issue Apr 26, 2018

Trying to OCR a jpeg but getting [Error 3221225477]? openpaperwork/pyocr#97

Closed

stefan6419846 mentioned this issue Oct 8, 2022

raise TesseractError(proc.returncode, get_errors(error_string)) pytesseract.pytesseract.TesseractError: (3221225477, '') #455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSD #105

OSD #105

ricardomga commented Feb 27, 2018 •

edited

bozhodimitrov commented Feb 27, 2018 •

edited

ricardomga commented Feb 27, 2018 •

edited

bozhodimitrov commented Feb 27, 2018

ricardomga commented Feb 27, 2018

bozhodimitrov commented Feb 27, 2018

ricardomga commented Feb 27, 2018

bozhodimitrov commented Feb 27, 2018 •

edited

me-suzy commented Oct 8, 2022

stefan6419846 commented Oct 8, 2022 •

edited

me-suzy commented Oct 8, 2022 •

edited

stefan6419846 commented Oct 8, 2022

me-suzy commented Oct 8, 2022 •

edited

stefan6419846 commented Oct 8, 2022

me-suzy commented Oct 8, 2022 •

edited

stefan6419846 commented Oct 9, 2022

OSD #105

OSD #105

Comments

ricardomga commented Feb 27, 2018 • edited

bozhodimitrov commented Feb 27, 2018 • edited

ricardomga commented Feb 27, 2018 • edited

bozhodimitrov commented Feb 27, 2018

ricardomga commented Feb 27, 2018

bozhodimitrov commented Feb 27, 2018

ricardomga commented Feb 27, 2018

bozhodimitrov commented Feb 27, 2018 • edited

me-suzy commented Oct 8, 2022

stefan6419846 commented Oct 8, 2022 • edited

me-suzy commented Oct 8, 2022 • edited

stefan6419846 commented Oct 8, 2022

me-suzy commented Oct 8, 2022 • edited

stefan6419846 commented Oct 8, 2022

me-suzy commented Oct 8, 2022 • edited

stefan6419846 commented Oct 9, 2022

ricardomga commented Feb 27, 2018 •

edited

bozhodimitrov commented Feb 27, 2018 •

edited

ricardomga commented Feb 27, 2018 •

edited

bozhodimitrov commented Feb 27, 2018 •

edited

stefan6419846 commented Oct 8, 2022 •

edited

me-suzy commented Oct 8, 2022 •

edited

me-suzy commented Oct 8, 2022 •

edited

me-suzy commented Oct 8, 2022 •

edited