Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract OCR doesn't work when Python script is converted to exe without console #5601

Closed
ugurcanakyuz opened this issue Mar 4, 2021 · 12 comments

Comments

@ugurcanakyuz
Copy link

ugurcanakyuz commented Mar 4, 2021

I have an ML solution. I use Pytesseract in this solution. I need to create an executable from it. So I use the pyinstaller. To create an executable that can call another exe, the tesseract exe, I followed the https://stackoverflow.com/a/60679256/13080899. When I create the exe with console Tesseract exe is called in my exe and gives me output but if I create the exe without console Tesseract doesn't work. I couldn't find any solution. How can I solve the problem?

Here is the my .spec file:

# -*- mode: python ; coding: utf-8 -*-
import sys
sys.setrecursionlimit(5000)

block_cipher = None


a = Analysis(['Cam_Choice.py'],
             pathex=['D:\\Project\\XXX'],
             binaries=[('config\\tesseract\\tesseract.exe', 'config\\tesseract')],
             datas=[],
             hiddenimports=['boto3'],
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
a.datas += [('logo.ico', 'D:\\Project\\img\\logo.ico', "DATA")]

pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          [],
          name='XXX',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          upx_exclude=[],
          runtime_tmpdir=None,
          console=False,
      icon='D:\\Project\\img\\logo.ico')

P.S: Because of non-console mode I can't debug the exe.

@bwoodsend
Copy link
Member

Does your program or any of its dependencies use subprocess anywhere? And can you verify that your program works if you run using pythonw Cam_Choice.py?

@ugurcanakyuz
Copy link
Author

pythonw Cam_Choice.py was worked correctly. When I run the code in PyCharm it works correctly. If I create exe with console it works correctly. However, if I switch it to non-console mode it doesn't work. I followed the code and found following lines are not working.
config_ = "--psm 13 --oem 1 -c tessedit_char_whitelist=0123456789."
rec = pytesseract.image_to_data(processed, output_type='data.frame', config= config_)

I use multithreading for different processes and they are not related to Tesseract. Tesseract is linked to the main thread. I don't use subprocess anywhere and, to be honest, I don't know it well.

@bwoodsend
Copy link
Member

Yep that's subprocess rearing its ugly head. See here where they do not set stdout to NULL and let it default to inheriting an invalid stream from its parent.

If you monkey patch your version of pytesseract to add an else to that if include_stdout: so it becomes:

    if include_stdout:
        kwargs['stdout'] = subprocess.PIPE
    else:
        kwargs['stdout'] = subprocess.DEVNULL

Then rerun your pyinstaller command but add the --clean flag:

pyinstaller --clean Cam_choice.spec

That should fix it?

@ugurcanakyuz
Copy link
Author

ugurcanakyuz commented Mar 5, 2021

I added the code above to the pytesseract codes in Virtual Environment and Default Environment library paths. They look like following:

def subprocess_args(include_stdout=True):
    # See https://github.com/pyinstaller/pyinstaller/wiki/Recipe-subprocess
    # for reference and comments.

    kwargs = {
        'stdin': subprocess.PIPE,
        'stderr': subprocess.PIPE,
        'startupinfo': None,
        'env': environ,
    }

    if hasattr(subprocess, 'STARTUPINFO'):
        kwargs['startupinfo'] = subprocess.STARTUPINFO()
        kwargs['startupinfo'].dwFlags |= subprocess.STARTF_USESHOWWINDOW
        kwargs['startupinfo'].wShowWindow = subprocess.SW_HIDE

    if include_stdout:
        kwargs['stdout'] = subprocess.PIPE
    else:
        kwargs['stdout'] = subprocess.DEVNULL

    return kwargs

I recreated the exe with --clean flag. However, it didn't solve the problem. Why didn't it work?

@rokm
Copy link
Member

rokm commented Mar 5, 2021

Just to clarify - "Tesseract not working" means that in non-console mode, it throws an exception and you get the "Failed to execute script " dialog? (As opposed to not returning any data or returning different data than in console mode).

If that's the case, let's try to find out what the error actually is: in your spec file under EXE, change the option debug=False to debug=True and rebuild your application. Then, after the "Failed to execute script dialog", you should get two more dialogs with exception information (first one will be exception type/name, the second will contain the traceback).

Alternatively, you can wrap your

config_ = "--psm 13 --oem 1 -c tessedit_char_whitelist=0123456789."
rec = pytesseract.image_to_data(processed, output_type='data.frame', config= config_)

calls in a try/except block and write the exception and its traceback into a file.

@ugurcanakyuz
Copy link
Author

@rokm I already tried to change debug option but it didn't work at all. As Iong as I understand If I want to see debug output I also have to activate the console mode but my problem is right here. If console mode is True and command line is displayed my program works but when I try with non-console mode it doesn't work and gives no output.

    try:
        rec = pytesseract.image_to_data(processed, output_type='data.frame', config= config_)
    except:
        with open("error.txt", "w") as err:
            err.write(sys.exc_info()[1])

Gave me an empty text file.

@rokm
Copy link
Member

rokm commented Mar 5, 2021

@rokm I already tried to change debug option but it didn't work at all. As Iong as I understand If I want to see debug output I also have to activate the console mode but my problem is right here.

What version of PyInstaller are you using? If you are using PyInstaller 4.2, then the debug version of bootloader should give you three error dialogs instead of just one. First the standard Failed to execute script <name>, then Error: <ExceptionType> and then Traceback: <Traceback>. Is that not happening?

   try:
        rec = pytesseract.image_to_data(processed, output_type='data.frame', config= config_)
    except:
        with open("error.txt", "w") as err:
            err.write(sys.exc_info()[1])

If that gives you an error dialog again, it means that you're generating another exception in your exception handler. With PyInstaller 4.2 and debug=True, the second error dialog should reveal the error: write() argument must be str, not Exception.

You'll need to convert or format the Exception into string yourself. Or, use

import traceback
traceback.print_exc(file=err)

to get the traceback printed into your file.

@ugurcanakyuz
Copy link
Author

I have never suspected Pyinstaller version. I have used it for a long time without a problem. I have just checked it and its version is 3.6. I will update it and try again.

@ugurcanakyuz
Copy link
Author

I created a log of error as @rokm recommended. The debug mode is True now. The tesseract.exe is already under the right directory.

Here is the error:

DEBUG:Core:Traceback (most recent call last):
  File "pytesseract\pytesseract.py", line 354, in get_tesseract_version
  File "subprocess.py", line 336, in check_output
  File "subprocess.py", line 403, in run
  File "subprocess.py", line 667, in __init__
  File "subprocess.py", line 890, in _get_handles
OSError: [WinError 6] The handle is invalid

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Core.py", line 198, in __convert2text
  File "pytesseract\pytesseract.py", line 458, in image_to_data
  File "pytesseract\pytesseract.py", line 142, in wrapper
  File "pytesseract\pytesseract.py", line 361, in get_tesseract_version
pytesseract.pytesseract.TesseractNotFoundError: D:\Project\dist\config\tesseract\tesseract.exe is not installed or it's not in your PATH. See README file for more information.

DEBUG:botocore.hooks:Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
DEBUG:botocore.hooks:Changing event name from before-call.apigateway to before-call.api-gateway
DEBUG:botocore.hooks:Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
DEBUG:botocore.hooks:Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
DEBUG:botocore.hooks:Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
DEBUG:botocore.hooks:Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
DEBUG:botocore.hooks:Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
DEBUG:botocore.hooks:Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
DEBUG:botocore.hooks:Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
DEBUG:botocore.hooks:Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
DEBUG:botocore.hooks:Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
DEBUG:botocore.loaders:Loading JSON file: C:\Users\UGURCA~1\AppData\Local\Temp\_MEI103842\boto3\data\s3\2006-03-01\resources-1.json
DEBUG:botocore.loaders:Loading JSON file: C:\Users\UGURCA~1\AppData\Local\Temp\_MEI103842\botocore\data\endpoints.json
DEBUG:botocore.hooks:Event choose-service-name: calling handler <function handle_service_name_alias at 0x000001F71F7C19D8>
DEBUG:botocore.loaders:Loading JSON file: C:\Users\UGURCA~1\AppData\Local\Temp\_MEI103842\botocore\data\s3\2006-03-01\service-2.json
DEBUG:botocore.hooks:Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x000001F71F7609D8>
DEBUG:botocore.hooks:Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x000001F71F84DEA0>
DEBUG:botocore.hooks:Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x000001F71F7607B8>
DEBUG:botocore.endpoint:Setting s3 timeout as (60, 60)
DEBUG:botocore.loaders:Loading JSON file: C:\Users\UGURCA~1\AppData\Local\Temp\_MEI103842\botocore\data\_retry.json
DEBUG:botocore.client:Registering retry handlers for service: s3
DEBUG:boto3.resources.factory:Loading s3:s3

@rokm
Copy link
Member

rokm commented Mar 5, 2021

 DEBUG:Core:Traceback (most recent call last):
  File "pytesseract\pytesseract.py", line 354, in get_tesseract_version
  File "subprocess.py", line 336, in check_output
  File "subprocess.py", line 403, in run
  File "subprocess.py", line 667, in __init__
  File "subprocess.py", line 890, in _get_handles
OSError: [WinError 6] The handle is invalid

It is indeed subprocess problem, but not in the place where you were fixing it before.

This subprocess.check_output probably needs to have its stdin set to subprocess.DEVNULL.

@ugurcanakyuz
Copy link
Author

ugurcanakyuz commented Mar 5, 2021

Finally, it looks working now. When I test it fully I will close the issue. Thank you very much for your help.

@run_once
def get_tesseract_version():
    """
    Returns LooseVersion object of the Tesseract version
    """
    try:
        return LooseVersion(
            subprocess.check_output(
                [tesseract_cmd, '--version'],
                stderr=subprocess.STDOUT,
                env=environ,
		stdin=subprocess.DEVNULL,
            )
            .decode('utf-8')
            .split()[1]
            .lstrip(string.printable[10:]),
        )
    except OSError:
        raise TesseractNotFoundError()

@rokm
Copy link
Member

rokm commented Mar 9, 2021

I see you have submitted the fix for this issue to the pytesseract. Thanks!

@rokm rokm closed this as completed Mar 9, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants