Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to python3 #158

Open
marijani101 opened this issue Nov 19, 2019 · 7 comments
Open

move to python3 #158

marijani101 opened this issue Nov 19, 2019 · 7 comments
Milestone

Comments

@marijani101
Copy link

As Python 2 is coming to an end, wouldn't it be better to migrate to Python 3?

@zuphilip
Copy link
Collaborator

The hocr-tools are working with Python 3 already. We support Python 2 and 3 together.

@stweil
Copy link
Collaborator

stweil commented Nov 20, 2019

Maybe some compatibility constructs can be removed from the code as soon as Python 2 is gone, but for the moment I think there is nothing to be done. @marijani101, did you notice problems with Python 3 which would require an action now? I only found that the README could be updated to mention Python 3 as well. Maybe you want to send a pull request for that?

@stweil
Copy link
Collaborator

stweil commented Jan 10, 2020

The Python 2 package names in the README.md should be replaced by Python 3 package names. @marijani101, can you send a pull request?

@FriedrichFroebel
Copy link

It seems like while the setup file still advertises Python 2, 269d63a basically drops this support in the most recent time. This contradicts with the following code, as f-strings have not been available before Python 3.6:

hocr-tools/setup.py

Lines 24 to 26 in 0ad95b3

'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',

@stweil
Copy link
Collaborator

stweil commented Dec 16, 2022

Right, thanks for reporting this. Do you want to send a pull request which removes all old entries? All Python versions before 3.7 are unsupported.

@FriedrichFroebel
Copy link

FriedrichFroebel commented Dec 16, 2022

I just did some more tests regarding version support and stumbled upon some more stuff which probably needs some attention (and is more or less related to Python 3 support):

  • CI fails due to beautifulsoup4 never being installed, but apparently being called by the tests (according to the code, the package is optional for regular installations):

    # ocrx_word argument
    not ok 17 - Failed: hocr-extract-images -U -p word-%03d.png -b ../testdata -e ocrx_word ../testdata/tess.hocr
    ---
    diag: |
    Traceback (most recent call last):
    File "/opt/hostedtoolcache/Python/3.7.15/x64/bin/hocr-extract-images", line 79, in <module>
    from bs4 import UnicodeDammit
    ModuleNotFoundError: No module named 'bs4'
    ...
  • Current lxml versions do not work with hocr-extract-images (fixed by using doc = html.document_fromstring(content.encode('utf-8'), parser=parser) instead:

    # ocrx_word argument
    not ok 17 - Failed: hocr-extract-images -U -p word-%03d.png -b ../testdata -e ocrx_word ../testdata/tess.hocr
    ---
    diag: |
    Traceback (most recent call last):
    File "/opt/hostedtoolcache/Python/3.7.15/x64/bin/hocr-extract-images", line 83, in <module>
    doc = html.document_fromstring(content, parser=parser)
    File "/opt/hostedtoolcache/Python/3.7.15/x64/lib/python3.7/site-packages/lxml/html/__init__.py", line 759, in     document_fromstring
    value = etree.fromstring(html, parser, **kw)
    File "src/lxml/etree.pyx", line 3257, in lxml.etree.fromstring
    File "src/lxml/parser.pxi", line 1911, in lxml.etree._parseMemoryDocument
    ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
    ...
  • We might want to migrate CI from Travis to GitHub Actions as well.

  • We might want to improve the overall package structure to declare the current scripts as console scripts entry points and move the actual implementations to some common package (making code re-usage easier). This would allow us to use the standardized stdlib unittest module for testing as well (due to the package being importable) which enables easier integration with other tools like coverage.py to actually ensure that the code is properly tested if dropping old Python 2 compatibility code for example.

@stweil stweil added this to the v1.3.1 milestone Dec 19, 2022
@mrghosti3
Copy link

Any news on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants