Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/pytesseract_test.py::test_image_to_data_common_output[dict] FAILED #406

Closed
mandree opened this issue Jan 26, 2022 · 4 comments
Closed

Comments

@mandree
Copy link

mandree commented Jan 26, 2022

Hello namesake,

the self-test suite fails on FreeBSD for pytesseract 0.3.8 and 0.3.9 with various Python 3.x versions,

>           assert 0 <= confidence_values[-1] <= 100
E           TypeError: '<=' not supported between instances of 'int' and 'str'

Full story:

GLOB sdist-make: /usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/setup.py
py39 create: /usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/.tox/py39
py39 installdeps: numpy, pandas, -r/usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/requirements-dev.txt
py39 inst: /usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/.tox/.tmp/package/1/pytesseract-0.3.9.zip
py39 installed: acme==1.22.0,affine==2.3.0,alabaster==0.7.12,appdirs==1.4.4,asn1crypto==1.4.0,astroid==2.9.0,atomicwrites==1.4.0,attrs==21.3.0,Babel==2.9.1,black==21.12b0,blinker==1.4,boto==2.49.0,Bottleneck==1.3.2,breathe==4.31.0,certbot==1.22.0,certifi==2021.10.8,cffi==1.15.0,cfgv==3.3.1,chardet==4.0.0,chrome-gnome-shell==0.0.0,click==8.0.3,click-plugins==1.1.1,cligj==0.7.2,cloudpickle==1.3.0,colorama==0.4.4,commonmark==0.9.1,ConfigArgParse==1.5.3,configobj==5.0.6,coverage==4.5.4,cryptography==3.3.2,cssselect==1.1.0,cycler==0.11.0,Cython==0.29.26,decorator==5.1.1,deprecation==2.1.0,distlib==0.3.4,distro==1.6.0,dnspython==2.1.0,docutils==0.17.1,entrypoints==0.3,evdev==1.4.0,eyed3==0.9.6,fastest-pkg==0.2.0,filelock==3.4.2,filetype==1.0.7,Fiona==1.8.20,Flask==2.0.2,Flask-WTF==0.15.1,freezegun==1.0.0,future==0.18.2,GDAL==3.3.3,geojson==2.3.0,geopandas==0.10.2,httplib2==0.20.2,icdiff==2.0.4,identify==2.4.5,idna==2.10,imageio==2.9.0,imageio-ffmpeg==0.4.5,imagesize==1.3.0,importlib-metadata==4.8.1,importlib-resources==5.4.0,incremental==21.3.0,iniconfig==0.0.0,ipython-genutils==0.2.0,iso-639==0.4.5,iso3166==1.0.1,iso8601==0.1.16,isodate==0.6.1,isort==5.10.1,itsdangerous==2.0.1,jedi==0.18.0,jeepney==0.7.1,Jinja2==3.0.1,joblib==1.1.0,josepy==1.11.0,jq==1.2.1,jsonpatch==1.21,jsonpointer==2.0,jsonschema==4.2.1,jupyter-core==4.9.1,keyring==18.0.1,keyrings.alt==3.1.1,kiwisolver==1.3.2,lazy-object-proxy==1.7.1,lensfun==0.3.95,libxml2-python==2.9.12,lxml==4.7.1,Markdown==3.3.4,MarkupSafe==2.0.1,matplotlib==3.4.3,matplotlib-scalebar==0.8.0,mccabe==0.6.1,meson==0.60.3,minidb==2.0.5,mock==3.0.5,mongoengine==0.20.0,more-itertools==8.12.0,munch==2.5.0,mutagen==1.45.1,mypy-extensions==0.4.3,nbformat==5.1.3,networkx==2.6.3,nltk==3.4.1,nodeenv==1.6.0,nose==1.3.7,numexpr==2.8.1,numpy==1.20.3,oauthlib==1.1.2,olefile==0.46,OWSLib==0.25.0,packaging==21.3,pafy==0.5.5,pandas==1.2.5,parsedatetime==2.6,parso==0.8.3,pathspec==0.9.0,pbr==5.5.0,pdftotext==2.2.2,Pillow==8.2.0,pkginfo==1.8.2,platformdirs==2.4.1,plotly==4.14.3,pluggy==0.13.1,ply==3.11,pre-commit==2.17.0,psutil==5.8.0,psycopg2==2.9.2,pwquality==1.4.4,py==1.9.0,pybind11==2.9.0,pycairo==1.18.1,pycodestyle==2.8.0,pycountry==18.5.26,pycparser==2.21,pycryptodome==3.12.0,pycurl==7.44.1,pydot==1.4.2,Pygments==2.7.2,PyGObject==3.38.0,pygraphviz==1.6,pyjq==2.4.0,PyJWT==2.3.0,pylint==2.12.2,pymongo==3.12.0,pyOpenSSL==20.0.1,pypa-docs-theme==0.0.1,pyparsing==3.0.6,pypng==0.0.17,pyproj==3.2.1,PyQRCode==1.2.1,PyQt-builder==1.9.1,PyQt5-sip==12.9.0,pyRFC3339==1.1,pyrsgis==0.4.1,pyrsistent==0.14.11,pyserial==3.5,PySocks==1.7.1,PyStemmer==2.0.1,pytesseract==0.3.9,pytest==4.6.11,python-dateutil==2.8.1,python-docs-theme==2018.2,python-magic==0.4.15,pytz==2021.3,pyudev==0.22.0,PyWavelets==1.2.0,pyxdg==0.27,PyYAML==5.4.1,QScintilla==2.13.0,rasterio==1.2.10,recommonmark==0.5.0,regex==2020.7.14,requests==2.25.1,requests-mock==1.9.3,requests-toolbelt==0.9.1,retrying==1.3.3,scikit-image==0.19.1,scikit-learn==1.0.2,scikit-sparse==0.4.6,scipy==1.7.1,SCons==4.2.0,SecretStorage==3.3.1,setuptools-scm==6.3.2,Shapely==1.8.0,simplejson==3.17.6,sip==5.5.0,six==1.16.0,snowballstemmer==2.2.0,snuggs==1.4.7,Sphinx==4.3.1,sphinx-markdown-tables==0.0.15,sphinx-rtd-theme==1.0.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,sphinxcontrib-websupport==1.2.4,sqlite3==0.0.0,streamlink==2.1.2,termcolor==1.1.0,tifffile==2021.8.30,Tkinter==0.0.0,toml==0.10.2,tomli==1.2.3,tornado==6.1,towncrier==19.2.0,tox==3.12.1,tqdm==4.62.3,traitlets==5.1.1,typed-ast==1.5.1,typing-extensions==3.10.0.2,urllib3==1.26.7,urlwatch==2.24,urwid==2.1.2,urwid-readline==0.13,vcversioner==2.16.0.0,virtualenv==20.13.0,wcwidth==0.1.8,webencodings==0.5.1,websocket-client==0.58.0,websockets==10.1,Werkzeug==2.0.2,wrapt==1.13.3,WTForms==2.1,wxPython==4.0.7,xlrd==2.0.1,xmltodict==0.12.0,ydiff==1.2,zipp==3.4.0,zope.component==4.2.2,zope.event==4.1.0,zope.interface==5.3.0
py39 run-test-pre: PYTHONHASHSEED='2942057313'
py39 run-test: commands[0] | python -bb -m pytest tests
======================================================= test session starts ========================================================
platform freebsd13 -- Python 3.8.12, pytest-4.6.11, py-1.9.0, pluggy-0.13.1 -- /usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/.tox/py39/bin/python
cachedir: .tox/py39/.pytest_cache
rootdir: /usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9, inifile: tox.ini
plugins: requests-mock-1.9.3
collected 47 items                                                                                                                 

tests/pytesseract_test.py::test_image_to_string_with_image_type[jpg] PASSED                                                  [  2%]
tests/pytesseract_test.py::test_image_to_string_with_image_type[pgm] PASSED                                                  [  4%]
tests/pytesseract_test.py::test_image_to_string_with_image_type[png] PASSED                                                  [  6%]
tests/pytesseract_test.py::test_image_to_string_with_image_type[ppm] PASSED                                                  [  8%]
tests/pytesseract_test.py::test_image_to_string_with_image_type[tiff] PASSED                                                 [ 10%]
tests/pytesseract_test.py::test_image_to_string_with_image_type[gif] PASSED                                                  [ 12%]
tests/pytesseract_test.py::test_image_to_string_with_image_type[webp] PASSED                                                 [ 14%]
tests/pytesseract_test.py::test_image_to_string_with_args_type[path_str] PASSED                                              [ 17%]
tests/pytesseract_test.py::test_image_to_string_with_args_type[image_object] PASSED                                          [ 19%]
tests/pytesseract_test.py::test_image_to_string_with_numpy_array PASSED                                                      [ 21%]
tests/pytesseract_test.py::test_image_to_string_european PASSED                                                              [ 23%]
tests/pytesseract_test.py::test_image_to_string_batch PASSED                                                                 [ 25%]
tests/pytesseract_test.py::test_image_to_string_multiprocessing PASSED                                                       [ 27%]
tests/pytesseract_test.py::test_image_to_string_timeout PASSED                                                               [ 29%]
tests/pytesseract_test.py::test_la_image_to_string PASSED                                                                    [ 31%]
tests/pytesseract_test.py::test_image_to_boxes PASSED                                                                        [ 34%]
tests/pytesseract_test.py::test_image_to_osd PASSED                                                                          [ 36%]
tests/pytesseract_test.py::test_image_to_pdf_or_hocr[pdf] PASSED                                                             [ 38%]
tests/pytesseract_test.py::test_image_to_pdf_or_hocr[hocr] PASSED                                                            [ 40%]
tests/pytesseract_test.py::test_image_to_alto_xml PASSED                                                                     [ 42%]
tests/pytesseract_test.py::test_image_to_alto_xml_support SKIPPED                                                            [ 44%]
tests/pytesseract_test.py::test_image_to_data__pandas_support SKIPPED                                                        [ 46%]
tests/pytesseract_test.py::test_image_to_data__pandas_output PASSED                                                          [ 48%]
tests/pytesseract_test.py::test_image_to_data_common_output[bytes] PASSED                                                    [ 51%]
tests/pytesseract_test.py::test_image_to_data_common_output[dict] FAILED                                                     [ 53%]
tests/pytesseract_test.py::test_image_to_data_common_output[string] PASSED                                                   [ 55%]
tests/pytesseract_test.py::test_wrong_prepare_type[int] PASSED                                                               [ 57%]
tests/pytesseract_test.py::test_wrong_prepare_type[float] PASSED                                                             [ 59%]
tests/pytesseract_test.py::test_wrong_prepare_type[none] PASSED                                                              [ 61%]
tests/pytesseract_test.py::test_wrong_tesseract_cmd[executable_name] PASSED                                                  [ 63%]
tests/pytesseract_test.py::test_wrong_tesseract_cmd[absolute_path] PASSED                                                    [ 65%]
tests/pytesseract_test.py::test_main_not_found_cases PASSED                                                                  [ 68%]
tests/pytesseract_test.py::test_proper_oserror_exception_handling[permission_error_path] PASSED                              [ 70%]
tests/pytesseract_test.py::test_proper_oserror_exception_handling[invalid_path] PASSED                                       [ 72%]
tests/pytesseract_test.py::test_get_languages[default_empty_config] PASSED                                                   [ 74%]
tests/pytesseract_test.py::test_get_languages[custom_tessdata_dir] PASSED                                                    [ 76%]
tests/pytesseract_test.py::test_get_languages[incorrect_tessdata_dir] PASSED                                                 [ 78%]
tests/pytesseract_test.py::test_get_languages[invalid_tessdata_dir] PASSED                                                   [ 80%]
tests/pytesseract_test.py::test_get_languages[invalid_config] PASSED                                                         [ 82%]
tests/pytesseract_test.py::test_file_to_dict[input_args0-expected0] PASSED                                                   [ 85%]
tests/pytesseract_test.py::test_file_to_dict[input_args1-expected1] PASSED                                                   [ 87%]
tests/pytesseract_test.py::test_file_to_dict[input_args2-expected2] PASSED                                                   [ 89%]
tests/pytesseract_test.py::test_get_tesseract_version[3.5.0-3.5.0] PASSED                                                    [ 91%]
tests/pytesseract_test.py::test_get_tesseract_version[4.1-a8s6f8d3f-4.1] PASSED                                              [ 93%]
tests/pytesseract_test.py::test_get_tesseract_version[v4.0.0-beta1.9-4.0.0] PASSED                                           [ 95%]
tests/pytesseract_test.py::test_get_tesseract_version_invalid[-Invalid tesseract version: ""] PASSED                         [ 97%]
tests/pytesseract_test.py::test_get_tesseract_version_invalid[invalid-Invalid tesseract version: "invalid"] PASSED           [100%]

============================================================= FAILURES =============================================================
______________________________________________ test_image_to_data_common_output[dict] ______________________________________________

test_file_small = '/usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/tests/data/test-small.jpg', output = 'dict'

    @pytest.mark.skipif(
        TESSERACT_VERSION[:2] < (3, 5),
        reason='requires tesseract >= 3.05',
    )
    @pytest.mark.parametrize(
        'output',
        [Output.BYTES, Output.DICT, Output.STRING],
        ids=['bytes', 'dict', 'string'],
    )
    def test_image_to_data_common_output(test_file_small, output):
        """Test and compare the type of the result."""
        result = image_to_data(test_file_small, output_type=output)
        expected_dict_result = {
            'level': [1, 2, 3, 4, 5],
            'page_num': [1, 1, 1, 1, 1],
            'block_num': [0, 1, 1, 1, 1],
            'par_num': [0, 0, 1, 1, 1],
            'line_num': [0, 0, 0, 1, 1],
            'word_num': [0, 0, 0, 0, 1],
            'left': [0, 11, 11, 11, 11],
            'top': [0, 11, 11, 11, 11],
            'width': [79, 60, 60, 60, 60],
            'height': [47, 24, 24, 24, 24],
            # 'conf': ['-1', '-1', '-1', '-1', 96],
            'text': ['', '', '', '', 'This'],
        }
    
        if output is Output.BYTES:
            assert isinstance(result, bytes)
    
        elif output is Output.DICT:
            confidence_values = result.pop('conf', None)
            assert confidence_values is not None
>           assert 0 <= confidence_values[-1] <= 100
E           TypeError: '<=' not supported between instances of 'int' and 'str'

tests/pytesseract_test.py:318: TypeError
========================================= 1 failed, 44 passed, 2 skipped in 18.43 seconds ==========================================
ERROR: InvocationError for command /usr/ports/graphics/py-pytesseract/work-py39/pytesseract-0.3.9/.tox/py39/bin/python -bb -m pytest tests (exited with code 1)
_____________________________________________________________ summary ______________________________________________________________
ERROR:   py39: commands failed
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jan 26, 2022
Release Notes: https://github.com/madmaze/pytesseract/releases/tag/v0.3.9
* bump minimal Python requirement to 3.7, upstream de-supports EOL 3.6
* upstream made build tweaks

Packager Notes:
* test_image_to_data_common_output[dict] is failing for me,
  => madmaze/pytesseract#406

* if your tox package complains it cannot find py-filelock,
  update the latter to 3.4.2_1 (commit 97cae4a by yuri@ 2022-01-19)
5u623l20 pushed a commit to 5u623l20/freebsd-ports that referenced this issue Jan 27, 2022
Release Notes: https://github.com/madmaze/pytesseract/releases/tag/v0.3.9
* bump minimal Python requirement to 3.7, upstream de-supports EOL 3.6
* upstream made build tweaks

Packager Notes:
* test_image_to_data_common_output[dict] is failing for me,
  => madmaze/pytesseract#406

* if your tox package complains it cannot find py-filelock,
  update the latter to 3.4.2_1 (commit 97cae4a by yuri@ 2022-01-19)
@bozhodimitrov
Copy link
Collaborator

bozhodimitrov commented Jan 27, 2022

This is so weird actually. How the hell those tests pass under Linux then?
Clearly the conversion breaks somewhere for the FreeBSD setup.

PS: I will have to create dev env for myself, so this might take some time.
@mandree can you let me know what the string value of confidence_values[-1] is, for the failing test?
I suspect that it is some negative or bogus value.

PS2: This should be resolved now if the value was just a negative number.

bozhodimitrov pushed a commit that referenced this issue Jan 27, 2022
Account for negative values. Fixes #406
bozhodimitrov pushed a commit that referenced this issue Jan 27, 2022
Account for negative values. Fixes #406
@mandree
Copy link
Author

mandree commented Jan 27, 2022

The patch in 06e7f80 is insufficient.

confidence_values is [-1, -1, -1, -1, '92.865524'] and that explains it; speaking for Python 3.8 and tesseract 5.0.1:
You cannot construct an int from this '92.865524' string because it does not represent an integer.
You can however construct a float and then round() or int() it, the latter truncates.

So I am trying this, in the try: block, I have changed val = int(...) to val=int(float(...)) - and then it succeeds on Python 3.7...3.9. I cannot currently test 3.10, this needs more work on FreeBSD since it sees distutils stuff and barfs.

...
for i, head in enumerate(header):
        result[head] = list()
        for row in rows:
            if len(row) <= i:
                continue

            if i != str_col_idx:
                try:
                    val = int(float(row[i]))
                except ValueError:
                    val = row[i]
            else:
                val = row[i]

            result[head].append(val)

    return result

freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jan 27, 2022
...via tsv file; this was the one failing self-test, now passes.

madmaze/pytesseract#406
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jan 28, 2022
Release Notes: https://github.com/madmaze/pytesseract/releases/tag/v0.3.9
* bump minimal Python requirement to 3.7, upstream de-supports EOL 3.6
* upstream made build tweaks

Packager Notes:
* test_image_to_data_common_output[dict] is failing for me,
  => madmaze/pytesseract#406

* if your tox package complains it cannot find py-filelock,
  update the latter to 3.4.2_1 (commit 97cae4a by yuri@ 2022-01-19)

(cherry picked from commit ee92f58)
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Jan 28, 2022
...via tsv file; this was the one failing self-test, now passes.

madmaze/pytesseract#406
(cherry picked from commit fb684ee)
@bozhodimitrov
Copy link
Collaborator

I cannot currently test 3.10, this needs more work on FreeBSD since it sees distutils stuff and barfs.

distutils is axed already in the latest release.

For the float addition -- I will add this as well, because it doesn't harm the existing behavior, but it might need a change in the future.

bozhodimitrov pushed a commit that referenced this issue Jan 28, 2022
Account for negative values. Fixes #406
@mandree
Copy link
Author

mandree commented Jan 28, 2022

I cannot currently test 3.10, this needs more work on FreeBSD since it sees distutils stuff and barfs.

distutils is axed already in the latest release.

Yup, I'd seen that but not investigated in more detail since that might have looked like infrastructure work, and also for successful tests on Python 3.10 I think we should also have Pandas/NumPy, which hinges on FreeBSD not providing NumPy 1.22 yet (the first version to formally support Python 3.10).

For the float addition -- I will add this as well, because it doesn't harm the existing behavior, but it might need a change in the future.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants