Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic unit tests #215

Merged
merged 32 commits into from
Jul 30, 2019
Merged

Conversation

nok
Copy link
Contributor

@nok nok commented Jul 27, 2019

Hello,

I added some unit tests to cover all common methods. For that I added more test data.

Finally I wasn't able to use a bitmap file for the tests, because tesseract (leptonica) failed.
But this bug is related to an open issue: tesseract-ocr/tesseract#2558.

Nevertheless the other tests passed successfully (Python 2.7, Python 3.6 and Python 3.7).

Screenshot:

Screenshot 2019-07-27 at 19 18 38

nok added 21 commits July 25, 2019 23:58
…es Add tests to validate the output of the method `image_to_data`
@nok nok mentioned this pull request Jul 27, 2019
@nok
Copy link
Contributor Author

nok commented Jul 27, 2019

Evidently, Tesseract v3.04.01 with Leptonica has trouble with gif images.

Tesseract Open Source OCR Engine v3.04.01 with Leptonica Warning in pixReadMemGif: writing to a temp file, not directly to memory Error in pixReadStreamGif: Can't use giflib-5.1.2; suggest 5.1.1 or earlier Error in pixReadStream: gif: no pix returned Error in pixRead: pix not read Error in pixReadMemGif: pix not read Error in pixReadMem: gif: no pix returned Error during processing.

@nok nok changed the title Add more unit tests Add basic unit tests Jul 27, 2019
@nok nok mentioned this pull request Jul 27, 2019
@bozhodimitrov
Copy link
Collaborator

@johnthagen please review whenever you have some spare time and let me know if everything looks good for the tests.

…sing `isort` Make method `predict` private by adding a leading underscore `_predict`


@pytest.mark.parametrize('test_file', [
# https://github.com/tesseract-ocr/tesseract/issues/2558
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these tests be turned on for tesseract 4.0 / bionic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also depends on the behaviour and version of Leptonica. On my system I have the newest versions of Tesseract and Leptonica and it doesn't work. Nevertheless I will test it with the images of Travis.

JFYI, we can skip tests with this decorator:

@pytest.mark.skipif(
    TESSERACT_VERSION[0] < 4,
    reason='requires tesseract >= 4'
)
# ...

# os.path.join(DATA_DIR, 'test.bmp'),
# os.path.join(DATA_DIR, 'test.gif'),
os.path.join(DATA_DIR, 'test.jpg'),
Image.open(os.path.join(DATA_DIR, 'test.jpg')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably simplify the logic if the Image.open() and os.path.join was moved into the test itself, and the only things parameterized was the file itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will split this tests in two different tests. You're right, I mixed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gif images don't work with Tesseract 3, so I skip these cases.
bmp images don't work with any version of Tesseract.

Copy link
Collaborator

@bozhodimitrov bozhodimitrov Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. According to the Leptonica docs - Image I/O, there is support for BMP and GIF. Leptonica is the underlying library that tesseract utilize.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as far as I can see from the Travis CI report, the gif test passes on bionic.
So indeed, the problem is the version of Leptopnica (leptonica-1.73 - xenial vs leptonica-1.75.3 - bionic which is also a bit old - Feb 16, 2018).

tests/test_pytesseract.py Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
nok added 7 commits July 30, 2019 16:37
…ting # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
…esseract.tesseract_cmd` at the end of the test
@nok
Copy link
Contributor Author

nok commented Jul 30, 2019

Thanks for your review and feedback @int3l and @johnthagen.
Can you pull the new changes and review it again please?

In general Tesseract 3/4 can't handle bitmap images. But Tesseract 4 can handle gif images. The tests handle these circumstances.

Copy link
Contributor

@johnthagen johnthagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this PR.

@bozhodimitrov bozhodimitrov merged commit d416a42 into madmaze:master Jul 30, 2019
@bozhodimitrov
Copy link
Collaborator

bozhodimitrov commented Jul 30, 2019

Once again, @nok and @johnthagen thank you for the contributions and the time spent on the tasks.
@nok please let me know if you don't mind to include you to the contributors credit section for the project.

@nok
Copy link
Contributor Author

nok commented Jul 31, 2019

Once again, @nok and @johnthagen thank you for the contributions and the time spent on the tasks.
@nok please let me know if you don't mind to include you to the contributors credit section for the project.

@int3l It makes fun and I learned something new 😄. Yes, please add me to the list of contributors, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants