Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in pixa_to_list #33

Closed
oskaritimperi opened this issue Feb 17, 2017 · 2 comments
Closed

Segfault in pixa_to_list #33

oskaritimperi opened this issue Feb 17, 2017 · 2 comments

Comments

@oskaritimperi
Copy link

Hi, I've encountered a segfault in pixa_to_list and I can reproduce it consistently. I don't have any idea how to fix this though.

This image here will always make tesserocr segfault:
fail

This image on the other hand works fine:
success

The code I'm using for testing is simple:

import tesserocr
from PIL import Image
import sys

print(tesserocr.tesseract_version())
print(tesserocr.get_languages())

png = Image.open(sys.argv[1]).convert('L')

# print(tesserocr.image_to_text(png))

with tesserocr.PyTessBaseAPI() as api:
    api.SetImage(png)
    boxes = api.GetComponentImages(tesserocr.RIL.WORD, True)
    for _, box, _, _ in boxes:
        pad = box['h'] * 0.2
        api.SetRectangle(box['x']-pad, box['y']-pad, box['w']+pad, box['h']+pad)
        text = api.GetUTF8Text().strip()
        confidence=api.MeanTextConf()
        print(text, confidence)

Here is a crash report from OS X: crashreport.txt

Here is the output from a succesful run (including version numbers and so on): success.txt

I'm using tesseract 3.05.00 which I compiled myself as I had this problem with the 3.04 also and I thought maybe the new version would fix the issue.

Here are the relevant environment variables I used when I executed python setup.py install for tesserocr:

declare -x CFLAGS="-g -fno-omit-frame-pointer  -UNDEBUG -O0"
declare -x CPPFLAGS="-I/Users/otimpe/dev/tesseract-3.05.00/dist/include"
declare -x DYLD_LIBRARY_PATH="/Users/otimpe/dev/tesseract-3.05.00/dist/lib"
declare -x LDFLAGS="-L/Users/otimpe/dev/tesseract-3.05.00/dist/lib -g"
declare -x TESSDATA_PREFIX="/usr/local/share"
sirfz added a commit that referenced this issue Feb 18, 2017
@sirfz
Copy link
Owner

sirfz commented Feb 18, 2017

Hi, this happened because GetComponentImages is returning a NULL boxa for this image. I wasn't validating this which caused pixa_to_list to run against an invalid pointer. I've added a check for this in all instances where pixa_to_list is used to ensure an empty list is returned instead of segfaulting. Thanks for the catch :)

You can test this using the tesseract4 branch.

@oskaritimperi
Copy link
Author

Hi,

the fixes you made seem to work nicely with tesseract 3.04. :-) Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants