Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No text lines detected - Regression? #60

Closed
mikegerber opened this issue Sep 12, 2022 · 8 comments · Fixed by #63
Closed

No text lines detected - Regression? #60

mikegerber opened this issue Sep 12, 2022 · 8 comments · Fixed by #63
Assignees
Labels
bug Something isn't working

Comments

@mikegerber
Copy link
Member

mikegerber commented Sep 12, 2022

Using https://qurator-data.de/examples/actevedef_718448162.first-page.zip, ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG -O OCR-D-SEG-LINE-SBB-TLD -P model "/var/lib/textline_detection" only gives:

        <pc:Border>
            <pc:Coords points="105,80 2418,80 2418,3952 105,3952"/>
        </pc:Border>
# pip list | egrep -i 'ocrd|sbb'
ocrd                   2.38.0
ocrd-modelfactory      2.38.0
ocrd-models            2.38.0
ocrd-utils             2.38.0
ocrd-validators        2.38.0
qurator-sbb-textline   0.0.1

I'm investigating.

@mikegerber mikegerber added the bug Something isn't working label Sep 12, 2022
@mikegerber mikegerber self-assigned this Sep 12, 2022
@mikegerber
Copy link
Member Author

Same problem with the non-OCR-D-CLI:

sbb_textline_detector  -i OCR-D-IMG_00000024.tif -o test-out -m /home/mike/devel/qurator-data/textline_detection

@mikegerber
Copy link
Member Author

@mikegerber
Copy link
Member Author

mikegerber commented Sep 13, 2022

The error module 'cv2' has no attribute 'cv2' is caught here:

except:
text_regions=None
contours=[]

I think the exception catching here is too broad and bad practice. If there's a specific exception to catch, it should be specified and that would have made it easier to track down this kind of bug - by giving a proper error message instead of silently ignoring it.

This is fixed by downgrading opencv-python-headless - the version 4.6.x from June 2022 seems to break contour detection here, therefore sbb_textline_detector is not giving any text regions and thus not giving any text lines either.

I'm preparing a PR to workaround the issue by requiring opencv-python-headless < 4.6.

👀 @kba This - the broad exception catching and the attribute error with the newest OpenCV version - might come up in eynollah too.

@mikegerber
Copy link
Member Author

mikegerber commented Sep 14, 2022

PEP8 (https://peps.python.org/pep-0008/) also has an opinion about this:

When catching exceptions, mention specific exceptions whenever possible instead of using a bare except: clause:

try:
   import platform_specific_module
except ImportError:
   platform_specific_module = None

A bare except: clause will catch SystemExit and KeyboardInterrupt exceptions, making it harder to interrupt a program with Control-C, and can disguise other problems. If you want to catch all exceptions that signal program errors, use except Exception: (bare except is equivalent to except BaseException:).

A good rule of thumb is to limit use of bare ‘except’ clauses to two cases:

If the exception handler will be printing out or logging the traceback; at least the user will be aware that an error has occurred.
If the code needs to do some cleanup work, but then lets the exception propagate upwards with raise. try...finally can be a better way to handle this case.

@mikegerber
Copy link
Member Author

A bare except: clause will catch SystemExit and KeyboardInterrupt exceptions, making it harder to interrupt a program with Control-C, and can disguise other problems.

Ah that's why I always had problems interrupting the run of this program!

@mikegerber
Copy link
Member Author

There is still something broken, with https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip and

ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE-SBB-TLD -P model "/home/mike/devel/qurator-data/textline_detection/"

I get text regions, but there aren't any useful text lines (green) detected:

image

@mikegerber
Copy link
Member Author

Thanks @vahidrezanezhad, I'll test it!

@mikegerber
Copy link
Member Author

With opencv-python-headless == 4.5.1.48 (c4df3d6), it looks fine:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant