Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextLine coordinates too coarse #33

Closed
bertsky opened this issue May 28, 2020 · 4 comments
Closed

TextLine coordinates too coarse #33

bertsky opened this issue May 28, 2020 · 4 comments
Assignees

Comments

@bertsky
Copy link
Contributor

bertsky commented May 28, 2020

Would it be possible to get good polygonal outlines from the text line segmentation instead of coarse bounding boxes?

There is a stark contrast between the precise contours of the text regions (which never overlap) and the coarse rectangles of text lines inside them (which often extrude beyond their parent and overlap between adjacent lines).

This makes it risky to apply line-level dewarping afterwards, and requires an OCR engine that can cope with intruders in the line image. In the example given in #29, I get these line images from ocrd-cis-ocropy-dewarp:

OCR-D-IMG-DEW-SBB_0001_r21_l24

OCR-D-IMG-DEW-SBB_0001_r21_l25

OCR-D-IMG-DEW-SBB_0001_r21_l26

OCR-D-IMG-DEW-SBB_0001_r21_l27

OCR-D-IMG-DEW-SBB_0001_r21_l28

OCR-D-IMG-DEW-SBB_0001_r21_l29

OCR-D-IMG-DEW-SBB_0001_r21_l30

OCR-D-IMG-DEW-SBB_0001_r21_l31

OCR-D-IMG-DEW-SBB_0001_r21_l32

OCR-D-IMG-DEW-SBB_0001_r21_l33

OCR-D-IMG-DEW-SBB_0001_r21_l34

@vahidrezanezhad
Copy link
Member

@bertsky We can set more tight textlines but this also has its own disadvantages. By the way we will publish a new tool which throws contours for textlines not rectangles. however mentioned method costs us more processing time!

@bertsky
Copy link
Contributor Author

bertsky commented Aug 14, 2020

@vahidrezanezhad

We can set more tight textlines but this also has its own disadvantages. however mentioned method costs us more processing time!

Then why not make that behaviour optional (with an ocrd-tool.json parameter), so the user can decide what is needed (precision or performance) for her workflow?

By the way we will publish a new tool which throws contours for textlines not rectangles.

Where?

And why did you close the issue already?

@vahidrezanezhad
Copy link
Member

Dear @bertsky ,
First of all you can see the tool which gives texlines as contour here
" https://github.com/vahidrezanezhad/newspapers_regions_and_reading_order_curved_lines "
But the reason it is not integrated as an option to the current model is that, the new tool will be another tool which can give also the reading order of textregions. The other reason is it is still under development. If you use this tool (of course I can share the models with you :) ) you will see that I am writing textlines contours on the deskewed image and not original image, but based on our internal decisions in sbb we decided to write results on org image again.

@bertsky
Copy link
Contributor Author

bertsky commented Aug 17, 2020

@vahidrezanezhad understood – I'll try to follow. Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants