ocrx_line example #39

kba · 2016-10-01T15:04:53Z

No description provided.

amitdo · 2016-10-02T12:34:20Z

1.2/spec.md

+
+```html
+...
+<span class="ocrx_line">


ocr_lines nested in ocrx_line? That's doesn't look right to me.

It's ocr_line nested in ocrx_line, in this case a single heading split over two lines.

But I'll gladly make a better example if you have an idea. What i've seen in the wild is just replacements for ocr_line, e.g. https://github.com/jwilk/ocrodjvu/blob/master/lib/hocr.py.

It's ocr_line nested in ocrx_line

Yeah, I fixed my original mistake...

Sadly, I don't know what is the right way in this case.

ocrx_line is engine-specific line markup. It exists for those cases where your OCR engine outputs text lines that don't correspond to "normal" text lines.

The most common case is if you apply an engine that's not capable of column segmentation to a multi-column document and you want to prevent subsequent processing stages from assuming that the text lines it gets contain text in reading order.

Basically, if you use ocrx_line instead of ocr_line, you're (intentionally) breaking most subsequent processing, since most OCR output processing will look for ocr_line tags (and assume they are in reading order).

Tom, thanks for clarifying this for us.

fix #19 fix #39

Example for ocrx_line, #19

b69b342

kba force-pushed the ocrx_line-example branch from cd35c43 to b69b342 Compare October 1, 2016 15:06

amitdo reviewed Oct 2, 2016

View reviewed changes

amitdo mentioned this pull request Oct 22, 2016

ocr_line vs. ocrx_line #19

Open

kba added a commit that referenced this pull request Nov 30, 2017

Add note on ocrx_line by @tmbdev

e2bf67d

fix #19 fix #39

kba mentioned this pull request Nov 30, 2017

Add note on ocrx_line by @tmbdev #105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocrx_line example #39

ocrx_line example #39

kba commented Oct 1, 2016

amitdo Oct 2, 2016 •

edited

Loading

kba Oct 2, 2016

amitdo Oct 2, 2016

amitdo Oct 2, 2016

tmbdev Oct 22, 2016

amitdo Oct 22, 2016

ocrx_line example #39

Are you sure you want to change the base?

ocrx_line example #39

Conversation

kba commented Oct 1, 2016

amitdo Oct 2, 2016 • edited Loading

Choose a reason for hiding this comment

kba Oct 2, 2016

Choose a reason for hiding this comment

amitdo Oct 2, 2016

Choose a reason for hiding this comment

amitdo Oct 2, 2016

Choose a reason for hiding this comment

tmbdev Oct 22, 2016

Choose a reason for hiding this comment

amitdo Oct 22, 2016

Choose a reason for hiding this comment

amitdo Oct 2, 2016 •

edited

Loading