You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
What's the problem?
When processing two PDFs that are visibly identical layout-wise except for different text value one pdf gives the text as a simple left-to-right block of text whilst the other break the text into two columns. I just wanted to know if there is a way to make it read all of it as a block of text and not try to interpret columns.
I've linked two of the PDFs that have the issue. On the 2nd page of both is the balance sheet I'm trying to read.
Expected behaviour (text read as a block) - bit.ly/3zG1LJO
Undesired behaviour (text split into two columns) - bit.ly/3UbvO5Q
OS: MacOS
Python version: 3.10
OCRmyPDF version: 14.0.4
Platform: ARM
The text was updated successfully, but these errors were encountered:
The reading order is inferred by the PDF viewer, unless there is markup in the PDF to indicate the appropriate reading order (such as making a tagged PDF). Adding this information is beyond the capabilities of OCR engines at the moment.
Describe the bug
What's the problem?
When processing two PDFs that are visibly identical layout-wise except for different text value one pdf gives the text as a simple left-to-right block of text whilst the other break the text into two columns. I just wanted to know if there is a way to make it read all of it as a block of text and not try to interpret columns.
I've linked two of the PDFs that have the issue. On the 2nd page of both is the balance sheet I'm trying to read.
Expected behaviour (text read as a block) - bit.ly/3zG1LJO
Undesired behaviour (text split into two columns) - bit.ly/3UbvO5Q
The text was updated successfully, but these errors were encountered: