You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm wondering if it would be benefiting to make the OCR text layer compatible to the specified PDF/A-2u standard. Since I couldn't find an issue covering this topic, I would like to ask if somebody already thought about this.
The text was updated successfully, but these errors were encountered:
PDF/A-2u is possible. Ghostscript does not have the ability to generate files with this feature, but it would be possible to test if the output conforms and promote it. With --force-ocr the output would always conform. Otherwise the output would conform if and only if all fonts have a valid Unicode mapping, which is not an easy test to implement.
2a is not possible, as this implies that detailed, user-generated "tagging" on the meaning of text (this is a heading, this is a paragraph, this is an image and the description of the image is as follows) and proper reading order. This requires a complex GUI. It is actually rather difficult to generate a 2a even with appropriate tools. I do not believe I have ever seen one "in the wild" - the only ones I have ever seen are examples for use in test suites.
Hi,
I'm wondering if it would be benefiting to make the OCR text layer compatible to the specified PDF/A-2u standard. Since I couldn't find an issue covering this topic, I would like to ask if somebody already thought about this.
The text was updated successfully, but these errors were encountered: