Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PDF/A-2u or PDF/A-2a #528

Closed
frederictobiasc opened this issue Apr 10, 2020 · 1 comment
Closed

Add support for PDF/A-2u or PDF/A-2a #528

frederictobiasc opened this issue Apr 10, 2020 · 1 comment

Comments

@frederictobiasc
Copy link

Hi,
I'm wondering if it would be benefiting to make the OCR text layer compatible to the specified PDF/A-2u standard. Since I couldn't find an issue covering this topic, I would like to ask if somebody already thought about this.

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Apr 10, 2020

PDF/A-2u is possible. Ghostscript does not have the ability to generate files with this feature, but it would be possible to test if the output conforms and promote it. With --force-ocr the output would always conform. Otherwise the output would conform if and only if all fonts have a valid Unicode mapping, which is not an easy test to implement.

2a is not possible, as this implies that detailed, user-generated "tagging" on the meaning of text (this is a heading, this is a paragraph, this is an image and the description of the image is as follows) and proper reading order. This requires a complex GUI. It is actually rather difficult to generate a 2a even with appropriate tools. I do not believe I have ever seen one "in the wild" - the only ones I have ever seen are examples for use in test suites.

@jbarlow83 jbarlow83 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants