Add support for PDF/A-2u or PDF/A-2a #528

frederictobiasc · 2020-04-10T11:19:43Z

Hi,
I'm wondering if it would be benefiting to make the OCR text layer compatible to the specified PDF/A-2u standard. Since I couldn't find an issue covering this topic, I would like to ask if somebody already thought about this.

jbarlow83 · 2020-04-10T11:45:13Z

PDF/A-2u is possible. Ghostscript does not have the ability to generate files with this feature, but it would be possible to test if the output conforms and promote it. With --force-ocr the output would always conform. Otherwise the output would conform if and only if all fonts have a valid Unicode mapping, which is not an easy test to implement.

2a is not possible, as this implies that detailed, user-generated "tagging" on the meaning of text (this is a heading, this is a paragraph, this is an image and the description of the image is as follows) and proper reading order. This requires a complex GUI. It is actually rather difficult to generate a 2a even with appropriate tools. I do not believe I have ever seen one "in the wild" - the only ones I have ever seen are examples for use in test suites.

frederictobiasc added the enhancement label Apr 10, 2020

jbarlow83 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PDF/A-2u or PDF/A-2a #528

Add support for PDF/A-2u or PDF/A-2a #528

frederictobiasc commented Apr 10, 2020

jbarlow83 commented Apr 10, 2020 •

edited

Loading

Add support for PDF/A-2u or PDF/A-2a #528

Add support for PDF/A-2u or PDF/A-2a #528

Comments

frederictobiasc commented Apr 10, 2020

jbarlow83 commented Apr 10, 2020 • edited Loading

jbarlow83 commented Apr 10, 2020 •

edited

Loading