Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

Xml Alto output #138

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Xml Alto output #138

wants to merge 5 commits into from

Conversation

Daniel-KM
Copy link
Contributor

Hi,

I added an xml output for Alto (see https://www.loc.gov/standards/alto), the international standard to describe layouts and content of ocerized files. Some features may be added (like margins and styles), but the outputs are conform to the standard.

To test:

pdf2txt.py -t alto -o samples/Alto.alto.xml samples/Alto.pdf

or with a precision of 300 dpi:

pdf2txt.py -t alto -Z 300 -o samples/Alto.alto.xml samples/Alto.pdf

or in 10th of millimeters rather than pixels:

pdf2txt.py -t alto -U mm10 -o samples/Alto.alto.xml samples/Alto.pdf

Sincerely,

Daniel Berthereau
Infodoc & Knowledge management

@Gruoningensis
Copy link

Thank you for your work on this piece of code. I've noticed that the resulting XML isn't always valid. It seems that XML entities aren't escaped properly. Could you please fix that in your code? I'd like to do it myself, but Python isn't really my cup of tea...

@ghost
Copy link

ghost commented Mar 21, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants