Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not well-formed ALTO XML #11

Closed
kermitt2 opened this issue Jun 6, 2018 · 3 comments
Closed

Not well-formed ALTO XML #11

kermitt2 opened this issue Jun 6, 2018 · 3 comments
Labels
bug Something isn't working

Comments

@kermitt2
Copy link
Owner

kermitt2 commented Jun 6, 2018

Here are some examples of not well-formed generated XML - invalid characters in attribute value.
Uploading cphc0012-0609.pdf…
Uploading hel0015-0279.pdf…
Uploading ljii31-131.pdf…

@kermitt2 kermitt2 added the bug Something isn't working label Jun 7, 2018
@Aazhar Aazhar closed this as completed in e2b9f7d Jun 8, 2018
@kermitt2 kermitt2 reopened this Jun 8, 2018
@kermitt2
Copy link
Owner Author

kermitt2 commented Jun 8, 2018

some examples still producing non well-formed XML
bmb0036-0262.pdf -> 0x14 in attribute value
em0051-0330.pdf -> 0x18
10989_2010_Article_9230.pdf -> 0x18

problems with control unicodes 0x12-0x18

@kermitt2
Copy link
Owner Author

kermitt2 commented Jun 8, 2018

@Aazhar
Copy link
Collaborator

Aazhar commented Jun 11, 2018

Wrong unicode mapping found into embedded fonts, those characters are too replaced by a placeholder 7ffb8a0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants