Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: sandwich renders differently than hocr #1191

Closed
femifrak opened this issue Nov 15, 2023 · 2 comments
Closed

[Bug]: sandwich renders differently than hocr #1191

femifrak opened this issue Nov 15, 2023 · 2 comments
Assignees
Labels

Comments

@femifrak
Copy link

Describe the bug

I like the hocr renderer very much as it is the only way by which you have influence on the outcome.
However, since a while it has rendered some letters wrongly, especially the "s" in old texts.

Steps to reproduce

Running these two commands:

1.) ocrmypdf -l frk --pdf-renderer hocr inmidst.pdf inmidst_hocr.pdf
2.) ocrmypdf -l frk inmidst.pdf inmidst_sandwich.pdf

leads to these two underlying texts:
1.) That's a tent with „3“ inmidnt a word.
2.) That's a test with „3“ inmidst a word.

There is no difference when using an older version of hocrtransform.py (Don't know how old it must be.)
Could you pls. have a look at this? Thanks a lot!

Files

inmidst.pdf

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

15.4.2

Relevant log output

No response

@jbarlow83
Copy link
Collaborator

See #1194 - your issue is fixed by the new renderer, but curious if you have other input.

@femifrak
Copy link
Author

Works great :) Thanks a lot for all your work!! Tested it with some other pdfs as well. 👍

jbarlow83 added a commit that referenced this issue Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants