New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pdf-renderer tess4 looses DPI Info with any image-preprocessing. #147
Comments
Looks like unpaper was indeed responsible for discarding the DPI. It always has, but the loss of this information only matters to the tess4 renderer. Fixed in 4.5.2. |
That was a quick one! Thanks again for your effort and that great piece of software you cobbled together! |
Somehow the automated build for the docker image ( jbarlow83/ocrmypdf-tess4) didn't kick in. Can you please build it manually? Thanks a lot! |
Alright, I gave it a kick in the pants and it looks like it's ready again.
|
Further investigations! ;) Most of the problems solved, though, when using the --clean parameter I still get the error: Surprisingly (probably) using the --clean-final parameter everything works as expected.
|
Could you provide a full command line that is still causing trouble? Also please check that |
Oh, you are right. Docker pulled an update after you kicked it but it still shows 4.5.1.
|
I double-checked that the latest image is 4.5.2. Perhaps you pulled the
latest image but didn't update the "ocrmypdf" tag?
The digest of the 4.5.2 image is
$ docker images --digests jbarlow83/ocrmypdf-tess4
jbarlow83/ocrmypdf-tess4 latest
sha256:fdc6203751c9e09691c1d003b72f4358bf84189862247a5449c10197fdc6f94a
…On Tue, 28 Mar 2017 at 23:53 17Halbe ***@***.***> wrote:
Oh, you are right. Docker pulled an update after you kicked it but it
still shows 4.5.1.
So either the versioning number somehow didn't made it into the docker
image, or docker didn't build the new image.
docker run --rm -v /myDir/:/home/docker ocrmypdf --version 4.5.1
Though docker pulled something new, because some errors went away. (Same
PDF, same commandline)
The exact commandline is:
docker run --rm -v
/Homepool/Documents/Home-Folder/alex/No-Images/:/home/docker ocrmypdf -j 1
--tesseract-timeout 360 -l deu+eng -c --pdf-renderer tess4 -f input.pdf
output.pdf
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#147 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvcM8deiRjvt9rFxrUdWBiO6m6WuKfIks5rqf_ugaJpZM4MoidN>
.
|
Yep, that was the problem. I'm fairly new to docker, so excuse my lack of knowledge! Can be closed! |
Hi there,
A (with a Fujitsu ScanSnap) scanned 600 dpi not ocr'd pdf wich is preprocessed with any of the preprocessing parameters (I tried -c, -r, -d, --oversample DPI and --remove-background) will get a Tesseract error of:
INFO - 1: [tesseract] Warning. Invalid resolution 0 dpi. Using 70 instead.
This is happening on the jbarlow83/ocrmypdf-tess4 docker image.
exact commandline:
ocrmypdf -l deu -c --pdf-renderer tess4 input.pdf output.pdf
See also: [Clarification request/bug?] "Warning. Invalid resolution 0 dpi. Using 70 instead." #649
and also a Tesseract Forums entry: Invalid resolution 0 dpi. Using 70 instead.
So it seems like the dpi Information is lost during the preprocessing (unpaper(?)).
Anyone else?
regards Alex
The text was updated successfully, but these errors were encountered: