New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not sharp #160
Comments
Uhm, I recognized with dpi 90 and saved with dpi 90, and the output (ZIP, grayscale) looks good to me: |
I made a further try also with dpi 90, ZIP and grey. (Windows 7) It's really strange. Edit: But my grey- version is 3 time as large as your version (KB) (?) |
Can you make a screencast of the steps you perform to produce the PDF? |
screencast deleted |
In the advanced image controls (button left of the OCR mode button), select 90 as dpi. |
Ah OK. I have never payed attention to this setting. Thank you, now it works. My setting there was on 300dpi. Question what is the differece between the DPI in the export -dialog window and the dpi in the image controls. Must the DPI in the image controls always be the same as the DPI in the input file are? |
The DPI setting in the advanced image controls toolbar is the DPI at which the input image is sampled to produced the image on which recognition is performed. The purpose of the first DPI control is to be able to artificially upscale the image to improve recognition results. Default is 300 which is recommended for good OCR results. Depending on the ratio between original image dpi and sampling dpi, interpolation may produce a blurry sampled image. In theory if you choose an integer multiple of the original dpi (say 180 or 270) you should get a smoother image. |
Now, I tried exactly 180. But in this case this seems not to give good resuts (?) |
Mh yeah looks like the downsampling in the end adds too much blur for the conversion to monochrome to look decent. If you choose grayscale, it looks decentish. |
Thank you.
-> This is not true. I tried this: When you now compare "Output, grey loseless 180,90.pdf" with the |
I said decent-ish, not decent ;) |
Please try: https://smani.fedorapeople.org/tmp/gImageReader_3.2.1_qt5_i686.exe This version samples the input image directly instead of the recognition image. This means that as far as image quality in the output is concerned it does not matter what dpi you choose in the advanced image controls. If you choose the same dpi in the output as the original source, you should get pretty much the same image again. I'd appreciate if you could test various combinations (PDF with and without invisible text overlay) and various resolutions to check whether it behaves as you expect. Thanks! |
Wow, thank you for making this test version. |
I like this advanced image controls. Yes, of course the text recognition is indeed sometimes much better with a higher dpi here. Up to now during the testing the following issues attracted my attention: 1.A) Inputfile 90dpi. Advanced image setting 90dpi. Saved with 90dpi and as OCR (no picture): My expectation was, that now the size of the text in B would be the same as in A. Inputfile: Beispiel.pdf The settings for both A) and B): 2.Inputfile 300dpi (Tif). Advanced image setting 300dpi. Start text recognition. Worked as expected. 3.Inputfile 300dpi (Tif). Width: 2835pix; height: 2209pix. Advanced image setting 300dpi. Start text recognition. Saving with this settings: My expectation was, that "width" and "height" do not change. But they did: Inputfile: Input.zip |
|
…tead of rescaling image used for recognition (#160)
Addendum:
Closing ticket since original issue of blurry issues has been addressed as far as possible in da93d34. |
Hello,
This is my input- File:
Beispiel.PDF
According to http://pdf-analyser.edpsciences.org/ it has a 90dpi resolution.
1)
For the OCR I made the following settings:
I name the output file "output 90dpi, b-w:
Output 90dpi, b-w.pdf
2)
For the OCR I made the following settings:
I name the output file "output 90dpi, colour, loseless:
Output 90dpi, colour, loseless.pdf
So my question is:
Why is neither "Output 90dpi, b-w.pdf" nor "Output 90dpi, colour, loseless.pdf" as sharp as the input file "Beispiel.pdf" ?
Thank you.
The text was updated successfully, but these errors were encountered: