Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poorly rendered image quality #7041

Closed
nicholassinggih opened this issue Feb 29, 2016 · 21 comments · Fixed by #11601
Closed

Poorly rendered image quality #7041

nicholassinggih opened this issue Feb 29, 2016 · 21 comments · Fixed by #11601

Comments

@nicholassinggih
Copy link

Hi all,

I have tried using the latest 1.3.91 version for this. All I did was modify the viewer.js to point to my own PDF. This is the result using the fresh out of the box viewer.html :
image

The left one is the actual PDF, and the right one is rendered by viewer.html. Same thing in IE and Chrome. I did not try it in FF because my users use either IE or Chrome.

I've tried searching everywhere I could and have been unsuccessful in finding a solution to this. Here's the PDF file from the picture.
.0002.pdf

@Rob--W
Copy link
Member

Rob--W commented Feb 29, 2016

What exactly is wrong with the PDF from viewer.html?
(the zoom levels are different by the way, the one at the left uses 100% zoom, the one on the right 80%).

@nicholassinggih
Copy link
Author

Hi Rob,

Thanks for your reply.

The problem is the text quality (I'm not sure if it's strictly font rendering or not) makes it a bit harder to read the words on it. Check out the word Billing at number 12, for example. The one on the right looks like it says "Bllllng". My users need to be able to see at least 8 pages at the same time, as they go through thousands every day. This means they're going to view the pdfs with an even smaller zoom level.

The one on the left is not at 100%. It's actually at a smaller percentage than 80%. Just look at the sizes. The one on the left showed number 1-28 on the same screen, while the right one only showed 25.

@Rob--W
Copy link
Member

Rob--W commented Feb 29, 2016

Could you paste a screenshot with exactly the same zoom sizes?

The "text rendering" is not a text rendering issue. What you see is an image. The question is whether the quality deteriotated after scaling (and of course, whether it can be improved).

@nicholassinggih
Copy link
Author

Here's from the viewer at 100%. Notice that the first 'i' in the word Billing still looks like an 'l'. And there are white pixels every here and there on all the letters.
image

And this next one is at 100% in Adobe Reader. The text looks smoother & cleaner.
image

Here's both of them at 50%:
image

Here's at 60% in the viewer.html and still 50% in Adobe Reader:
image

Even at 70% in the viewer, it's still easier to read from Adobe at 50%: Notice that you can clearly discern on Adobe that the page number is 2. While on the viewer, it looks more like a '3'. Letters 'e' looks a bit like 'o', and 'i' looks like 'l'. The header looks thinner and distorted, too. And the footer is really hard to read.

image

@Rob--W
Copy link
Member

Rob--W commented Feb 29, 2016

Thanks for your extra info. I don't know what goes wrong, but maybe some of the other PDF.js devs know.

@fkaelberer
Copy link
Contributor

I think the loss of image quality happens in canvas.js#L1895, where the Jpeg image of dimensions 2508 x 3525 is drawn onto a canvas that is much smaller than the image. Images look poor when they are scaled down by a large factor at once, which is fixed in canvas.js#L2081 (from #3312). The same fix was applied to thumbnails in #4924. So I guess it should be fixed here for Jpeg images as well.

@nicholassinggih
Copy link
Author

The PDFs I'm rendering mostly contain text as majority of the content.

The same PDFs are displayed perfectly in an iframe on IE + Adobe plugin, or IE + Foxit plugin. This is despite the iframe's dimensions were set to 360 x 400 (much smaller than the actual size). While on Chrome, using both the default viewer and PdfViewer extension, they look really distorted.

Adobe and Foxit use their own engine to render the PDF, while most Chrome extensions translate the PDF contents into html elements. Problem is, Adobe & Foxit draw their whole interface on a different layer on top of the html, disabling my application from displaying context menu and other things over the PDFs.

I'm still hoping I can use pdf.js for my app. But right now, I'm forced to try a different solution.

@fkaelberer
Copy link
Contributor

The PDFs I'm rendering mostly contain text as majority of the content.

@nicholassinggih The pdf you provided contains a (scanned?) jpeg image in the background. The text that you can select in the document is an invisible text overlay.
The images are downscaled by the browser (not by pdf.js's javascript code), so image quality may vary with the browser or OS. Firefox and Chrome (maybe others too) do a terrible job of downscaling the big images to small canvases, thus causing the bad image quality.

I pushed a commit to fkaelberer@913d3bc, which downscales the jpegs in multiple steps, in each step with a factor of <= 2x. As a result, the readability is increased a lot, see images below.

I did not open a pull request, though, because the code is unfinished and not tested much. Anyone, please feel free take and improve the code.

  • better readabilty / image quality
  • figure out if / how it works if image does not fill the whole canvas
  • Fix blurry issue1350
  • Bonus: deduplicate the downscaling code
  • Check out if blurryness of some images can be reduced (red font in issue2642 looks better without this patch)

At 50%:
before-after50

At 70%:
before-after70

@nicholassinggih
Copy link
Author

Hi Felix,

Thank you very much for this. I've briefly tested your new code and it does give a much better result, just as shown in your screenshots. Bravo to you!!

Forgive me if my next questions sound silly, but I just want to clarify if I understand this correctly. About the text vs jpeg image thing, are you saying that the PDF that I provided actually only contain jpeg of a scanned paper with text on it? Therefore, it doesn't actually contain text data and it was rendered as an image which in turn was scaled down by the browser?

Should I close this thread, or let either Yuri or Tim to close this?

@Rob--W
Copy link
Member

Rob--W commented Mar 6, 2016

About the text vs jpeg image thing, are you saying that the PDF that I provided actually only contain jpeg of a scanned paper with text on it?

Yes. That's why I slapped the jpeg label on this ticket.

Therefore, it doesn't actually contain text data and it was rendered as an image which in turn was scaled down by the browser?

It does contain text data (with a transparent color, probably for text selection), but what you see (and what is printed) is the (scaled) image.

Should I close this thread, or let either Yuri or Tim to close this?

The (legitimate) issue hasn't been resolved yet, so I'd keep the issue open.

@nicholassinggih
Copy link
Author

It does contain text data (with a transparent color, probably for text selection), but what you see (and what is printed) is the (scaled) image.

I see. They ran OCR on the PDF, thus the transparent/invisible text data. But, if they hadn't ran OCR on it, I'm guessing it would just contain a jpeg image with no text.

Thank you both for your help. Really appreciate it.

@yurydelendik
Copy link
Contributor

That's a first time I see scanned data was packaged as JPEG (vs JBIG2 or CCITT). We probably didn't see this issue early since it's probably a rare case. As mentioned in #7041 (comment) above, we already do it for most of the images except JPEG (e.g. https://github.com/mozilla/pdf.js/blob/master/src/display/canvas.js#L2081) Trade-off is more memory and CPU is used. If we will decide to move decoding of the JPEG to the worker side then we don't have to worry about code duplication.

@fkaelberer
Copy link
Contributor

That's a first time I see scanned data was packaged as JPEG (vs JBIG2 or CCITT). We probably didn't see this issue early since it's probably a rare case.

It does not only affect text, but also images
From #2739:
nvidia_resized nvidia_resampled

http://www.ikea.com/at/de/assembly_instructions/rakke-kleiderschrank__AA-808506-1.pdf (notice the thumbnails)
ikea_resize ikea_resample

@nicholassinggih
Copy link
Author

Hi Felix or anyone,

Is there a chance that you could implement a better downsampling algorithm in the paintJpegXObject methods? Some of my users are still complaining about the quality of the images. Even with the gradual scaling that Felix added, scanned images (JPEG) of handwritten documents are sometimes too faded or blurry too read.

Volume 1 of 4 43_Redacted.pdf
Volume 1 of 4 6_Redacted.pdf

This is all I know about the user's machine:
Windows 7 SP1
Browser: Google Chrome 54.0.2840.71 m
I used Felix's version fkaelberer@913d3bc
Graphics card: GeForce GT610
Monitor's resolution 1680 x 1050

I am currently trying to implement Lanczos or sinc downscaling algorithm in the paintJpegXObject. But I don't know how bad it will affect the performance, and not sure if it is going to fix the problem.

Really appreciate your help.

@yurydelendik
Copy link
Contributor

But I don't know how bad it will affect the performance, and not sure if it is going to fix the problem.

@nicholassinggih if you will find out algorithm that we can use for downsampling and it works, we will provide pointers on how to improve its performance (e.g via asm.js).

@nicholassinggih
Copy link
Author

@yurydelendik I've tried implementing bicubic interpolation instead to scale down the jpegs. The result is not as good as @fkaelberer 's solution, not to mention that the performance is also slower.

This problem proves to be the most difficult challenge in the process of developing the current application. I'm currently looking into the possibility of just creating a modified copy of the PDF file with all the pages and images resized to the desired dimensions. As my users don't actually dynamically zoom in and out when viewing the pages, this could work.

@natarajnattu
Copy link

var options = options || {
scale: 1
};

increase the scale you can see improved clarity

@mozilla mozilla deleted a comment Dec 31, 2017
@mozilla mozilla deleted a comment Dec 31, 2017
@wotzhs
Copy link

wotzhs commented Feb 13, 2018

hey guys, not sure if this is applicable in this case, i have been getting blurry text as well from the rendered canvas, and i have been wondering if this is rather html5 canvas issue than pdf.js, till i stumbled on:
https://www.html5rocks.com/en/tutorials/canvas/hidpi/

before accounting for window.devicePixelRatio. the pdf looks like this:
23
after accounting for window.devicePixelRatio, the pdf looks like this:
24

@yurydelendik
Copy link
Contributor

@wotzhs yes, pdf.js demo viewer relies on devicePixelRatio to increase pixel density per pixel via CSS.

@532910
Copy link

532910 commented Apr 12, 2018

Is it a dup for #2750?

@nowherenearithaca
Copy link

@yurydelendik I am confused. Are you saying that the existing pdfjs should handle the devicePixelRatio as that article suggested? If so, you don't happen to know when it started doing that, do you? I am seeing blurriness with wix where they seem to be using pdfjs-dist version 2.305 from Feb 1, 2018, and wondering if they need to either upgrade or make use of the devicePixelRatio stuff in their use of the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants