Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extrememly slow rendering on this pdf #2813

Closed
qlum opened this issue Feb 26, 2013 · 15 comments
Closed

Extrememly slow rendering on this pdf #2813

qlum opened this issue Feb 26, 2013 · 15 comments

Comments

@qlum
Copy link

qlum commented Feb 26, 2013

This particular pdf takes more then 10 minutes to load on pdf.js on firefox 20 while If I where to compare it to the internal pdf reader on chrome that one only takes 5 seconds to load the same file. While firefox is loading the file its also slow and not very responsive, if you where to zoom in after it loads it will be a big pain as well.
https://www.prorail.nl/sites/default/files/spoorkaart_prorail_april_2013.pdf

@waddlesplash
Copy link
Contributor

It only takes about 2.5min for me to draw, but then again, I have a mid-range GPU for hardware acceleration. But yes, this is far too slow, and FF uses 863MB of RAM after finishing.

@timvandermeij
Copy link
Contributor

Apart from it being slow, the grid lines are also not rendered. However, I think that that is discussed in other issues as well because we have seen that before.

@timvandermeij
Copy link
Contributor

[Edited the original issue with a new link]

@timvandermeij
Copy link
Contributor

This has improved a tiny bit with the grouping PR, but I think we can improve this much more with even more grouping as suggested by Yury.

@p01
Copy link
Contributor

p01 commented May 16, 2014

I look at this document yesterday. Starting at 16.5s, it's now down to 13.5s. I should get a PR ready early next week.

@p01
Copy link
Contributor

p01 commented May 20, 2014

Here is the PR #4817

@Hengjie
Copy link
Contributor

Hengjie commented May 26, 2014

@timvandermeij What grouping PR are you referring to?

@Snuffleupagus
Copy link
Collaborator

What grouping PR are you referring to?

#4683 (and the work-in-progress patch in https://github.com/yurydelendik/pdf.js/compare/groupshapes).

@yurydelendik
Copy link
Contributor

Some of the progress blocked by https://bugzilla.mozilla.org/show_bug.cgi?id=1026009 (skia backend 7.5 sec vs core graphics 10.1sec)

@yurydelendik yurydelendik removed this from the 2014 Q2 milestone Jun 19, 2014
@nnethercote
Copy link
Contributor

I have an partial explanation of why the memory usage is high. This PDF contains a StreamsSequenceStream that contains 8 FlateStreams. Each of those sub-streams is 2.9 MiB when decompressed. StreamsSequenceStream concatenates the bytes from each substream into a single buffer, ending up with a 23.2 MiB buffer.

But because of the way DecodeStream uses a doubling growth strategy, each sub-stream has to grow from 1 MiB to 2 MiB to 4 MiB to hold that 2.9 MiB of data. Furthermore, the StreamsSequenceStream starts with a buffer of a mere 512 bytes and has to double it all the way up to 32 MiB in order to hold that 23.2 MiB of data.

I played around with this code and found that if I reset the StreamsSequenceStream buffer for each sub-stream, thus preventing the doubling up to 32 MiB, that peak RSS dropped from ~530 MiB to ~430 MiB. The way I did this was not safe in general and I can't even reproduce it now, but it did help a lot.

This is a general problem with DecodeStreams. We decode each one entirely into memory, and they can be large. But in practice we mostly traverse them linearly and once a section has been read, it's not looked at again. (FlateStream is an exception; it involves a 32 KiB window of data that can be back-referenced.) I looked into discarding bytes once they've been read, but it's difficult. Although we rarely read bytes more than once from a DecodeStream's buffer, it does happen, and it's hard to determine conclusively where this happens, and I failed to come up with a safe, general way of doing this. I'd love to hear if others had ideas about it, though.

@nnethercote
Copy link
Contributor

And the time (on my Linux machine, at least) is mostly due to drawing the text. If I comment out the fillText() call in CanvasGraphics.prototype.showText() the time to render the document drops from 22.1s to 6.5s.

@nnethercote
Copy link
Contributor

The textLayer accounts for a lot of the memory usage, too.

@nnethercote
Copy link
Contributor

http://mta.maryland.gov/sites/default/files/MTA-Regional-Transit_0.pdf is a similar document -- a very detailed public transportation map.

@timvandermeij
Copy link
Contributor

Closing since this PDF file now renders in around six seconds after all performance optimizations since this issue was opened, which brings performance up to par with e.g., Okular and PDFium according to the original report. If there is anything more we can do, a separate issue should be opened with the exact details of what can be improved.

@teaalltr
Copy link

@timvandermeij it loads very slowly on 77.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants