Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance: Chinese PDF #4580

Closed
bthorben opened this issue Apr 9, 2014 · 10 comments
Closed

Slow performance: Chinese PDF #4580

bthorben opened this issue Apr 9, 2014 · 10 comments

Comments

@bthorben
Copy link
Contributor

bthorben commented Apr 9, 2014

@timvandermeij
Copy link
Contributor

There are also many console warnings:

"Warning: Could not find a preferred cmap table." pdf.worker.js:200
"Warning: Unsupported feature "font"" pdf.worker.js:200
"Warning: Unsupported feature "font"" pdf.js:200
"Warning: Error during font loading: cmapMappings is undefined" pdf.js:200

@fkaelberer
Copy link
Contributor

FYI: ~50% of the time is spent for the text layer, and the rest is bottlenecked by
for (charcode in cMap) in src/core/fonts.js#L3847. Changing this to a regular for loop reduces overall time by ~40% (when textLayer=off). But since I don't know how cMap is structured, I'll leave that to someone else.

EDIT: fixed source code link.

@bthorben
Copy link
Contributor Author

@brendandahl: Could you have a look?

@timvandermeij
Copy link
Contributor

Brendan is hiking at the moment, so perhaps @yurydelendik can take a look at this?

@bthorben
Copy link
Contributor Author

@p01: Maybe you can have a look? It's probably not that complicated

@p01
Copy link
Contributor

p01 commented May 13, 2014

Oy! in is really bad for performance. I'll look into that once I'm done with #4764

@timvandermeij
Copy link
Contributor

in still needs to be fixed, but I have closed my PR because the solution became too complicated. Feel free to look into this.

@nnethercote
Copy link
Contributor

The link in the first comment is dead. Does anybody have a copy of the document?

@yurydelendik
Copy link
Contributor

nnethercote added a commit to nnethercote/pdf.js that referenced this issue Jul 30, 2014
This change avoids the element stringification caused by for..in for the
vast majority of CMaps.

When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~650
to ~600 MiB, and improves overall speed by ~20%, from 902 ms to 713 ms.  Other
CMap-heavy documents will also see improvements.
nnethercote added a commit to nnethercote/pdf.js that referenced this issue Jul 30, 2014
This change avoids the element stringification caused by for..in for the
vast majority of CMaps.

When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~650
to ~600 MiB, and improves overall speed by ~20%, from 902 ms to 713 ms.  Other
CMap-heavy documents will also see improvements.
nnethercote added a commit to nnethercote/pdf.js that referenced this issue Aug 1, 2014
cid chars are 16-bit unsigned integers. Currently we convert them to
single-char strings when inserting them into the CMap, and then convert
them back to integers when extracting them from the CMap. This patch
changes CMap so that cid chars stay in integer format throughout, saving
both time and space.

When loading the PDF from issue mozilla#4580, this change reduces peak RSS from
~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%,
going from 724 ms to 533 ms.
nnethercote added a commit to nnethercote/pdf.js that referenced this issue Aug 1, 2014
cid chars are 16-bit unsigned integers. Currently we convert them to
single-char strings when inserting them into the CMap, and then convert
them back to integers when extracting them from the CMap. This patch
changes CMap so that cid chars stay in integer format throughout, saving
both time and space.

When loading the PDF from issue mozilla#4580, this change reduces peak RSS from
~600 to ~370 MiB. It also improves overall speed on that PDF by ~26%,
going from 724 ms to 533 ms.
fkaelberer pushed a commit to fkaelberer/pdf.js that referenced this issue Aug 1, 2014
This change avoids the element stringification caused by for..in for the
vast majority of CMaps.

When loading the PDF from issue mozilla#4580, this change reduces peak RSS from ~650
to ~600 MiB, and improves overall speed by ~20%, from 902 ms to 713 ms.  Other
CMap-heavy documents will also see improvements.
nnethercote added a commit to nnethercote/pdf.js that referenced this issue Aug 5, 2014
IdentityCMap uses an array to represent a 16-bit unsigned identity
function. This is very space-inefficient, and some files cause multiple
IdentityCMaps to be instantiated (e.g. the one from mozilla#4580 has 74).

This patch make the representation implicit.

When loading the PDF from issue mozilla#4580, this change reduces peak RSS from
~370 to ~280 MiB. It also improves overall speed on that PDF by ~30%,
going from 522 ms to 366 ms.
@timvandermeij
Copy link
Contributor

Closing as fixed by #6590.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants