-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance on NASA Budget #4339
Comments
Update on our analysis: pdf.js seemed to process the fonts in the document many times, basically loading the font again for every page. This suggested a cache issue. For testing, we added a cache after the font was translated and that made the document 10x faster. |
@bthorben Nice! Could you make a pull request for that if it solves the PDF.js issue? |
Since there has been a lot of focus on reducing memory consumption of PDF.js lately, it would also be interesting to know if, and how, this kind of caching impacts the memory consumption. |
Our "solution" is really just a quick hack here, something we added to test our theories about how PDF.js works. The way we cached is actually quite inefficient and doing it right would probably improve performance on this document another 2 - 4 times. We will spend more time to find an elegant solution. |
@Snuffleupagus Can you give us some data that shows the problems with memory consumption? Regarding this issue, not generating the fonts many times but caching them reduces memory consumption when viewing this document considerably |
Sorry, I don't think I expressed myself clearly enough! |
@Snuffleupagus, ok, I see. It would be much nicer if we could have actual benchmarking |
While working on issue mozilla#4339 it was confusing that the code to translate a font is not found in a single place. This commit extracts the code and puts in a class called FontTranslator
We analysed the issue further. We wrote a small tool (available here) to gain insights into the document and its object graph. This is our conclusion on which we will base a solution: The document makes use of at least one Type 0 font. Type 0 fonts are basically composed of
In this particular case there are many Type0 fonts (shown at [1]) which use the same CIDFont, as shown by this graph (extracted using an uncompressed version of the NASA budget using our tool): The node on the left (177065 T6) is the font program of the CID Font, above that you see its FontDescriptor and the CIDFont dictionary. We shortened the graph, but on the right you see three Type 0 fonts that use this font. The nodes 28, 46 and 10 are the CMap dictionaries and they reference an array as their DescendantFonts that has our CID Font as it’s sole reference. This situation shouldn’t be that special (I guess this makes sense for a linearised document) but here it gets interesting: The CMaps are all the same, which means that the Type 0 fonts all actually look the same. Since now PDF.js stores the translated Font object at the Type0 font node (more precisely: its parsed dictionary, compare [2]), for each font there will be another one created. This is what makes the NASA-Budget so slow in PDF.js. [1]
[2] |
Our solution is relatively simple: We create a cache at the font-descriptor of the CIDFont that is indexed by encoding. This means if the encoding is the same the expensive font translation will be done only once. |
This should fix mozilla#4339. We attached an explanation of the idea at the issue.
Different fonts can point to the same font descriptor (see mozilla#4339 for details). With this commit such fonts are treated as aliases if they have also the same encoding. The according info is stored on the font descriptor. This change must also ensure that aliases use always the same font name because translated fonts can get cleared depending on the CLEANUP_TIMEOUT setting.
While working on issue mozilla#4339 it was confusing that the code to translate a font is not found in a single place. This commit extracts the code and puts in a class called FontTranslator
…s aliases Different fonts can point to the same font descriptor (see mozilla#4339 for details). With this commit such fonts are treated as aliases if they have also the same encoding and the same toUnicode map. The according info is stored on the font descriptor. This change must also ensure that aliases use always the same font name because translated fonts can get cleared depending on the CLEANUP_TIMEOUT setting.
Performance is extremely poor when viewing the NASA 2014 Budget request available at http://www.nasa.gov/pdf/750614main_NASA_FY_2014_Budget_Estimates-508.pdf
The text was updated successfully, but these errors were encountered: