New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance degradation #401
Comments
Please see I need to add performance testing to my qpdf release and test process. Would you be willing to share any of what you've built already? I would like to see if I can reproduce your results. |
Hi, use it for example like Once differents version are available in ~/installManuel/bin directory (with there relative lib)
I launch the test with the second script : usit simply lique this :
(on my machine it write on ramdisk to avoid disk lag) My Pdf is a customer privarte pdf I can't provide it but I performs different test with more or less big pdf, the result is always the same behaviour. with some pdf, relative difference between qpdf revision are worst than above result. (about 350% longer for 9.1.1 vs 7.1.1) for example : this pdf hope this this is not to much a mess and can help you :-) . I will check --preserve-unreferenced-resources and tel you after |
@ccdric Any news on whether |
I am making good progress on tracking down the root causes of the degradation. |
@ccdric My work branch contains fixes that have improved the performance of page splitting to a level that's about 30% worse than 7.1.1 but beats 8.2.1 by a significant margin and is about a 70% improvement over 9.1.1 (meaning 9.1.1 is 70% worse than my current code, which is 30% worse than 7.1.1). I'm not sure I will be able to get it much below this. Two of the commits that slowed the performance relative to 7.1.1 are important bug fixes. In one case, qpdf was generating invalid output in the case of a file that contained an indirect reference to non-existent object. The PDF spec explicitly allows this, but qpdf could in some cases overwrite such an object. These files are rare, but unfortunately I can't ignore this case, and detecting this case incurs about an 8% overhead. In the second case, qpdf was allocating too much memory for arrays that are "sparse", which is a pattern that sometimes shows up. I made a new implementation for the qpdf array that handles spares arrays better, but unfortunately it's a little worse than using a vector. However, I tweaked it a little to get it down a bit from 9.1.1, so that's still only adding about a 5% overhead. The other thing I haven't reverted is a change I made to significantly improve the diagnostics of invalid objects. This change adds about a 3% overhead. I tried adding some code that would allow users to turn this off, but once the code is refactored to make this selectable at runtime, turning it off only saves about 1.5%, and the value of the diagnostic messages is very great. My ability to help people with problems in their PDFs would be greatly reduced if I removed this. So, when I release qpdf 10.0.0, you can count on its performing better than any release since 7.1.1. It will not quite reach the 7.1.1 level of performance, but the additional slowness comes along with more robustness and much better diagnostics. Hopefully it's a tolerable trade-off, especially since this is going to be vastly superior to 9.1.1. I am also going to bake some performance benchmarks into my release process so that I will not accidentally break performance as I have done in the past few years. Hopefully, over time, there will be opportunities for further optimization. |
Note that this performance is obtained with |
Thanks Jay for your work ! Did you know roughly when you plane to release the 10.0.0 ? |
I am hoping to get 10.0.0 out today, but it should be no later than early next week. I have a handful of other issues to look at before I get it out the door. I'm going to go ahead and close this since I have probably squeezed about as much out as I can without major, high-risk work. Thanks again for sharing your findings and technique and opening my eyes to the severity of this issue. Before I release 10, I will have a simple performance benchmarking procedure in my release process. I don't have a way to run it in CI at this time, but I will do it manually just like I do for binary compatibility testing. Also I will have a very easy way to test whether a specific commit had an impact on performance. So this should be the end of surprise performance degradation. Feel free to comment and/or reopen if necessary. |
@ccdric In qpdf 10, qpdf will analyze files when run with |
hello,
I have some performances issues with qpdf. some process are very long several hours.
I use qpdf (in fact libqpdf directly) for spliting big bpf in unity doc, (some time adding also underlay)
I did a litle test with qpdf tool with different revision.
the choice for simplicity reason was to split in pages with qpdf --split-pages under linux.
for that, i used to compile different version and install shared lib, qpdf, and wrapper in separate files to use those different version at the same time.
here is the result
the methodology was just run a script that does the measurement (linux time command, user time only). it perform 20 measurement for each revision alternately and push the result in a csv. then i calculated the average time for each version.
(let my computer alone and remove network for the test)
wee can see performance decrease wile increasing qpdf version.
is there a way to increase performance on next version ? or a way with actual version to code a more efficient split-pdf and under-overlay ?
(Somme of my jobs can run several hours it's a lot too much for us)
The text was updated successfully, but these errors were encountered: