Profile-Guided Optimization (PGO) benchmark report #3456
Labels
enhancement
New feature or request
speedup
Performance bugs, speed improvements
unrelated to 1.0
Things that need not be done before the 1.0 version milestone
Hi!
Recently I checked optimizations like Profile-Guided Optimization (PGO) and Post-Link Optimizations (PLO) improvements on multiple projects. The results are available here. According to the tests, all these optimizations can help with achieving better performance in many cases for many applications. I think trying to enable them for
libjxl
can be a good idea. I read an article on Phoronix about a new JPEG encoding/decoding library - Jpegli - and decided to optimize it with PGO.I already did some benchmarks and want to share my results here. Hopefully, they will be helpful.
Test environment
main
branch on commit680d0e38683b6485e39807772c579252fe91f3a4
Benchmark
I didn't find a good benchmark suite to evaluate performance gains on a large dataset. Instead, I use these image samples. In all cases, an image for 30 Mib is used. In all cases, the library is configured with
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DENABLE_JPEGLI_DEFAULT=ON ..
. For the PGO training phase, additional flag-fprofile-generate
is passed to the compiler, for the PGO optimization phase --fprofile-use
flag. The PGO training phase is done with the following command:cjpegli Sample-png-image-30mb.png converted.jpeg -q 90
, wherecjpegli
- Jpegli's encoder,Sample-png-image-30mb.png
- an input image.All tests are done on the same machine, done multiple times, with the same background "noise" (as much as I can guarantee of course) - the results are reproducible at least on my machine.
taskset -c 0
is used for better stability across runs (to reduce OS scheduler influence).Results
Here are the results:
Also, I tested the case when training and actual workloads differ. Here are the PGO optimized compared to a regular release benchmark, when another sample image is used (not the same as during the training phase): https://gist.github.com/zamazan4ik/4750fa6424a53e83638f4ab422f901a9
At least to the simple benchmarks above, PGO allows achieving better performance.
Further steps
I can suggest the following action points:
Here are some examples of how PGO optimization is integrated into other projects:
configure
scriptI have some examples of how PGO information looks in the documentation:
Please, do not treat the issue like a bug or smth like that. It's just a benchmark report with possible improvement idea for the project.
The text was updated successfully, but these errors were encountered: