Try to enable PGO #25500

zamazan4ik · 2023-02-12T13:58:38Z

Title: Enable PGO for Envoy

Description:
Profile-Guided Optimization (PGO) allows gaining additional performance for the software since it uses runtime profile information to perform more advanced optimization during the compilation process. I guess it could be useful for Envoy.

Possible steps:

Prepare Envoy build with PGO and bench it compared to the non-PGO Envoy. I expect it will help to boost Envoy performance.
At least consider adding PGO to CI. Yes, it has a LOT of caveats like a huge bump in a build time, good profile preparation, profile stability between releases, and much more other stuff but in my opinion, it could be worth it.
For some users, who want "cheaply" try to boost their Envoy performance. Maybe will be a good idea to leave a note somewhere in the Envoy documentation about this "advanced" option?

Possible future steps for improving:

Try to play with BOLT. BOLT also could help with gaining more performance even from LTO + PGO build (but it's not guaranteed). This way has drawbacks like BOLT on some platforms is too unstable; BOLT could not support some architectures, etc. But it definitely a good tool to think about :)

[optional Relevant Links:]

Successful PGO case (there are a lot of others): PGO applicability to Vector vectordotdev/vector#15631
How to use PGO with Clang: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

adisuissa · 2023-02-13T16:49:33Z

Thanks for sharing the idea, overall PGO sounds good to me.
There's been some previous discussion about LTO and some bumps that were encountered (see #4159 for example).

zamazan4ik · 2023-02-13T17:04:27Z

Yep, I've seen the discussion about LTO. I just didn't want to mix the discussion about LTO and PGO into the same issue. If you think that the issue about PGO should be discussed as yet another default compiler flag - feel free to mention it in #4159. However, I recommend to track it separately since PGO requires a little bit more work around it.

adisuissa · 2023-02-13T20:57:17Z

I've added the comment to link a related issue/PR so that when someone attempts to build with PGO this may help them.

zamazan4ik · 2023-09-18T23:10:21Z

I just finished some Profile-Guided Optimization (PGO) benchmarks for Envoy and want to share my results.

Test environment

Fedora 38
Linux kernel 6.4.15
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Clang 14 (the compiler provided by Envoy's tooling with docker-clang config)
Envoy version: one of the latest commit (16da1981d89aecdbfcaa3ffedc36879f32d37cdb) from the main branch
Other details: Turbo boost is disabled (I also tested with it but without Turbo the results are more stable (obviously))

Benchmark setup

An idea of how to do the benchmark I got from the Rathole benchmark guide for HTTP load. So I implemented the same benchmark for Envoy: Benchmark tool -> Envoy -> Nginx.

As a benchmark tool, I use Nighthawk with this command line: taskset -c 4-5 ./nh/nighthawk_client --rps 10000 --duration 300 --connections 4 --concurrency auto --prefetch-connections -v info http://127.0.0.1:8080.

Envoy was tested with this command line: taskset -c 0 ./envoy_static_release_master --concurrency 1 --config-path envoy-demo.yaml. envoy-demo.yaml content is here: https://pastebin.com/QfZi19Nu . I use --concurrency 1 since I want to load Envoy to 100% on 1 core so I can easily measure the maximum throughput and get the difference in max RPS between Release and PGO builds.

taskset is used everywhere just to reduce OS scheduling noise during the measurements. All measurements are done multiple times, on the same hardware/software with the same background load (as much as I can guarantee).

Optimization steps

Release Envoy is built with bazel build -c opt envoy --config=docker-clang command.

Envoy PGO is built in the following steps:

Build Instrumented Envoy with bazel build -c opt --copt="-fprofile-instr-generate=/home/zamazan4ik/open_source/bench_envoy/profiles/envoy_%m_%p.profraw" --cxxopt="-fprofile-instr-generate=/home/zamazan4ik/open_source/bench_envoy/profiles/envoy_%m_%p.profraw" --linkopt="-fprofile-instr-generate=/home/zamazan4ik/open_source/bench_envoy/profiles/envoy_%m_%p.profraw" envoy --config=docker-clang
Run the instrumented Envoy to collect the profile. As a training workload, I used completely the same workload as used for the benchmark purposes
Compile Envoy once again with the collected profiles with bazel build -c opt --copt="-fprofile-instr-use=/execroot/profiles/envoy.profdata" --copt="-Wno-profile-instr-unprofiled" --copt="-Wno-profile-instr-out-of-date" --cxxopt="-fprofile-instr-use=/execroot/profiles/envoy.profdata" --linkopt="-fprofile-instr-use=/execroot/profiles/envoy.profdata" --cxxopt="-Wno-profile-instr-unprofiled" --cxxopt="-Wno-profile-instr-out-of-date" envoy --config=docker-clang

In the last step, there is one tricky place - you need to somehow mount your PGO profile into the container since here I used the Docker build configuration. I resolved it by putting this line to the root .bazelrc file: build:docker-sandbox --sandbox_add_mount_pair=/home/zamazan4ik/open_source/bench_envoy/profiles:/execroot/profiles. Probably, it could be done via the Bazel command line too - I don't know since I have almost no experience with Bazel.

Results

In short, I get the following RPS results from Nighthawk:

Release: ~5200 RPS
Release + PGO: ~6300 RPS

More detailed reports from Nighthawk are available here:

Release: https://pastebin.com/YQRBy0uR
Release + PGO: https://pastebin.com/3KaM76GX

According to the tests, PGO helps a lot with optimizing Envoy's performance (from latency and throughput perspectives).

Possible further steps

I can suggest the following action points:

Evaluate Link-Time Optimization (LTO). From my experience, LTO shines with PGO and we can get even more performance.
Perform the more robust benchmarks in other modes.
Add a note about improvements in Envoy's performance with PGO to the Envoy's documentation. In this case, users and maintainers will be aware of optimizing their Envoy builds with higher chances.
Providing an easier way (e.g. a build option) to build Envoy with PGO can be useful for the end-users and maintainers since they will be able to optimize Envoy according to their own workloads.
Optimize pre-built Envoy binaries with PGO (if you think the results above are worth it).

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too but I recommend starting from the usual PGO.

Found caveats / interesting details

I tried to use built-in Bazel support for PGO (in Bazel it's called Feedback-Driven Optimization (FDO)) with bazel build -c opt envoy --fdo_instrument=/home/zamazan4ik/open_source/bench_envoy/profiles --config=docker-clang but got the following errors: https://pastebin.com/8HtsEC26 . I tried to debug it but it was too complicated to resolve during the 10 minutes so I just decided to use another option via the direct compiler options injection. If you want to use Bazel native options - possibly you need to resolve it somehow.
In the benchmark I use PGO via instrumentation. Another option is using Sampling PGO (also known as AutoFDO). I haven't tested AutoFDO yet. I expect (almost) the same results for this kind of PGO too.

Useful links

Much more results about PGO, its results on different kinds of software, possible caveats, PGO tricky moments, and much more you can find in my repo here.

zamazan4ik added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Feb 12, 2023

keith added the area/build label Feb 12, 2023

adisuissa added area/perf help wanted Needs help! and removed triage Issue requires triage labels Feb 13, 2023

This was referenced Sep 20, 2023

Try to apply PGO haproxy/haproxy#2047

Open

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT qinguoyi/TinyWebServer#247

Open

zamazan4ik mentioned this issue Nov 4, 2023

Evaluate using LTO, Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) like LLVM BOLT evilsocket/legba#10

Open

zamazan4ik mentioned this issue Nov 20, 2023

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) on VTS vozlt/nginx-module-vts#283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to enable PGO #25500

Try to enable PGO #25500

zamazan4ik commented Feb 12, 2023 •

edited

adisuissa commented Feb 13, 2023

zamazan4ik commented Feb 13, 2023

adisuissa commented Feb 13, 2023

zamazan4ik commented Sep 18, 2023

Try to enable PGO #25500

Try to enable PGO #25500

Comments

zamazan4ik commented Feb 12, 2023 • edited

adisuissa commented Feb 13, 2023

zamazan4ik commented Feb 13, 2023

adisuissa commented Feb 13, 2023

zamazan4ik commented Sep 18, 2023

Test environment

Benchmark setup

Optimization steps

Results

Possible further steps

Found caveats / interesting details

Useful links

zamazan4ik commented Feb 12, 2023 •

edited