Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to enable PGO #25500

Open
zamazan4ik opened this issue Feb 12, 2023 · 4 comments
Open

Try to enable PGO #25500

zamazan4ik opened this issue Feb 12, 2023 · 4 comments
Labels
area/build area/perf enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@zamazan4ik
Copy link

zamazan4ik commented Feb 12, 2023

Title: Enable PGO for Envoy

Description:
Profile-Guided Optimization (PGO) allows gaining additional performance for the software since it uses runtime profile information to perform more advanced optimization during the compilation process. I guess it could be useful for Envoy.

Possible steps:

  • Prepare Envoy build with PGO and bench it compared to the non-PGO Envoy. I expect it will help to boost Envoy performance.
  • At least consider adding PGO to CI. Yes, it has a LOT of caveats like a huge bump in a build time, good profile preparation, profile stability between releases, and much more other stuff but in my opinion, it could be worth it.
  • For some users, who want "cheaply" try to boost their Envoy performance. Maybe will be a good idea to leave a note somewhere in the Envoy documentation about this "advanced" option?

Possible future steps for improving:

  • Try to play with BOLT. BOLT also could help with gaining more performance even from LTO + PGO build (but it's not guaranteed). This way has drawbacks like BOLT on some platforms is too unstable; BOLT could not support some architectures, etc. But it definitely a good tool to think about :)

[optional Relevant Links:]

@zamazan4ik zamazan4ik added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Feb 12, 2023
@adisuissa adisuissa added area/perf help wanted Needs help! and removed triage Issue requires triage labels Feb 13, 2023
@adisuissa
Copy link
Contributor

Thanks for sharing the idea, overall PGO sounds good to me.
There's been some previous discussion about LTO and some bumps that were encountered (see #4159 for example).

@zamazan4ik
Copy link
Author

Yep, I've seen the discussion about LTO. I just didn't want to mix the discussion about LTO and PGO into the same issue. If you think that the issue about PGO should be discussed as yet another default compiler flag - feel free to mention it in #4159. However, I recommend to track it separately since PGO requires a little bit more work around it.

@adisuissa
Copy link
Contributor

I've added the comment to link a related issue/PR so that when someone attempts to build with PGO this may help them.

@zamazan4ik
Copy link
Author

I just finished some Profile-Guided Optimization (PGO) benchmarks for Envoy and want to share my results.

Test environment

  • Fedora 38
  • Linux kernel 6.4.15
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Clang 14 (the compiler provided by Envoy's tooling with docker-clang config)
  • Envoy version: one of the latest commit (16da1981d89aecdbfcaa3ffedc36879f32d37cdb) from the main branch
  • Other details: Turbo boost is disabled (I also tested with it but without Turbo the results are more stable (obviously))

Benchmark setup

An idea of how to do the benchmark I got from the Rathole benchmark guide for HTTP load. So I implemented the same benchmark for Envoy: Benchmark tool -> Envoy -> Nginx.

As a benchmark tool, I use Nighthawk with this command line: taskset -c 4-5 ./nh/nighthawk_client --rps 10000 --duration 300 --connections 4 --concurrency auto --prefetch-connections -v info http://127.0.0.1:8080.

Envoy was tested with this command line: taskset -c 0 ./envoy_static_release_master --concurrency 1 --config-path envoy-demo.yaml. envoy-demo.yaml content is here: https://pastebin.com/QfZi19Nu . I use --concurrency 1 since I want to load Envoy to 100% on 1 core so I can easily measure the maximum throughput and get the difference in max RPS between Release and PGO builds.

taskset is used everywhere just to reduce OS scheduling noise during the measurements. All measurements are done multiple times, on the same hardware/software with the same background load (as much as I can guarantee).

Optimization steps

Release Envoy is built with bazel build -c opt envoy --config=docker-clang command.

Envoy PGO is built in the following steps:

  • Build Instrumented Envoy with bazel build -c opt --copt="-fprofile-instr-generate=/home/zamazan4ik/open_source/bench_envoy/profiles/envoy_%m_%p.profraw" --cxxopt="-fprofile-instr-generate=/home/zamazan4ik/open_source/bench_envoy/profiles/envoy_%m_%p.profraw" --linkopt="-fprofile-instr-generate=/home/zamazan4ik/open_source/bench_envoy/profiles/envoy_%m_%p.profraw" envoy --config=docker-clang
  • Run the instrumented Envoy to collect the profile. As a training workload, I used completely the same workload as used for the benchmark purposes
  • Compile Envoy once again with the collected profiles with bazel build -c opt --copt="-fprofile-instr-use=/execroot/profiles/envoy.profdata" --copt="-Wno-profile-instr-unprofiled" --copt="-Wno-profile-instr-out-of-date" --cxxopt="-fprofile-instr-use=/execroot/profiles/envoy.profdata" --linkopt="-fprofile-instr-use=/execroot/profiles/envoy.profdata" --cxxopt="-Wno-profile-instr-unprofiled" --cxxopt="-Wno-profile-instr-out-of-date" envoy --config=docker-clang

In the last step, there is one tricky place - you need to somehow mount your PGO profile into the container since here I used the Docker build configuration. I resolved it by putting this line to the root .bazelrc file: build:docker-sandbox --sandbox_add_mount_pair=/home/zamazan4ik/open_source/bench_envoy/profiles:/execroot/profiles. Probably, it could be done via the Bazel command line too - I don't know since I have almost no experience with Bazel.

Results

In short, I get the following RPS results from Nighthawk:

  • Release: ~5200 RPS
  • Release + PGO: ~6300 RPS

More detailed reports from Nighthawk are available here:

According to the tests, PGO helps a lot with optimizing Envoy's performance (from latency and throughput perspectives).

Possible further steps

I can suggest the following action points:

  • Evaluate Link-Time Optimization (LTO). From my experience, LTO shines with PGO and we can get even more performance.
  • Perform the more robust benchmarks in other modes.
  • Add a note about improvements in Envoy's performance with PGO to the Envoy's documentation. In this case, users and maintainers will be aware of optimizing their Envoy builds with higher chances.
  • Providing an easier way (e.g. a build option) to build Envoy with PGO can be useful for the end-users and maintainers since they will be able to optimize Envoy according to their own workloads.
  • Optimize pre-built Envoy binaries with PGO (if you think the results above are worth it).

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too but I recommend starting from the usual PGO.

Found caveats / interesting details

  • I tried to use built-in Bazel support for PGO (in Bazel it's called Feedback-Driven Optimization (FDO)) with bazel build -c opt envoy --fdo_instrument=/home/zamazan4ik/open_source/bench_envoy/profiles --config=docker-clang but got the following errors: https://pastebin.com/8HtsEC26 . I tried to debug it but it was too complicated to resolve during the 10 minutes so I just decided to use another option via the direct compiler options injection. If you want to use Bazel native options - possibly you need to resolve it somehow.
  • In the benchmark I use PGO via instrumentation. Another option is using Sampling PGO (also known as AutoFDO). I haven't tested AutoFDO yet. I expect (almost) the same results for this kind of PGO too.

Useful links

Much more results about PGO, its results on different kinds of software, possible caveats, PGO tricky moments, and much more you can find in my repo here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build area/perf enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests

3 participants