Skip to content

oliora/habr-switches-perf-test

Repository files navigation

Disclaimer

  • The code was made purely for performance testing purpose, use it on your own risk
  • Part of the code is only compatible with x86_64 architecture and requires CPU with AVX2 support

Prerequisites

  • x86_64 Linux machine with CPU that supports AVX2 (all modern Intel and AMD CPUs have it)
  • GCC or Clang (any version that supports C++17 should be fine)
  • Meson
  • To create graphs: Python 3 with pandas, mathplot and jupyterlab packages

How to build

To build release version (fully optimized):

meson setup build
meson compile -C build

To build debug version (non-optimized):

meson setup build-debug --buildtype=debug
meson compile -C build-debug

How to run tests

To run all the tests:

./run-benchmarks.sh > results.log

Additionally you can provide a number of runs for each test (10 by default):

./run-benchmarks.sh 1 > results.log

Tests run for two input files from input directory: wp.txt - War and Peace by Leo Tolstoy and long.txt - a 100x copy of wp.txt. long.txt is generated by run-benchmarks.sh script on its first run.

For more definitive results run tests on an isolated CPU core with using taskset or numactl command:

taskset -c <core> ./run-benchmarks.sh > results.log

To run particular algo on a particular input file run:

[taskset -c <core>] build/algo-XXX input/wp.txt

How to run unit tests and micro benchmarks

Build release version first and then run:

meson test -C build -v

Results

results directory contains benchmark results, both the logged data and final graphs.

Benchmarks were run on two platforms:

  • run-amd-epyc.txt - AMD EPYC 7282, physical server
  • run-icelake-aws.txt - Intel Xeon Platinum 8375C, m6i.metal AWS instance

The benchmark was run under the following conditions:

  • Benchmark process is pinned to an isolated core:
  • Hyperthreading is disabled
  • Performance mode is enabled

Each algo run 10 times for each input, best pass is plotted. We use 10 passes per run for wp.txt input to reduce variability of the results (otherwise RSD was up to 40%). We use 1 pass per run for long.txt input.

There is Jupyter notebook graphs.ipynb that you can run to regenerate the graphs.

About

Perf test of different implementations of the "switches" algo

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published