Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 30 additions & 11 deletions libcxx/docs/TestingLibcxx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
Benchmarks
==========

Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.

Expand All @@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
the instructions for running individual tests.

If you want to compare the results of different benchmark runs, we recommend using the
``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
and run the benchmark:
``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
be installed with:

.. code-block:: bash

$ cmake -S runtimes -B <build1> [...]
$ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
$ python -m venv .venv && source .venv/bin/activate # Optional but recommended
$ pip install -r libcxx/utils/requirements.txt

Then, do the same for the second configuration you want to test. Use a different build
directory for that configuration:
Once that's done, start by configuring CMake in a build directory and running one or
more benchmarks, as usual:

.. code-block:: bash

$ cmake -S runtimes -B <build2> [...]
$ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
$ cmake -S runtimes -B <build> [...]
$ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed

Finally, use ``libcxx-compare-benchmarks`` to compare both:
Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:

.. code-block:: bash

$ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
$ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt

The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:

.. code-block:: bash

$ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt

Finally, use ``compare-benchmarks`` to compare both:

.. code-block:: bash

$ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt

# Useful one-liner when iterating locally:
$ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)

The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
differences in a browser window. Use ``compare-benchmarks --help`` for details.

.. _`Google Benchmark`: https://github.com/google/benchmark

Expand Down
123 changes: 123 additions & 0 deletions libcxx/utils/compare-benchmarks
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
#!/usr/bin/env python3

import argparse
import re
import statistics
import sys

import plotly
import tabulate

def parse_lnt(lines):
"""
Parse lines in LNT format and return a dictionnary of the form:

{
'benchmark1': {
'metric1': [float],
'metric2': [float],
...
},
'benchmark2': {
'metric1': [float],
'metric2': [float],
...
},
...
}

Each metric may have multiple values.
"""
results = {}
for line in lines:
line = line.strip()
if not line:
continue

(identifier, value) = line.split(' ')
(name, metric) = identifier.split('.')
if name not in results:
results[name] = {}
if metric not in results[name]:
results[name][metric] = []
results[name][metric].append(float(value))
return results

def plain_text_comparison(benchmarks, baseline, candidate):
"""
Create a tabulated comparison of the baseline and the candidate.
"""
headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
fmt = (None, '.2f', '.2f', '.2f', '.2f')
table = []
for (bm, base, cand) in zip(benchmarks, baseline, candidate):
diff = (cand - base) if base and cand else None
percent = 100 * (diff / base) if base and cand else None
row = [bm, base, cand, diff, percent]
table.append(row)
return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')

def create_chart(benchmarks, baseline, candidate):
"""
Create a bar chart comparing 'baseline' and 'candidate'.
"""
figure = plotly.graph_objects.Figure()
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
return figure

def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
"""
Prepare the data for being formatted or displayed as a chart.

Metrics that have more than one value are aggregated using the given aggregation function.
"""
all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
baseline_series = []
candidate_series = []
for bm in all_benchmarks:
baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
return (all_benchmarks, baseline_series, candidate_series)

def main(argv):
parser = argparse.ArgumentParser(
prog='compare-benchmarks',
description='Compare the results of two sets of benchmarks in LNT format.',
epilog='This script requires the `tabulate` and the `plotly` Python modules.')
parser.add_argument('baseline', type=argparse.FileType('r'),
help='Path to a LNT format file containing the benchmark results for the baseline.')
parser.add_argument('candidate', type=argparse.FileType('r'),
help='Path to a LNT format file containing the benchmark results for the candidate.')
parser.add_argument('--metric', type=str, default='execution_time',
help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
'this option allows selecting which metric is being analyzed. The default is "execution_time".')
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
help='Path of a file where to output the resulting comparison. Default to stdout.')
parser.add_argument('--filter', type=str, required=False,
help='An optional regular expression used to filter the benchmarks included in the comparison. '
'Only benchmarks whose names match the regular expression will be included.')
parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
args = parser.parse_args(argv)

baseline = parse_lnt(args.baseline.readlines())
candidate = parse_lnt(args.candidate.readlines())

if args.filter is not None:
regex = re.compile(args.filter)
baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}

(benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)

if args.format == 'chart':
figure = create_chart(benchmarks, baseline_series, candidate_series)
plotly.io.write_html(figure, file=args.output)
else:
diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
args.output.write(diff)

if __name__ == '__main__':
main(sys.argv[1:])
36 changes: 36 additions & 0 deletions libcxx/utils/consolidate-benchmarks
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/usr/bin/env python3

import argparse
import pathlib
import sys

def main(argv):
parser = argparse.ArgumentParser(
prog='consolidate-benchmarks',
description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
parser.add_argument('files_or_directories', type=str, nargs='+',
help='Path to files or directories containing LNT data to consolidate. Directories are searched '
'recursively for files with a .lnt extension.')
parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
help='Where to output the result. Default to stdout.')
args = parser.parse_args(argv)

files = []
for arg in args.files_or_directories:
path = pathlib.Path(arg)
if path.is_dir():
for p in path.rglob('*.lnt'):
files.append(p)
else:
files.append(path)

for file in files:
for line in file.open().readlines():
line = line.strip()
if not line:
continue
args.output.write(line)
args.output.write('\n')

if __name__ == '__main__':
main(sys.argv[1:])
57 changes: 0 additions & 57 deletions libcxx/utils/libcxx-benchmark-json

This file was deleted.

73 changes: 0 additions & 73 deletions libcxx/utils/libcxx-compare-benchmarks

This file was deleted.

2 changes: 2 additions & 0 deletions libcxx/utils/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
plotly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pin down the versions we use in the scripts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know people tend to do that a lot, but in my experience this sometimes results in unresolvable dependency problems, when in reality any version of plotly and tabulate should work with the usage we're doing. So I am tempted to keep it as-is.

tabulate
Loading