llvm · ldionne · Sep 9, 2025 · Sep 8, 2025 · philnik777 · Sep 9, 2025
@@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
 Benchmarks
 ==========
 
-Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
+Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
 library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
 Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.
 
@@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
 the instructions for running individual tests.
 
 If you want to compare the results of different benchmark runs, we recommend using the
-``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
-and run the benchmark:
+``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
+be installed with:
 
 .. code-block:: bash
 
-  $ cmake -S runtimes -B <build1> [...]
-  $ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+  $ python -m venv .venv && source .venv/bin/activate # Optional but recommended
+  $ pip install -r libcxx/utils/requirements.txt
 
-Then, do the same for the second configuration you want to test. Use a different build
-directory for that configuration:
+Once that's done, start by configuring CMake in a build directory and running one or
+more benchmarks, as usual:
 
 .. code-block:: bash
 
-  $ cmake -S runtimes -B <build2> [...]
-  $ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+  $ cmake -S runtimes -B <build> [...]
+  $ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
 
-Finally, use ``libcxx-compare-benchmarks`` to compare both:
+Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:
 
 .. code-block:: bash
 
-  $ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
+  $ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
+
+The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
+directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
+
+.. code-block:: bash
+
+  $ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
+
+Finally, use ``compare-benchmarks`` to compare both:
+
+.. code-block:: bash
+
+  $ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
+
+  # Useful one-liner when iterating locally:
+  $ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
+
+The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
+differences in a browser window. Use ``compare-benchmarks --help`` for details.
 
 .. _`Google Benchmark`: https://github.com/google/benchmark
 

@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import statistics
+import sys
+
+import plotly
+import tabulate
+
+def parse_lnt(lines):
+    """
+    Parse lines in LNT format and return a dictionnary of the form:
+
+        {
+            'benchmark1': {
+                'metric1': [float],
+                'metric2': [float],
+                ...
+            },
+            'benchmark2': {
+                'metric1': [float],
+                'metric2': [float],
+                ...
+            },
+            ...
+        }
+
+    Each metric may have multiple values.
+    """
+    results = {}
+    for line in lines:
+        line = line.strip()
+        if not line:
+            continue
+
+        (identifier, value) = line.split(' ')
+        (name, metric) = identifier.split('.')
+        if name not in results:
+            results[name] = {}
+        if metric not in results[name]:
+            results[name][metric] = []
+        results[name][metric].append(float(value))
+    return results
+
+def plain_text_comparison(benchmarks, baseline, candidate):
+    """
+    Create a tabulated comparison of the baseline and the candidate.
+    """
+    headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
+    fmt = (None, '.2f', '.2f', '.2f', '.2f')
+    table = []
+    for (bm, base, cand) in zip(benchmarks, baseline, candidate):
+        diff = (cand - base) if base and cand else None
+        percent = 100 * (diff / base) if base and cand else None
+        row = [bm, base, cand, diff, percent]
+        table.append(row)
+    return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
+
+def create_chart(benchmarks, baseline, candidate):
+    """
+    Create a bar chart comparing 'baseline' and 'candidate'.
+    """
+    figure = plotly.graph_objects.Figure()
+    figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
+    figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
+    return figure
+
+def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
+    """
+    Prepare the data for being formatted or displayed as a chart.
+
+    Metrics that have more than one value are aggregated using the given aggregation function.
+    """
+    all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
+    baseline_series = []
+    candidate_series = []
+    for bm in all_benchmarks:
+        baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
+        candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
+    return (all_benchmarks, baseline_series, candidate_series)
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='compare-benchmarks',
+        description='Compare the results of two sets of benchmarks in LNT format.',
+        epilog='This script requires the `tabulate` and the `plotly` Python modules.')
+    parser.add_argument('baseline', type=argparse.FileType('r'),
+        help='Path to a LNT format file containing the benchmark results for the baseline.')
+    parser.add_argument('candidate', type=argparse.FileType('r'),
+        help='Path to a LNT format file containing the benchmark results for the candidate.')
+    parser.add_argument('--metric', type=str, default='execution_time',
+        help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
+             'this option allows selecting which metric is being analyzed. The default is "execution_time".')
+    parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+        help='Path of a file where to output the resulting comparison. Default to stdout.')
+    parser.add_argument('--filter', type=str, required=False,
+        help='An optional regular expression used to filter the benchmarks included in the comparison. '
+             'Only benchmarks whose names match the regular expression will be included.')
+    parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
+        help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
+             'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
+    args = parser.parse_args(argv)
+
+    baseline = parse_lnt(args.baseline.readlines())
+    candidate = parse_lnt(args.candidate.readlines())
+
+    if args.filter is not None:
+        regex = re.compile(args.filter)
+        baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
+        candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
+
+    (benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
+
+    if args.format == 'chart':
+        figure = create_chart(benchmarks, baseline_series, candidate_series)
+        plotly.io.write_html(figure, file=args.output)
+    else:
+        diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
+        args.output.write(diff)
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+
+import argparse
+import pathlib
+import sys
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='consolidate-benchmarks',
+        description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
+    parser.add_argument('files_or_directories', type=str, nargs='+',
+        help='Path to files or directories containing LNT data to consolidate. Directories are searched '
+             'recursively for files with a .lnt extension.')
+    parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+        help='Where to output the result. Default to stdout.')
+    args = parser.parse_args(argv)
+
+    files = []
+    for arg in args.files_or_directories:
+        path = pathlib.Path(arg)
+        if path.is_dir():
+            for p in path.rglob('*.lnt'):
+                files.append(p)
+        else:
+            files.append(path)
+
+    for file in files:
+        for line in file.open().readlines():
+            line = line.strip()
+            if not line:
+                continue
+            args.output.write(line)
+            args.output.write('\n')
+
+if __name__ == '__main__':
+    main(sys.argv[1:])
@@ -0,0 +1,2 @@
+plotly
+tabulate
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		plotly
Copy link Contributor philnik777 Sep 9, 2025 Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. Should we pin down the versions we use in the scripts? Copy link Member Author ldionne Sep 9, 2025 Choose a reason for hiding this comment The reason will be displayed to describe this comment to others. Learn more. I know people tend to do that a lot, but in my experience this sometimes results in unresolvable dependency problems, when in reality any version of `plotly` and `tabulate` should work with the usage we're doing. So I am tempted to keep it as-is.
		tabulate