feat(benchmark-tools): Update benchmark-tool to support custom measurement benchmarks #21555

chentong7 · 2024-06-20T18:14:15Z

Description

We've run into several scenarios in the past where we want to have benchmark tests which report their own measurements of something (other than the execution time and memory usage that benchmark-tool currently supports), and have those measurements go to Kusto so we can track them through time.

POC: #20772

Sample

  Op Size
spec.js:54
    Insert Nodes
spec.js:54
      Many Transactions
spec.js:54
        ✔ 100 small nodes in 100 transactions @CustomBenchmark (134ms)
spec.js:83
        ✔ 100 medium nodes in 100 transactions @CustomBenchmark (89ms)
spec.js:83
        ✔ 100 large nodes in 100 transactions @CustomBenchmark (78ms)
spec.js:83
      Single Transaction
spec.js:54
        ✔ 100 small nodes in 1 transaction @CustomBenchmark (66ms)
spec.js:83
        ✔ 100 medium nodes in 1 transaction @CustomBenchmark (54ms)
spec.js:83
        ✔ 100 large nodes in 1 transaction @CustomBenchmark (53ms)
spec.js:83
    Remove Nodes
spec.js:54
      Many Transactions
spec.js:54
        ✔ 100 small nodes in 100 transactions (63ms)
spec.js:83
        ✔ 100 medium nodes in 100 transactions (62ms)
spec.js:83
        ✔ 100 large nodes in 100 transactions (62ms)
spec.js:83
      Single Transaction
spec.js:54
        ✔ 100 small nodes in 1 transactions containing 1 removal of 100 nodes
spec.js:76
        ✔ 100 medium nodes in 1 transactions containing 1 removal of 100 nodes
spec.js:76

Unit test

Writing test results relative to package to nyc/junit-report.xml and nyc/junit-report.json
.mocharc.cjs:11

spec.js:54
  `benchmarkCustom` function
spec.js:54
    uses `before` and `after`
spec.js:54
      ✔ test @CustomBenchmark
spec.js:76
      ✔ run BenchmarkCustom
spec.js:76
  2 passing (8ms)

Josmithr · 2024-06-20T21:01:33Z

Note that this package is versioned and published independently from others in our repo. In order to publish the changes made here, you will want to update the package.json's version, and add notes to the package's CHANGELOG.md file about the changes.

tools/benchmark/api-report/benchmark.alpha.api.md

tools/benchmark/src/MochaCustomOutputReporter.ts

pnpm-lock.yaml

tools/benchmark/src/mocha/customOutputRunner.ts

chentong7 · 2024-06-20T21:52:53Z

Note that this package is versioned and published independently from others in our repo. In order to publish the changes made here, you will want to update the package.json's version, and add notes to the package's CHANGELOG.md file about the changes.

Updated. Thx!

tools/benchmark/CHANGELOG.md

tools/benchmark/package.json

tools/benchmark/src/mocha/customOutputRunner.ts

tools/benchmark/CHANGELOG.md

tools/benchmark/src/mocha/customOutputRunner.ts

tools/benchmark/CHANGELOG.md

tools/benchmark/src/mocha/customOutputRunner.ts

CraigMacomber

I left a suggested simplification.

Generally, though this is looking great. It's exciting to see reporter refactoring paying off. So clean :)

tools/benchmark/src/mocha/customOutputRunner.ts

Co-authored-by: Craig Macomber (Microsoft) <42876482+CraigMacomber@users.noreply.github.com>

chentong7 · 2024-07-16T22:43:11Z

CI is failing: you should fix the issues it found.

I see new APIs being added, but I don't see any changes to the tests. New APIs should get tests to both show how the new APIs are used (helpful for future developers working on this, customers and reviewers) as well as help ensure the APIs actually work correctly (which also helps reviewers).

Without any tests or docs added to the readme, or an explanation of the new features in the changelog, its a bit hard to find a good spot to start evaluating the design.

Given that, I think I know whats going on in this change, and I don't think its the way to get to where we want to go.

Currently the problem as I understand it is that:

We do not offer an API for reporting arbitrary data as benchmark results

We can't implement such an API since our reporter is not general enough to support that.

Last time we had this problem when trying to add memory tests, the approach taken was to duplicate a bunch of code, then customize it for that use case.

What I would like to see:

Rather than adding a bunch of new code which is mostly duplicating code we already have two copies of (for runtime benchmarks and memory benchmarks), I think we should instead work toward deduplicating the existing code.

Before the memory stuff was added, we had one reporter, factored into two parts:

Reporter: logic for the reporter containing everything that's not test runner specific, making it easy to author reporters for specific test runners

MochaReporter: Repoter+Mocha specific bits.

This made it easy for customers using other test runners to create the reporters they needed (ex: to work with jest).

When the memory stuff was added, it added a second reporter. This made running the tests harder (need to configure memory tests with a different reporter) and it didn't split out the test framework independent parts, so supporting other test runners is much less straight forward. Ideally a user of our library could just add a memory or custom test, and not need to author new reporters for their test framework, reconfigure their testing command line to include those reporters etc.

I view that as technical debt we should pay off before we continue work on this package.

What I think we should do in this change is modify our existing Reporter and MochaReporter to be sufficiently general that they can handle other kinds of data, like memory or custom. Infact they should only support arbitrary custom data, since that should be general enough to handle all cases. Then we can delete the memory reporter, and get down to a single reporter instead or going up to three different ones.

Once thats complete, then (in a separate PR) we could add the feature of this custom benchmark API. Some of new API should actually be used in the implementation of the time and memory benchmarks as they should all be able to be written as helpers using the new general purpose custom API.

Then we are left with a single reporter easy to use with multiple test framework and built in support for mocha, a library for making tests which report data to it, and a collection of helper libraries to measuring different things, including time and memory.

That leaves us with a much simpler to use library, with a much more modular implementation that I would expect to be far easier to understand, maintain, test and extend.

I guess that rant amounts to a proposed design doc for this work.

Link the refactoring PR here: #21730

tools/benchmark/README.md

Co-authored-by: Joshua Smithrud <54606601+Josmithr@users.noreply.github.com>

chentong7 requested review from a team as code owners June 20, 2024 18:14

github-actions bot added area: dds Issues related to distributed data structures area: dds: tree dependencies Pull requests that update a dependency file public api change Changes to a public API base: main PRs targeted against main branch labels Jun 20, 2024