Issues And Benchmarks

jkomoros edited this page May 23, 2012 · 4 revisions

Although the benchmark suite is made up of a number of benchmarks, each benchmark is motivated by a specific issue. An issue is an instance of felt performance pain--a symptom that the web development community would liked to be fixed. A benchmark is a browser performance test specifically crafted to “capture” the pain of the issue in a quantifiable way, such that an improvement in the score implies a lessening of the felt pain. The appropriateness of inclusion for an issue is independent of a specific benchmark proposed to address that issue.

The default for the suite is inclusion. The only valid reason for a committee member to vote against including an issue or benchmark in the suite is if a compelling argument can be made that it doesn’t meet at least one of the criteria below.

What makes a good issue

A good issue...

  • ... is related primarily to a performance problem, not a problem of correctness or missing features.
  • ... is a real problem that multiple real developers have run into in practice--either in production code, or in experiments that required lots of workarounds or a totally different implementation strategy to fix. It shouldn’t be a theoretical problem, although it can represent an ideal use case that’s not possible today.
  • ... is something that should be possible but isn’t because of a performance cliff.
  • ... is not necessarily a performance problem that all browsers share. It is okay to have an issue that performs very poorly on one browser but reasonably on others--so long as it is a real problem that causes real pain.
  • ... target performance problems that continue to exist in the latest versions of browsers.
  • ... does not have compelling arguments that it shouldn’t be fixed (e.g. where the better way to accomplish all instances is some other way).
  • ... can theoretically be captured in a quantifiable way by a benchmark.

What makes a good benchmark

A good benchmark...

  • ... is created specifically to target a specific issue (see above).
  • ... is the smallest reasonable case that triggers the behavior to optimize, without being overly specific and thus easy to micro-optimize.
  • ... operates at scale to demonstrate meaningful performance differences (it doesn’t complete in milliseconds).
  • ... keeps browser-specific code to the bare minimum required to deal with inconsistent APIs. (See the guidelines on benchmark compatibility for more information.)
  • ... fits into the test harness and can be run in the browser easily by visiting a URL.
  • ... is deterministic and exhibits low variance.

See the community participation page for more information on how to evangelize for issues you care about.