[Update] Benchmarking Microtests: State of the union of node/benchmark tests and where to go next

The goal of this issue is to demonstrate the following:
- Why microbenching is important.
- Benchmark.js is a good library to use
- Where am I going to start and the path forward
### Why micro benchmark?

Micro benchmarking is arguably more important from a library's standpoint than application / integration level benchmarking (I have heard macro benchmark as a term to). Micro benchmarking will quickly flag slowdowns in the system with little noise.  This will help diagnosis the issue with little to no investigation needed.
#### Arguments against micro benchmarks
- Can be less reliable.  
  - This is addressed below in more detail (linked article).  It can be measured _accurately_*.
- Application / integration benchmarks are more meaningful measurements.  
  - Correct and Incorrect.  Its meaningful for its estimated performance of said application / integration that is being measured, but does not mean it will be as performant for my application as our calling patterns could be different, thus different performance characteristics.  
  - Second, there is no reasonable/practical way to determine where performance issues are arising from if the granularity of performance tests are at application level.  The noise is to loud.
  - Finally, due to the performance of application level measurements, some operations can become 2 - 3x slower and be eclipsed by the performance of the application itself.  The 1000 paper cuts of slow down can be observed overtime with no individual being able to determine where/when it happened.

\* measured accurately:  If a stable platform and multiple runs are used, one gets the most consistent measurements possible from javascript measuring javascript.
### Where are the current tests at and where do we go?
#### Overview

After reviewing the set of tests for [nodejs/node/benchmark](https://github.com/nodejs/node/tree/master/benchmark) I see an _awesome_ set of micro benchmarks. It really is a great place to start. It appears that the represented set of node specific libraries are here.    
#### Why not just use those tests?

The primary reason why the tests are invalid forms of measurement can be [found here](http://calendar.perfplanet.com/2010/bulletproof-javascript-benchmarks/)(for the TL/DR; portion, read how option _**A**_ and _**D**_ work).   Secondly, it would also reduce potential bugs / learning curve just by using a well known performance measuring library.  Especially since the custom benchmark minimally suffers from more than all the same downfalls of [benchmark.js](http://benchmarkjs.com/).  
#### Downfalls of Benchmark.js and their workaround.

The downfalls of benchmark.js is its javascript measuring javascript (one of the same downfalls of node/benchmark/common.js).  The operating system can do _who knows what_ during performance runs and cause incorrect measurements.  Thus a more consistent platform (EC2 as an example) can make results more stable.  Multiple runs (say 10), tossing out the high/low will help remove v8 mid-run optimizing, OS context switching, Wednesdays bad weather, etc. etc. issues.
### What about flamegraphs?

Flamegraphs do not give an absolute number, they give relative numbers.  Flamegraphs are amazing for understanding whats taking the most time within a library, not the performance of the library itself..

A side note: This would be a very interesting tool to use for performance charting overtime.  One could use the % of samples as an indicator of growth of running time.  If all tests were measured for a long enough period of time, a complete picture could be established and used build over build / day over day / some frequency.  The only issue I see with this is that there is no _out of box_ solution to this.  Secondly writing this library would be a feat in of itself.  So we will defer discussions / implementation of this for a _later time or never_.
### Where to go from here?
- Talk to @mhdawson on where to commit these tests too.
- Now that we have a baseline of where to start I'll create a set of tests for `require`.  It may be impossible for `require` new module code to be tested by benchmark.js due to the caching nature (it really depends if I can muck with the memory or not).  It will be trivial to test `require`s cached result retrieval with benchmark js.  
- I'll talk to @mhdawson and learn how to integrate the results into the already built 
  charting/storage system.  
- I'll start building a suite of tests using benchmark js for each of node's subsystems.  This would be `buffer`, `path`, `urlparse`, etc.  I would follow suit of node/benchmark.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Update] Benchmarking Microtests: State of the union of node/benchmark tests and where to go next #25

Why micro benchmark?

Arguments against micro benchmarks

Where are the current tests at and where do we go?

Overview

Why not just use those tests?

Downfalls of Benchmark.js and their workaround.

What about flamegraphs?

Where to go from here?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Update] Benchmarking Microtests: State of the union of node/benchmark tests and where to go next #25

Description

Why micro benchmark?

Arguments against micro benchmarks

Where are the current tests at and where do we go?

Overview

Why not just use those tests?

Downfalls of Benchmark.js and their workaround.

What about flamegraphs?

Where to go from here?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions