Currently the benchmarks take upwards of 15 minutes to run. It's really hard to determine, for a given change, whether a performance regression was introduced.
It would also be nice to be able to easily compute baseline numbers, e.g. by 'node bench.js --did-i-regress' performing a git clone into a temporary directory and running the benchmarks at origin/master, then running with your changes.
(also, something that might make the numbers more stable: node --expose-gc bench.js, and call gc() between each run to invoke the collector deterministically.)
Edit: for reference, I just tried this, and I was unable to produce more predictable results by calling gc() manually. (method: I measured the standard deviation across 9 runs with and without manual gc() calls between run_benches. With the calls to gc() in, the standard deviation was not noticeably different to without them.)