[View as slides](https://nbviewer.jupyter.org/format/slides/github/lutostag/talks/blob/master/python/Benchmarks.ipynb#/)

<center><h1>Benchmarking + Regressions</h1></center>
  
Greg Lutostanski

[github.com/lutostag](https://github.com/lutostag)

Senior Software Architect
    
  
![The Mobility House](https://www.mobilityhouse.com/media/logo/default/tmh_logo.png)

Give you all some tools to use for benchmarking/understand the landscape.

Particularly tied to some tools we already use.

Benchmarks:
* A type of test
* How fast something is (relative to something else)
* Hard to get right
    * what is measured
    * consistency

Regressions:
* Has something gotten worse (since before -- some checkpoint)

Standards:
* [xUnit](https://en.wikipedia.org/wiki/XUnit) / jUnit [...](https://martinfowler.com/bliki/Xunit.html)
    
started in 1998 -- has some standard parseable xml output that looks like:
```
<?xml version="1.0" encoding="utf-8"?><testsuite errors="0" failures="0" name="pytest" skips="0" tests="47" time="3.592">
<testcase classname="marketplace.tests.test_aggregator" file="marketplace/tests/test_aggregator.py" line="8" name="test_aggregate_empty_input" time="0.0025908946990966797">
```

Tools:
* pytest
* pytest-benchmark (for benchmarking obviously)
    
Other languages:
* karma
* karma-benchmark

Basically any *good* testing framework should have a benchmark plugin -- and should write xUnit xml output

Also circleci...

Lets you save test artifacts -- and can read xUnit output files in trends too...





In [33]:
!pip install -q pytest pytest-benchmark

In [13]:
# test.py
import time

def test_my_stuff(benchmark):
    benchmark(time.sleep, 0.02)

In [26]:
with open('test.py', 'w') as f:
    f.write("""import time

def test_my_stuff(benchmark):
    benchmark(time.sleep, 0.02)""")

In [27]:
!pytest test.py --benchmark-json=/tmp/bench.json

platform linux -- Python 3.7.1, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/lutostag/python
plugins: benchmark-3.2.2
collected 1 item                                                               [0m

test.py [32m.[0m[36m                                                                [100%][0m
[35mWrote benchmark data in: <_io.BufferedWriter name='/tmp/bench.json'>[0m



[33m---------------------------------------------- benchmark: 1 tests ---------------------------------------------[0m
Name (time in ms)         Min      Max     Mean  StdDev   Median     IQR  Outliers      OPS  Rounds  Iterations
[33m---------------------------------------------------------------------------------------------------------------[0m
test_my_stuff       [1m  20.0695[0m[1m  20.2069[0m[1m  20.1267[0m[1m  0.

In [30]:
#Now lets slow it down to compare...
!sed -i test.py -e 's/0.02/0.10/g'
!pytest test.py --benchmark-compare=/tmp/bench.json

[35mComparing against benchmarks from: /tmp/bench.json[0m
platform linux -- Python 3.7.1, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/lutostag/python
plugins: benchmark-3.2.2
collected 1 item                                                               [0m

test.py [32m.[0m[36m                                                                [100%][0m


[33m---------------------------------------------------------------------------------------- benchmark: 2 tests ---------------------------------------------------------------------------------------[0m
Name (time in ms)                    Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
[33m----------------------------------------------

In [31]:
#we can also make it fail if it regresses too far... just add the option:
!pytest test.py --benchmark-compare=/tmp/bench.json --benchmark-compare-fail=min:5%

[35mComparing against benchmarks from: /tmp/bench.json[0m
platform linux -- Python 3.7.1, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
benchmark: 3.2.2 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/lutostag/python
plugins: benchmark-3.2.2
collected 1 item                                                               [0m

test.py [32m.[0m[36m                                                                [100%][0m


[33m---------------------------------------------------------------------------------------- benchmark: 2 tests ---------------------------------------------------------------------------------------[0m
Name (time in ms)                    Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
[33m----------------------------------------------

But how do we automate this?
CircleCI:
* [build artifacts](https://circleci.com/docs/2.0/artifacts/#section=jobs)
* [API usage](https://circleci.com/docs/api/#get-authenticated)

Basically store the benchmark for each run -- and compare to a baseline run

Not entirely live demo
[github](https://github.com/lutostag/benchmark-circleci)
[circleci](https://circleci.com/build-insights/gh/lutostag/benchmark-circleci/master)

Questions:
* So now you have some tools to benchmark, what does this make sense for?
* What would you actually want to benchmark?