Baseline benchmarking tests #54

goodboy · 2017-07-05T04:46:35Z

This adds some basic speed tests for the _MultiCall loop.
The goal is to extend this out so what we avoid performance degradation in addressing #23 and #15.
The next set of tests will include full code path(s) initiated by _HookCaller methods.
I was also thinking of adding collection speed tests.

Treat this PR as a first draft to get some opinions going.
So far the results are about what I'd expect - pypy killing it, py27 and py36 the runners up.

RonnyPfannschmidt · 2017-07-05T06:38:09Z

at first glance it looks good, is there a easy way to publish the results?

goodboy · 2017-07-05T07:44:29Z

@RonnyPfannschmidt good question.
The travis CI runs contain results in the console output but I don't know if there's a better way.

Hopefully @ionelmc can comment.

ionelmc · 2017-07-05T10:31:58Z

testing/test_multicall.py

+
+@pytest.mark.benchmark
+def test_hookwrapper_speed(benchmark):
+    benchmark(inner_exec, [w1, w2])


You could have something like

execute = MC(methods, {'arg1': 1, 'arg2': 2, 'arg3': 3}).execute benchmark(execute, [w1, w2])

to avoid including the wrapper overhead in your timings. This would matter if the execute method is really really fast.

@ionelmc the wrapper overhead is actually more true to call usage by a _HookCaller which internally calls its _HookCaller._hookexec() which in turn eventually invokes the PlugginManager._inner_hookexec().

My next move once this stuff lands is to add benchmarks around those higher level components such that we speed test the full API code paths.

@ionelmc by the way did you see our question about how to better plubish the benchmarking results for CI or on github? I noticed in your docs that you've got notes on how to render results to json but is there any known way to present results here?

Yes, right. I don't know much about the pluggy internals.

About displaying results, and running pytest-benchmark in CI ... well, it's not good. CI will run each with lots of external bias, so timings aren't consistent. You could look at perf counters or profile data (my plugin only shows some profile data but not well integrated in the stats yet) or you could compare against a baseline (eg, you have a test that only does 10 function calls to establish a baseline to compare against - if there's a perf regression you will get slower than the baseline.

Just to restate, saving benchmark data somewhere and comparing against previous runs would be unreliable due to virtualization and contention in TravisCI. We can talk about instruction counters I guess (since those shouldn't care about preemption right?), what would be useful?

And testing on pypy further complicates things because of warmup (the plugin has such feature but you need to set it correctly).

@ionelmc I didn't even think about the effects of the CI server.. very good point.

or you could compare against a baseline (eg, you have a test that only does 10 function calls to establish a baseline to compare against - if there's a perf regression you will get slower than the baseline.

Can you elaborate on this? Are you suggesting this approach will work in CI?

I think I'll still include a baseline from the machine I develop on at least as a reference.

We can talk about instruction counters I guess (since those shouldn't care about preemption right?), what would be useful?

Yes this would be super useful. Can I pull this from the profiling data; I'm guessing it can't be used with --benchmark-compare-fail?

nicoddemus · 2017-07-05T20:35:10Z

I like the output, nice work!

How do we plan to track if some change improves or decreases performance? Manually? Any idea if it would be possible to automate it without too much effort?

goodboy · 2017-07-05T20:43:25Z

@nicoddemus I was planning a couple progress tracking ideas:

parameterize tests with different core-component implementations and group them to demonstrate perf differences
commit snapshots of benchmark results and track regressions with the –benchmark-compare=NUM|ID and --benchmark-compare-fail=EXPR flags

I was going to add this in my next PR which will attempt to address #23 with accompanying benchmarks to make everyone confident about the changes.

This PR should be viewed as just an initial (and non-invasive) introduction of pytest-benchmark

goodboy · 2017-07-05T20:48:11Z

Also note the appveyor builds are failing because of missing pypy?
Do we actually need the appveyor runs given we already have travis?

nicoddemus · 2017-07-05T20:58:06Z

@tgoodlet awesome, sounds like a good plan.

One idea: how about creating a new environment just to run the tests for pytest benchmark? That would avoid cluttering all runs with it, plus would give us flexibility to run different comparisons on the CI.

nicoddemus · 2017-07-05T20:58:55Z

Also note the appveyor builds are failing because of missing pypy?
Do we actually need the appveyor runs given we already have travis?

Well it is a good practice, given that we also test all python versions in both Linux and Windows. It should be working though, pytest uses the same mechanism basically.

nicoddemus · 2017-07-05T21:31:42Z

Actually pypy is installed, it is crashing midway though. But as I mentioned earlier, I suggest we have a separate env just for the benchmark tests. This would solve this problem, plus we can choose to skip them on Windows.

goodboy · 2017-07-05T23:15:46Z

@nicoddemus ah I see the error now:
ERROR: InvocationError: 'C:\\projects\\pluggy\\.tox\\pypy-pytest30\\bin\\py.test.EXE testing/'
Any idea what's going on there - are you saying it's due to the new tests?

@nicoddemus in terms of separate (tox) envs that sounds fine to me although won't we still be interested in performance diffs on each pytest and python version anyway? I was also thinking maybe it makes more sense to stick all the perf tests in a separate test module clearly isolating them instead of relying solely on marks. I'll try to add the separate env later today.

nicoddemus · 2017-07-05T23:57:03Z

Any idea what's going on there - are you saying it's due to the new tests?

It seems to me the tests are abruptly terminating, which indicates a crash to me.

in terms of separate (tox) envs that sounds fine to me although won't we still be interested in performance diffs on each pytest and python version anyway?

We might have different benchmark envs (bench-py27-pytest28, etc), although I feel that initially we should keep things simple with a single benchmark env; we want to keep benchmark logs on the repository for comparison, and adding another level of complexity might be overkill at this point.

I was also thinking maybe it makes more sense to stick all the perf tests in a separate test module clearly isolating them instead of relying solely on marks.

Good point. In fact we can name it so pytest won't pick it by default (benchmarks.py?), which makes sense, so the benchmark env would look like this:

[testenv:bench]
commands= py.test testing/benchmarks.py
deps=
    pytest
    pytest-benchmark

ionelmc · 2017-07-06T09:34:20Z

In fact we can name it so pytest won't pick it by default (benchmarks.py?), which makes sense, so the benchmark env would look like this

You can also stick --benchmark-disable or --benchmark-skip in addopts or somewhere for same effect.

This begins an effort to incorporate run-time speed tests using `pytest-benchmark`. This initial test set audits the `_MultiCall` loop with hookimpls, hookwrappers and the combination of both. The intention is to eventually have a reliable set of tests which enable making core component modifications without disrupting performance as per #37.

goodboy · 2017-07-07T05:54:54Z

@nicoddemus @RonnyPfannschmidt alright made the requested changes.
I made two travis runtimes for py27 and 36 which can be used to track results.

If possible I'd like to get this PR in before adding more tests.
Let me know what you guys think!

goodboy · 2017-07-07T06:29:50Z

Just made it even better by parametrizing over multiple hook/wrapper counts.

nicoddemus

Nice work!

Just a note: you might want to consider to push to your branches to your fork instead of the main repository; this avoids running the CI services twice. That's our recommendation in pytest and plugins at least. 👍

goodboy · 2017-07-07T16:30:28Z

you might want to consider to push to your branches to your fork instead of the main repository

@nicoddemus I'm fine to do that if it's preferred. What do you mean by CI running twice? Seems to me it's the same either way?

nicoddemus · 2017-07-07T20:37:17Z

What do you mean by CI running twice? Seems to me it's the same either way?

The end result is the same, is just that Travis and AppVeyor will run twice each: 1 for the PR and another for the branch (at least in the current configuration). Also this leaves the main repository cleaner without topic branches. But really, it is no big deal and I'm mentioning just because 4 builds might take longer than just two. 👍

Add Python 3.6 and pytest-benchmark for testing

4a2478c

goodboy requested review from hpk42, RonnyPfannschmidt and nicoddemus July 5, 2017 04:46

goodboy force-pushed the benchmarking branch from d7ade7b to c64cede Compare July 5, 2017 05:06

RonnyPfannschmidt requested a review from ionelmc July 5, 2017 06:38

ionelmc reviewed Jul 5, 2017

View reviewed changes

goodboy force-pushed the benchmarking branch 2 times, most recently from 4a2f399 to d8d6fdd Compare July 5, 2017 18:11

goodboy force-pushed the benchmarking branch from d8d6fdd to a3b736a Compare July 7, 2017 05:30

Tyler Goodlet added 3 commits July 7, 2017 01:40

Add a benchmark tox env

8dae27d

Run benchmark tests in travis

2306080

goodboy force-pushed the benchmarking branch from a3b736a to 2306080 Compare July 7, 2017 05:44

Parametrize by counts of hooks/wrappers

4a8bf5c

nicoddemus approved these changes Jul 7, 2017

View reviewed changes

goodboy merged commit 9b2bc8c into master Jul 7, 2017

goodboy deleted the benchmarking branch July 7, 2017 19:07

pyup-bot mentioned this pull request Sep 18, 2017

Scheduled weekly dependency update for week 38 qutebrowser/qutebrowser#2985

Closed

pyup-bot mentioned this pull request Sep 25, 2017

Scheduled weekly dependency update for week 39 qutebrowser/qutebrowser#3031

Merged

pyup-bot mentioned this pull request Oct 5, 2017

Initial Update aio-libs/async-lru#11

Closed

pyup-bot mentioned this pull request Oct 29, 2017

Update pluggy to 0.5.2 uppsaladatavetare/foobar-api#56

Closed

This was referenced Nov 24, 2017

Update pluggy to 0.6.0 amureki/whoisp#76

Closed

Update pluggy to 0.6.0 DavidHickman/calendary#181

Open

Update pluggy to 0.6.0 StuartJSquires/markov_slackbot#147

Closed

pyup-bot mentioned this pull request Jan 19, 2018

Update pluggy to 0.6.0 Poorvak/http_pipeline#8

Merged

pyup-bot mentioned this pull request Feb 8, 2018

Initial Update aio-libs/aiocassandra#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Baseline benchmarking tests #54

Baseline benchmarking tests #54

goodboy commented Jul 5, 2017

RonnyPfannschmidt commented Jul 5, 2017

goodboy commented Jul 5, 2017

ionelmc Jul 5, 2017

goodboy Jul 5, 2017

ionelmc Jul 6, 2017

goodboy Jul 6, 2017

nicoddemus commented Jul 5, 2017

goodboy commented Jul 5, 2017

goodboy commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

goodboy commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

ionelmc commented Jul 6, 2017

goodboy commented Jul 7, 2017

goodboy commented Jul 7, 2017

nicoddemus left a comment

goodboy commented Jul 7, 2017

nicoddemus commented Jul 7, 2017

Baseline benchmarking tests #54

Baseline benchmarking tests #54

Conversation

goodboy commented Jul 5, 2017

RonnyPfannschmidt commented Jul 5, 2017

goodboy commented Jul 5, 2017

ionelmc Jul 5, 2017

Choose a reason for hiding this comment

goodboy Jul 5, 2017

Choose a reason for hiding this comment

ionelmc Jul 6, 2017

Choose a reason for hiding this comment

goodboy Jul 6, 2017

Choose a reason for hiding this comment

nicoddemus commented Jul 5, 2017

goodboy commented Jul 5, 2017

goodboy commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

goodboy commented Jul 5, 2017

nicoddemus commented Jul 5, 2017

ionelmc commented Jul 6, 2017

goodboy commented Jul 7, 2017

goodboy commented Jul 7, 2017

nicoddemus left a comment

Choose a reason for hiding this comment

goodboy commented Jul 7, 2017

nicoddemus commented Jul 7, 2017