-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
rough performance analysis of the current internals #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can I work on it ? |
Feel free to, please send a mail to the ml and join the IRC channels if IRC works out for you |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I'll look into creating an example project with the performance issues tomorrow at work. It's probably just to create lots of files and directories that are irrelevant I think. What's the deal with the "marked as off topic" stuff btw? This issue was about performance issues generally and I gave a pretty clear example of a huge performance issue. I don't see how I could be more on topic than that ^_- |
Cool, looking forward to this. Re offtopic: this issue appears to be investigating performance altogether / in general, so your issue is not really offtopic, but should be a separate issue - my intention was to keep this issue here easier to read / skim in the future. |
For pytest's own suite the `cache_info()` looks as follows: > session.config._getconftest_pathlist.cache_info() CacheInfo(hits=231, misses=19, maxsize=None, currsize=19) While it does not really make a difference for me this might help with larger test suites / the case mentioned in pytest-dev#2206 (comment).
I've done some more investigation on the collect phase speed issues:
So as you can see I've only tried the released versions, but it's narrowed down a lot. I was very impressed by the speed of 3.6.0, so much I had to check and recheck that it did indeed find all my tests. I have written a little test script that demonstrates some (but not all!) of the behavior we see: from os import makedirs
from os.path import expanduser, join
from shutil import rmtree
base_dir = expanduser('~/Downloads/test_pytest_collect_speed/tests/')
rmtree(base_dir)
for i in range(100):
d = join(base_dir, f'foo_{i}')
makedirs(d)
with open(join(d, f'__init__.py'), 'w') as f:
pass
for j in range(10):
with open(join(d, f'test_bar_{j}.py'), 'w') as f:
f.write('def test_foo(): pass') Running this script and then
As you can see 3.6 isn't much faster with this test, unlike our real code base, but the slowdown in 3.7 is clearly visible. |
Thanks @boxed for the analysis. 3.7.0 saw the introduction of package-scoped fixtures, that's very likely to be the cause of the slowdown since it changed the collection code significantly. |
That sounded plausible, but looking at the cProfile output I don't think that's the case: 3.6.4:
3.7.0:
I don't see how package-scoped fixtures can be an explanation of how the number of calls to posix.fspath went from 383193 to 7339091. That's 19x more. |
We're spending the vast amount of time in this piece of code in _pytest/main.py: if not any(x in path.parts() for x in paths): I'm having trouble wrapping my head around what it's doing. It was introduced with package-scoped fixtures as you suspected. |
Here some data from running running @boxed test case under pyflame. Worst offender in terms of performance regressions are:
Both were indeed introduced with the package-scoped fixtures feature. https://github.com/martenbr/flamecharts/blob/master/3.6.4.svg (8.63s) |
Well.. we found it. I'll send a PR. |
we really ought to take a look at using strings and a set there ^^ |
It seems to me like a purge of py's path handling is the way to go. It looks like a high level path juggling lib but it goes to the filesystem crazy much. I also noticed that between 3.7.0 (which I tested my performance fix on) and master, more performance regressions have landed, resulting in my test going from 7 seconds (in 3.7 with my patch) to 9.3 seconds. |
@boxed perso0nally i'D be delighted if we could remove all of |
Here's a PR #4237 There's a lot of performance left on the table still, so I'll keep looking. This is still ~50% slower than 3.9.1 on my machine. |
lets commit this to the examples folder - those around >1000 files share content, so its inexpensive to keep in git - and we can have a direct folder in the checkout to test with |
i made |
this makes me wonder how much of that is pytest hooks/pluggy and whether the idea of cythonizing hooks would make a good difference |
Here is some analysis, not of collection but mostly for execution. MethodologyI use the idea from #401, without the import pytest
@pytest.mark.parametrize("x", range(5000))
def test_foo(x):
pass I analyze how much overhead each non-essential plugin adds to the default with all plugins enabled. Run with Specs: Linux, Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz, HDD, Python 3.8, current master (7d5f5a8), performance governor, turbo off Hacky script#!/usr/bin/env python
from typing import List
import subprocess
import time
plugins = (
"assertion",
"cacheprovider",
"capture",
"debugging",
"doctest",
"faulthandler",
"freeze_support",
"junitxml",
"logging",
"monkeypatch",
"nose",
"pastebin",
"recwarn",
"reports",
"resultlog",
"setuponly",
"setupplan",
"skipping",
"stepwise",
"unittest",
"warnings",
)
def check(plugin: str, iterations: int = 5) -> List[float]:
timings = []
for i in range(iterations):
start = time.perf_counter()
subprocess.run(
["python", "-m", "pytest", "xx.py", "-p", f"no:{plugin}"],
stdout=subprocess.DEVNULL,
check=True,
env={"PYTHONPATH": "src/"},
)
timings.append(time.perf_counter() - start)
return timings
print("PLUGIN", "T1", "T2", "T3", "T4", "T5", "TMIN", "DIFF", "DIFF%")
timings = check("$DEFAULT")
baseline = min(timings)
print(
"{:15} {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:+.4f}s {:+.3f}%".format(
"BASELINE",
timings[0],
timings[1],
timings[2],
timings[3],
timings[4],
min(timings),
baseline - min(timings),
((baseline - min(timings)) / baseline) * 100,
)
)
for plugin in plugins:
timings = check(plugin)
print(
"{:15} {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:2.4f}s {:+.4f}s {:+.3f}%".format(
plugin,
timings[0],
timings[1],
timings[2],
timings[3],
timings[4],
min(timings),
baseline - min(timings),
((baseline - min(timings)) / baseline) * 100,
)
) ResultsDetailed results
So overall, for this simple case, the plugins take 57.5% of the time, almost all of it from these 5:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I ran this test with pytest (no config): 11.6s. Not sure if those parameters are correct, because passing any random string there doesn't give an error message so I'm not sure pytest isn't just swallowing an invalid configuration silently... Another interesting test is to remove the plain asserts param and changing the test to (I would argue that hammett actually right now provides a slightly better user experience for failures right now.) |
Hi @boxed, thanks for checking! It's nice to have a lower bound with hammett. Interesting that it's only 11.6 -> 9.2 for you. Can you describe your system (operating system and hardware)?
You can verify with
Hmm. I haven't looked at assertion rewriting. IIRC assertion rewriting is cached/compiled to pyc though, is it still 19.9s on second run?
A large chunk of the remaining time (after turning off the plugins) is taken by |
I realized I had run this test in a rather dirty venv that had the django plugin. Made a new fresh venv and got 6 seconds with tests passing. Big improvement from 11s! Still bad compared to hammetts 2, but at least same order of magnitude. Assertion rewriting on or off has a small effect. It's very obvious that the degradation in performance is due to the printing at the end. Hammett also prints local variables so that's not it. |
All tests failing is probably not a very common case, but 39.2 does sound excessive. If you have time to do a cProfile of this scenario, I'd be interested. Also be interested to know how long it takes with |
I tried > /dev/null and it made no difference. As for a cProfile. It's simple enough to reproduce, it's not specific to my machine. |
I wrote up a little benchmark suite: https://github.com/boxed/test-benchmarks |
Both nose2 and hammett don't collect a list of tests before running. They just collect a list of files containing tests (for nose2 based on filename heuristics). From some experience writing our own collector to work around pytest collector slowness, I doubt they are capable of finding all types of tests that pytest supports. |
What do you mean "don't collect tests"? Hammett finds all pytest tests, fixtures and parametrized tests and runs them all. What do you mean "collect" is more than that? And also note that the benchmarks contain empty directory. What is pytest "collecting" that justifies it taking so much more time doing nothing? I can make a benchmark for explicitly sending the full list of tests. Shall we say 100? Or just one? Hammett will run circles around pytest on that too, I will gladly put down a $100 bet on this before having written the benchmark. Let's be serious though: the benchmarks are not super strange and show a real problem in pytest. There is no point in trying to hide simple facts. |
just as an observation point, while working on a pytest plugin (pytest-terraform) and using xdist on a many core machine, i've noticed that pytest does some fairly invasive/expensive introspection on any python module, basically a getattr on everything it can find in a module trying to find plugins and fixtures, looking at some of the perf traces this appears to cause significant runtime execution count and time accumulation, for xdist this gets multiplied where pytest collection appears to be dominate (60-70%) the overall test execution time. as a first step it might be nice just to have some instrumentation within pytest its unclear if that can be done out of tree via plugin to accurately capture the cost versus needing to be in core to get stats on the cost of plugin scanning, another would be reworking the pluggy hook and fixture mechanism to populate a module global so that lookups are explicit instead of implicit. a speed infrastructure setup would also be useful to prevent significant regressions over time. |
i started playing around PYTEST_DEBUG (didn't realize it existed on previous comment) and py-spy on test collection ie. the delta for running
actually running tests instead of just collection with xdist the PYTEST_DEBUG output showing the plugin processing of other data files in the same directory
for plugins that want to load files, it might be better to have some explicit file extension registration on the sessoin rather than dispatch on every file found. |
@kapilt pytest "has" to dispatch the hook on these files, because there might be a plugin which turns them into a test, check out https://docs.pytest.org/en/stable/example/nonpython.html for example. In cloud-custodian it seems all of the data files are nicely encapsulated in one directory, so you might want to tell pytest to skip it. There are amusingly many ways to do it: |
That non-python files feature seems a perfect example of something that should be opt in so not everyone gets the performance hit. |
@bluetech thanks for the pointers re config options, however the plethora of overlapping options around this is indicative that many people have had this issue. while I love the configurability/pluggability of pytest, ideally features should not impose significant costs to those not using them, ie zero cost abstractions. in this context, I would hope having plugins in an early hook register their interest in file extensions, would allow a fast non plugin based filtering at the session level when collecting files with a default to just py files, ie. zen of python explicit is better than implicit. |
I kinda agree, but that would be a breaking change at this point, so my preference is to just make it fast instead :) From the snippet above I see you are using pytest 5.3.5. There have been several performance improvements since then. It would be great if you can test it with the latest release, or even pytest master, or even even with #7671:
|
thanks, i appreciate the pitfalls of compatibility here, we had that version pinned, due to issues with pytest-sugar (output writer compat), along with pytest-xdist and pytest-terraform (both of which needed tmp factory root location). Trying #7671 vs trunk to compare, via |
I think pytest is overly concerned with backwards compatibility. The performance problems are much worse than breaking backwards compatibility, especially if the change is that users just needs to add some config in setup.cfg (or project.toml or whatever the flavor of the month is.. the need to check multiple files seems like a problem too, but smaller :P ) |
Thanks for the performance benchmarks!
Indeed it is a bit surprising, but at a cursory glance I see that your test files are all in the root I don't think in this case it would be a matter of pytest calling hooks for just some file extensions would really help much, hook calls are supposed to be cheap, and we still need to traverse the directories and list the files anyway. For this specific case I think the best bet is to use [pytest]
norecursedirs=data This simple configuration change should speed up your specific case significantly.
Not sure, I think it is the responsible thing to do given that pytest is part of the infrastructure of so many projects and users. Having said that, I agree that if we can speed things up by introducing a backward incompatible change that can be reverted in a simple manner (say a config option or plugins to implement an extra hook), we just need to make sure the change brings real gains to justify the breakage, and doing in a way that is the least disruptive as possible. |
Depends on your perspective. Yes it might introduce some breakage which will take some time to fix, but breakage+fixing is O(1), while overhead for invocations is O(n), sometimes even O(n*m). It's not at all obvious that is bigger than the other.
Agreed. |
Indeed there's a balance. 👍 |
Adding one data point here: running a single test on our django codebase takes about 25 seconds.
Running the same but narrowing the search path shaves a good 10 seconds off that:
Selecting a test that doesn't exist shaves another 10 to 12 seconds so it would seem that this does not comes from our own code:
[EDIT] After discussing with my coworkers it seems that not everyone has the same behavior. It seems to be this slow on MacOS only but not for everybody... |
Adding another data point:
Examining the hotpath it looks like recompiling the tests dominates the time collecting tests, with walking the ast while rewriting assertions is also significant. Has caching modules (after they've been rewritten) to disk for reuse been considered? I haven't given it much thought, but it seems plausible at first consideration. I assume that would turn subsequent collecting into primarily a lookup operation (except for modules that have been changed, of course). |
@holmanb recompiled modules are cached in pyc files with a pytest tag to denote the difference |
Of course caching doesn't really work well in cases of CI that often rebuild from scratch anyway. For example on GitHub Actions. |
this is the second followup to #401
its about time we take a good looks what eats time when executing pytest,
the last deep look was quite a while ago
The text was updated successfully, but these errors were encountered: