Context: I write Hypothesis, a randomized testing library for Python. It works "well" under py.test, but only in the sense that it ignores py.test almost completely other than doing its best to expose functions in a way that py.test fixtures can understand.
A major problem with using Hypothesis with py.test is that function level fixtures get evaluated once per top-level function, not once per example. When these fixtures are mutable and mutated by the test this is really bad, because you end up running the test against the fixture many times, changing it each time.
People keep running into this as an issue, but currently it seems to be impossible to fix without significant changes to py.test. @RonnyPfannschmidt asked me to write a ticket about this as an example use-case of subtests, so here I am.
So what's the problem?
A test using Hypothesis looks something like:
def test_some_stuff(a, b):
This translates into something approximately like:
def test_some_stuff(a, b=special_default):
if b == special_default:
for b in examples():
The key problem here is that examples() cannot be evaluated at collect time because it depends on the results of previous test execution.
The reasons of this in order of decreasing amount of "this seems to be impossible" (i.e. with the current feature set of py.test I have no idea how to solve the first and neither does anyone else, could maybe solve the second, could definitely do something about the third):
There's a somewhat similar issue in pytest-dev/pytest-qt#63 - this adds a way to pytest-qt to test Qt models to ensure they behave correctly.
The tests can't easily change the data in the model (as a model has a defined interface for getting the data from it, not necessarily for adding/removing/changing data), so the approach of the original C++ tests is to re-run all tests when the model changes, so you can "attach" the tester and then make the model do something and the tests rerun as soon as the model changes.
I've not found a satisfying way to do that yet since the tests aren't know at collection time. What the code is doing currently is to have a qtmodeltester.setup_and_run(model) method which runs the tests once and listens for changes, and the user then modifies the model as part of their (single) test.
This however poses several problems, e.g. how to tell the user which of the "sub-tests" has failed and which tests did run, etc.
@The-Compiler i think your use-case is fundamentally different
as far as i understand @DRMacIver needs sub-test level operations, setup/teardown
while you need something thats more like a set of attached checks that run per model change
I think both use-cases would be satisfied by having a way to generate new tests (or sub-tests) while a test is running. Then pytest would take care of running the new tests and handling setup/teardown for each one.
Generating new first-class tests while the tests are already running will be awkward for the UI, so I think subtests are the only option (for a start only the parent test is visible in the UI).
I wonder if, for hypothesis' case, there's an upperbound on the test runs necessary that can be determined at collection time.
There isn't right now, but there could be made to be one. However it's going to be somewhere between 10 and 100 times larger than the typical number of runs.
Also note that Hypothesis in default configuration runs 200 subtests per test as part of its typical run, so if you want to display those in the UI it's already going to be um, fun.
I see. The idea was to, as a workaround, generate as many testcases as possibly needed for hypothesis, and then just skip the ones that are not needed.
Yeah, I figured it would be something like that. It's... sortof possible but the problem is also that Hypothesis can't really know in advance what each example is going to be, so there'd have to be a bunch of work to match the two up. I think I would rather simply not support the feature than use this workaround.
I'm currently fooling around with this. Would it be an OK API if there's a way to instantiate sub-sessions (on the same config)?
@untitaker you mean subtests (#153)? or something else?
No, I meant to actually instantiate a new _pytest.Session within the existing test session. Nevermind, it seems to be unnecessary.
Meanwhile I've come up with https://gist.github.com/untitaker/49a05d4ea9c426b179e9, the thing works for function-scoped fixtures only.
@untitaker that looks pretty much like what i mean with subtests, however the way its implemented might add extra unnecessary setup/tardown cost due to nextitem
I'm not sure if we can set nextitem properly without changes to at least Hypothesis.
I'm not expecting this to work automatically. :-) Hypothesis doesn't depend on py.test by default, but I can either hook in to things from the hypothesis-pytest plugin or provide people with a decorator they can use to make this work (the former would be better).
What sort of unneccessary teardown/setup cost did you have in mind? Does it just run the fixtures an extra time?
Currently it seems that module-level fixtures are set up and torn down for each subtest. I wonder if that's because of the incorrect nextitem value.
Ah, yes, that would be unfortunate.
@untitaker thats exactly the problem, but i consider that a pytest bug - unfortunately its a structural one, so hard to fix before 3.0
as a hack you could perhaps use the parent as next item, that way the teardown_torwards mechanism should keep things intact
@untitaker i future i'd like to see a subtest mechanism help with those details
I'm currently experimenting with this, I fear that this might leak state to subsequent testfuncs in different modules/classes.
the state leak should be prevented by the outer runtest_protocol of the actual real test function
due to doing a teardown_torwards with a next item there cleanup should be expected,
but to ensure it works, a acceptance tests with a fnmatch_lines item is needed
I've updated the gist.
BTW should this hack rather go into hypothesis-pytest for trying it out, or do you already want to stabilize an API in pytest?
Also I'd like to hide the generated tests from the UI.
Yeah I was just about to ask if there was a way to do that. This looks great (just tried it locally), but I'd rather not spam the UI with 200 tests, particularly for people like me who typically run in verbose mode.
@untitaker should go into something external, and we should later on figure a feature test to kill it out
@DRMacIver the proper solution is still a bit away (it would hide the number of sub-tests)
however making that happen is a bit major, and between personal life and a job i cant make any promises for quick progress
right now i'm not even putting the needed amount of time into the pytest-cache merge and the yield test refactoring
That's of course totally fine. Life always takes priority over free work. I'm also probably not going to be that active on the Hypothesis side of this in the near term.
I don't think the number of subtests is relevant -- hypothesis would bump this count by hundreds for each testcase. I'd like to see the same exact UI as before.
To clarify, the number of collected tests in the pytest UI is still 1 with that gist.
@untitaker subtest's are named a test item execution time, not collection time
its needed for fixing #16 as well (and its perfectly fine to collect and report exactly one item in most cases)
(for hypothesis for example it would be very sensible to pick and choose what of the subtests to report in error cases (i.e. the minimized still failing examples), and generally hide the non-error cases)
A thing to note there is that I would like to at some point start reporting multiple distinct errors per test when multiple are found, even though Hypothesis currently only reports one error per test.
So I'm looking into this approach and I don't think it works. The problem is that the inner tests seem to run after the outer test has executed, which makes it impossible for minimize to work (and also to stop execution when the first failure is found). The behaviour Hypothesis needs is to run the test function and get an exception immediately if it fails.
Try again? I'm not sure if I'm not missing some critical setup now though.
Yes, this seems to work. FWIW my test for it working is just to add "assert len(s) <= 2" in test_inner: What should happen is that it prints out a falsifying example of s='000', and it does. Yay!
Although actually it's not quite right: Something you're doing is interfering with output capturing, so it prints the falsifying example in the wrong place. It normally appears in captured output (I have a pytest plugin for better reporting integration, so this isn't intrinsically a problem and I could just add it to the report in the same way)
Well, I think I've reached the dead end with trying random hooks! 😆
As far as I can tell, pytest's capturing stuff is not cleanly nestable.
currently capture is not nest-able, i have a rough plan to change that, but it needs discussion with @hpk42 and about 2 days of work to build a Prood of Concept - so its def not a soon item
I'm considering putting my hack above into a new package, as I need it for one of my projects. Did a better solution appear out there since then, or is there some other reason (other than shitty diagnostics) I shouldn't use it?
There is no better solution yet, but Please document it AS hack that might break between pytest minor releases
I've now published https://github.com/untitaker/pytest-subtesthack
There's also an experimental drop-in replacement for given, https://gist.github.com/untitaker/49a05d4ea9c426b179e9, but it's extremely buggy because it's too simple. I don't want to replicate all the argspec-juggling logic in hypothesis :(
Yeah, that's fair. I wrote it, and I don't want to replicate all the argspec juggling logic in Hypothesis :-)
I wonder if there's a way to split up given into unstable internal APIs such that I can hook better into it.
I'm open to suggestions. It's not totally obvious where that would be though.