Switch to pytest from our homegrown test runner #1673

refi64 · 2016-06-07T20:40:16Z

Ok, so when I saw the PR a while back that "parallelized" the test runner, I was super hyped. But the hype wore off when I realized that the unit tests were still run sequentially.

In particular, it continuously bugs me how, when running the full unit test suite, there's absolutely no sort of progress indicator of any kind. To make things worse, runtests.py is largely undocumented, leaving me to wonder why stuff like ./runtests.py unit-test -a '*something*' works but not ./runtests.py -a '*something*'.

In addition, to an extent, -a is slightly...useless. I mean, because the naming convention for tests is so inconsistent, it's rarely ever useful.

Also, test fixtures are a cruel mess. It took me 30 minutes of debugging to figure out why the tests were giving out obscure KeyErrors that I couldn't reproduce in my own code. Turns out, the fixture I was using didn't define list. But defining list caused other tests to brutally fail, so then I just created a new fixture. IMO this whole thing should either be reworked or (better yet) completely gutted and thrown into a fire pit. A very hot fire pit. With sulfur.

So I see two options:

Move to a different test runner. Some have suggested pytest, which I personally really like! Pytest has the xdist plugin, which supports running multiple tests in parallel.
Revamp the current test runner. Ideally, this would include fixing fixtures (oh, the irony...), improving runtests.py, and making the actual unit tests run in parallel, preferably also with a progress bar of some sort.

Thoughts? I could try and help out here a bit if you guys come to a consensus of some sort.

The text was updated successfully, but these errors were encountered:

gnprice · 2016-06-07T22:23:15Z

Yep -- I think we all agree that the current test runner is pretty frustrating. Last week at the PyCon sprints where we had a bunch of new contributors getting involved, I found myself apologizing every time I explained how to filter tests (with the positional argument and with -a).

We were actually just discussing this also on #1668, and so far everyone agrees that the right answer will be to move to something like pytest. If you want to try putting together a PR, that would be very useful!

Pytest is (mostly for good reasons) kind of a complicated beast with a lot of choices in how to set it up, so I recommend discussing early and often, either on this thread or on a work-in-progress PR.

ddfisher · 2016-06-07T22:25:29Z

I also hate hate hate the builtins test fixtures. +1 to killing them with fire. They only exist for perf reasons, and I'm not at all sure they're worth it.

gvanrossum · 2016-06-07T22:29:23Z

A big problem here seems to be that (for reasons having to do with mypy's early history as a separate language, probably) the non-data-driven tests use a set of classes that is custom to mypy's "myunit" test runner (which is wrapped by runtests.py). I don't know enough about pytest to understand whether we'd have to redo all that infrastructure. I also don't know how easy it is to get pytest to understand our data-driven tests. It would be pretty frustrating if pytest treated each file of test data as a single test.

gnprice · 2016-06-07T22:44:09Z

Yeah, one thing we'll definitely want is for pytest to see the individual
test cases inside the data files. I think it should be possible to do this
pretty nicely with appropriate use of pytest.mark.parametrize:
https://pytest.org/latest/parametrize.html but I am not a pytest expert,
and this is one of the things I had in mind when I said that pytest is
something of a complicated beast with a lot of choices in how to set it up.
So that will be for Ryan or whoever takes this on to figure out. :-)

On Tue, Jun 7, 2016 at 3:29 PM, Guido van Rossum notifications@github.com
wrote:

A big problem here seems to be that (for reasons having to do with mypy's
early history as a separate language, probably) the non-data-driven tests
use a set of classes that is custom to mypy's "myunit" test runner (which
is wrapped by runtests.py). I don't know enough about pytest to understand
whether we'd have to redo all that infrastructure. I also don't know how
easy it is to get pytest to understand our data-driven tests. It would be
pretty frustrating if pytest treated each file of test data as a single
test.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1673 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AABuDRvZfrjPF-sEQfTUhsdfDcXEFYvgks5qJfDDgaJpZM4IwVl5
.

refi64 · 2016-06-07T22:56:12Z

I'll try to see if I can tackle this over the weekend!

JukkaL · 2016-06-08T11:01:59Z

Cool, switching to a standard test runner would have many potential benefits. (Note that this has been discussed before, see #380. That issue also explains why things are as they are.)

The last time the test runner was rewritten we had a bunch of issues and regressions and let's try to avoid them this time. Here are some things to consider:

Running a single test case shouldn't be (significantly) slower than it's now, as this is critical to productivity.
Generally try to support current workflows and try not to make them less convenient or slower. Otherwise productivity could even be impaired instead of improved due to the switch, at least in the short term. I don't have a full list, but here are some:
- Running a single unit test or a bunch of related unit tests (based on keywords). The -k py.test option seems to do the trick.
- Running all "unit tests" (the idea is to make it easy to run only fast test cases locally).
- Running all tests in parallel locally.
- Running all tests in parallel on Travis CI.
- Reporting test failures and file locations clearly. This includes locations of test cases in .test data files and generally whatever (useful) output the current runner generates.
Try to only change one thing at a time. For example, don't try to deal with test fixtures at the same time as switching to py.test. Also, try to avoid touching the .test data files when switching the test runner.

Test fixtures are mostly a separate issue. I'll file a new issue.

gvanrossum · 2016-06-08T18:55:45Z

FWIW the fast way to run a single test case is currently to use the myunit incantation given in README.md, ignoring runtests.py. It would be much nicer if pytest could handle all of these.

gnprice · 2016-06-08T21:17:28Z

Yeah, "change one thing at a time" is definitely good advice for this kind of migration.

I'd recommend taking that farther, even, and try to find a way to migrate one section of things at a time. For example, one good strategy could be to take just one of the larger "tasks" in runtests.py, like the testcheck driver, and rewrite that driver using pytest, so that

the tests can be run directly with a nice py.test command, and also
runtests.py continues to run it, now through that py.test command.

You'd want to preserve the ability to run just one or a few related unit tests at a time (perhaps by having runtests.py pass any -a argument it gets straight through to py.test -k, if it's running the task you migrate). And I think if you reuse DataDrivenTestCase and parse_test_cases from mypy.test.data, perhaps with some adaptations, it should be possible to keep parsing the *.data files unmodified and pass the results to a mechanism like pytest.mark.parametrize so that pytest knows about all the separate test cases.

A PR that does that for one driver without making a ton of changes in total would be a really solid step forward. There'll probably be discussion on the details (made possible by having the code up in a PR), and then once it's merged it should be pretty smooth sailing to do the same for most of the other drivers. Once they're all converted, runtests.py can be dropped.

@kirbyfan64

…over to it (#1944) This is a step toward #1673 (switching entirely to pytest from myunit and runtests.py), using some of the ideas developed in @kirbyfan64's PR #1723. Both `py.test` with no arguments and `py.test mypy/test/testcheck.py` work just as you'd hope, while `./runtests.py` continues to run all the tests. The output is very similar to the myunit output. It doesn't spew voluminous stack traces or other verbose data, and it continues using `assert_string_arrays_equal` to produce nicely-formatted comparisons when e.g. type-checker error messages differ. On error it even includes the filename and line number for the test case itself, which I've long wanted, and I think pytest's separator lines and coloration make the output slightly easier to read when there are multiple failures. The `-i` option from myunit is straightforwardly ported over as `--update-data`, giving it a longer name because it feels like the kind of heavyweight and uncommon operation that deserves such a name. It'd be equally straightforward to port over `-u`, but in the presence of source control I think `--update-data` does the job on its own. One small annoyance is that if there's a failure early in a test run, pytest doesn't print the detailed report on that failure until the whole run is over. This has often annoyed me in using pytest on other projects; useful workarounds include passing `-x` to make it stop at the first failure, `-k` to filter the set of tests to be run, or (especially with our tests where errors often go through `assert_string_arrays_equal`) `-s` to let stdout and stderr pass through immediately. For interactive use I think it'd nearly always be preferable to do what myunit does by immediately printing the detailed information, so I may come back to this later to try to get pytest to do that. We don't yet take advantage of `xdist` to parallelize within a `py.test` run (though `xdist` works for me out of the box in initial cursory testing.) For now we just stick with the `runtests.py` parallelization, so we set up a separate `py.test` command for each test module.

gnprice · 2016-07-27T23:33:32Z

With #1944 merged, the migration is begun!

Some next steps, not necessarily in a particular order:

Migrate testcmdline, which is the other DataDrivenTestCase-using driver. This should be a straightforward re-use of what's in Set up to use pytest for our data-driven tests, and switch testcheck over to it #1944.
Migrate the other parse_test_cases-using drivers that don't use DataDrivenTestCase -- testparse, testpythoneval, testsemanal, teststubgen, testtransform, testtypegen. This should generally also be a pretty straightforward re-use of what's in Set up to use pytest for our data-driven tests, and switch testcheck over to it #1944, with a little more massaging of the test code.
- testpythoneval is especially appealing because it's the long pole and in combination with xunit (below) will make a full test run faster.
Migrate the non-parse_test_cases-using drivers: testargs, testdocstring, testgraph, testinfer, testlex, testmoduleinfo, testsolve, testsubtypes, testtypes. These probably become plain old bog-standard pytest tests. Probably will require loosening up python_functions and/or python_classes in pytest.ini, which in turn may require some tweaking elsewhere to avoid spuriously picking up functions back in parse_test_cases-land that are meant as helpers and not as test cases themselves.
Add the xdist plugin, do some testing that it's working correctly, and set up pytest.ini so that when someone just types py.test they get the benefit of it by default. Then adjust runtests.py to fit with that. Initially this could just be runtests.py passing -n 1 to knock out xdist and carrying on with its own concurrency model... but the real prize is when it doesn't do that. Say, run pytest alone first, so it can have all the cores to itself, then everything else with the runtests/waiter concurrency mechanism.
Cut out myunit from the code and documentation, simplifying things.
Migrate everything else in runtests.py: flake8, type-checking, and importing.
Cut out runtests.py and mypy.waiter from code and documentation, simplifying things again.

The big prizes here are the simplification from cutting things out at the end; and the speedups from getting effective parallelism, especially on testpythoneval and other individual slow drivers.

gnprice · 2016-07-27T23:34:57Z

Oh, also:

Incorporate the test data file (like check-classes.test) into what py.test -k can match against. It's already in the reporting, which is something I've wanted for a while and I know at least some other people have too.

refi64 · 2016-07-27T23:46:33Z

Uhhh...I think I kind of screwed this one up... So sorry! Glad @gnprice was able to actually, y'know, finish it!

gnprice · 2016-07-28T01:35:27Z

@kirbyfan64 Nothing to apologize for! Thanks for your work in #1723 -- I drew on that for #1944 and it's now landed. Want to take on some of the further steps of the migration? ;-)

In particular I think you mentioned in a comment on #1723 that you'd already done some testing of xdist, and gotten it to work after a bugfix upstream. Setting things up so a plain py.test command runs things in parallel by default would be awfully nice...

ambv · 2017-01-05T06:37:17Z

The biggest issue with test speed now is not the runner anymore but excessive subprocessing and disk thrashing. If we switch to py.test but still have to subprocess for every myunit invocation and for every mypy test or flake8 run, it's not going to win us much, especially with so many [cases] generating temporary throw-away *.py files.

As far as I can tell, the myunit *.test format is very appealing because it's minimal boilerplate. There's no plans to move away from that syntax, right?

Are there plans to make myunit use the API to not have to fire a fresh mypy subprocess for every check? mypy --help takes 0.5s on my laptop. Since myunit is already subprocessed from runtests.py and it doesn't itself do any parallelization, there's little point in invoking mypy via the command line. We could start simplifying myunit so that it just becomes a set of helpers for pytest to handle *.test as parameters, like Greg is suggesting above. Then, making those tests use the API instead of command line mypy, would give us the biggest efficiency win.

This change is a step towards removing `runtests.py` (see #1673). The exclusion list in the flake8 configuration in `setup.cfg` has been updated to enable running the linter from the root of the project by simply invoking `flake8`. This enables it to leverage its own file discovery and its own multiprocessing queue without excessive subprocessing for linting every file. This gives a minor speed up in local test runs. Before: total time in lint: 130.914682 After: total time in lint: 20.379915 There's an additional speedup on Travis because linting is now only performed on Python 3.6. More importantly, this means flake8 is now running over all files unless explicitly excluded in `setup.cfg`. This will help avoiding unintentional omissions in the future (see comments on #2637). Note: running `flake8` as a single lazy subprocess in `runtests.py` doesn't sacrifice any parallelism because the linter has its own process pool. Minimal whitespace changes were required to `mypy_extensions.py` but in return flake8 will check it now exactly like it checks the rest of the `mypy/*` codebase. Those are also done on #2637 but that hasn't landed yet. Finally, flake8-bugbear and flake8-pyi were added to test requirements to make the linter configuration consistent with typeshed. I hope the additional checks will speed up future pull requests by automating bigger parts of the code review. The pyi plugin enables forward reference support when linting .pyi files. That means it's now possible to run `flake8` inside the typeshed submodule or on arbitrary .pyi files during development (which your editor could do for you), for example on fixtures. See discussion on #2629 on checks that are disabled and why.

elazarg · 2017-08-26T21:12:34Z

The way I see it, moving the rest of the tests (i.e. the non data-driven tests) to pytest requires using @pytest.fixture: see POC implementation of the fixture file with functions like

@pytest.fixture
def fx_contra() -> TypeFixture:
    return TypeFixture(CONTRAVARIANT)

And a test, with test cases such as

def test_is_proper_subtype_contravariance(fx_contra) -> None:
    assert_true(is_proper_subtype(fx_contra.gsab, fx_contra.gb))

Type checking the consistency of pytest's fixtures will require nontrivial mypy plugin, at least. Is it OK to leave it unchecked, and open a separate issue for that?

JukkaL · 2017-08-30T14:58:08Z

Pytest can also run tests defined using the stdlib unittest module. unittest is closer to myunit and using it would make the migration easier fir non-data-driven tests. In particular, the fixtures should be easy to migrate. Type checking test cases would also be simpler.

elazarg · 2017-09-20T07:32:39Z

#3880 completed the migration of the data-driven tests to pytest.

gvanrossum · 2017-09-20T14:24:40Z

Great! Thanks for working through the various test modules.

What should be the criteria for closing this issue? The retirement or runtests.py? Why would we need that script still if we can run all tests through pytest?

JukkaL · 2017-09-20T15:19:11Z

runtests.py also type checks various things, verifies that modules can be imported, and it runs lint. These would need to be migrated to pytest as well, in addition to the remaining non-data-driven myunit tests.

A reasonable next milestone would be the removal of myunit.

ethanhs · 2017-12-15T21:24:04Z

After #4369, I think we should replace some of the simpler assert* functions, as pytest will work better without them and using bare assert functions should lead to nicer output on errors.

This would entail e.g. replacing assert_false calls to be just assert not ..., but leaving assert_join and other non-straightforward assert functions.

@JukkaL

Migrate the remaining myunit test cases to pytest. This uses unittest.TestCase as a shortcut, as suggested by @JukkaL. It is aliased as Suite. The transition is crude, aims at functionality; I intend to clean it up in a later PRs. Work towards #1673.

JukkaL · 2018-02-06T19:22:26Z

Myunit is finally gone. Thanks @elazarg!

ilevkivskyi · 2018-05-18T18:49:08Z

We are not quite there still, the current status is:

$ ./runtests.py -l
0:lint
1:pytest
2:check file runtests.py
3:check file waiter.py
4:check package mypy
5:check stubs
6:check stdlibsamples (3.2)
7:check file test-data/samples/bottles.py
8:check file test-data/samples/class.py
9:check file test-data/samples/cmdline.py
10:check file test-data/samples/crawl.py
11:check file test-data/samples/crawl2.py
12:check file test-data/samples/dict.py
13:check file test-data/samples/fib.py
14:check file test-data/samples/files.py
15:check file test-data/samples/for.py
16:check file test-data/samples/generators.py
17:check file test-data/samples/greet.py
18:check file test-data/samples/guess.py
19:check file test-data/samples/hello.py
20:check file test-data/samples/input.py
21:check file test-data/samples/itertool.py
22:check file test-data/samples/regexp.py

@elazarg As I understand you are working on this, right?

Using the api, similarly to #5087. stdlibsamples is not included in this PR since it requires special handling of working directory and paths. This is another item for #1673.

ilevkivskyi · 2018-07-18T23:40:11Z

This is now finally fixed with #5274 🎉

gnprice changed the title ~~Either fix up the current test runner or move to a new one~~ Switch to pytest from our homegrown test runner Jun 7, 2016

JukkaL added the feature label Jun 8, 2016

JukkaL mentioned this issue Jun 8, 2016

Rethink how we use stubs in test cases #1675

Closed

This was referenced Jun 8, 2016

Use a real test runner #380

Closed

Port test suite to unittest #381

Closed

refi64 mentioned this issue Jun 18, 2016

Start moving to pytest #1723

Closed

gnprice mentioned this issue Jul 27, 2016

Set up to use pytest for our data-driven tests, and switch testcheck over to it #1944

Merged

refi64 added a commit to refi64/mypy that referenced this issue Jul 28, 2016

Always run pytest tests in parallel (ref. python#1673)

0c9a293

refi64 mentioned this issue Jul 28, 2016

Always run pytest tests in parallel #1951

Merged

refi64 added a commit to refi64/mypy that referenced this issue Jul 29, 2016

Always run pytest tests in parallel (ref. python#1673)

6def917

gvanrossum pushed a commit that referenced this issue Jul 29, 2016

Always run pytest tests in parallel (ref. #1673) (#1951)

f6ad7bf

gnprice added this to the 0.5 milestone Aug 4, 2016

ambv mentioned this issue Jan 5, 2017

Speed up testpythoneval #2635

Merged

ambv mentioned this issue Jan 5, 2017

Externalize flake8 linting, introduce additional checks #2638

Merged

gvanrossum added the priority-1-normal label Jan 6, 2017

elazarg mentioned this issue Aug 1, 2017

Move testtransform to pytest #3788

Merged

This was referenced Aug 22, 2017

Move testtypegen to pytest #3861

Merged

Move testsemanal to pytest #3866

Merged

Move pythoneval and cmdline to pytest #3870

Merged

Move data driven part of stubgen to pytest #3880

Merged

ethanhs mentioned this issue Dec 4, 2017

Plans for mypy.myunit #908

Closed

ethanhs mentioned this issue Dec 14, 2017

provide tools for testing projects that use mypy (e.g. mypy.tests.data pytest plugin) #4366

Closed

elazarg mentioned this issue Dec 15, 2017

Remove myunit #4369

Merged

This was referenced May 7, 2018

Cleanup runtests.py #5011

Merged

Refactoring: merge MypyDataCase into DataDrivenTestCase #5013

Merged

ilevkivskyi added priority-0-high topic-tests and removed priority-1-normal labels May 18, 2018

elazarg mentioned this issue May 20, 2018

Add self-check to pytests #5087

Merged

elazarg mentioned this issue Jun 4, 2018

Move test_stubs and test_samples to pytest #5142

Merged

This was referenced Jun 24, 2018

Move linting to pytest #5270

Closed

Migrate stdlibsamples test to pytest #5271

Merged

Remove runtests.py #5274

Merged

ilevkivskyi closed this as completed Jul 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to pytest from our homegrown test runner #1673

Switch to pytest from our homegrown test runner #1673

refi64 commented Jun 7, 2016

gnprice commented Jun 7, 2016

ddfisher commented Jun 7, 2016

gvanrossum commented Jun 7, 2016 via email

gnprice commented Jun 7, 2016

refi64 commented Jun 7, 2016

JukkaL commented Jun 8, 2016

gvanrossum commented Jun 8, 2016

gnprice commented Jun 8, 2016 •

edited

Loading

gnprice commented Jul 27, 2016

gnprice commented Jul 27, 2016

refi64 commented Jul 27, 2016

gnprice commented Jul 28, 2016

ambv commented Jan 5, 2017

elazarg commented Aug 26, 2017

JukkaL commented Aug 30, 2017

elazarg commented Sep 20, 2017

gvanrossum commented Sep 20, 2017

JukkaL commented Sep 20, 2017

ethanhs commented Dec 15, 2017

JukkaL commented Feb 6, 2018

ilevkivskyi commented May 18, 2018

ilevkivskyi commented Jul 18, 2018

Switch to pytest from our homegrown test runner #1673

Switch to pytest from our homegrown test runner #1673

Comments

refi64 commented Jun 7, 2016

gnprice commented Jun 7, 2016

ddfisher commented Jun 7, 2016

gvanrossum commented Jun 7, 2016 via email

gnprice commented Jun 7, 2016

refi64 commented Jun 7, 2016

JukkaL commented Jun 8, 2016

gvanrossum commented Jun 8, 2016

gnprice commented Jun 8, 2016 • edited Loading

gnprice commented Jul 27, 2016

gnprice commented Jul 27, 2016

refi64 commented Jul 27, 2016

gnprice commented Jul 28, 2016

ambv commented Jan 5, 2017

elazarg commented Aug 26, 2017

JukkaL commented Aug 30, 2017

elazarg commented Sep 20, 2017

gvanrossum commented Sep 20, 2017

JukkaL commented Sep 20, 2017

ethanhs commented Dec 15, 2017

JukkaL commented Feb 6, 2018

ilevkivskyi commented May 18, 2018

ilevkivskyi commented Jul 18, 2018

gnprice commented Jun 8, 2016 •

edited

Loading