Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup regr_test.py by running test cases concurrently #10714

Merged
merged 7 commits into from
Sep 23, 2023

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Sep 15, 2023

As we add more and more test cases, regr_test.py is getting kinda slow. Other than mypy_primer, it's now the slowest CI job we have by a long way (and we don't run mypy_primer on all PRs -- for example, it's skipped on this PR!).

The reason for the slowness is that we now have regression tests for 11 stubs packages. We run all regression tests on Python 3.8-12 inclusive, and we run them all on linux, darwin and win32. That adds up for a total of 165 subprocesses that are created for each run of the test when it's run with --all (the flag we use in CI).

At some point we may want to consider sharding this test between GitHub Actions workers, similar to the way we run mypy_test.py and pyright in CI. (We can also possibly reconsider whether we need to, e.g., run all tests on darwin, linux and Windows). For now, though, we can speed things up a lot just by running the subprocess concurrently using a ProcessPoolExecutor. This cuts the execution time in CI roughly in half, from around 5-6 minutes to around 2-3 minutes.

(N.B.: A ProcessPoolExecutor feels like a slightly blunt instrument here with a lot of overhead; I'm sure there are more efficient ways of spawning subprocesses concurrently. I got this to work reasonably quickly, however, and when I tried different approaches I quickly ran into race conditions. I think this is ~good enough for now.)

@AlexWaygood AlexWaygood marked this pull request as ready for review September 15, 2023 14:35
@AlexWaygood
Copy link
Member Author

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 17, 2023

Here's an alternative approach that uses asyncio to create the concurrent subprocesses rather than a ProcessPoolExecutor: https://github.com/python/typeshed/compare/main...AlexWaygood:typeshed:optimise-regrtest-async?expand=1.

It seems to work fine, without race conditions, and is a similar approach to what mypy_primer does. It's possibly a slightly more elegant approach. But it seems to consistently be a little slower.

@JelleZijlstra
Copy link
Member

Any specific reason to use processes rather than threads here? Seems like the workers are mostly waiting on external processes, so the GIL shouldn't be much of an issue. I tend to avoid multiprocessing as it's more prone to failing in weird ways than threads are.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 23, 2023

I initially tried a ThreadPoolExecutor, but quickly ran into weird race conditions where some of the mypy subprocesses couldn't find various stdlib modules. But I'll take another look and see if I can make threads work here. I agree they make more sense for this kind of thing.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Sep 23, 2023

I couldn't repro the race conditions anymore when I tried to, so I switched to a ThreadPoolExecutor. I'm a tiny bit wary of this after our experiences in #9537 (which never repro'd on Windows), but threads do feel like a better fit here, so let's see how it goes.

(Note that although we have the infrastructure setup to run test cases on stubs that have non-types dependencies, we don't yet have any test cases for any stubs with non-types dependencies. So there's some bits of this script that are currently "dead code, lying in wait". Those bits of the script might be more susceptible to #9537-type issues than the bits that are currently being used.)

@AlexWaygood AlexWaygood merged commit e40b5be into python:main Sep 23, 2023
62 checks passed
@AlexWaygood AlexWaygood deleted the optimise-regrtest branch September 23, 2023 14:39
@AlexWaygood
Copy link
Member Author

Thanks @JelleZijlstra!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants