Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: split Windows Azure tests in half #43468

Merged
merged 7 commits into from
Sep 10, 2021
Merged

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Sep 9, 2021

Trying an experiment for Windows CI where PYTEST_WORKERS is auto rather than 2.

@Dr-Irv Dr-Irv added CI Continuous Integration Windows Windows OS labels Sep 9, 2021
@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

how'd this work out?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 9, 2021

how'd this work out?

auto doesn't solve the problem. Current hypothesis is that the expression for skipping tests is wrong. Here's the evidence:

For the Windows py38 tests, which are "not slow and not network", we get: 122956 passed, 6965 skipped
For the Windows py39 tests, which are ""not slow and not network and not high_memory"", we get 124788 passed, 4943 skipped

Doesn't make sense that skipping more tests in the expression would yield less tests skipped.

That test is running now....

The other issue is that it isn't clear to me how many cores are actually assigned to the Azure instances. Using auto means use as many cores as are available. In the first set of tests under this PR, py38 and py39 both used two cores. In the latest push, py38 is using 4 cores and py39 is using 2. The more cores we can get, the faster these tests will run.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

38 and 39 might have different deps though

@mzeitlin11
Copy link
Member

In the latest push, py38 is using 4 cores and py39 is using 2. The more cores we can get, the faster these tests will run.

Just noting from looking at this in #42236, seemed like the opposite was actually true based on looking at a few runs using different numbers of cores.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 9, 2021

Revised hypothesis. The number of cores that we get from Azure is variable. But the issue here is that if you only have 2 cores and use both of them, performance can get worse than if you use just 1 core. Here is some evidence. On my 16 core laptop, I ran pytest --skip-slow -m "not single" -n XX pandas/tests/groupby for XX=1,2,4,8,16 and auto. Here are the timing results:

XX Time (secs)
16 32.09
8 30.16
4 37.56
2 55.04
1 90.53
auto 28.87

For "auto", it created 8 processes.

Note that using all 16 cores doesn't help - that's because these are hyperthreaded cores, and using them usually doesn't help computational performance. Secondly, even going from 1 core to 8 only sped things up by a factor of 3.

So if Azure gives us 2 cores, and we use both, that could be worse than just using 1 core. My latest commit is doing that, and also (hopefully) printing out the CPU info.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

oh so this really could be a timeout?

also is there a way to request 4 cores (min) from azure for these builds?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 9, 2021

oh so this really could be a timeout?

Yes. The last test confirms it. Here is the processor info that was obtained (which was the same for the py38 and py39 runs):

Caption                            DeviceID  MaxClockSpeed  Name                                            NumberOfCores  

Intel64 Family 6 Model 85 Stepping 7  CPU0      2594           Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz  2              

In the last test, I had it not do any parallelization, and the py38 run timed out, the py39 run did not. So I think this has to do with where the virtual container is sitting on the hardware. Most of the time before, I'd see py38 succeed and py39 time out.

also is there a way to request 4 cores (min) from azure for these builds?

I don't know the answer to that! Who is responsible for the relationship we have with azure?

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

I don't know the answer to that! Who is responsible for the relationship we have with azure?

umm, don't really have anything specific

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 9, 2021

Another possible solution is to split the tests for Azure. Here's an example from Microsoft on how to do that:
https://github.com/PBoraMSFT/ParallelTestingSample-Python

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

maybe best is just to remove some tests (e.g. have a marker that we dont' use here) and/or move some to a slow build.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 9, 2021

maybe best is just to remove some tests (e.g. have a marker that we dont' use here) and/or move some to a slow build.

Issue here is that the slowest test is 10 seconds. Our issue is volume (over 124K tests) and on Azure, we're running on a "slow" machine compared to Travis.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

maybe best is just to remove some tests (e.g. have a marker that we dont' use here) and/or move some to a slow build.

Issue here is that the slowest test is 10 seconds. Our issue is volume (over 124K tests) and on Azure, we're running on a "slow" machine compared to Travis.

right one way is for example to NOT run any pandas/tests/windows (which @mroeschke disabled for a while)
not great, but we do really test these

@jbrockmendel
Copy link
Member

If we cant get more cores/build, could we run more azure builds, e.g. for each of the current azure builds split it in two with one ignoring tests/window and the other doing only tests/window?

@mroeschke
Copy link
Member

Or maybe we can test a separate Github Actions Window runner for a subset of tests (seems to only be a 2 core CPU https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 10, 2021

I'm trying something else here by setting up a PYTEST_TARGET in the azure pipelines and then for Windows having separate jobs where I separate the tests into "pandas/tests/[a-i]*" and "pandas/tests/[j-z]*"

@Dr-Irv Dr-Irv changed the title CI: try windows with auto workers CI: split Windows Azure tests in half Sep 10, 2021
@Dr-Irv Dr-Irv marked this pull request as ready for review September 10, 2021 04:09
@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

looking good

can u close and reopen to force a run again (just checking on the timeouts)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 10, 2021

can u close and reopen to force a run again (just checking on the timeouts)

Closing and reopening..

@Dr-Irv Dr-Irv closed this Sep 10, 2021
@Dr-Irv Dr-Irv reopened this Sep 10, 2021
@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@Dr-Irv this looks good. i think splitting the macos tests along the same lines makes sense as well (in this PR or followon)

@jreback jreback added this to the 1.4 milestone Sep 10, 2021
@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@simonjayhawkins lmk if you want to backport (either way ok by me)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 10, 2021

@jreback all green

We also have the option of speeding up some of the other checks by doing a similar split of the tests. Some of them are taking close to an hour. That would allow people to get quicker feedback on their PRs. If you think that's a good idea, I could do another PR for that.

@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@jreback all green

We also have the option of speeding up some of the other checks by doing a similar split of the tests. Some of them are taking close to an hour. That would allow people to get quicker feedback on their PRs. If you think that's a good idea, I could do another PR for that.

yep, we actually have a pretty large azure allocation, so a few more jobs per PR is fine.

@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

going to backport this (@simonjayhawkins can always not merge it on 1.3 if too many conflicts)

@jreback jreback modified the milestones: 1.4, 1.3.3 Sep 10, 2021
@jreback jreback merged commit 6e19bdc into pandas-dev:master Sep 10, 2021
@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

thanks @Dr-Irv

@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app

This comment has been minimized.

@lumberbot-app

This comment has been minimized.

@jreback
Copy link
Contributor

jreback commented Sep 10, 2021

@Dr-Irv if you can push the backport as above (guess we have a conflict)

Dr-Irv added a commit to Dr-Irv/pandas that referenced this pull request Sep 10, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 10, 2021

@Dr-Irv if you can push the backport as above (guess we have a conflict)

Created this PR #43496

Not sure which labels and milestones to put on it, and the backport PR is showing conflicts

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Sep 10, 2021

@jreback Bad PR #43496 (was merging to master). New PR #43497 merges to 1.3.x .

@jbrockmendel
Copy link
Member

Awesome, thanks for figuring this out @Dr-Irv

@Dr-Irv Dr-Irv mentioned this pull request Sep 11, 2021
lithomas1 pushed a commit that referenced this pull request Sep 13, 2021
* Backport PR #43468: CI: split Windows Azure tests in half

* split macos tests in 2 (#43517)

* fix macos backport to use 3.7
@Dr-Irv Dr-Irv deleted the wintests branch February 13, 2023 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants