Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious appveyor 32-bit test timeouts #46903

Closed
arielb1 opened this issue Dec 21, 2017 · 9 comments
Closed

Spurious appveyor 32-bit test timeouts #46903

arielb1 opened this issue Dec 21, 2017 · 9 comments
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-slow Issue: Problems and improvements with respect to performance of generated code. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Dec 21, 2017

The appveyor 32-bit MinGW test builders on appveyor are sometimes slower than expected and time out, which causes some of its builders to exceed the 3 hour limit (this had also happened I think in the start of December, if someone can bother digging up these PRs).

It appears that a "good" build (e.g. https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5766) takes 150 minutes, while a "bad" build on the same code can exceed the 3 hour (180 minutes) limit.

It appears that in some cases (e.g. https://ci.appveyor.com/project/rust-lang/rust/build/1.0.5551) other builders also get close to the limit, but I haven't seen any of the hitting it yet. The reason appears to be that the 32-bit test builders (both MSVC and GNU) are the slowest, taking the "full" 150 minutes even on a good day.

I'm not that sure what the best solution is - eventually we could play with checkpoint/restart, but I would not want to do that on Windows first.

Maybe it's possible to investigate the cause of the slowness, or to bump the time limit, or to split the pc-windows-gnu builders (the latter would also speed up the cycle time).

However, the Windows 32-bit test builders being the slowest of our entire group seems to be a good cause to split them (this also makes some sense, because they spawn a lot of processes, which is slow on Windows).

Cases:

@arielb1 arielb1 added A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Dec 21, 2017
@kennytm kennytm added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Dec 21, 2017
@pnkfelix pnkfelix added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 21, 2017
@alexcrichton
Copy link
Member

Picking two random logs good bad the major difference seems to be that the good log finishes compiling the compiler at 01:01:30, whereas the bad log finishes at 01:21:05, a 20 minute delay from the original one. AFAIK no real extra work was done in the bad log. I believe that AppVeyor doesn't guarantee a constant level of performance (shared hosting and whatnot) so I think that we just get less CPU time during peak hours (or at least that's what I think).

In that sense I think the only real solution here is to do less work per job. That may mean cutting tests from 32-bit MinGW tests or sharding the builder.

@kennytm
Copy link
Member

kennytm commented Jan 6, 2018

#47154 may be a cause to the recent explosion in timeouts. The timing also match since #46278 is merged at 2018-01-01T19:04:27Z. There is a fix in #47161.

#46910 has caused about 40–50% increase in time spent on fulldeps tests. But it is not sufficient to explain the previous timeouts since that just means an additional 4 minutes at most.

@kennytm
Copy link
Member

kennytm commented Jan 9, 2018

#47161 has landed but the error rate is still not decreasing 😢

@alexcrichton
Copy link
Member

I've done some analysis of our historical trends to see what's going on here. This is specifically for the i686-pc-windows-msvc builder that's running tests on AppVeyor

First up we have the trend of the total build time over time:

https://i.imgur.com/ePWXNQX.png

Clearly we're on the up and up!

Next I broke it down by stage. Here I was taking a look at various stages in the build:

https://i.imgur.com/akskz7I.png

Here we can see for sure that various stages are getting slower, and if we look at each of them in isolation (not stacked up) we get:

https://i.imgur.com/1pmu9m8.png

which from this seems to indicate:

  • The run-pass test suite is getting steadily slower over time. I'm not sure if this is a slower compiler or more tests, but my guess is a slower compiler.
  • The bootstrap itself is getting steadily slower over time. Both stage0 and stage1 are getting slower at what appears to be roughly the same pace.
  • Something I haven't focused on here (the "other" blob) has added nearly a half hour to the build time over the past month ish

The raw data (not smoothed, but stacked and not stacked) is unfortunately pretty hard to decipher. I also unfortunately don't quite know where to go from here..

@withoutboats
Copy link
Contributor

Surely the size of the code base and test suite is growing over time, I think this is the expected result unless compiler speed is improving at a greater rate than the code base is growing (which seems unlikely).

@alexcrichton
Copy link
Member

@withoutboats I agre yeah but there's been a severe uptick over the past ~200 builds which means our build time is increasing way faster than it was before, which seems worrisome..

@Aaron1011
Copy link
Member

This seems to be another example: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6426/job/do1stdu2mywwkyf7 MSYS_BITS=32, RUST_CONFIGURE_ARGS=--build=i686-pc-windows-gnu

bors added a commit that referenced this issue Feb 24, 2018
Split MinGW tests into two builders on AppVeyor

Run-pass and compile-fail tests appear to take the most significant chunk of time, so split them into their own builder.

Should help with #46903.

r? @kennytm
cc @alexcrichton
@Mark-Simulacrum
Copy link
Member

Closing as fixed. We've had multiple successful builds on AppVeyor, the 32-bit MinGW builders are both now around 2 hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-slow Issue: Problems and improvements with respect to performance of generated code. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants