Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect CPU utilization statistics of CI builders #48828

Open
alexcrichton opened this Issue Mar 7, 2018 · 4 comments

Comments

Projects
None yet
5 participants
@alexcrichton
Copy link
Member

alexcrichton commented Mar 7, 2018

One of the easiest ways to make CI faster is to make things parallel and simply use the hardware we have available to us. Unfortunately though we don't have a lot of data about how parallel our build is. Are there steps we think are parallel but actually aren't? Are we pegged to one core for long durations when there's other work we could be doing?

The general idea here is that we'd spin up a daemon at the very start of the build which would sample CPU utilization every so often. This daemon would then update a file that's either displayed or uploaded at the end of the build.

Hopefully we could then use these logs to get a better view into how the builders are working during the build, diagnose non-parallel portions of the build, and implement fixes to use all the cpus we've got.

cc @rust-lang/infra

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Mar 8, 2018

On Windows this can be done by taking advantage of job objects. If the entire build is wrapped in a job object then we can call QueryInformationJobObject with JobObjectBasicAccountingInformation to get a bunch of useful data.

@matthiaskrgr

This comment has been minimized.

Copy link
Contributor

matthiaskrgr commented Mar 8, 2018

I made script that will print top output into the travis log every 30 seconds log , raw log .

# launch in travis as 'pathto/script.sh &'
while `sleep 30`
do
top -ibn 1 | head -n4 | tr "\n" " " | tee -a /tmp/top.log
echo "" | tee -a /tmp/top.log
done

Some findings:

Cloning submodules jemalloc, libcompiler_buildtins and liblibc alone takes 30 seconds.

While building bootstrap, compiling serde_derive, serde_json and bootstrap crates seems to take 30 seconds (total build time: 47 seconds).

stage0:
Compiling tidy crate seems to take around 30 seconds.
Compiling rustc_errors takes at least 2 minutes, only one codegen-unit is used
Compiling syntax_ext takes 9 minutes, only one CGU used

stage0 codegen artifacts:
Compiling rustc_llvm takes 1,5 minutes, one CGU

During stage1, rustc_errors and syntax_ext builds are approximately as slow as during stage0, rustc_plugins 2 minutes, one CGU.

stage2:
rustdoc took 2 minutes to build, one CGU

compiletest suite=run-make mode=run-make:
It looks like there is a single test that takes around 3 minutes to complete and has no parallelization.

Testing alloc stage1:
building liballoc takes around a minute

Testing syntax stage1:
building syntax takes 1.5 minutes, one CGU

Notes:
When the load average dropped towards 1, I assumed only one codegen unit was active.
The script was only applied to the default pullrequest travis-ci configuration.

@kennytm kennytm referenced this issue Mar 8, 2018

Open

Tracking issue on Reducing bors cycle time #10

1 of 5 tasks complete
@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Mar 12, 2018

As shown in #48480 (comment), the CPUs assigned to each job may have some performance difference:

  • Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  • Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

The clock-rate 2.4 GHz vs 2.5 GHz shouldn't make any noticeable difference though (this would at most slow down by 7.2 minutes out of 3 hours if everything is CPU-bound). It is not enough to explain the timeout in #48480.

@alexcrichton

This comment has been minimized.

Copy link
Member Author

alexcrichton commented Mar 19, 2018

I was working on https://github.com/alexcrichton/cpu-usage-over-time recently for this where it periodically prints out the CPU usage as a percentage for the whole system (aka 1 core on a 4 core machine is 25%). I only got Linux/OSX working though and was unable to figure out a good way to do it on Windows.

My thinking for how we'd do this is probably download a binary near the beginning of the build (or set up some script). We'd then run stamp that-script > some-output.log just before we run stamp run-the-build.sh. That way we could correlate the two timestamps of each log (the main log and the some-output.log to similar moments in time.

Initially I was also thinking we'd just cat some-output.log at the end of the build and scrape it later if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.