Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCollect CPU utilization statistics of CI builders #48828
Comments
alexcrichton
added
A-build
T-infra
labels
Mar 7, 2018
This comment has been minimized.
This comment has been minimized.
|
On Windows this can be done by taking advantage of job objects. If the entire build is wrapped in a job object then we can call |
matthiaskrgr
referenced this issue
Mar 8, 2018
Closed
travis: print some statistics from "top". #48841
This comment has been minimized.
This comment has been minimized.
|
I made script that will print # launch in travis as 'pathto/script.sh &'
while `sleep 30`
do
top -ibn 1 | head -n4 | tr "\n" " " | tee -a /tmp/top.log
echo "" | tee -a /tmp/top.log
doneSome findings: Cloning submodules jemalloc, libcompiler_buildtins and liblibc alone takes 30 seconds. While building bootstrap, compiling serde_derive, serde_json and bootstrap crates seems to take 30 seconds (total build time: 47 seconds). stage0: stage0 codegen artifacts: During stage1, rustc_errors and syntax_ext builds are approximately as slow as during stage0, rustc_plugins 2 minutes, one CGU. stage2: compiletest suite=run-make mode=run-make: Testing alloc stage1: Testing syntax stage1: Notes: |
This comment has been minimized.
This comment has been minimized.
|
As shown in #48480 (comment), the CPUs assigned to each job may have some performance difference:
The clock-rate 2.4 GHz vs 2.5 GHz shouldn't make any noticeable difference though (this would at most slow down by 7.2 minutes out of 3 hours if everything is CPU-bound). It is not enough to explain the timeout in #48480. |
This comment has been minimized.
This comment has been minimized.
|
I was working on https://github.com/alexcrichton/cpu-usage-over-time recently for this where it periodically prints out the CPU usage as a percentage for the whole system (aka 1 core on a 4 core machine is 25%). I only got Linux/OSX working though and was unable to figure out a good way to do it on Windows. My thinking for how we'd do this is probably download a binary near the beginning of the build (or set up some script). We'd then run Initially I was also thinking we'd just |
alexcrichton commentedMar 7, 2018
One of the easiest ways to make CI faster is to make things parallel and simply use the hardware we have available to us. Unfortunately though we don't have a lot of data about how parallel our build is. Are there steps we think are parallel but actually aren't? Are we pegged to one core for long durations when there's other work we could be doing?
The general idea here is that we'd spin up a daemon at the very start of the build which would sample CPU utilization every so often. This daemon would then update a file that's either displayed or uploaded at the end of the build.
Hopefully we could then use these logs to get a better view into how the builders are working during the build, diagnose non-parallel portions of the build, and implement fixes to use all the cpus we've got.
cc @rust-lang/infra