Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Update binutils for powerpc64 and powerpc64le #58986

Merged
merged 2 commits into from
Mar 20, 2019

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Mar 7, 2019

Cargo powerpc64 and powerpc64le are seeing SIGILL crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
#57345 (comment)

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes #57345
Closes rust-lang/rustup#1620

r? @alexcrichton

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 7, 2019
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:06c91bf0:start=1551932172688926405,finish=1551932173597035017,duration=908108612
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
Setting environment variables from .travis.yml
---
[00:02:00]  ---> 5a459b5b6133
[00:02:00] Successfully built 5a459b5b6133
[00:02:00] Successfully tagged rust-ci:latest
[00:02:00] Built container sha256:5a459b5b6133c303505bda700757809a29e55343b4a3b83d1f7cae08c62dcfd8
[00:02:00] Uploading finished image to s3://rust-lang-ci-sccache2/docker/3010b440ec30a8a3a8347df228d54e1ca9d78fcce0ba5b618ad3c3d0ade5b8a5f2d5b466d856e4e2587679d1e7f6f47e49cd9fa80519b41565b1bbc2f1be23e2
[00:02:50] upload failed: - to s3://rust-lang-ci-sccache2/docker/3010b440ec30a8a3a8347df228d54e1ca9d78fcce0ba5b618ad3c3d0ade5b8a5f2d5b466d856e4e2587679d1e7f6f47e49cd9fa80519b41565b1bbc2f1be23e2 Unable to locate credentials

[00:02:50] travis_time:end:00c4f43c:start=1551932197322908505,finish=1551932354505169632,duration=157182261127
[CI_JOB_NAME=x86_64-gnu-llvm-6.0]
[00:02:50] [CI_JOB_NAME=x86_64-gnu-llvm-6.0]
---

[00:05:09] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:05:09] tidy error: /checkout/src/ci/docker/dist-powerpc64-linux/build-powerpc64-toolchain.sh:26: line longer than 100 chars
[00:05:11] some tidy checks failed
[00:05:11] 
[00:05:11] 
[00:05:11] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:05:11] 
[00:05:11] 
[00:05:11] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:05:11] Build completed unsuccessfully in 0:00:45
[00:05:11] Build completed unsuccessfully in 0:00:45
[00:05:11] make: *** [tidy] Error 1
[00:05:11] Makefile:68: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:09a0ed6e
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Mar  7 04:21:36 UTC 2019
---
travis_time:end:37b033e4:start=1551932496795864106,finish=1551932496800640516,duration=4776410
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:003ffaa8
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1e354676
travis_time:start:1e354676
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:008d9168
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@alexcrichton
Copy link
Member

@bors: r+

Nice finds!

@bors
Copy link
Contributor

bors commented Mar 7, 2019

📌 Commit c843fe7 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Mar 7, 2019

🌲 The tree is currently closed for pull requests below priority 500, this pull request will be tested once the tree is reopened

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 7, 2019
kennytm added a commit to kennytm/rust that referenced this pull request Mar 11, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620
kennytm added a commit to kennytm/rust that referenced this pull request Mar 15, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620

r? @alexcrichton
sanxiyn added a commit to sanxiyn/rust that referenced this pull request Mar 18, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620

r? @alexcrichton
kennytm added a commit to kennytm/rust that referenced this pull request Mar 19, 2019
[CI] Update binutils for powerpc64 and powerpc64le

Cargo powerpc64 and powerpc64le are seeing `SIGILL` crashes in openssl,
which was found to be a linking problem, fixed by newer binutils. See
<rust-lang#57345 (comment)>

For powerpc64 we're using crosstool-ng, which doesn't offer a newer
binutils version, but we can just compile it separately. On powerpc64le
we're already building binutils. Both are now updated to binutils 2.32.

Closes rust-lang/cargo#6320
Closes rust-lang#57345
Closes rust-lang/rustup#1620

r? @alexcrichton
bors added a commit that referenced this pull request Mar 20, 2019
Rollup of 5 pull requests (all of which changes `src/ci/docker`)

Successful merges:

 - #58986 ([CI] Update binutils for powerpc64 and powerpc64le)
 - #59038 (Track embedded-book in the toolstate)
 - #59055 (CI: Set job names.)
 - #59253 (Calculate Docker cache hash precisely from Dockerfile's dependencies)
 - #59257 (Update CI configuration for building Redox libraries)

Failed merges:

r? @ghost
@bors bors merged commit c843fe7 into rust-lang:master Mar 20, 2019
@cuviper cuviper added beta-nominated Nominated for backporting to the compiler in the beta channel. stable-nominated Nominated for backporting to the compiler in the stable channel. labels Mar 21, 2019
@cuviper
Copy link
Member Author

cuviper commented Mar 21, 2019

I've nominated this for beta and stable to get the fixed builds out sooner, though I'm aware that we'll only get stable if there's a point release. cc @rust-lang/release

@cuviper cuviper added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label Mar 27, 2019
@cuviper
Copy link
Member Author

cuviper commented Mar 27, 2019

@rust-lang/infra -- please consider this for beta.

@pietroalbini
Copy link
Member

We discussed this at the infra meeting, and decided not to backport this.

@pietroalbini pietroalbini removed beta-nominated Nominated for backporting to the compiler in the beta channel. stable-nominated Nominated for backporting to the compiler in the stable channel. labels Apr 2, 2019
@llebout
Copy link

llebout commented Apr 11, 2019

@pietroalbini What is the cost of making a working build for ppc64le? Really don't understand the behavior here.

@pietroalbini
Copy link
Member

Backporting CI changes is always risky for us, so we tend to reject most of the backports that are not essential for a Tier 1 platform, as it happened with this PR.

Most of our CI is built around custom Docker images which are rebuilt every time the CI configuration is changed. In a perfect world rebuilding a Docker image twice with the same Dockerfile would result in the same image being produced, but making Docker builds deterministic and reproducible is hard.

It's not rare that a change on an image breaks an unrelated platform (for example it happened a few days ago with Android and its SDK that doesn't support pinning at all), and we can't afford these kinds of breakages on beta. I hope you understand.

@llebout
Copy link

llebout commented Apr 11, 2019

Backporting CI changes is always risky for us, so we tend to reject most of the backports that are not essential for a Tier 1 platform, as it happened with this PR.

Most of our CI is built around custom Docker images which are rebuilt every time the CI configuration is changed. In a perfect world rebuilding a Docker image twice with the same Dockerfile would result in the same image being produced, but making Docker builds deterministic and reproducible is hard.

It's not rare that a change on an image breaks an unrelated platform (for example it happened a few days ago with Android and its SDK that doesn't support pinning at all), and we can't afford these kinds of breakages on beta. I hope you understand.

You don't have a way to make changes specific to each platform?

@pietroalbini
Copy link
Member

You don't have a way to make changes specific to each platform?

Sort of.

The main issue is we have to cache the Docker images, since we don't have the CI capacity to rebuild them every time (our time cap is 3 hours, and for example rebuilding the linux x86 ones take 2.5 hours), so we need to decide at the start of the build whether to use the cached one or rebuild the image.

In our cache the images are identified by the hash of all the related src/ci/docker files, but when building the hashing code we opted to include more (potentially unneeded) files in the hash rather than leaving some of them out, since a collision could make huge damages to our ability to land PRs.

Also we share lots of scripts between Docker images, so rebuilding one could trigger the rebuild of another one even if all the previous issues didn't exist.

Rest assured we don't want to ship broken lower-tier platforms, but backporting those kinds of changes is too risky, and there is a reason why we have an (informal) policy against it.

@llebout
Copy link

llebout commented Apr 11, 2019

I feel like something could be done to make this situation easier, it sounds like a mess.

Making stuff works should never be risky.

@madscientist159 can and will donate ppc64 hardware from IntegriCloud, that way you can build things with more capacity.

@cuviper
Copy link
Member Author

cuviper commented Apr 11, 2019

Even with native hardware (which would be nice!), we still need to take care with toolchain compatibility, so that wouldn't remove problems like this.

@pietroalbini
Copy link
Member

I feel like something could be done to make this situation easier, it sounds like a mess.

Yeah, as you saw our CI is quirky and messy, but it gets its job done and it has some really nice features (like the ability to run the same exact build locally with src/ci/docker/run.sh image-name) :)

@madscientist159 can and will donate ppc64 hardware from IntegriCloud, that way you can build things with more capacity.

We still need to figure out a way to properly make use of hardware donations, but thanks for the offer! When all the stuff on our end is sorted out we'll try to remember to ping y'all again!

To be clear though, the problem is not the lack of hardware to test things on, but the time it takes to run a full test: x86 tests at the moment take around 2 to 3 hours to execute, and the Docker image takes 2.5 hours to build. Since we test only one PR at the time (not for capacity issues, but to ensure all the commits on master are always working -- more details if you're curious) being able to land a PR every 4/5 hours instead of one every 2/3 hours would be really bad for our queue.

@llebout
Copy link

llebout commented Apr 11, 2019

@pietroalbini It is lack of hardware to test things on, if the hardware is faster or you have more, then that takes less time and your problem is solved.
Or I am misunderstanding the issue?

@llebout
Copy link

llebout commented Apr 11, 2019

Even with native hardware (which would be nice!), we still need to take care with toolchain compatibility, so that wouldn't remove problems like this.

Yes, I think there should be a way to edit the build process, per platform and per release quality (stable, beta, nightly). AIUI it's not the case and they're scared to make changes because a fix for powerpc64/le can break the others.

@pietroalbini
Copy link
Member

It is lack of hardware to test things on, if the hardware is faster or you have more, then that takes less time and your problem is solved.
Or I am misunderstanding the issue?

Well, sort of. To be fair the main problem if we skipped caching altogether is not PowerPC but x86, and even throwing a lot of money at the problem won't help too much since it's not possible to distribute building the docker container.

Yes, I think there should be a way to edit the build process, per platform and per release quality (stable, beta, nightly).

The build process is already clearly split between stable, beta and nightly. The line between the platforms' build processes is a bit fuzzy though, and I agree with you we should strive to improve that.

Kenny already improved in #59253 the cache hashing code so it's more precise, and that's included in the 1.35 CI configuration. If you have any ideas on how to improve the caching invalidation (without rebuilding the image every time) I'd love to hear them!

it's not the case and they're scared to make changes because a fix for powerpc64/le can break the others.

By the way, we're wary of making changes to the CI only for beta and stable, we have no problems tweaking things on nightly and let the changes ride the trains as it's happening with this PR ;)

@llebout
Copy link

llebout commented Apr 11, 2019

We (ppc64 users) will wait, we wish it was in our hands to have a working build through rustup and it would already have been fixed, thankfully there's workarounds available.

@madscientist159
Copy link

madscientist159 commented Apr 11, 2019

I second this; the fact that Rust is having issues on POWER is making it very hard to recommend Rust for secure applications. While I understand the x86 focus, x86 is a very poor platform to be tied exclusively to for a language that is focusing on security aspects -- a safe language will not paper over the serious issues that the x86 platform presents at a firmware and hardware level, and for many applications it's more important to control the base platform and use something like C++ than it is to use Rust and deal with the unfixable x86 platform security problems.

We can get you direct dedicated access to a fairly large POWER9 box if build / test time is the main problem. In terms of benchmarks, the POWER9 systems compete directly with expensive x86 machines for build time; it's not like you're going to be waiting for the POWER9 box to finish when the x86 machine is done. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
7 participants