Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: switch to old docker builder to temporary solve caching issue #114621

Closed
wants to merge 5 commits into from

Conversation

klensy
Copy link
Contributor

@klensy klensy commented Aug 8, 2023

Switch to old docker builder to temporary solve caching issues.
Also silence unzip, it too noisy

@rustbot
Copy link
Collaborator

rustbot commented Aug 8, 2023

r? @pietroalbini

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Aug 8, 2023
@klensy
Copy link
Contributor Author

klensy commented Aug 8, 2023

Oh, yes, i've tried to solve caching issue, but for PR jobs it didn't worked before too.

@Kobzol
Copy link
Contributor

Kobzol commented Aug 10, 2023

@bors try

@bors
Copy link
Contributor

bors commented Aug 10, 2023

⌛ Trying commit 5592aa3 with merge c4e6680c1530436eac185b091edd9f1a292520c2...

@klensy
Copy link
Contributor Author

klensy commented Aug 10, 2023

This isn't working currently.

Docker updated from 20.10.25+azure-2 to 23.0.6+azure-2, changing it's behavior and switching builder to buildx.

Problem us that previously builded image has all image ids in history, so doing

docker history -q rust-ci | \
          grep -v missing | \
          xargs docker save | \
          gzip | \
          $upload

saved all layers, but now it's empty all, except last one:

 IMAGE          CREATED          CREATED BY                                      SIZE      COMMENT
9914c1307d3e   6 seconds ago    ENV SCRIPT=python3 ../x.py --stage 2 test sr…   0B        buildkit.dockerfile.v0
<missing>      6 seconds ago    ENV RUN_CHECK_WITH_PARALLEL_QUERIES=1           0B        buildkit.dockerfile.v0
<missing>      6 seconds ago    COPY host-x86_64/mingw-check/validate-error-…   555B      buildkit.dockerfile.v0
<missing>      6 seconds ago    COPY host-x86_64/mingw-check/validate-toolst…   367B      buildkit.dockerfile.v0
<missing>      6 seconds ago    RUN |1 DEBIAN_FRONTEND=noninteractive /bin/s…   13.7MB    buildkit.dockerfile.v0
<missing>      11 seconds ago   COPY host-x86_64/mingw-check/reuse-requireme…   5.33kB    buildkit.dockerfile.v0
<missing>      11 seconds ago   RUN |1 DEBIAN_FRONTEND=noninteractive /bin/s…   23.1MB    buildkit.dockerfile.v0
<missing>      12 seconds ago   COPY scripts/sccache.sh /scripts/ # buildkit    482B      buildkit.dockerfile.v0
<missing>      12 seconds ago   RUN |1 DEBIAN_FRONTEND=noninteractive /bin/s…   41.8MB    buildkit.dockerfile.v0
<missing>      18 seconds ago   ENV RUST_CONFIGURE_ARGS=--set rust.validate-…   0B        buildkit.dockerfile.v0
<missing>      18 seconds ago   ENV PATH=/node-v16.9.0-linux-x64/bin:/usr/lo…   0B        buildkit.dockerfile.v0
<missing>      18 seconds ago   RUN |1 DEBIAN_FRONTEND=noninteractive /bin/s…   95.9MB    buildkit.dockerfile.v0
<missing>      20 seconds ago   RUN |1 DEBIAN_FRONTEND=noninteractive /bin/s…   1.57GB    buildkit.dockerfile.v0
<missing>      20 seconds ago   ARG DEBIAN_FRONTEND=noninteractive              0B        buildkit.dockerfile.v0
<missing>      5 weeks ago      /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B        
<missing>      5 weeks ago      /bin/sh -c #(nop) ADD file:140fb5108b4a2861b…   77.8MB    
<missing>      5 weeks ago      /bin/sh -c #(nop)  LABEL org.opencontainers.…   0B        
<missing>      5 weeks ago      /bin/sh -c #(nop)  LABEL org.opencontainers.…   0B        
<missing>      5 weeks ago      /bin/sh -c #(nop)  ARG LAUNCHPAD_BUILD_ARCH     0B        
<missing>      5 weeks ago      /bin/sh -c #(nop)  ARG RELEASE                  0B  

Either restoring missing ids or playing with inline cache can fix things.

@bors
Copy link
Contributor

bors commented Aug 10, 2023

☀️ Try build successful - checks-actions
Build commit: c4e6680c1530436eac185b091edd9f1a292520c2 (c4e6680c1530436eac185b091edd9f1a292520c2)

bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 12, 2023
…-Simulacrum

CI: fix Docker layer caching

As reported by `@klensy` on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/242791-t-infra/topic/docker.20images.20always.20rebuilded), Github Actions have recently updated their Docker version from 20.x to 23.x, which enabled the BuildKit build backend by default.

This broke our way of performing Docker layer caching on CI, which immediately made all non-PR CI builds (including try builds) ~1 hour longer (Docker caching didn't work on PR builds before, so it wasn't affected). The moment this started happening can be seen [here](https://github.com/rust-lang-ci/rust/actions?page=2&query=branch%3Aauto+is%3Asuccess).

The problem is with the following command:
```
docker history -q rust-ci | \
          grep -v missing | \
          xargs docker save | \
          gzip | \
          $upload
```
which returns the intermediate layers as `<missing>`, if BuildKit is enabled. This was investigated by `@klensy` in rust-lang#114621. Thanks for that!

I will continue experimenting with how we can enable the cache with BuildKit in rust-lang#114762, but for the time being, I think that we should just hotfix this.

This PR reverts the build backend back to the old one, which fixes the caching. However, we also have to bust the cache of all Dockerfiles, otherwise caching would only start kicking in for them the next time they are updated (or the next time GH updates their docker version). Because when the Docker version was updated the last time, the Dockerfiles were cached on S3 with basically an empty cache, and unless we bust it, even after reverting to the old build engine, the CI script would just download the empty cache and rebuild the Dockerfile from scratch, thus nullifying our fix.

r? `@Mark-Simulacrum`
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 13, 2023
…acrum

CI: use smaller machines in PR runs

mingw-check job-linux-16c -> job-linux-4c
~job-linux-4c 20 min in auto job
~job-linux-16c 13 min in pr job
with current pr regressed to almost 21 min, it's ok.

mingw-check-tidy job-linux-16c -> job-linux-4c small enough, so reduce to minimal
~ job-linux-16c 3 min
with current pr regressed to almost 5 min, it's ok.

x86_64-gnu-tools job-linux-16c this is top job by time in PR, so don't touch it
~ job-linux-8c 1.30 hour in auto job
~ job-linux-16c 1 hour in pr job (affected by rust-lang#114613, actual time ~ 30 min)

x86_64-gnu-llvm-15 job-linux-16c don't change too
~ job-linux-8c 1.30 hour in auto job
~ job-linux-16c 30 min in pr job

Noticed while working on rust-lang#114621, so current time affected by always rebuilded docker images (but pr images always rebuilded before too, so nvm)
@klensy
Copy link
Contributor Author

klensy commented Aug 13, 2023

Okay, will close @Kobzol applied fix.

@klensy klensy closed this Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants