Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.10] cache: add fallback for snapshotID #3595

Merged
merged 1 commit into from
Feb 8, 2023

Conversation

tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Feb 8, 2023

fix for moby/moby#44943

In older BuildKit versions, snapshotID was not always set if the record was not created with GetByBlob method. Old code defaulted to cache record ID in that case, but that broke with the metadata interface refactor.

Regression for this fallback is in:
before: a9f1980#diff-e4a99c1c50054f36891bbe03da3784b5d2d4ca6b279ff32c7999d3ac5db48dc7L81

image

after: a9f1980#diff-e4a99c1c50054f36891bbe03da3784b5d2d4ca6b279ff32c7999d3ac5db48dc7R250

image

It doesn't appear in current versions (at least) because snapshotID is explicitly set in finalize/commit a9f1980#diff-94cb4012b7ea73b110c7ac219d52793e11b96d9141c3ad631603f34a7fc9289aR917

finalize:
image

commit:
image

@neersighted @thaJeztah @sipsma

Signed-off-by: Tonis Tiigi tonistiigi@gmail.com

In older BuildKit versions snapshotID was not always set if record
was not created with GetByBlob method. Old code defaulted to cache
record ID in that case but that broke with the metadata interface
refactor.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick questions as someone who is naive to BuildKit:

  • Does this fix also need to go in v0.11, as an old cache could still be loaded?
  • Does it make more sense to encapsulate this logic inside md.GetString()?

@tonistiigi
Copy link
Member Author

Does this fix also need to go in v0.11, as an old cache could still be loaded?

Yes. If we want cache from 20.10 to not break Moby master.

Does it make more sense to encapsulate this logic inside md.GetString()?

No, this is specific fallback only for the snapshotID value.

@thaJeztah
Copy link
Member

Any idea what this failiure is / was ? Looks weird https://github.com/moby/buildkit/actions/runs/4120008980/jobs/7114304809

 > [linux/amd64->riscv64 runc 3/3] RUN --mount=from=runc-src,src=/usr/src/runc,target=. --mount=target=/root/.cache,type=cache   CGO_ENABLED=1 xx-go build -mod=vendor -ldflags '-extldflags -static' -tags 'apparmor seccomp netgo cgo static_build osusergo' -o /usr/bin/runc ./ &&   xx-verify --static /usr/bin/runc:
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x006.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x007.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x008.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x009.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x010.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: -march=rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0: unknown z ISA extension `zmmul'
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file /riscv64-alpine-linux-musl//usr/lib/gcc/riscv64-alpine-linux-musl/12.2.1/crtendS.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: -march=rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0: unknown z ISA extension `zmmul'
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file /riscv64-alpine-linux-musl//usr/lib/gcc/riscv64-alpine-linux-musl/12.2.1/crtn.o
#60 106.3 clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
------
Dockerfile:47
--------------------
  46 |       [ "$(xx-info arch)" != "ppc64le" ] || XX_CC_PREFER_LINKER=ld xx-clang --setup-target-triple
  47 | >>> RUN --mount=from=runc-src,src=/usr/src/runc,target=. --mount=target=/root/.cache,type=cache \
  48 | >>>   CGO_ENABLED=1 xx-go build -mod=vendor -ldflags '-extldflags -static' -tags 'apparmor seccomp netgo cgo static_build osusergo' -o /usr/bin/runc ./ && \
  49 | >>>   xx-verify --static /usr/bin/runc
  50 |     

@crazy-max
Copy link
Member

Any idea what this failiure is / was ? Looks weird https://github.com/moby/buildkit/actions/runs/4120008980/jobs/7114304809

 > [linux/amd64->riscv64 runc 3/3] RUN --mount=from=runc-src,src=/usr/src/runc,target=. --mount=target=/root/.cache,type=cache   CGO_ENABLED=1 xx-go build -mod=vendor -ldflags '-extldflags -static' -tags 'apparmor seccomp netgo cgo static_build osusergo' -o /usr/bin/runc ./ &&   xx-verify --static /usr/bin/runc:
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x006.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x007.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x008.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x009.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file $WORK/b192/_x010.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: -march=rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0: unknown z ISA extension `zmmul'
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file /riscv64-alpine-linux-musl//usr/lib/gcc/riscv64-alpine-linux-musl/12.2.1/crtendS.o
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: -march=rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zmmul1p0: unknown z ISA extension `zmmul'
#60 106.3 /usr/bin/riscv64-alpine-linux-musl-ld: failed to merge target specific data of file /riscv64-alpine-linux-musl//usr/lib/gcc/riscv64-alpine-linux-musl/12.2.1/crtn.o
#60 106.3 clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
------
Dockerfile:47
--------------------
  46 |       [ "$(xx-info arch)" != "ppc64le" ] || XX_CC_PREFER_LINKER=ld xx-clang --setup-target-triple
  47 | >>> RUN --mount=from=runc-src,src=/usr/src/runc,target=. --mount=target=/root/.cache,type=cache \
  48 | >>>   CGO_ENABLED=1 xx-go build -mod=vendor -ldflags '-extldflags -static' -tags 'apparmor seccomp netgo cgo static_build osusergo' -o /usr/bin/runc ./ && \
  49 | >>>   xx-verify --static /usr/bin/runc
  50 |     

Yes catch this one in #3587 (review). I'm not sure yet but best guess is this could be linked to libseccomp package from alpine repo:

RUN set -e; xx-apk add musl-dev gcc libseccomp-dev libseccomp-static; \

@tonistiigi tonistiigi merged commit 4f0ee09 into moby:v0.10 Feb 8, 2023
@neersighted
Copy link
Member

Can we tag a release for this fix before we re-vendor in Moby?

@neersighted
Copy link
Member

Also, just thinking about this change/bug a bit more:

  • Can we create a regression test that generates a cache without a metadata.ID() to guard against this in the future?
  • It feels like a code smell that an empty string was allowed to silently break things -- should functions up the call chain be somehow validating that they are not passed an empty layer ID to load (or whatever the abstraction is)?

@tonistiigi
Copy link
Member Author

Can we create a regression test that generates a cache without a metadata.ID() to guard against this in the future?

We do not have any version upgrade tests atm. that could have caught this.

It feels like a code smell that an empty string was allowed to silently break things -- should functions up the call chain be somehow validating that they are not passed an empty layer ID to load (or whatever the abstraction is)?

Sure, but the old code was pretty strict that this variable could never be empty. A regression that broke that seems to have been an oversight during a big refactor, and there was no intention to change the behavior behind it. This way, any time we pass a variable to a function, we would need to do a if var == "" check if we would want to check for any bug that could be setting var="" in some other function. A language that distinguishes empty values in the type system would have caught this bug.

@neersighted
Copy link
Member

neersighted commented Feb 8, 2023

If there's not an explicit way to catch it without nil-guarding, fair enough. ZSTs are certainly the thing that makes the most sense here, but we don't work with a language that has such a rich type system 😆

It seems like the best thing to add is a regression for this case/upgrade tests with a gold corpus in general, that way we can guard against future similar regressions.

@neersighted
Copy link
Member

Related: #3605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants