New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix docker invalidation bug. #16419
Fix docker invalidation bug. #16419
Conversation
Assume a "child" image that depends on a "parent" image (via a FROM command). Previously, the child image building Process had no mention of the parent image's unique ID. So, even though we would re-run the parent building process every time (because it was set to PER_SESSION scope in pantsbuild#13464), Pants would not know that the child building process depended on the parent building process completing first (and creating the new parent image as a side effect). Now we write a dummy file containing the ids of all upstream images into the context in which a child image build is executed. This causes the child building process to invalidate correctly. Note that we still need the PER_SESSION scope because we have to re-run all image build processes even when there were no changes at all, in case the user did a `docker image rm`. This highlights the complications and dangers of interacting with a stateful external daemon like docker. [ci skip-rust] [ci skip-build-wheels]
In the long run it would be nice if Process had a field for "strings that should invalidate the process", i.e., that participate in the cache key but don't affect process execution. For now I emulate this by writing a dummy file into the sandbox. The downside of this is that this file could potentially be copied into the image, if there was a COPY that copied the entire context (not that there's harm in this, it's just weird). An alternative hack might be to put this in an env var, although that is less obvious when debugging in the sandbox. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for figuring this out!
May fix #16356 |
If there's a COPY copying the entire context they are also copying the dockerfile, among other things 🙈 |
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Thanks a lot!
Yea would recommend the env var. Since nothing consumes it, it shouldn't impact the efficacy of debugging. |
I'm off tower for a few days, so don't wait on me. I trust that you've tested the use cases with the code, and the approach is sound to me. |
Opened #16422 for the general issue of invalidating Processes based on out-of-band data. |
Well, my thinking was that it would be more obvious that the sandbox has been "tampered with" with the file. E.g., when trying to debug why something was invalidated. |
But then if we implement #16422 it will be even less obvious, and yet that would still be better than this hack, so I guess this is fine. Will switch to env vars. Theoretically env var values are capped in length, but that length is 128K on linux and unknown but large on macOS (and 32K even on Windows) so not a problem in practice. |
[ci skip-rust] [ci skip-build-wheels]
Not that env var length is an issue (as you say) but we could always sha256 hash the input (further making it like a poor man's solution to #16422) |
[ci skip-rust] [ci skip-build-wheels]
Note that we use env var for Go cache invalidation like this. It seems preferable to use an env var rather than having to hit LMDB store to create a digest & merge it? Update: I see you changed to env var. Cool. |
Switched to env var. PTAL. |
Assume a "child" image that depends on a "parent" image (via a FROM command). Previously, the child image building Process had no mention of the parent image's unique ID. So, even though we would re-run the parent building process every time (because it was set to PER_SESSION scope in pantsbuild#13464), Pants would not know that the child building process depended on the parent building process completing first (and creating the new parent image as a side effect). Now we write a dummy file containing the ids of all upstream images into the context in which a child image build is executed. This causes the child building process to invalidate correctly. Note that we still need the PER_SESSION scope because we have to re-run all image build processes even when there were no changes at all, in case the user did a `docker image rm`. This highlights the complications and dangers of interacting with a stateful external daemon like docker. [ci skip-rust] [ci skip-build-wheels]
Assume a "child" image that depends on a "parent" image (via a FROM command). Previously, the child image building Process had no mention of the parent image's unique ID. So, even though we would re-run the parent building process every time (because it was set to PER_SESSION scope in #13464), Pants would not know that the child building process depended on the parent building process completing first (and creating the new parent image as a side effect). Now we write a dummy file containing the ids of all upstream images into the context in which a child image build is executed. This causes the child building process to invalidate correctly. Note that we still need the PER_SESSION scope because we have to re-run all image build processes even when there were no changes at all, in case the user did a `docker image rm`. This highlights the complications and dangers of interacting with a stateful external daemon like docker. [ci skip-rust] [ci skip-build-wheels]
Assume a "child" image that depends on a "parent" image (via a FROM command). Previously, the child image building Process had no mention of the parent image's unique ID. So, even though we would re-run the parent building process every time (because it was set to PER_SESSION scope in pantsbuild#13464), Pants would not know that the child building process depended on the parent building process completing first (and creating the new parent image as a side effect). Now we write a dummy file containing the ids of all upstream images into the context in which a child image build is executed. This causes the child building process to invalidate correctly. Note that we still need the PER_SESSION scope because we have to re-run all image build processes even when there were no changes at all, in case the user did a `docker image rm`. This highlights the complications and dangers of interacting with a stateful external daemon like docker. [ci skip-rust] [ci skip-build-wheels]
Assume a "child" image that depends on a "parent" image (via a FROM command). Previously, the child image building Process had no mention of the parent image's unique ID. So, even though we would re-run the parent building process every time (because it was set to PER_SESSION scope in #13464), Pants would not know that the child building process depended on the parent building process completing first (and creating the new parent image as a side effect). Now we write a dummy file containing the ids of all upstream images into the context in which a child image build is executed. This causes the child building process to invalidate correctly. Note that we still need the PER_SESSION scope because we have to re-run all image build processes even when there were no changes at all, in case the user did a `docker image rm`. This highlights the complications and dangers of interacting with a stateful external daemon like docker. [ci skip-rust] [ci skip-build-wheels]
AFAICT, this is not about the cache key at all: rather, about ordering. Process 1 "depends on" process 2, but that fact is completely out of band. The process is already marked It seems like the effect of this change might instead be that we now finish running process 1 to get its id before we begin running process 2. But I can't quite tell from the change how the ids are propagated between images. Note that another way to do this is recursively: building image 1 recursively depends on building image 2, etc. |
Yes, the paragraph you quote doesn't mention the cache key. This is exactly about ordering. I think that paragraph pretty much states the same as yours? Possibly recursing is a better way to do this though. |
Yea, sorry... I didn't quote a relevant part. But the discussion of the cache key on here and in #16422 is slightly confusing. The My comment has more to do with not seeing how you introduced a new data dependency here: I don't see anything that would cause the process to wait for the |
On closer inspection, to remind myself of the moving parts here, this is already implemented via recursion (and was before). See here. But that alone is not enough because of speculation - we speculate on the parent image and the child image concurrently, and there was nothing to cause the speculative child image build to be canceled and re-run. The data dependency on the parent image's id is introduced via the DockerBuildContext. When the parent image completes, the child image's inputs will have changed, and that will cancel the speculative child image build and re-run it. That causes I had added some detail in the issue. At least, that is my understanding of all this... |
Aha. Yea, that makes more sense. I don't have the entire picture in my mind (maybe 85%?), but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, speculative process execution. Yea I didn't consider that, hah. Thx for addressing it!
Assume a "child" image that depends on a "parent" image (via a FROM command). Previously, the child image building Process had no mention of the parent image's unique ID. So, even though we would re-run the parent building process every time (because it was set to PER_SESSION scope in pantsbuild#13464), Pants would not know that the child building process depended on the parent building process completing first (and creating the new parent image as a side effect). Now we write a dummy file containing the ids of all upstream images into the context in which a child image build is executed. This causes the child building process to invalidate correctly. Note that we still need the PER_SESSION scope because we have to re-run all image build processes even when there were no changes at all, in case the user did a `docker image rm`. This highlights the complications and dangers of interacting with a stateful external daemon like docker. [ci skip-rust] [ci skip-build-wheels]
Imagine a "child" image that depends on a "parent" image
(via a FROM command).
Previously, the child image building Process had no mention
of the parent image's unique ID. So, even though we would re-run
the parent building process every time (because it was set
to PER_SESSION scope in #13464), Pants would not know that the
child building process depended on the parent building process
completing first (and creating the new parent image as a side effect).
Now we write a dummy file containing the ids of all upstream images
into the context in which a child image build is executed. This
causes the child building process to have the correct causal chain.
Note that we still need the PER_SESSION scope because we have
to re-run all image build processes even when there were no changes at
all, in case the user did a
docker image rm
.This highlights the complications and dangers of interacting with
a stateful external daemon like docker.
Fixes #16101
[ci skip-rust]
[ci skip-build-wheels]