New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Docker build context hash for image tag interpolation. #13959
Conversation
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
Motivation from at least one user ;) #13934 (comment) |
version_context["PANTS"] = { | ||
# Present hash for all inputs that can be used for image tagging. | ||
"HASH": str(hash((build_args, build_env, snapshot.digest))).replace("-", "0"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure if lowercase keys would be better, or lower.UPPER, or not nesting, so it would be "...{pants_hash}.."
, etc.. 🤷🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted for lowercasing, so it is {pants.hash}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. How do you think this should be documented?
With a growing number of sources for the interpolation context, it's becoming increasingly motivated to maybe create a dedicated section for that, detailing what goes into the context, including how the |
Oh wait actually please wait to merge: what do you think about stability guarantees for the hash value? Is this going to slow us done from making changes to this part of the codebase? |
Good point. Let's examine:
If we change the implementation so that it affects any of the input data, it will also affect the output image (potentially, at least), and a change of hash value is desirable. A change of hash value due to change of Python version is not desirable, but I think we can live with that risk (I think it is rather low). There's also a hash value embedded in one of the test cases, alerting us of unexpected changes to the calculated hash value. wdyt? |
Also, I think this use case is more in the form of optimization.. so it will not be critical to get a hash value "miss", and rebuild because of it. |
Perhaps @vputz could fills us in on how this would affect them, in case of unexpected hash value changes? |
Actually, I realize we should be using
Okay. I can't really comment to how useful this is, good idea to ask @vputz. I think it will be important to document the stability guarantees so users can decide if they want to rely on this or not. |
[ci skip-rust] [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
src/python/pants/util/hash.py
Outdated
@@ -0,0 +1,38 @@ | |||
# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I recommend having this be in docker/util_rules
. It seems unlikely that we'll want this code to be used in other parts of the codebase without careful thought. If we do want to use it in other places, we can move it to this more generic place. For now, avoid "premature generalization".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about in the docker/utils.py
file, as it's rather small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I tend to want to only put files with rules in them in docker/util_rules/..
, otherwise I find the name misleading.. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works! And yeah...we put interpreter_constraints
in backend/python/util_rules
and it's a lie - there are no rules for it. But at this point it would be disruptive to change.
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, but please wait for feedback from @vputz on stability guarantees.
Also did you have a chance to try this in the real world? Specifically, to make sure the JSON encoder works there and not only in tests.
Thanks for iterating on this!
The encoder is taken from the peek goal. Also, it does always a str as last resort for any unknown data, so "can't" fail ;) |
I'm thinking we may want to state that the hash may change between stable versions of Pants, to not put too restrictive boundaries on what we can do in the future. In cases where we may add to the build args/env in ways we don't do now, for instance. |
That makes sense to me to retain flexibility. |
Thanks for all the work! This is part of an ongoing question about how to integrate monorepo workflow into CI/CD, with the general question being both "how do I deploy the minimum real changes to a configuration" and "how can I independently version deployable artifacts". We have so far been doing the "easy but expensive" route of tagging all our deployable containers with the git hash. This works well, but if for example a source file for a single microservice got changed, this would force redeployment of all microservices. Semver is one option, but relies on human vigilance (which is necessarily fallible). So that's the motivation for a "hash for the entire build context" of a container. And I agree with what I'm seeing on the above: "build args, environment, and [source files]", ie the full docker context once it's built (although this may have uses beyond the docker targets of course). As noted above, we've just been redeploying EVERYTHING for any git hash change once that branch is marked as approved. So from my perspective and these goals, if the hash changes rarely without that (such as from a python version change) that's really not a problem--we'd still say "the system under test behaves normally, deploy all the changed containers" and just do a few more spurious deployments than are strictly necessary since the container itself may not have changed. So in short, this sounds pretty good to me; it would let us say "okay, a bunch of things changed in the repo, but it's obvious that only these container artifacts really are different and worth deploying" and that could avoid a lot of unnecessary deployments (although updating our CI/CD system to use this would take a while). I really like the idea and look forward to trying it! |
Great, thank you @vputz! I'm comfortable landing this as long as we communicate the limitations to users, like that there are no guarantees for a stable hash across Pants versions. |
This allows users to tag their Docker images with a stable hash value based on build args, env and input sources.