Expose Docker build context hash for image tag interpolation. #13959

kaos · 2021-12-22T15:52:18Z

This allows users to tag their Docker images with a stable hash value based on build args, env and input sources.

docker_image(
  image_tags=["1.2.3-{pants.hash}"],
)

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

kaos · 2021-12-22T16:40:41Z

Motivation from at least one user ;) #13934 (comment)

kaos · 2021-12-23T14:01:09Z

src/python/pants/backend/docker/util_rules/docker_build_context.py

+        version_context["PANTS"] = {
+            # Present hash for all inputs that can be used for image tagging.
+            "HASH": str(hash((build_args, build_env, snapshot.digest))).replace("-", "0"),
+        }


Not really sure if lowercase keys would be better, or lower.UPPER, or not nesting, so it would be "...{pants_hash}..", etc.. 🤷🏽

Opted for lowercasing, so it is {pants.hash}.

Eric-Arellano

Interesting. How do you think this should be documented?

kaos · 2021-12-28T18:54:43Z

With a growing number of sources for the interpolation context, it's becoming increasingly motivated to maybe create a dedicated section for that, detailing what goes into the context, including how the pants.hash value is calculated. Also as the interpolation context may be used in several fields/scenarios, repeating the context information is useful only in so much detail.

Eric-Arellano · 2021-12-29T17:46:24Z

Oh wait actually please wait to merge: what do you think about stability guarantees for the hash value? Is this going to slow us done from making changes to this part of the codebase?

kaos · 2021-12-29T19:41:41Z

Good point. Let's examine:

Algorithm. Likely stable. May potentially change between versions of Python (as we rely on the builtin hash function).
Input data: build args, env and snapshot digest (i.e. sources). Whatever we do, if any of those change, we want the hash to change.

If we change the implementation so that it affects any of the input data, it will also affect the output image (potentially, at least), and a change of hash value is desirable.

A change of hash value due to change of Python version is not desirable, but I think we can live with that risk (I think it is rather low).

There's also a hash value embedded in one of the test cases, alerting us of unexpected changes to the calculated hash value.

wdyt?

kaos · 2021-12-29T19:43:13Z

Also, I think this use case is more in the form of optimization.. so it will not be critical to get a hash value "miss", and rebuild because of it.

kaos · 2021-12-29T19:45:13Z

Perhaps @vputz could fills us in on how this would affect them, in case of unexpected hash value changes?

Eric-Arellano · 2021-12-29T23:29:35Z

Algorithm. Likely stable. May potentially change between versions of Python (as we rely on the builtin hash function).

Actually, I realize we should be using hashlib rather than hash: https://stackoverflow.com/questions/5583907/is-the-builtin-hash-method-of-python2-6-stable-across-architectures.

If we change the implementation so that it affects any of the input data, it will also affect the output image (potentially, at least), and a change of hash value is desirable.

Okay. I can't really comment to how useful this is, good idea to ask @vputz. I think it will be important to document the stability guarantees so users can decide if they want to rely on this or not.

[ci skip-rust] [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano · 2021-12-30T17:43:36Z

src/python/pants/util/hash.py

@@ -0,0 +1,38 @@
+# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).


I think I recommend having this be in docker/util_rules. It seems unlikely that we'll want this code to be used in other parts of the codebase without careful thought. If we do want to use it in other places, we can move it to this more generic place. For now, avoid "premature generalization".

How about in the docker/utils.py file, as it's rather small?

Also, I tend to want to only put files with rules in them in docker/util_rules/.., otherwise I find the name misleading.. ;)

That works! And yeah...we put interpreter_constraints in backend/python/util_rules and it's a lie - there are no rules for it. But at this point it would be disruptive to change.

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano

The code looks good, but please wait for feedback from @vputz on stability guarantees.

Also did you have a chance to try this in the real world? Specifically, to make sure the JSON encoder works there and not only in tests.

Thanks for iterating on this!

kaos · 2021-12-30T23:18:57Z

The encoder is taken from the peek goal. Also, it does always a str as last resort for any unknown data, so "can't" fail ;)
But no, haven't tried it outside tests yet.. however it sits in a well exercised code path.

kaos · 2021-12-31T08:59:05Z

I'm thinking we may want to state that the hash may change between stable versions of Pants, to not put too restrictive boundaries on what we can do in the future. In cases where we may add to the build args/env in ways we don't do now, for instance.

Eric-Arellano · 2022-01-03T18:29:18Z

That makes sense to me to retain flexibility.

vputz · 2022-01-03T20:45:12Z

Thanks for all the work! This is part of an ongoing question about how to integrate monorepo workflow into CI/CD, with the general question being both "how do I deploy the minimum real changes to a configuration" and "how can I independently version deployable artifacts".

We have so far been doing the "easy but expensive" route of tagging all our deployable containers with the git hash. This works well, but if for example a source file for a single microservice got changed, this would force redeployment of all microservices. Semver is one option, but relies on human vigilance (which is necessarily fallible).

So that's the motivation for a "hash for the entire build context" of a container. And I agree with what I'm seeing on the above: "build args, environment, and [source files]", ie the full docker context once it's built (although this may have uses beyond the docker targets of course).

As noted above, we've just been redeploying EVERYTHING for any git hash change once that branch is marked as approved. So from my perspective and these goals, if the hash changes rarely without that (such as from a python version change) that's really not a problem--we'd still say "the system under test behaves normally, deploy all the changed containers" and just do a few more spurious deployments than are strictly necessary since the container itself may not have changed.

So in short, this sounds pretty good to me; it would let us say "okay, a bunch of things changed in the repo, but it's obvious that only these container artifacts really are different and worth deploying" and that could avoid a lot of unnecessary deployments (although updating our CI/CD system to use this would take a while).

I really like the idea and look forward to trying it!

Eric-Arellano · 2022-01-03T21:23:34Z

Great, thank you @vputz! I'm comfortable landing this as long as we communicate the limitations to users, like that there are no guarantees for a stable hash across Pants versions.

Expose Docker build context hash for image tag interpolation.

6018405

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

kaos mentioned this pull request Dec 22, 2021

docker-image does not yet support --target #13934

Closed

kaos requested a review from Eric-Arellano December 22, 2021 16:40

kaos requested a review from stuhood December 23, 2021 13:59

kaos commented Dec 23, 2021

View reviewed changes

Eric-Arellano reviewed Dec 28, 2021

View reviewed changes

kaos added 3 commits December 29, 2021 12:51

Merge branch 'main' into docker_build_hash

5ded57a

Lowercase the pants.hash interpolation context keys.

a3faba1

fix test

0ef5b52

Eric-Arellano approved these changes Dec 29, 2021

View reviewed changes

kaos added 2 commits December 30, 2021 10:29

Merge branch 'main' into docker_build_hash

23dcda9

[ci skip-rust] [ci skip-build-wheels]

replace hash() with hashlib.sha256() via a json dump.

41f41ab

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

kaos requested a review from Eric-Arellano December 30, 2021 10:59

Eric-Arellano reviewed Dec 30, 2021

View reviewed changes

move get_hash() to docker.utils.

8acf470

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano approved these changes Dec 30, 2021

View reviewed changes

kaos mentioned this pull request Jan 4, 2022

[Documentation] Add section describing the value interpolation available for Docker #14060

Closed

kaos merged commit 269aa34 into pantsbuild:main Jan 4, 2022

kaos deleted the docker_build_hash branch January 4, 2022 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Docker build context hash for image tag interpolation. #13959

Expose Docker build context hash for image tag interpolation. #13959

kaos commented Dec 22, 2021 •

edited

kaos commented Dec 22, 2021

kaos Dec 23, 2021

kaos Dec 29, 2021

Eric-Arellano left a comment

kaos commented Dec 28, 2021

Eric-Arellano commented Dec 29, 2021

kaos commented Dec 29, 2021

kaos commented Dec 29, 2021

kaos commented Dec 29, 2021

Eric-Arellano commented Dec 29, 2021

Eric-Arellano Dec 30, 2021

kaos Dec 30, 2021

kaos Dec 30, 2021

kaos Dec 30, 2021

Eric-Arellano Dec 30, 2021

Eric-Arellano left a comment •

edited

kaos commented Dec 30, 2021

kaos commented Dec 31, 2021

Eric-Arellano commented Jan 3, 2022

vputz commented Jan 3, 2022

Eric-Arellano commented Jan 3, 2022

		@@ -0,0 +1,38 @@
		# Copyright 2021 Pants project contributors (see CONTRIBUTORS.md).

Expose Docker build context hash for image tag interpolation. #13959

Expose Docker build context hash for image tag interpolation. #13959

Conversation

kaos commented Dec 22, 2021 • edited

kaos commented Dec 22, 2021

kaos Dec 23, 2021

Choose a reason for hiding this comment

kaos Dec 29, 2021

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

kaos commented Dec 28, 2021

Eric-Arellano commented Dec 29, 2021

kaos commented Dec 29, 2021

kaos commented Dec 29, 2021

kaos commented Dec 29, 2021

Eric-Arellano commented Dec 29, 2021

Eric-Arellano Dec 30, 2021

Choose a reason for hiding this comment

kaos Dec 30, 2021

Choose a reason for hiding this comment

kaos Dec 30, 2021

Choose a reason for hiding this comment

kaos Dec 30, 2021

Choose a reason for hiding this comment

Eric-Arellano Dec 30, 2021

Choose a reason for hiding this comment

Eric-Arellano left a comment • edited

Choose a reason for hiding this comment

kaos commented Dec 30, 2021

kaos commented Dec 31, 2021

Eric-Arellano commented Jan 3, 2022

vputz commented Jan 3, 2022

Eric-Arellano commented Jan 3, 2022

kaos commented Dec 22, 2021 •

edited

Eric-Arellano left a comment •

edited