Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM lockfile invalidation headers and verification #14185

Merged

Conversation

chrisjrn
Copy link
Contributor

@chrisjrn chrisjrn commented Jan 18, 2022

This adds support for JVM lockfile invalidation headers. Most of the infrastructure is shared with the equivalent Python support code.

Closes #13373


Notes for reviewers:

The lockfile format now adds a header of the same form as our Python lockfiles (which have been thoroughly bikeshedded at this stage). Once you've seen one lockfile with contents in the valid_for_requirements field, you've seen them all, and feel free to skip.

There's no new tests, but both the tool and user lockfile invalidation code seems to be hit pretty thoroughly by all the existing tests (note the numerous failures in the commit history :)). If there's any specific behaviour in lockfile_metadata.py that you feel needs testing, please note it in the review.

Christopher Neugebauer added 14 commits January 13, 2022 14:09
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…; deserializes more centrally

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…eading toml lockfiles

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
WIP
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…e to avoid a circuilar import later

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…e to avoid a circuilar import later

[ci skip-rust]

[ci skip-build-wheels]

# Conflicts:
#	src/python/pants/jvm/goals/coursier.py
#	src/python/pants/jvm/resolve/coursier_fetch.py
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Conflicts:
#	3rdparty/jvm/global_scalac_plugins.lock
#	3rdparty/jvm/testprojects.lockfile
#	src/python/pants/backend/codegen/avro/java/avro-tools.default.lockfile.txt
#	src/python/pants/backend/codegen/avro/java/rules.py
#	src/python/pants/backend/codegen/protobuf/scala/rules.py
#	src/python/pants/backend/codegen/protobuf/scala/scalapbc.default.lockfile.txt
#	src/python/pants/backend/codegen/thrift/scrooge/rules.py
#	src/python/pants/backend/codegen/thrift/scrooge/scrooge.default.lockfile.txt
#	src/python/pants/backend/java/dependency_inference/java_parser.lockfile
#	src/python/pants/backend/java/lint/google_java_format/google_java_format.default.lockfile.txt
#	src/python/pants/backend/java/lint/google_java_format/rules.py
#	src/python/pants/backend/scala/dependency_inference/scala_parser.lockfile
#	src/python/pants/backend/scala/lint/scalafmt/rules.py
#	src/python/pants/backend/scala/lint/scalafmt/scalafmt.default.lockfile.txt
#	src/python/pants/backend/scala/subsystems/scalatest.default.lockfile.txt
#	src/python/pants/backend/scala/test/scalatest.py
#	src/python/pants/jvm/goals/coursier.py
#	src/python/pants/jvm/resolve/coursier_fetch.py
#	src/python/pants/jvm/resolve/jvm_tool.py
#	src/python/pants/jvm/test/junit.default.lockfile.txt
#	src/python/pants/jvm/test/junit.py

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn marked this pull request as draft January 18, 2022 17:35
resolved_serialized_lockfile = resolved_lockfile.to_serialized()
metadata = JVMLockfileMetadata.new(requirements)
resolved_serialized_lockfile = metadata.add_header_to_lockfile(
resolved_serialized_lockfile, regenerate_command="./pants generate-lockfiles"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: check if this is correct after Eric's changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be correct once the code gets moved, but presently it isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm close to putting up this change, so this is fine.


contents = toml.loads(lockfile_str)
entries = tuple(
CoursierLockfileEntry.from_json_dict(entry) for entry in (contents["entries"])
)
metadata = JVMLockfileMetadata.from_lockfile(lockfile_bytes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written before I factored out the TOML-parsing code in #14175, this will fail if the metadata header block is not in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if we want to add a super-quick deprecation cycle here, but I don't believe we've put the contents of #14175 in a release yet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't, no need to handle deprecation.


if lockfile.metadata and not lockfile.metadata.is_valid_for(requirements):
raise ValueError(
f"Lockfile for {request.options_scope} was generated with different "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling code needs to be better (at least as good as Python's :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the Python tool lockfile error message code is a hundred lines of code with carefully specified error messages. We should probably tackle this as a separate issue, as this PR is already pretty big.

Christopher Neugebauer added 5 commits January 18, 2022 10:35
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
fmt
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn changed the title [WIP/DRAFT] JVM lockfile invalidation headers and verification (for tools only) [WIP/DRAFT] JVM lockfile invalidation headers and verification Jan 18, 2022
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Christopher Neugebauer added 2 commits January 18, 2022 15:39
fmt
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
assert (
Path(rule_runner.build_root, "3rdparty/jvm/default.lock").read_bytes()
== HAMCREST_EXPECTED_LOCKFILE.to_serialized()
compare_lockfiles(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the test -- we're now deserializing the file on disk and comparing to the structure we defined in the test, rather than checking that the structure is byte-for-byte correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better :)

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Christopher Neugebauer added 2 commits January 19, 2022 08:29
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
lockfile_path = jvm.resolves[request.resolve]

# If the lockfile hasn't changed, don't overwrite it.
existing_lockfile_digest_contents = await Get(DigestContents, PathGlobs([lockfile_path]))
if (
existing_lockfile_digest_contents
and resolved_lockfile_serialized == existing_lockfile_digest_contents[0].content
and resolved_serialized_lockfile == existing_lockfile_digest_contents[0].content
Copy link
Contributor Author

@chrisjrn chrisjrn Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I chose different variable names in #14175. Whoops :)

@@ -449,13 +456,22 @@ async def fetch_with_coursier(request: CoursierFetchRequest) -> FallibleClasspat
# TODO: Loading this per JvmArtifact.
lockfile = await Get(CoursierResolvedLockfile, CoursierResolveKey, request.resolve)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: The following code is for validating user lockfiles, and is necessarily done per JvmArtifact, because of how we bubble a resolved classpath up to the consuming code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be good to put in a comment. To clarify, that is because this is the rule for CoursierFetchRequest?

requirement = ArtifactRequirement.from_jvm_artifact_target(request.component.representative)

if lockfile.metadata and not lockfile.metadata.is_valid_for([requirement]):
raise ValueError(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError seems to be the exception we've been using in this part of the codebase. Let me know if there's anything more appropriate here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably define a new error type, but NBD.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@chrisjrn chrisjrn marked this pull request as ready for review January 19, 2022 16:45
@chrisjrn
Copy link
Contributor Author

Tagging @Eric-Arellano for familiarity with how we handle lockfiles on the Python side, and @stuhood for signoff on actual JVM functionality

@chrisjrn chrisjrn changed the title [WIP/DRAFT] JVM lockfile invalidation headers and verification JVM lockfile invalidation headers and verification Jan 19, 2022
from pants.util.logging import LogLevel
from pants.util.strutil import pluralize

logger = logging.getLogger(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not necessary

resolved_serialized_lockfile = resolved_lockfile.to_serialized()
metadata = JVMLockfileMetadata.new(requirements)
resolved_serialized_lockfile = metadata.add_header_to_lockfile(
resolved_serialized_lockfile, regenerate_command="./pants generate-lockfiles"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm close to putting up this change, so this is fine.

assert (
Path(rule_runner.build_root, "3rdparty/jvm/default.lock").read_bytes()
== HAMCREST_EXPECTED_LOCKFILE.to_serialized()
compare_lockfiles(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better :)


contents = toml.loads(lockfile_str)
entries = tuple(
CoursierLockfileEntry.from_json_dict(entry) for entry in (contents["entries"])
)
metadata = JVMLockfileMetadata.from_lockfile(lockfile_bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't, no need to handle deprecation.

@@ -449,13 +456,22 @@ async def fetch_with_coursier(request: CoursierFetchRequest) -> FallibleClasspat
# TODO: Loading this per JvmArtifact.
lockfile = await Get(CoursierResolvedLockfile, CoursierResolveKey, request.resolve)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be good to put in a comment. To clarify, that is because this is the rule for CoursierFetchRequest?

requirement = ArtifactRequirement.from_jvm_artifact_target(request.component.representative)

if lockfile.metadata and not lockfile.metadata.is_valid_for([requirement]):
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably define a new error type, but NBD.

src/python/pants/jvm/resolve/jvm_tool.py Outdated Show resolved Hide resolved
src/python/pants/jvm/resolve/jvm_tool.py Show resolved Hide resolved
Comment on lines -73 to +84
lockfiles=(tool.resolved_lockfile(),),
lockfiles=(lockfile,),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh huh, this was violating the engine by using open() inside lockfile_content

Comment on lines +186 to +192
lockfile = request.lockfile
requirements = await Get(
ArtifactRequirements,
GatherJvmCoordinatesRequest(
request.artifact_inputs, f"[{request.options_scope}].artifacts"
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to reason about if this should be requesting JvmLockfileRequest or not, and then looking at the request.artifacts. It seems wrong for us to be computing the two requirements in distinct ways. But I'm having a hard time following everything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had some difficulty here. Let's discuss what a refactor would look like elsewhere.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, not a blocker, but this is odd. I think that I might have caused some of the oddity though, because global scalac_plugins have a lockfile, but don't have an associated subsystem. And so ValidatedJvmToolLockfileRequest cannot take a subsystem as an argument, because not all requests have one.

@Eric-Arellano
Copy link
Contributor

Would it make sense to have this PR only add the metadata to the lockfiles, but not yet do anything with them?

I'm having a hard time reasoning about how the validation happens, and it might change once I land very soon unifying coursier-resolve with generate-lockfiles. Also, our consumption of lockfiles is currently broken because we are using open() rather than the engine.

I agree with your intuition that we would likely benefit from refactoring the JVM lockfile code. So I'm thinking it might make sense to do that before changing any consumption code.

Regardless of what we do there, the writing of metadata looks great and seems ready to land. Wdyt?

@chrisjrn
Copy link
Contributor Author

I'm OK with adding the invalidation headers in a separate PR, but I'm hesitant to defer the remainder of the PR for long.

Christopher Neugebauer added 2 commits January 19, 2022 13:15
…ckfile-invalidation

# Conflicts:
#	src/python/pants/jvm/goals/lockfile.py
Comment on lines +60 to +67
def _header_dict(self) -> dict[Any, Any]:
"""Produce a dictionary to be serialized into the lockfile header.

Subclasses should call `super` and update the resulting dictionary.
"""

d = super()._header_dict()
return d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why override this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to override it in subclasses, and you need to keep super in place. This override here is defensive work to make sure that people who eventually add things into _header_dict in this class don't do it the wrong way.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I find that it is preferable in OO to avoid APIs that require overriding concrete methods, and to stick to completely abstract (or empty) methods or fields. But not a blocker.

@chrisjrn
Copy link
Contributor Author

Following discussion with @Eric-Arellano, we're probably good to merge this as-is. Can you review please, @stuhood?

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed over DM, there are some known issues:

  • Don't use tool.resolved_lockfiles()
  • Figure out how to get this working with scalac_plugins.py
  • Figure out how the consumption rule should be computing the ArtifactRequirements

But this gets us closer to where we want, and it's fine to fast-follow.

Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

It would be good to update the title to indicate that this is only for tool lockfiles (afaict).

Comment on lines 18 to 20
class LockfileScope(Enum):
JVM = "jvm"
PYTHON = "python"
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eric-Arellano : Not a blocker for this change, but this seems like something that should be computed from one of the @unions involved here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe which metadata class to create when generating a lockfile, but nothing else.

Comment on lines +186 to +192
lockfile = request.lockfile
requirements = await Get(
ArtifactRequirements,
GatherJvmCoordinatesRequest(
request.artifact_inputs, f"[{request.options_scope}].artifacts"
),
)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, not a blocker, but this is odd. I think that I might have caused some of the oddity though, because global scalac_plugins have a lockfile, but don't have an associated subsystem. And so ValidatedJvmToolLockfileRequest cannot take a subsystem as an argument, because not all requests have one.

Comment on lines +60 to +67
def _header_dict(self) -> dict[Any, Any]:
"""Produce a dictionary to be serialized into the lockfile header.

Subclasses should call `super` and update the resulting dictionary.
"""

d = super()._header_dict()
return d
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I find that it is preferable in OO to avoid APIs that require overriding concrete methods, and to stick to completely abstract (or empty) methods or fields. But not a blocker.

@chrisjrn
Copy link
Contributor Author

@stuhood This has support for user lockfiles now, too.

@stuhood
Copy link
Sponsor Member

stuhood commented Jan 19, 2022

@stuhood This has support for user lockfiles now, too.

So it does! I think that that means that CoursierResolvedLockfile._coordinate_not_found should be unreachable now.

@chrisjrn chrisjrn merged commit e771911 into pantsbuild:main Jan 19, 2022
Eric-Arellano added a commit that referenced this pull request Jan 25, 2022
…thRequest` (#14241)

To consume a tool lockfile, we before would:

1. Read the lockfile, either by:
     1. Using `ValidatedJvmToolLockfileRequest`, which only works if it's a `JvmToolBase`
     2. Using `GenerateJvmLockfile -> CoursierResolvedLockfile` rule, which we used with `scalac_plugins.py`
2. Pass the lockfile to `ToolClasspathRequest -> ToolClasspath`

The rule in 1.2 was not using the validation added in #14185, and it was generally confusing to have two code paths.

This unifies everything by having `ToolClasspathRequest -> ToolClasspath` now handle reading, validating, and then fetching the lockfile. 

Reading and validating a lockfile requires the same metadata we need for generating a lockfile with `generate-lockfiles` via `GenerateJvmLockfile`. So, we reuse that type for the sake of DRY.

[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jvm: Add invalidation headers to lockfiles
3 participants