Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include buildtool support for bundling reachability metadata into jar files #322

Merged
merged 4 commits into from
Oct 12, 2022

Conversation

philwebb
Copy link
Contributor

@philwebb philwebb commented Sep 28, 2022

I believe that this request was discussed recently with @bclozel who kindly wrote the following:

Motivation

Currently, Native Build Tools build plugins fetch reachability metadata from the official repository right before building a native image of the application.
While the library in charge of fetching the metadata repository is available and can be reused in different contexts, there is still a non-trivial amount of code required to do the following:

  • get reachability metadata for the libraries in the application classpath
  • apply filtering options described in the plugin configuration

Frameworks using NBT can generate their own reachability metadata before packaging the application as a JAR.
It would be useful if vanilla or shaded JARs could be packaged with all the required reachability metadata (Framework + metadata repository) so that such JARs could be used by CI or cloud infrastructure to generate a native image directly.

This would provide the following advantages:

  • the application JAR is the single source for running the application in JVM mode, or building it as a native binary
  • this enables build reproducibility - without this change, building the JAR as a native binary at different times could yield different results if the shared repo has changed.

Requested behavior

The current change proposal would require a new build tool task, for the Maven and Gradle plugins. Those tasks would:

  1. Fetch the repository, collect the relevant metadata, apply filters and in general the plugin configuration related to this step
  2. Write the resulting metadata to a location configurable in the plugin. By default the location should align with the build tool defaults to pack these files with the resulting JAR.
  3. Be designed to be run independently of other NBT tasks
  4. Be designed to be called by Framework plugins or as a task/goal directly by the developer

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label Sep 28, 2022
@oracle-contributor-agreement
Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

@philwebb
Copy link
Contributor Author

There are four commits in this pull-request. The first updates common support code to enable copying. The second adds Maven support and the third adds Gradle support.

The fourth commit is a minor refactoring that attempts to remove duplicated code. It isn't directly related to this pull-request and could be dropped or submitted as another PR if required.

@aclement
Copy link
Collaborator

approved by me as a contribution. Will let the engineers comment on the details :)

@melix
Copy link
Collaborator

melix commented Sep 29, 2022

Thanks for the PR, @philwebb ! I took a quick look at the PR and will submit some review comments. Regarding the feature itself, I think it's a reasonable thing to have an option to copy the metadata in some output directory, even if technically speaking, as we see today, this is not required for native image to build. It's interesting for debugging purposes, and possibly for what you are mentioning. However...

Frameworks using NBT can generate their own reachability metadata before packaging the application as a JAR.
It would be useful if vanilla or shaded JARs could be packaged with all the required reachability metadata (Framework + metadata repository) so that such JARs could be used by CI or cloud infrastructure to generate a native image directly.

I'm still not convinced that bundling metadata in the jar is the right thing to do. I understand that this is aimed at final applications, not libraries, but it could create a precedence for what we ask folks not to do (have a jar provide metadata about transitive libraries).

the application JAR is the single source for running the application in JVM mode, or building it as a native binary

I don't think that stands true. native-image has more options than just metadata. There are command line flags, or ordering issues (order of jars on classpath, ...) which are also to take into consideration. While it may be possible to build a native image with just a single jar, this isn't generally the case. It's even more unlikely if the 3rd party dependencies are repackaged, in which case metadata would be just wrong.

this enables build reproducibility - without this change, building the JAR as a native binary at different times could yield different results if the shared repo has changed.

This is only the case if you don't use the native build tools, right? Because we are reproducible: the repo has a default version, and if not using the default version, you need to provide the repository itself, which means that even if the repository changes, you would still build against the same version of the repository. So I understand this feature is in case you don't use a build tool to build the native image and try to emulate the feature.

I wouldn't recommend, personally, to do this. For Micronaut for example, we provide a way to build native images in Docker, using the same build process, so it doesn't matter if you build locally or on CI. So this feature probably only makes sense if, for some reason, you plan to not use NBT, in which case you need:

  1. the application jar and classpath
  2. native image metadata (this PR) in some form
  3. the whole command line options which would have been used if going through the NBT

So I'm not against this PR, but we should be clearer with regards to what it is suitable for, and I would not recommend to put the metadata in the jar (actually I would suggest to publish another artifact, possibly with a classifier, which only contains metadata, and possibly other "build reproducibility" metadata).

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Sep 29, 2022
@philwebb
Copy link
Contributor Author

I'm still not convinced that bundling metadata in the jar is the right thing to do. I understand that this is aimed at final applications, not libraries, but it could create a precedence for what we ask folks not to do.

The Maven goal and Gradle task are pretty generic at the moment, the only problem is the defaults that I picked. If we change things so that the destination always needs to be specified, would that be better?

That way we're adding general purpose tasks/goals without guiding the user in potentially bad practices. I can also change the documentation to remove any suggestion of adding files to jars.

This is only the case if you don't use the native build tools, right? Because we are reproducible

Right. If you're always using NBT then the metadata repository is pinned and things are repeatable. The reproducibility only happens if you want some other process to create the native image. In that case, for the other process to work it will either need the metadata, or it will need to know the GAVs of dependencies and the version of the metadata repository to use.

For Micronaut for example, we provide a way to build native images in Docker, using the same build process, so it doesn't matter if you build locally or on CI.

I think that's something we'd like to do eventually as well. There's a lot less ceremony in creating a Docker image for a native executable. It's basically a from JVM and here's a layer with the executable.

There is one benefit of having Source -> Jar -> Native Image, namely that you can take the same jar and create native images for lots of different architectures. As I understand it, you can't currently create an MacOS executable on a Linux OS.

So I'm not against this PR, but we should be clearer with regards to what it is suitable for, and I would not recommend to put the metadata in the jar (actually I would suggest to publish another artifact, possibly with a classifier, which only contains metadata, and possibly other "build reproducibility" metadata).

If we make the goal/task general purpose then I think folks could use it to create copy the files wherever they see fit. I suspect it would be quite easy to configure a Gradle build to create a distinct artifact. I imagine it's also possible with Maven, albeit with more configuration.

Thanks for taking the time to look it. I'll await your feedback then update the PR.

Copy link
Collaborator

@melix melix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added comments on the implementation for the Gradle plugin, I haven't looked at the Maven side yet (possibly @alvarosanchez can look?).

@melix
Copy link
Collaborator

melix commented Oct 4, 2022

As long as we don't do something by default for the jar (or tasks of type Jar), I think we're good.

If we make the goal/task general purpose then I think folks could use it to create copy the files wherever they see fit. I suspect it would be quite easy to configure a Gradle build to create a distinct artifact.

Yes, that would be pretty straightforward.

I imagine it's also possible with Maven, albeit with more configuration.

No idea.

@philwebb
Copy link
Contributor Author

philwebb commented Oct 5, 2022

@melix I've forced pushed an update addressing the previous review comments. The only one I'm not too sure about is registering the task by default. I've gone with a simple getTasks().register(...) call and left the conventions as a configureEach callback. I hope that's what you meant, but let me know if I've got it wrong.

I picked nativeReachabilityMetadata as that task name. There's already nativeBuild, nativeTest, nativeBuild, nativeTestBuild so the native prefix seemed sensible. I noticed that there's also a metadataCopy task that's registered. I wonder if at some point this should be renamed to something like nativeAgentMetadata or nativeAgentMetadataCopy?

@sdeleuze
Copy link
Collaborator

sdeleuze commented Oct 5, 2022

Not sure I would prefix pure metadata generation by native since this prefix implies a native compilation and metadata are designed to be not tied to native, but I agree that's a grey area given the name of the plugin. I think I would be fine with metadataGenerate or something like that, up to @melix to decide.

I will discuss the reachability-metadata.properties thing with @vjovanov and share my feedback here later, see my related comment here.

@melix
Copy link
Collaborator

melix commented Oct 5, 2022

It happens that we already have a metadataCopy task (https://github.com/oracle/graalvm-reachability-metadata/blob/master/docs/CollectingMetadata.md) which is aimed at copying metadata collected by the agent into the local sources. It's unfortunate that we have a name clash because this would have made sense to use this task name.

So maybe, for this new task, we should use a different name like collectReachabilityMetadata?

@sdeleuze
Copy link
Collaborator

sdeleuze commented Oct 5, 2022

What about metadataGenerate since here it is IMO more a generation than a copy? Also that would be closer to current naming pattern.

@melix
Copy link
Collaborator

melix commented Oct 5, 2022

What about metadataGenerate since here it is IMO more a generation than a copy? Also that would be closer to current naming pattern.

The task already exists as metadataCopy, so it would be a breaking change. One could argue that it is a copy, since its responsibility is to copy the metadata from the agent output directory into another place. So I tend to think it's ok to keep metadataCopy as is today, and use collectReachabilityMetadata for the new task, which has a more explicit meaning imo.

@philwebb
Copy link
Contributor Author

philwebb commented Oct 5, 2022

I like collectReachabilityMetadata. I'll update the PR and take another look at the naming used in the Maven plugin. Thanks @melix!

@sdeleuze
Copy link
Collaborator

sdeleuze commented Oct 5, 2022

After discussing with @vjovanov, I think we should move forward with what you implemented @philwebb with reachability-metadata.properties

Update `DirectoryConfiguration` with a new `copy` method that
can be used to copy matadata files into `META-INF/native-image`
directories.

The `DirectoryConfiguration` class now has group, artifact and
version information so that it can write files into the
appropriate folder. The `copy` method will also write a
`reachability-metadata.properties` file that indicates if the
'override' property was set on the metadata.
@philwebb
Copy link
Contributor Author

philwebb commented Oct 6, 2022

Forced push with the suggested task name.

@melix
Copy link
Collaborator

melix commented Oct 10, 2022

@alvarosanchez do you have time to review the Maven side of this PR or should I give it a try?

docs/src/docs/asciidoc/maven-plugin.adoc Outdated Show resolved Hide resolved
Update the maven plugin with a new `add-metadata-hints` goal that can
be used to bundle hints obtained from the metadata repository into
the jar.
Update the gradle plugin with a new `ReachabilityMetadataCopyTask`
class that can be used to copy hints obtained from the metadata
repository.
Refactor the `configureJvmReachabilityConfigurationDirectories`
and `configureJvmReachabilityExcludeConfigArgs` to use a common
method since they share the same logic.
@melix melix added this to the 0.9.15 milestone Oct 12, 2022
@melix melix merged commit a4592a7 into graalvm:master Oct 12, 2022
@philwebb
Copy link
Contributor Author

Thanks everyone for the feedback and fast turnaround.

@sdeleuze
Copy link
Collaborator

Thanks to you for this epic contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants