Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embed a graal native-image configuration inside the zinc wrapper #7506

Conversation

@cosmicexplorer
Copy link
Contributor

commented Apr 5, 2019

Problem

native-image is a tool from the Graal project which can compile jvm bytecode to a native executable with very short startup time.

As of 1.0.0rc13, native-image became able to use special files in META-INF/ to configure the command-line arguments and json configuration it needed to run, described in https://medium.com/graalvm/simplifying-native-image-generation-with-maven-plugin-and-embeddable-configuration-d5b283b92f57. This allows us to publish a zinc artifact which works the exact same when invoked by java, but can also be made into a native-image without any further configuration.

Solution

  • Add the reflective/resource access json files into META-INF/ in the zinc jar, as well as substitutions.
  • Add descriptive readmes linking to external documentation and blog posts about Graal which were used to construct the configuration being added.

Result

#8036 (once the artifacts from this PR are published to maven central) will be able to remove an extra git clone from its script which does all of the configuration automatically!

@illicitonion
Copy link
Contributor

left a comment

Exciting stuff! I would appreciate a lot more documentation so that I can hopefully make changes in the future as needed, rather than having to ask you to :)

Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/zinc/zinc_compile.py Outdated
Show resolved Hide resolved src/python/pants/backend/jvm/tasks/jvm_compile/zinc/zinc_compile.py Outdated
Show resolved Hide resolved src/python/pants/java/executor.py Outdated
Show resolved Hide resolved src/python/pants/java/executor.py Outdated
@@ -1,6 +1,9 @@
# Copyright 2017 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

# NB: To build a native-image of zinc *for a specific scalac version*, the scala path

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 5, 2019

Contributor

Currently these can be specified at runtime; does this mean we need to stop accepting these flags (both in pants and in our zinc wrapper)?

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 5, 2019

Member

If you passed the flags to pants, you would presumably want a new image bootstrapped.

But yea, would need to remove the flags from zinc.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 7, 2019

Author Contributor

What #6893 does is generate native images on demand, and I almost but didn't quite get to just making that into its own executor yet. I think the approach of making a separate execution strategy for this will work and can be done at runtime. This is still an open question right now.

.withLogLevel(settings.consoleLog.javaLogLevel)
.withPositionMapper(positionMapper)
)
val reporter = ReporterUtil.getReporter(DummyLogger, ReporterManager.getDefaultReporterConfig)

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 5, 2019

Contributor

Do we understand what the root cause is that meant we needed to do this, so that we can be confident that there isn't other code in Zinc that will fail, or behave differently, which we may not have hit in testing yet?

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 7, 2019

Author Contributor

The normal logging actually works now after making a very simple substitution (see the TODO I've added in the PR description), but adds a whopping 5 seconds to startup time compared to the DummyLogger. But I intend to add very clear documentation on how every build or runtime failure was investigated and worked around before I mark this un-WIP.

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 15, 2019

Contributor

Do we know why using the standard logger adds 5 seconds of startup time? That sounds scary...

@@ -0,0 +1,29 @@
package scala.tools.nsc.substitutions;

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 5, 2019

Contributor

Can you add a comment at the top of this file (or in a README) explaining what it does, how these substitutions were reached at, and when people should add/remove things to this file?

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 5, 2019

Member

I'm not sure that we're going to be able to publish things in this package to maven-central.

If they need to live at this package location, we might need to dynamically add them to the image at native-image build time...?

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 7, 2019

Author Contributor

I'm 90% sure that the package names are arbitrary, this was ripped from the example at https://github.com/graalvm/graalvm-demos/tree/master/scala-days-2018/scalac-native and this package doesn't actually exist in scalac.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 7, 2019

Author Contributor

Also, I intend to add very clear documentation on how every build or runtime failure was investigated and worked around before I mark this un-WIP.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 12, 2019

Author Contributor

I have added a README.md describing where these substitutions came from and how they are used.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 12, 2019

Author Contributor

I have also modified the package names to be nested in the pantsbuild package namespace.

Show resolved Hide resolved ...a/META-INF/native-image/org/pantsbuild/zinc/compiler/reflect-config.json
Show resolved Hide resolved ...TA-INF/native-image/org/pantsbuild/zinc/compiler/native-image.properties
@stuhood
Copy link
Member

left a comment

Thanks!

Show resolved Hide resolved src/scala/META-INF/native-image/org/pantsbuild/zinc/compiler/README.md Outdated
@@ -1,6 +1,9 @@
# Copyright 2017 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

# NB: To build a native-image of zinc *for a specific scalac version*, the scala path

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 5, 2019

Member

If you passed the flags to pants, you would presumably want a new image bootstrapped.

But yea, would need to remove the flags from zinc.

@@ -0,0 +1,29 @@
package scala.tools.nsc.substitutions;

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 5, 2019

Member

I'm not sure that we're going to be able to publish things in this package to maven-central.

If they need to live at this package location, we might need to dynamically add them to the image at native-image build time...?

@cosmicexplorer cosmicexplorer changed the title embed a graal native-image configuration inside the zinc wrapper [WIP] embed a graal native-image configuration inside the zinc wrapper Apr 7, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Marked this WIP, which I should have done in the first place (sorry!). Will remove the changes to pants code and move out the testing into a separate script. Now that graal 1.0.0rc15 is released (which has a fix needed to build zinc from oracle/graal#1071), we can probably improve on this to actually add testing. I will ping reviewers when this is ready for review again.

@cosmicexplorer cosmicexplorer moved this from Done to In progress in native-image for remote execution Apr 8, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch from 3831144 to 8c0048b Apr 12, 2019

@cosmicexplorer cosmicexplorer changed the title [WIP] embed a graal native-image configuration inside the zinc wrapper embed a graal native-image configuration inside the zinc wrapper Apr 12, 2019

@cosmicexplorer cosmicexplorer requested review from illicitonion and stuhood Apr 12, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Apr 12, 2019

Tentatively un-marking this as WIP after making a push to document how each file is created.

@stuhood
Copy link
Member

left a comment

Looks great, thanks.

@@ -12,6 +15,7 @@ scala_library(
'3rdparty/jvm/com/martiansoftware:nailgun-server',
'3rdparty/jvm/org/scala-lang/modules:scala-java8-compat',
'3rdparty/jvm/org/scala-sbt:io',
'3rdparty/jvm/org/scala-sbt:sbt-compiler-bridge',

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 12, 2019

Member

Since we're planning to build the image dynamically, we should probably not add the compiler-bridge dep here... or we should add a separate target in this file (zinc-for-native) that you would use if you wanted to build the image for a particular scala version.

So: yes to config being added to the main target, but no to adding the bridge?

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 12, 2019

Author Contributor

So I might be doing this wrong, but I believe the difference between this and the scala compiler jars (which are added to the command line which actually creates the native image) is that the sbt compiler bridge is not pinned to a specific scala patch version (in the way that e.g. org.scala-lang:scala-library:2.12.3 is pinned). We also want to make sure that the compiler-bridge version is the same as the zinc version, which we get if we add it to the jar here, but would have to somehow ensure if we added it later. In terms of "where the classpath entries should be located", I think inserting this dep in here makes sense, as it's definitely going to be needed anyway at build time, and doesn't depend on the (specific) scala version we're compiling for.

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 12, 2019

Member

The zinc bootstrapper takes the sources of the jar, and then compiles them on a language-by-language basis:

opt[File]("compiler-bridge-src")
.required()
.valueName("<file>")
.action((x, c) => c.copy(compilerBridgeSource = x))
.validate { file =>
if (file.exists) success else failure(s"$file does not exist.")
}
.text("Compiler bridge source code.")

I think the effect of what you're doing here is to embed a pre-compiled version... but it will be specific to whatever version of scala you are currently using.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 14, 2019

Author Contributor

Ok. I'll have to add a method for additional classpath entries at build time just to inject the scala compiler/library/etc, so there's no reason that wouldn't be able to contain the compiler-bridge jar.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 14, 2019

Author Contributor

but it will be specific to whatever version of scala you are currently using.

But also, how does coursier/etc resolve this then, if the jars are only marked with _2.12, and the version (1.1.7) isn't related to the scala version? I don't understand what information coursier has that could let it be attached to a single scalac version.

@stuhood

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

(love the readme by the way: thanks!)

@illicitonion
Copy link
Contributor

left a comment

This looks great, thanks!

I'm still super curious about the logging initialisation time - I'm kind of scared that we'll find that some other zinc operation takes a long time that we haven't happened to profile... Would be interested to know what is happening in those 5 seconds...

The Graal VM's `native-image` tool[^1] converts JVM bytecode to a native compiled executable[^2] executing via the Substrate VM. The `native-image` tool is not yet immediately compatible with all JVM code[^3], but the tool is stable and featureful enough to successfully build many codebases with a mixture of (mostly) automated and (some) manual configuration.

The `native-image` tool accepts JSON configuration files which overcome some of the current limitations[^3]:
1. `reflect-config.json`: The largest file, this would typically be generated automatically[^5].

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 15, 2019

Contributor

Can you add the literal command lines you used to generate these files, so that I can re-generate them if I want to by copying and pasting those exact command lines? (And so that I can easily work out what manual edits were made, by running diff :))

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 16, 2019

Author Contributor

I would need to regenerate these files by running the java agent again on linux, as the reflect configuration currently contains:

  1. output from a run of zinc -help with the java agent from a previous graal release
  2. manual edits
  3. output from a run of zinc as used in a pants compile command line with the java agent from the current graal release

(3) was created by running ./pants -ldebug binary zinc:compiler, stealing one of the command lines, then adding the java agent to the command line. Some of the entries in the reflect-config.json which were produced from the runs with the java agent had to be removed, as they caused failures (NoClassDefFound) when building native-image. See this comment on the graal PR required to make building zinc work: oracle/graal#1071 (comment) -- the java agent is "still in an experimental state", and the output does change across versions, and while jq --sort-keys . will sort keys of individual objects, it doesn't sort different entries of a JSON array in the output, and the output of the native-image-configure tool is not guaranteed to be deterministically sorted.

Also, there's no guarantee we're exercising all the right logic paths -- see https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md:

It is advisable to manually review the generated configuration files. Because the agent observes only code that was executed, the resulting configurations can be missing elements that are used in other code paths. It could also make sense to simplify the generated configurations to make any future manual maintenance easier.

If we were to attempt to describe an automated approach, I think we would need to change the approach to generating these to use the environment variable approach mentioned in https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md to invoke zinc exactly the way pants is, and then to use the native-image-configure tool on the output generated from that.

If the result works, then great. But I'm not sure that given the current instability of the native-image-configure tool and surrounding code that it would be worth the time to create that harness right now. I can provide the list I gave above on how this was generated, at least.

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 16, 2019

Contributor

The thing I'm scared of here is that if we need to re-generate them (say, graal fixes a bug, or we update zinc, or we pick up a new compiler), it sounds like it would be hard for you to re-make these files, let alone for me to... And it also sounds like I would need to re-work-out all of the lessons you already learnt to do the manual changes, because I can't know what's different because graal changed vs what's different because you manually modified it.

I'm not saying that I need a script that produces the files exactly as-is, but a script that I can run which does everything except the manual tweaks you made, so that I can run it and run git diff against what's checked in to see what you manually did, would be really really useful.

It sounds like this would take the form roughly of an invocation to the java agent tool (at a particular version), with a single zinc command line invocation, at a particular pants sha, and then piping that to jq --sort-keys .? Can we try to write down that command line?

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 19, 2019

Author Contributor

Yes -- and because I've found that the java agent which generates these files actually works great on OSX (even though it says it's only supported for Linux), and also that it supports the config-merge-dir option (which automates the process of generating a config across multiple invocations), producing these command lines will be significantly easier than I expected.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch 2 times, most recently from d6f7c4a to 1739d53 May 21, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch 2 times, most recently from 2bfee88 to 2ffca9a Jun 2, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch 2 times, most recently from 0196723 to 7edf270 Jun 27, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch from 7edf270 to ba08d63 Jul 11, 2019

make the zinc-compiler binary native-image compatible
sort json files

this doesn't work at all

the `+ args` part is what made it work

add dummy logger

enable using the normal sbt logger by copying sbt logging code into a substitution

Revert "enable using the normal sbt logger by copying sbt logging code into a substitution"

This reverts commit afcb8ab.

update DummyLogger to BareBonesLogger

remove hacks for testing

add some ~documentation~

appropriately namespace the substituted classes

add a public class to the java substitutions library to avoid a javadoc error

remove sbt-compiler-bridge from the zinc jar

add hacks

add LD_LIBRARY_PATH hacks

HACK

add options to zinc compile to use a provided native-image binary

remove all the configs!

remove commented-out code!

remove devel zinc jar lib decl!

remove old native-image hacks

remove accidentally-committed files

revert more
add hacky protobuf override to match source to avoid bytecode error w…
…hen native-image building -- DO NOT PUSH YET

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch from ba08d63 to 4bb8cb7 Jul 12, 2019

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor Author

commented Jul 14, 2019

This appears to be getting bit by #7860 again -- will figure out what to do with that before merging.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch from 0e0ae1e to 4bb8cb7 Jul 15, 2019

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:native-image-ready-zinc-wrapper branch from 4bb8cb7 to 1fb3183 Jul 15, 2019

@cosmicexplorer cosmicexplorer merged commit 534bed4 into pantsbuild:master Jul 15, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

native-image for remote execution automation moved this from In progress to Done Jul 15, 2019

stuhood added a commit to twitter/pants that referenced this pull request Jul 16, 2019

stuhood added a commit that referenced this pull request Jul 16, 2019

Add provides clauses for new zinc deps. (#8055)
In order to publish zinc, its new dependencies from #7506 must name their artifacts via provides clauses.

cosmicexplorer added a commit that referenced this pull request Jul 24, 2019

add script to generate zinc native-images, with example usage (#8036)
### Problem

A script to automatically create zinc native-images from pants for JVM code is currently being created in #7506. The script will infer macro usages from the compiler in an initial run with the native-image java agent, described in https://github.com/oracle/graal/blob/master/substratevm/CONFIGURE.md. This requires providing a library built from the graal repo on the runtime library search path.

### Solution

- Pipe in `LD_LIBRARY_PATH` or `DYLD_LIBRARY_PATH` into the hermetic zinc invocation.
- Pull down graal and other dependencies by `git clone` for now.
- Use `./pants classmap` to get the targets providing the classes corresponding to macros, and generate a `BUILD` file with a fake `jvm_app()` bundling all those targets.

### Result

A script works to generate zinc images for arbitrary scala code!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.