Skip to content

graalvm/taming-build-time-initialization

Repository files navigation

Taming Build-Time Initialization in Native Image

Why Build-Time Initialization?

Better Peak Performance

By the semantics of the Java, access to classes, methods, or fields can cause class initialization. In just-in-time compilers (JIT) this doesn't introduce performance overheads: every class in the compiled code is initialized because the interpreter has already executed it.

In ahead-of-time compilers such as GraalVM Native Image, class-initialization checks can not be removed as this would break Java semantics. For example, a simple sequence of field accesses will get translated into a check for class initialization and field access, e.g.,

Math.PI

will become

if (!Math.class.isInitialized) { // hidden field in Native Image intrinsic
  initialize(Math.class)         // invocation of an intrinsic function
}
Math.PI

The performance overhead of extra checks becomes particularly obvious in hot code (e.g., tight loops). If the class Math is initialized at build-time, the extra check is not necessary and the code will be as performant as when using the JIT compiler.

The code example of a performance critical code where initialization is a problem can be found here.

Smaller Binary and Less Configuration

Class initialzers can pull a lot of unnecessary code into the resulting native-image although they would be functional otherwise. The good example is netty where certain classes traverse all methods to just reach a single declaration and store it into the image.

Netty is currently initialized at build time. In the past this has caused many issues with incorrect cross-library initializations. To address this issue we made a PR to change default initialization of Netty to run time but the results were somewhat dissapointing: the Netty "Hello, World!" application grew from 15 MB to 20 MB in binary size. The extra necessary config grew by more than 5x--most of the reflection configuration happens in static initializers.

Faster Startup via Heap Snapshotting

When a class is initialized at image build time its static fields are saved to the image heap in the generated executable. When the application starts up, this saved heap is mapped into memory with almost no overhead.

Parse Configuration at Build Time

Heap snapshotting can be used to, for example, parse and load a configuration or static data at image build time.

In the config-initialization example, a big list of (fake!) employee accounts in the JSON format is parsed in the static initializer of ConfigExample. By initializing this class at build time, we avoid the overhead of parsing this configuration file at runtime.

Data in this sample was generated using https://www.json-generator.com/.

Context pre-Initialization for GraalVM Languages

Another good place to use heap snapshotting is pre-initialization of language contexts. For example, in GraalVM JS the frist context is initialized and stored into the javascript image. This makes the "Hello, World!" in JS more than 55% less expensive. With context pre-intialized we have 5,367,730 instructions executed

$ valgrind --tool=callgrind ../jre/bin/js -e 'print("Hello, World!")'
...
==1729206==
==1729206== I   refs:      5,367,730

while without the context stored in the image we have 12,101,651

$ valgrind --tool=callgrind ../jre/bin/js-no-context -e 'print("Hello, World!")'
...
==1729206==
==1729206== I   refs:      12,101,651

The results are even better for Ruby where we have a reduction from 56 ms to 14 ms with the pre-initialized context.

Rules of Build-Time Initialization and Heap Snapshotting

Types of Classes in GraalVM Native Image

In GraalVM Native Image there are three possible initialization states for each class:

  1. BUILD_TIME - marks that a class is initialized at build time and all of static fields that are reachable are saved in the image heap.
  2. RUN_TIME - marks that a class is initialized at run time and all static fields and the class initializer will be evaluted at run time.
  3. RERUN - internal state that means BUILD_TIME by accident. Static fields and class initializers will be evaluated at run time.

Properties of Build-Time Initialized Classes

  1. All classes stored in the image heap must be initialized at build time. This is necessary as accessing an object through a virtual method could execute code in an object doesn't have consistent state--static initializer has not been executed.
  2. All super classes, and super interfaces with default methods, of a build-time class must be build-time as well.
  3. Code reached through the class initializer of a build time class, must be either marked as BUILD_TIME or RERUN. In the example of JSON parsing at build time, most of the jackson library is initialized at build time.

Proving a Class is Build-Time Initialized

The default for GraalVM Native Image is that classes are initialized at run time. However, for performance reasons, Native Image will prove certain classes safe to initialize and will still initialize them.

Proving Safe Initialization During Analysis and After Analysis

GraalVM Native Image can prove classes safe in two places:

  1. During Analysis - all of the static fields will be folded during analysis and the resulting image size can be smaller. These proofs work on simple class initializers without recursion or cyclic dependencies.
  2. After Analysis - the fields will not have an effect on static analysis.

The best place to see the types of classes that can be proven early is the test for class initialization.

Limitations of Heap Snapshotting

Every object can't be stored in the image heap. The major categories of objects are the ones that keep the state from the build machine:

  1. Objects containing build-system information, e.g., open files (java.io.FileDescriptor).
  2. Objects containing host VM data, e.g., running threads and continuations.
  3. Objects pointers to native memory (e.g., java.nio.MappedByteBuffer)
  4. Known random seeds (impossible to prove no random seed ends up in the image)

Properties for Run-Time Classes

  1. All sub-classes of a run-time class (or interface with default methods) must also be a runtime class. Otherwise, initialization of that class would also initialize the run-time class. (Inverse rule from the rule of build-time initialization.)

  2. Run-Time initialized classes must not end up in the image heap.

Hidden Dangers of Class Initialization

Security Vulnerabilities: Cryptographic Keys, Random Seeds, etc.

Storing security-sensitive information such as private keys or having a PRNG in static fields of classes initialized at build time is a recipe for trouble. The keys in such classes would remain in the image executable, readily discoverable by Eve. PRNGs in static fields initialized with a random seed during the image build would always use the same seed, leading to the exact same sequence of numbers being generated in every application run.

In the security-problems example, the following problematic fields are initialized at build time:

public class SecurityProblems {
   ...
   // Will "bake" the private key found during the image build into the image!
   private static final PrivateKey runtimeSuppliedPrivateKey = loadPrivateKey();

   // Will always contain the same random seed at image runtime!
   private static final SimplePRNG randomNumberGenerator = new SimplePRNG(System.currentTimeMillis());
   ...
}

The bytes of the private key will be embedded in the image heap, and while it may take a bit of time to analyze the executable, it is possible to retrieve and compromise it. The simple random number generator will be initialized with a random seed (never use the current time as the random seed in a real app!) at image build time. This seed will not change between subsequent runs:

$ ./target/org.graalvm.securityproblems
An entirely random sequence: 2843 5686 3435
$ ./target/org.graalvm.securityproblems
An entirely random sequence: 2843 5686 3435

Host-Machine Data Leakage

Storing paths in static fields of classes initialized at build time can leak information about the machine used to build the image. A prime example of this is storing System.getProperty("user.home") in a static field. However, contents of any file or directory structure that is saved into the image heap can fall into this category.

In the security-problems example, the following problematic field is initialized at image build time:

public class SecurityProblems {
   ...
    // Will leak the user's home directory of the user building the image!
    private static final String USER_HOME = System.getProperty("user.home");
    ...
}

Regardless of where the final image is executed, USER_HOME will always contain the user.home path on the original machined used to build the image. A basic check for these directories in the image heap is provided and can be enabled with -H:+DetectUserDirectoriesInImageHeap.

Correctness

Read a Property from the Build Machine and Always use it in Production

Let us look at INetAddress where the IP preference is determined in the class initializer:

  static {
     String str = java.security.AccessController.doPrivileged(
                new GetPropertyAction("java.net.preferIPv6Addresses"));
        if (str == null) {
            preferIPv6Address = PREFER_IPV4_VALUE;
        } else if (str.equalsIgnoreCase("true")) {
            preferIPv6Address = PREFER_IPV6_VALUE;
        } else if (str.equalsIgnoreCase("false")) {
        ...

This was initialized at build-time in Native Image that caused a bug.

Simple Code Changes can Cause Unintended and Unknown Correctness Problems

If anywhere in the code that is reachable from static initializers we introduce reading a system property.

The writer of the code can't know if the property will be used in the static initializer. For example, the writer of ReadPropertyHolder does not know who could use this class in build-time initialization.

This especially doesn't play well when initialization is crossing the library boundaries.

Crossing the Library Boundaries

Initializing classes at build time in one library can unintentionally ripple and wrongly initialize classes in a different library. The most widespread example of cross-library initialization victims are logging libraries.

Most Java frameworks have the following structure:

public class MyBuildTimeInitClass {
   ...
   private static final Logger logger = MyFrameworkLogFactory.getLogger(MyBuildTimeInitClass.class);
   ...
}

If the underlying logging library is configurable by the user, buildtime initialization of the above class would wrongly initialize any of the selected logging library classes at build time.

Code Compatibility

Initializing Run-Time Classes Unintentionally as a Consequence of Build-Time Initialization

Parsing the configuration during build time comes with a major caveat: in the config-initialization example, the library used to parse the data, jackson, must not be referenced by any code at runtime. Doing so will result in:

com.fasterxml.jackson.databind.SerializationConfig was unintentionally initialized at build time. To see why com.fasterxml.jackson.databind.SerializationConfig got initialized use --trace-class-initialization=com.fasterxml.jackson.databind.SerializationConfig
com.fasterxml.jackson.annotation.JsonSetter$Value was unintentionally initialized at build time. To see why com.fasterxml.jackson.annotation.JsonSetter$Value got initialized use --trace-class-initialization=com.fasterxml.jackson.annotation.JsonSetter$Value

The proposal for fixing this problem is to allow classes initialized at build-time that are not marked with --initialize-at-build-time, to be used at runtime without affecting the state of the classes that were proven safe by the image builder.

Making a class intialized at Run Time Stored in the Image Heap

This can happen accross the library boundaries through values returned by regular functions (possibly written by a thrid party).

Changing a Class from Build Time to Run Time is a Backwards Incompatible Change

  1. Explicit changes in the configuration. See, for example, the changes in Netty that occured over time. Each was a breaking change for the rest of the community.
  2. Modifying code so that it can't be initialized at build-time anymore: e.g. dissallowed heap objects stored to build-time classes.

Image Bloating by Using Inadequate Data Structures

In the config-initialization example, the collections holding the parsed data will be written to the image heap in the executable. Such collections will introduce size overhead:

  • Size of the image with parsing the config at buildtime: 58 MB
  • Size of the image without parsing the config at buildtime: 30 MB
  • Size of the data file: 15 MB

  • Total overhead: 13 MB

To fix this it is recommended to use lean data-structures (e.g., EconomicMap from Graal or trimmed ArrayLists).

Build-Time Class Initialization Without Regret

Inspecting the Results of Build-Time Initialization

To see how and where a class got initialized we introduce a flag -H:+PrintClassInitialization. This flag will output for each class where the decision is coming from and why it got initialized. An example of the output is a CSV file show when classes were proven:

Class Name, Initialization Kind, Reason for Initialization
boolean, BUILD_TIME, primitive types are initialized at build time
boolean[], BUILD_TIME, arrays are initialized at build time
...
com.oracle.graal.compiler.enterprise.BulkAllocationSnippetTemplates, BUILD_TIME, Native Image classes are always initialized at build time
...
com.oracle.svm.core.heap.Target_jdk_internal_ref_SoftCleanable, BUILD_TIME, substitutions are always initialized at build time
com.oracle.svm.core.heap.Target_jdk_internal_ref_WeakCleanable, BUILD_TIME, substitutions are always initialized at build time
...
io.netty.bootstrap.AbstractBootstrap, BUILD_TIME, from jar:file:///<path>/substratevm-netty-hello-world-1.0.0-SNAPSHOT.jar!/META-INF/native-image/io.netty/common/native-image.properties (with 'io.netty.util.AbstractReferenceCounted') and from jar:file:///<path>/substratevm-netty-hello-world-1.0.0-SNAPSHOT.jar!/META-INF/native-image/io.netty/codec-http/native-image.properties (with 'io.netty')
...
sun.util.calendar.ZoneInfoFile$Checksum, RERUN, from feature com.oracle.svm.core.jdk.LocalizationFeature.addBundleToCache with 'class sun.util.resources.cldr.CalendarData'

Rewrite the Code so Native Image can Prove Critical Classes

For this we will use the example with the inverse square root decision made by the property. By simply rewriting the code example

private static final boolean fastSquareRoot = ReadPropertyHolder.useFastInverseSquareRoot();

private static boolean useFastSquareRoot() {
    return fastSquareRoot;
}

into

 private static class FastSquareRootHolder {
    static final boolean fastSquareRoot = ReadPropertyHolder.useFastInverseSquareRoot();
 }
 
 private static boolean useFastSquareRoot() {
    return FastSquareRootHolder.fastSquareRoot;
}

will make the code fast.

Hand-Pick Classes Important for Build-Time Initialization

Sometimes proofs are impossible (e.g., Netty PlatformDependent0) but we still need to initialize this class at build time.

The soultion is simple, re-write the code of the class so it can be initialized at build-time. For that we can use the system properties injected by GraalVM Native Image. In the avoiding-library-initialization example, AvoidingLibraryInitialization could be initialized at build-time if it did not have a static logger.

To work around this, we refactor the logger creation to a utility method:

    private static Logger getLogger() {
        if ("buildtime".equals(System.getProperty("org.graalvm.nativeimage.imagecode"))) {
            return NOPLogger.NOP_LOGGER;
        } else {
            return LoggerFactory.getLogger(AvoidingLibraryInitialization.class);
        }
    }

During image build-time, calls to getLogger will return a no-op logger and avoid initializing (and subesequently, configuring) logging at build-time. Native-image exposes the org.graalvm.nativeimage.imagecode system property that can contain:

  • null: code is executing on regular Java
  • buildtime: code is executing in the image builder
  • runtime: code is executing in the image, at runtime

In the example, logging is configured using logback.xml. Initializing the logger at build-time would also unintentionally initialize XML parsing at build-time, creating an issue if XML is used elsewhere in the code.

Debugging Class Initialization

Two useful options for debugging class initialization problems are:

  • --trace-class-initialization=
  • --trace-object-instantiation=

These options instruct the native-image builder to trace initialization/object instantiation of given classes.

If a given class is wrongly initialized at build-time, a trace with the culprit class is printed. In the class-initialization-tracing example, class A wrongly initializes class B and the tracing gives us:

org.graalvm.A caused initialization of this class with the following trace: 
	at org.graalvm.B.<clinit>(ClassInitializedByAccident.java)
	at org.graalvm.A.<clinit>(ClassInitializedByAccident.java:10)

If an object of a forbidden class (e.g. java.lang.Thread) is instantiated in the image builder and is reachable at runtime, a trace showing us how the object got instantiated is printed. In the object-instantiation-tracing example, SneakyRunningThread starts a thread during the image build. Tracing object instantiaton of java.lang.Thread gives us:

Error: Detected a started Thread in the image heap. Threads running in the image generator are no longer running at image runtime.  Object has been initialized by the org.graalvm.SneakyRunningThread class initializer with a trace: 
 	at java.lang.Thread.<init>(Thread.java:489)
	at org.graalvm.SneakyRunningThread.<clinit>(SneakyRunningThread.java:7)
. Try avoiding to initialize the class that caused initialization of the Thread. The object was probably created by a class initializer and is reachable from a static field. You can request class initialization at image runtime by using the option --initialize-at-run-time=<class-name>. Or you can write your own initialization methods and call them explicitly from your main entry point.
Detailed message:
Trace: Object was reached by 
	reading field org.graalvm.SneakyRunningThread.sneakyThread

About

Demos for the build-time initialization blog post

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published