Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-39406] Add new class initialization strategy that allows all classes to be used at image build time. #4684

Merged
merged 1 commit into from
Jul 2, 2022

Conversation

graalvmbot
Copy link
Collaborator

@graalvmbot graalvmbot commented Jun 30, 2022

This PR adds a less strict class initialization mode that allows all classes to be used at image build time. It can be enabled with -H:+UseNewExperimentalClassInitialization.

The new approach is functional, i.e., it can be used for experiments to see if the new approach is useful and to see if there are compatibility problems.. It is not optimized yet, i.e., far more classes will be initialized at image run time so peak performance will be lower for now. The new approach will only be enabled by default once all optimizations of the current approach are also available for the new approach.

Current approach

  • The user can configure whether a class should be initialized at image build time or at image run time. Various equivalent ways for this exist: the --initialize-at-build-time option, the org.graalvm.nativeimage.hosted.RuntimeClassInitialization class then can be called from a Feature, ...
  • If a class that is not explicitly marked for initialization at build time gets initialized by accident, the image build is aborted and an error is reported to the user.
  • A special mode to "re-run initialization at run time" exists. Such classes are then initialized at image build time, and again at image run time (so the class initializer runs twice). This mode is not exposed in the supported public API. For a good reason: it can lead to strange side effects when a class initializer has a side effect on static fields of another class.
  • Classes whose class initializer can be proven to not have side effects are automatically initialized at image build time. This is a performance optimization: it safes the initialization at run time and the initialization checks before all static member accesses for such classes. The first round of such initializations happens before static analysis. Another round happens after static analysis, when more virtual method calls can already be de-virtualized and therefore be proven allowed.
  • When a class is initialized at image build time, the class initializer is executed by the Java HotSpot VM that runs the image builder. All object allocations, field modifications, ... directly happen in the HotSpot heap.

Problems with the current approach

  • Debugging why a class marked for run-time initialization was initialized at image build time is quite difficult for users, even with the additional diagnostic options that we provide that try to collect a stack trace that triggered the initialization.
  • Running initialization code at image build time is harder than it should be. We want people to, e.g., load configuration files at image build time. But that requires that the whole configuration file parser (like a JSON or YAML parser) is marked for initialization at build time. It is unrealistic to audit large existing code bases that they are unconditionally safe for such initialization at build time.
  • Automatically initializing classes at image build time is not 100% safe. We analyze the Graal IR that we would put into the image - but due to intrinsifications, that can be different code than what really gets executed when the class initializer runs in the image generator.
  • Some frameworks like Quarkus opt to initialize all classes at image build time. That is highly unsafe, but for them the only realistic option right now. For example, when Quarkus bootstraps Hibernate to find out which parts of Hibernate are necessary at image run time, most classes of Hibernate need to be initialized. Even though the outcome of this bootstrap is just a limited set of configuration objects.
  • Some frameworks like Spring do not want to initialize classes at image build time at all because of the dangers of unintended side effects. For example configuration of logging code.
  • For frameworks like Netty, we tried to push for "initialize as much as possible at image build time" using explicit configurations in Netty. That did not work out overly well, there were always unintended side effects like buffers being sized based on the Java heap size of the image generator.

New less restricted approach

  • All classes are allowed to be used (and therefore be initialized) in the image generator, regardless of the class initialization configuration.
  • For classes that are configured as "initialize at build time", the static fields of the image generator are preserved at image run time.
  • Classes that are configured as "initialize at run time", the class appears as uninitialized again at run time. In particular if the class was already initialized at build time, static fields of the image generator are not preserved at image run time. This is equivalent to the current "re-run initialization at run time".
  • Classes whose class initializer can be proven to not have side effects can no longer be "just initialized by the image builder VM", because we now need to distinguish the build-time initialization state used by build-time code from the "clean state" produced by just running the class initializer. A new optimization approach is necessary and possible.

Compatibility impact

  • All code that is currently properly configured for build-time initialization (i.e., all classes that end up on the image heap are explicitly listed as "initialize at build time") using command line options will continue to work. It will allow users to run more code at image build time, or reduce the number of classes explicitly marked for initialization at build time.

  • In the current system, classes that are proven to be safe for build time initialization are treated the same as classes explicitly configured for build-time initialization. This is no longer possible with the new architecture. If an instance of such a class is in the image heap, then image build will fail with the an error that the class is not allowed in the image heap.

Examples

To make the examples more readable, each class name has a suffix:

  • _InitAtBuildTime: The class is explicitly marked as "initialize at build time" by the user
  • _InitAtRunTime: The class is not marked as initialization at build time and therefore initialized at run time. This is the default.

We use the pseudo-field $$initialized to refer to the class initialization status of a class.

Example 1

class A_InitAtBuildTime {
  static int a = 42;
}

class B_InitAtRunTime {
  static int b = 123;

  static {
    A_InitAtBuildTime.a = A_InitAtBuildTime.a + 1;
  }
}

Assume that B_InitAtRunTime is not used by a Feature and therefore does not get initialized by the image builder VM. A_InitAtBuildTime gets initialized by the image builder VM because of its manual designation, regardless of how it is used at image build time. So in the image builder, the static field values are

A_InitAtBuildTime.$$initialized = true
A_InitAtBuildTime.a = 42

B_InitAtRunTime.$$initialized = false
B_InitAtRunTime.b = 0

The same values are written out into the image heap.

After the class B_InitAtRunTime gets initialized at run time, the static fields have the following values:

A_InitAtBuildTime.$$initialized = true
A_InitAtBuildTime.a = 43

B_InitAtRunTime.$$initialized = true
B_InitAtRunTime.b = 123

Example 2

class A_InitAtBuildTime {
  static int a = 42;
}

class B_InitAtRunTime {
  static int b = 123;

  static {
    A_InitAtBuildTime.a = A_InitAtBuildTime.a + 1;
  }
}

Assume that B_InitAtRunTime is used by a Feature and therefore gets initialized by the image builder VM. So in the image builder, the static field values are

A_InitAtBuildTime.$$initialized = true
A_InitAtBuildTime.a = 43

B_InitAtRunTime.$$initialized = true
B_InitAtRunTime.b = 123

In the written out image, the following static field values are in the image heap:

A_InitAtBuildTime.$$initialized = true
A_InitAtBuildTime.a = 43

B_InitAtRunTime.$$initialized = false
B_InitAtRunTime.b = 0

After the class B_InitAtRunTime gets initialized at run time, the static fields have the following values:

A_InitAtBuildTime.$$initialized = true
A_InitAtBuildTime.a = 44

B_InitAtRunTime.$$initialized = true
B_InitAtRunTime.b = 123

The usage of B_InitAtRunTime at image build time has a bad side effect on A_InitAtBuildTime, whose field got incremented twice. But by explicitly marking A_InitAtBuildTime as initialize-at-build-time, the user has acknowledged that such side effects are understood.

Our recommendation for users should be: Only mark a class for initialization at build time if it does not have 1) any mutable static state (including any mutable data structures reachable from static final fields), and 2) a class initializer that accesses any mutable state of another class.

Example 3

class A_InitAtBuildTime {
  static Object a;
}

class MyFeature implements Feature {
  void beforeAnalysis(...) {
    A_InitAtBuildTime.a = new A_InitAtBuildTime();
  }
}

The feature code runs before static analysis and initializes the static field. So in the image builder, the static field values are

A_InitAtBuildTime.$$initialized = true
A_InitAtBuildTime.a = <A_InitAtBuildTime instance>

This is a correct use of build time initialization: classes that store information computed a build time must be marked as "initialize at build time".

Example 4

class A_InitAtBuildTime {
  static Object a;
}

class B_InitAtRunTime {
}

class MyFeature implements Feature {
  void beforeAnalysis(...) {
    A_InitAtBuildTime.a = new B_InitAtRunTime();
  }
}

The image build fails: The image heap must not only contain instances of classes that are marked as "initialize at build time".

@graalvmbot graalvmbot force-pushed the cwi/GR-39406-new-class-init-review branch from 104d71f to c521f01 Compare June 30, 2022 23:18
@galderz
Copy link
Contributor

galderz commented Jul 1, 2022

@christianwimmer I just gave this a go with the latest Quarkus and I don't see any new failures as a result of this work. FYI I used a Quarkus branch where I set -H:+UseNewExperimentalClassInitialization in the native image command line builder we use.

@graalvmbot graalvmbot force-pushed the cwi/GR-39406-new-class-init-review branch from c521f01 to d90833a Compare July 1, 2022 20:28
@graalvmbot graalvmbot merged commit faeaf3d into master Jul 2, 2022
@graalvmbot graalvmbot deleted the cwi/GR-39406-new-class-init-review branch July 2, 2022 04:58
@fniephaus fniephaus added this to To do in Native Image via automation Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Native Image
  
To do
Development

Successfully merging this pull request may close these issues.

None yet

3 participants