Initial class init handling proposal #458

DanHeidinga · 2021-04-16T15:14:20Z

Some background and thoughts on how to handle class initialization in qbicc

Signed-off-by: Dan Heidinga heidinga@redhat.com

Signed-off-by: Dan Heidinga <heidinga@redhat.com>

dmlloyd · 2021-04-19T16:55:04Z

docs/ClassInitialization.adoc

+Placing a check in front of every such access would result in a lot of overhead.  We can use some basic heuristics to limit where the init
+checks are needed.
+
+Heuristics:


One behavior of SVM that Quarkus takes extensive advantage of is that it runs all run-time initializations eagerly (but still in dependency order) at application startup. This avoids any requirement for class initialization checks in the runtime image of any kind. It also prevents nearly any possibility of class initialization deadlock. I think it may make sense to consider retaining this behavior.

My understanding was that SVM flipped the default from buildtime-init to runtime-init, though they still have some (limited?) auto-detection of "safe" <clinit>, and provide options to allow the user to control when a class is init'ed. I'll add a section in here about user control of when to run <clinit>.

Yes, they did change the default. Initializing all classes (or as many as possible) at build time represents an ideal in many respects, yielding the minimum startup time (since startup is when the run time initialization occurs with SVM) and maximizing the potential for optimizations such as constant folding, partial evaluation, etc., and consequent dead code elimination opportunities. Quarkus relies heavily on build time initialization for this reason.

The move to run-time initialization by default was not driven by idealism but pragmatism. The performance and density benefits of build-time initialization may be undeniable, but the JDK and many libraries require substantial changes to be compatible with the SVM paradigm of choosing an initialization strategy on a per-class basis. It was determined for various reasons to align more with the JDK as it exists today. However this change brought with it a new problem: any class that depends (directly or indirectly) upon a run-time-initialized class must itself be initialized at run time, cascading through the dependency tree.

In the context of Leyden there may exist more possibilities. One idea would be to divide class initialization of every class into two stages - build and run time - thereby eliminating the cascading dependency problem. In a hypothetical language enhancement, one might imagine a new and separate static initialization block which applies "late" at run time in the event that the program is compiled into an image. Static fields could then be divided into build time versus run time. The former would be accessible to all initialization stages, but the latter would only be accessible at run time. Field-level reinitialization is also a possibility, where a field can be initialized in both static initialization stages.

Within the remit of qbicc, we could try to implement this kind of concept using annotations to mark a field as run-time initialized or run-time re-initialized, with the default being build-time initialized. We could analyze the program graph of each initializer and divide their graph into build time and run time subgraphs. Any node with side-effects would be assigned to either one or the other graph; in the case where a single ordered node is used in both build-time-only and run-time-only contexts, we could raise a warning or even an error. Any access of a run-time field at build time would result in a compilation error.

In terms of implementation, by segregating the program graph we can divide the initialization nodes into two. The build-time nodes would remain within the initializer element, to be executed by the interpreter. The run-time nodes would be recorded to be appended to a run-time initialization startup sequence - for example, they could be added to the constructor of the static field object (such as what is proposed in #457), or a function that is executed as a part of an ELF/etc. object file constructor on library or executable load.

The move to run-time initialization by default was not driven by idealism but pragmatism.

Apart from the JDK libraries, user libraries will be subject to the same pragmatic constraints. Quarkus uses its extensions to address these libraries externally when the library can't be updated. Even with this, there will always be code that cannot be initialized at build time - from RandomNumberGenerators, to System properties, to processor counts for sizing threadpools, and even just creating Threads.

There's no escaping the need to support both runtime and build time class initialization, for either some <clinit> or at least some fields.

However this change brought with it a new problem: any class that depends (directly or indirectly) upon a run-time-initialized class must itself be initialized at run time, cascading through the dependency tree.

Right, build time init depends on the super classes and super-interfaces with default methods also being build-time init'd.

One idea would be to divide class initialization of every class into two stages - build and run time - thereby eliminating the cascading dependency problem.

Great minds think alike: I've proposed variations of this in multiple forums - adding a <static-clinit> method that can always be run ahead of time by static compilers while dynamic JVMs always run <static-clinit> before the regular <clinit>. It's a good path forward - and we should try to start on it - but will take time to ripple out through the ecosystem.

qbicc / Leyden / others will still need to allow the user to control compile-time/runtime <clinit> decisions and attempt to determine which <clinit> are build time safe when working with today's classfiles.

We need to build those capabilities in now even if we intend to push the dial as far to compile-time as possible.

Within the remit of qbicc, we could try to implement this kind of concept using annotations to mark a field as run-time initialized or run-time re-initialized ....

This an interesting approach to explore. I'd like to build the conservatively correct runtime<clinit> model and then gradually migrate to something like this so that we know we have all the tools in place to handle the different usecases: annotated fields, commandline options for {build/run}time, and "safe" analysis for un-updated classes.

In terms of implementation, by segregating the program graph we can divide the initialization nodes into two. The build-time nodes would remain within the initializer element, to be executed by the interpreter. The run-time nodes would be recorded to be appended to a run-time initialization startup sequence - for example, they could be added to the constructor of the static field object (such as what is proposed in #457), or a function that is executed as a part of an ELF/etc. object file constructor on library or executable load.

I wouldn't be surprised to find that forcibly initializing classes at startup has a negative effect on time-to-first response, especially in larger applications. We've seen in other experiments that initializing classes that are only used on conditionally executed code paths (ie: if a certain type of request comes in) at startup can hurt the time-to-first response for the common case. It's best to init at build time, and failing that to let application behaviour drive the initialization as some <clinit> that are expensive wouldn't otherwise run.

I wouldn't be surprised to find that forcibly initializing classes at startup has a negative effect on time-to-first response, especially in larger applications. We've seen in other experiments that initializing classes that are only used on conditionally executed code paths (ie: if a certain type of request comes in) at startup can hurt the time-to-first response for the common case. It's best to init at build time, and failing that to let application behaviour drive the initialization as some <clinit> that are expensive wouldn't otherwise run.

We might have some data from Quarkus on this point. I'll see what I can find.

I wouldn't be surprised to find that forcibly initializing classes at startup has a negative effect on time-to-first response, especially in larger applications. We've seen in other experiments that initializing classes that are only used on conditionally executed code paths (ie: if a certain type of request comes in) at startup can hurt the time-to-first response for the common case. It's best to init at build time, and failing that to let application behaviour drive the initialization as some <clinit> that are expensive wouldn't otherwise run.

We might have some data from Quarkus on this point. I'll see what I can find.

We don't presently have data about what fraction of startup is spent in static initialization; startup time of native images has never been consistently significant enough to really determine this information (which is maybe enough to come to a conclusion already, since any fraction of a small number is a small number). However, our Quarkus performance expert is looking at ways to instrument just the class initialization part of native image execution to see if it is possible to glean some useful numbers.

The Quarkus presumption is that, in an executable which has already gone through aggressive DCE, the percentage of classes which are configured for run time initialization yet which are not on the hot path of execution (and thus would not be necessary for the first request) would be likely to be small. If class init were purely lazy, and most of the classes were initialized to handle a first request, it's possible that the overhead of the checks would end up being higher than the cost of aggressively initializing the extra classes.

Another difference to consider is that eager initialization would be completely single-threaded, barring asynchronous thread creation from a static initializer. The advantage (as I mentioned before) is that this generally precludes class init deadlocks, and it also brings a level of determinism which could manifest in various ways (such as a more consistent startup and response time during early application execution). However it's possible that opportunistic concurrent initialization would be more performant overall in certain situations.

However, the ability for a static initializer to spawn a thread could invalidate the correctness of certain assumptions about eager initialization, as such a thread could observe a class that is not yet initialized in a way that violates the JLS.

You're example of spawning a thread from a <clinit> is key. Once that happens, all other static field access, static method calls, or allocations need to check if the class being accessed is initialized or not. Which basically degenerates to needing class init checks on all such accesses at runtime.

There are ways to optimize the placement of those checks but they need to be present once runtime <clinit> is possible.

I'll add a section to the document that talks about batching up some set of the runtime <clinit> methods and running them immediately. We need to think through whether it can be all runtime <clinit> methods (default thought - probably not due to the Thread example) and how that can further remove the need for runtime init checks. This is very similar to what Multi-Tenant JVM did to ensure that critical classes - Object, Class, String, etc - where pre-initialized by the interpreter on tenant creation so that JITTed code never did the checks for those classes and was broadly shareable.

Added description of this in 05923f1

Signed-off-by: Dan Heidinga <heidinga@redhat.com>

Initial class init handling proposal

95c8bd0

Signed-off-by: Dan Heidinga <heidinga@redhat.com>

DanHeidinga added ⚠️ WIP / draft Design document 📝 A design document/proposal labels Apr 16, 2021

dmlloyd reviewed Apr 19, 2021

View reviewed changes

DanHeidinga added 2 commits April 19, 2021 13:12

Add mention of user commandlines to control when to perform <clinit>

dbc4381

Signed-off-by: Dan Heidinga <heidinga@redhat.com>

Describe immediate batched <clinit> calls and potential concern

05923f1

Signed-off-by: Dan Heidinga <heidinga@redhat.com>

DanHeidinga mentioned this pull request Apr 22, 2021

Some groundwork for support <clinit> #467

Merged

Clarify where state should live - heap vs native

e770a89

Signed-off-by: Dan Heidinga <heidinga@redhat.com>

DanHeidinga mentioned this pull request May 6, 2021

More <clinit> groundwork #501

Closed

dmlloyd marked this pull request as draft May 20, 2021 13:48

dmlloyd removed the ⚠️ WIP / draft label May 20, 2021

DanHeidinga closed this May 25, 2021

This was referenced May 25, 2021

More <clinit> groundwork #530

Merged

Initial class init handling proposal #532

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial class init handling proposal #458

Initial class init handling proposal #458

DanHeidinga commented Apr 16, 2021

dmlloyd Apr 19, 2021

DanHeidinga Apr 19, 2021

dmlloyd Apr 20, 2021

DanHeidinga Apr 20, 2021

dmlloyd Apr 20, 2021

dmlloyd Apr 20, 2021

dmlloyd Apr 20, 2021

DanHeidinga Apr 20, 2021

DanHeidinga Apr 20, 2021

Initial class init handling proposal #458

Initial class init handling proposal #458

Conversation

DanHeidinga commented Apr 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment