Skip to content

Roadmap to Jacobin source code

Andrew Binstock edited this page May 3, 2022 · 24 revisions

This is a roadmap to the source code for Jacobin as of March 2022. (The symbol ❒ indicates work to be done)

The JVM consists of a handful of fundamental parts:

  • a command-line parser
  • a classloader
  • an execution engine
  • garbage collection (GC) This is handled by go's built-in GC.
  • many libraries for doing additional tasks

In a program in another language, the source tree would reflect this division of labor. However, go is persnickety about package layouts and will refuse to compile if packages have mutual dependencies. So the source tree is divided in part according to the preceding and in part based on what's possible in a go project structure.

Command-line parsing

Jacobin handles a subset of the JVM's command-line options. Currently, it accepts many basic options (those that are displayed with java -help, identifies the class name, and captures/passes in any program arguments. It also accepts command-line options from environmental variables, as described here.

The initial handling of the command-line interface (CLI) is in jvm/cli.go The parsing of JVM options and setting of corresponding switches is done in jvm/option_table_loader.go. This file loads a table with all JVM options that Jacobin responds to and for each entry includes a first-class function that is executed when the option is specified. Instructions int option_table_loader.go explain how to add switches when they are supported in Jacobin.

❒ Jacobin expects a class name to end in .class, whereas the OpenJDK JVM expects no extension. Follow the JVM convention. (required)\
❒ Support for JAR (and EAR, WAR) files. (required)
❒ Support for the -cp and -classpath options (required)
❒ Separate parsing routines for switches that begin with the -X: and -XX: prefixes (not urgent)

Classloader

A classloader consists of a utility that parses a class file into useful fields, does some format checking, and places the parsed and checked data into the method area of the JVM. Jacobin's classloading is done in the classloader module. The heavy lifting is done by classloader.go, which begins with many of the data structures used in classloading and then has a variety of different functions for loading a class. The most important of these is loadClassFromFile().

loadClassFromFile() calls parseAndLoadClass(). This function calls the class-file parser (parse() in parser.go), then format-checks the parsed file via a call to formatCheckClass() in formatCheck.go, and if all is successful, posts the parsed and format-checked data to the method area.

In addition to loading the classes of the app, Jacobin (like the OpenJDK JVM) pre-loads a series of base classes from the JDK, which cover basic Java functionality. The location of those classes is specified by the JACOBIN_HOME environmental variable. This is handled by LoadBaseClasses() in classloader.go.

Another set of classes is preloaded, namely the classes referenced by the main application class. These are preloaded by LoadReferencedClasses() in classloader.go. It is called at program start-up by JVMrun() in jvmStart.go, which is the main line that drives Jacobin. This preloading is done in parallel with the execution of the main class, via the use of a channel and a wait group. (Note: originally this parallelization was a proof of concept and the idea was to also parallelize the preloading of the base classes. However, on standard hardware, the roughly 1400 base classes load in Jacobin in less than 500ms, so it's no clear that parallelizing would add much in terms of performance, although it's certainly something to explore whenever optimization becomes a primary goal.)

Parsing the class file, as mentioned above, is driven by parse() in parser.go. The class file consists of three main parts: some basic fields, the constant pool, and the attribute area. Java methods are attributes and found in the attribute area. parse() parses the initial fields. The constant pool (uniformly referred to in the code as CP) is parsed in cpParser.go At present, the parsing of the CP should be complete and all Java record types are parsed as completely as needed for Jacobin. The code, however, is cumbersome due to go's lack of support for generics (even the basic generics in go 1.18 don't satisfy the needs of the CP parser). If need arises, this page can be expanded to explain how the CP data is parsed and, especially, how its data is stored for fast access.

Methods are complex attributes in Java. They are parsed by ParseMethods() in classloader/methodParser.go. Despite being attributes themselves, methods have attributes. These are incompletely parsed at present--sufficiently for Jacobin to function properly, but not enough to provide advanced features, such as debugging, etc.

Several class-file attributes of at present secondary importance are not parsed. These include attributes relating to annotations, modules, packages, etc.

The format-checking performed by the Jacobin classloader both exceeds the JVM spec in some ways and is incomplete in other ways. The goal of format checking is to make sure that the class file has not been accidentally altered. It's an integrity check done by validating data fields against each other, validating sequences of items, data ranges, etc. It's required of all classes. Validation, in JVM terms, is a separate process that entails more advanced checking to make sure that there is no malicious insertion of code. Per the JVM spec, class files are validated at various points in the initialization process (except for classes taken from the JDK/JRE, which are presumed to not be malicious.) Jacobin presently does very few of these checks and assumes any class it executes is legitimate.

❒ pre-load classes from Java SDK, rather than from unzipped classes in JACOBIN_HOME (required). This will also close GitHub issue #7
❒ parse all remaining class and method attributes (eventually, not a priority)
❒ plan out the extent to which initialization, linking, and preparation steps need to be done and how to best sequence them (priority)

Java classes implemented in go

The classloader contains several Java classes that have been partially implemented in go. For example, javaPrintStream.go These classes are explained here. When developing a JVM from scratch, it's beneficial to implement some Java classes in the development language. For example, println() in Java is a complex set of operations. Because an incipient JVM engine might not have all the capabilities to perform these operations, yet still want to print messages to the console, it's customary to write println() in the JVM implementation language. This was done in javaPrintStream.go, which implements various forms of println(). The comments in that file explain the way this is implemented.

The Java classes implemented in go are referred to as go-style functions. (They would normally be called native functions, but that is a term of art in Java, with a different meaning. Calling them go-type functions doesn't work well either, as type is an already overloaded word. Style might not be the permanent way of referring to this. In fact, the open issue JACOBIN-131 refers to this naming problem for methods.)

Go-style functions are loaded into Jacobin's method table (generally referred to as the MTable, or MT). The call to load the MTable with the go-style functions is made from the StartExec() function, which is where the actual bytecode execution begins. This is located in the jvm package in run.go. The MTable is part of the execution engine but is in the classloader package due to dependency requirements that go imposes. The MTable holds the name of a function in fully qualified form (e.g., java/lang/System.currentTimeMillis()), a pointer to the function body, and a flag stating whether the function is Java-style or go-style. The first time a function is searched for by the execution engine, a pointer to it is placed in the MTable. On future look-ups, the engine checks the MTable first before doing the slower search through the class data. Because the engine always checks the MTable for a method, inserting the go-style methods into it guarantees that they, rather than their Java counterparts, will always be executed.

The use of go-style substitutes for Java functions should also have performance benefits--in essence, it acts as a method cache--and so it's likely to continue in use even after the initial stages of Jacobin are complete.

Note that this MTable is JVM-wide: all executed functions are cached in it. This is not how the MTable is usually implemented. More commonly, it's a per-class structure holding the functions addresses for that class and for all the methods in that class's superclasses. This form of the MTable enables a look up of all possible methods a class can execute without forcing the engine to climb the hierarchy of superclasses.

Eventually, when the class-specific MTable equivalent is added to Jacobin, it might make sense to get rid of the present MTable setup. It's not clear what exactly should be done and what the performance impacts are/would be. This will need to be addressed when handling task JACOBIN-120 (calling a superclass method), but it might well deserve its own entry in the task list.

❒ Design per-class MTables when implementing calls to superclass methods
❒ Decide what to about the two kinds of MTables
❒ If the universal MTable model is retained, consider preloading all the main() class's methods into it, per JACOBIN-121

Execution engine

The Jacobin execution, like the OpenJDK's, consists of one or more executing threads, each operating on a stack of frames, each of which represents a call to a method. Each method is executed by sequentially executing the bytecodes. If the method calls another method, a new frame is created and pushed onto the frame stack and execution begins on the bytecode of the called method. When the called method exits, the frame is popped off the stack and execution resumes on the bytecode after the method call. When all the frames in the stack of the main() thread have been popped off, the program exits and the JVM shuts down.

In Jacobin, the execution engine's parts (except for the MTable) are primarily in the jvm module, with some elements also located in the thread module and the frames module.

Threads

Threads are simple data structures, defined in jvmThread.go containing a few fields of metadata and a pointer to the frame stack, known as the JVM stack. Threads are created and then added to the a linked list of threads globals.Threads in globals.go in the globals module, which is simply a repository for select global data. The MainThread (the one with the main() method is created in the StartExec() function in jvm\run.go.

Frames

Frames are more complex data structures, defined in frames.go. They contain all the needed execution environment for the present executing method. In addition to expected metadata (thread number, method name, pointer to the CP for this class, etc.), a frame contains the local variables, the operand stack, and the program counter (PC), which holds the location of the currently executing bytecode.

The local variables and the nodes of the operand stack in Jacobin are all 64 bits wide. In the OpenJDK, these values are all 32 bits wide. Using 64 bits avoids some complexities. For example, all primitives as well as pointers to objects can fit in the same-sized value. One size fits all, making many operations straightforward; for example, passing arguments to methods: In the handling of go-style functions, the functions are passed an array of 64-bit values, which can be zero-sized if there are no args to pass in.

This design avoids a problem facing 64-bit values on the OpenJDK's operand stacks, which take up two 32-bit slots. To get the value, the two slots have to be soldered together. In go, that's not an easy or fast process. The 64-bit Jacobin approach does require care in certain operations involving longs and doubles so that they match what the OpenJDK does.

Frames in Jacobin each have their own program counter (PC). This is a slight variation from the OpenJDK JVM, which uses a single program counter in the thread metadata and updates it each time a frame is popped or removed. Keeping the PC with the frame makes tracking easier.

The frame stack is not implemented as a stack, but as a linked list. The current frame is always the head node. New frames are pushed to the head and deleted there, which makes the next node the head node. It is defined in frames.go and called for the main thread in StartExec() in jvm\run.go

Bytecode execution

When a method is loaded into a frame, bytecodes are executed sequentially unless there are jump instructions. The execution is done via a large switch statement in runFrame() in jvm\run.go with one branch for each bytecode. These operations are largely unremarkable, except for method invocation.

When a method is invoked (see invokestatic or invokevirtual bytecodes), the method is looked up in the MTable. If it's a go-style function, runGmethod() in goFunctionExect.go is called. This creates a new frame for the function, pushes it onto the frame stack, and begins execution of the go-style function. This execution is still done via the runFrame() method, which identifies the frame as pertaining to a go-style function and calls runGframe() in goFunctionExect.go. It returns any return value from the called function and an error code.

If the function returns a value, it is place on the operand stack of the calling function. This is done in runFrame(). Essentially, the code reaches into the next lowest frame in the frame stack and places the returned value on the operand stack. This is done with this code (where fs is the frame stack):

if retval != nil {
f = fs.Front().Next().Value.(*frames.Frame)
push(f, retval.(int64))
}

Note the benefit of normalizing values to 64 bits. Java functions (and hence go-style functions) can return at most one value. By having all values normalized to 64 bits, any returned value can be pushed without regard to its type.

At present, all go-style functions are terminal functions, in that they do not call other JVM-based functions, whether Java-style functions or go-style. So, when the go-style function returns, the frame is popped off the stack and processing resumes at the next bytecode after the original method call in runFrame().

If the method being called is a Java-style function (the majority of cases), the logic takes place directly in the switch statement's section for the invoking bytecode. A new frame is created and any passed-in arguments are popped off the operand stack of the calling function and placed into the called method's local variables. runFrame() is then called (this is a recursive call) and frame handling as well as returned variables are handled as they are with go-style functions described above.

At present, Jacobin does not handle exceptions, so execution proceeds safely and predictably as described here. Test program Hello3.java is a proof of concept of multiple levels of method calls.

❒ Implement invokespecial
❒ Implement remaining bytecodes
❒ Implement exceptions

Testing

From the very start, Jacobin has aimed for super-high reliability. It's goal is to fail only ways the OpenJDK would fail. Consequently, it has a large testbed, currently at 2x the size of the codebase. In all the modules, there are go's unit test files, which end with the _test.go suffix. Some of these files are fairly substantial. For example, formatCheck_test.go is nearly 1900 lines long and contains a detailed inventory of the tests in the comments at the head of the file. Many of these files redirect stderr and stdout and test for correct messages to users or correct logged data in order to validate the test's success or failure.

Some code coverage metrics are generated by go. Those metrics cover only the unit tests.

Beyond the unit tests are Jacobin's integration tests, which test a run of Jacobin against a single Java class. These tests are located in the wholeClassTests module. They rely on an executable version of Jacobin being located in the \src directory, which is called as a spawned process. These tests typically run Jacobin several times against a given class file found in the testdata folder. The source code for those Java classes is included as a comment in the test. For example, Hello3_test.go runs Jacobin against the class Hello3.class in \testclass. The Java source code for Hello3.class is found in the comments in Hello3_test.go. On each test run, different logging levels are set, and the data and output are verified. These tests take a long time to run. To run the Jacobin tests without running these integration tests, use the -short option on the command line. (Note that -short is an option in the standard go test framework.) For example:

go test -short .\...

Eventually, Jacobin will be tested against standard JVM test suites, likely starting with the Malva test suite, Eclipse Aqavit and progressing to the Oracle TCK test suites.

[Note: this description is accurate as of March 2022.]