-
Notifications
You must be signed in to change notification settings - Fork 0
CSEC: Precompiled classes #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mattcce
wants to merge
14
commits into
main
Choose a base branch
from
ece-precompiled-classes
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Coverage report
Show new covered files 🐣
Show files with reduced coverage 🔻
Test suite run success1123 tests passing in 64 suites. Report generated by 🧪jest coverage report action from 4d10a34 |
|
Good progress. Test cases currently failing; can you take a look? |
|
Oops — should've marked this one as draft. Still quite a bit of baking left to do for this one. |
Removed the expectation of an empty `Object` class from all tests, since that behaviour is now deprecated. A blank `Object` class will be synthesized instead by utility function.
Previously, the library classes would be injected but the evaluator would not run the program to completion as it was limited by the input target step.
7345635 to
4d10a34
Compare
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on top of #78.
Note to reviewers: This PR appears much, much larger than it actually is because a formatting change was previously applied to the CSE machine files, but was not applied to test files. The test files were formatted in this PR as they were updated here.
Introduction
This feature involves precompiled classes that the CSEC machine uses. This is equivalent to having precompiled library classes offered by a JDK (also commonly referred to as the Java SE API).
Motivation
Library classes and methods implemented in them can be written in Java code. Java programmers have come to naturally expect that common library classes and their methods are readily available. Some of these (in method reference syntax) include:
System.out::printlnObject::hashCodeObject::toStringInstead of injecting these classes as a preamble every time a program is run in the CSE machine, we 'precompile' them, and have them initialised in the environment from the beginning in a single step to avoid extraneous CSE machine states caused by evaluating library classes.
Implementation
Injecting Precompiled Classes
We can 'precompile' Java classes for the CSE machine simply by parsing them and running them as the CSE machine would. The environment that this results in is precisely the initialised environment that should be used for all other programs run in the CSE machine.
We do this before running the main program. The CSEC machine is made to ingest the library classes and thus make them available in the environment before running a new program.
Type Checking Modifications
The type checker must also be able to see such library classes. We achieve this in a similar manner as before, by injecting the type definitions into the type checker. We do so here by just injecting the library code into the type checker before the rest of the program code.
We modify the type checker to ingest any number of programs. It will type check them as though the programs were concatenated one after the other (in order).
We further modify the type checker to use the supplied
Objectclass if provided with one, else it retains its present behaviour.A future PR might look at injecting the definitions directly into the type checker (that is, effectively bypass type checking for library code). This option is more sensible, but may be presently too complicated to be necessary. The intent is to avoid having to do repeated work and have externally imported classes visibly interfere with the running of the CSE machine.
The frontend presently calls the type checker before attempting to run any Java code at all (whether it be through the compiler or the CSE machine). The target of interest is in
frontend/src/commons/utils/JavaHelper.ts. The functionJavaRunhandles calls to the Java CSE machine. Only after the type checker runs successfully will the frontend attempt to run the Java program.This is likely a bad way to do it, because it couples the compiler and CSE machine. A minimal modification has been made to call the CSE machine directly and let the CSE machine handle type checking. This lets us decide what needs to get type-checked. A future PR might look at refactoring this and unifying the different components to use the same library classes.
Note: The library classes implementable in the CSE machine and skeleton classes defined in the JVM (but are presently unimplemented) may genuinely clash. A future PR might look at the feasibility and sensibility of refactoring library classes, and in particular native methods, to be unified across all components. JLS code is no different from normal Java code, so the fact that all of the components presently handle
Objectso specially is concerning (and was a significant source of pain in implementing this feature).As for this particular feature, the type checker can be modified to include the library classes. We isolate the library classes to only the CSE machine.
CSEC Machine Callbacks and Static Lifetimes
In order to implement the target primitives (described below), a number of changes were made to the CSEC Machine's handling of
Contexts. A new field,interfaces, holds all the IO callbacks (stdout,stderr) and new static lifetime items (e.g.LFSR). Correspondingly, various CSEC Machine components have been modified accordingly to propagate this change. Interfaces are presently only used when invoking methods, as only native methods should require and exercise access to items external to the CSEC machine.The need to propagate changes to other components of the CSEC machine uncovers a level of coupling that may be potentially undesirable from a software engineering perspective. It may just be better to pass in the full
Contextto eachCmdEvaluator. (I say potentially because such changes are generally exceedingly rare.)Implemented Primitives
The following primitive functions are supplied as part of this feature.
Object::hashCodeBecause switching states in the CSEC machine always refreshes/reruns the entire program, the hash code generator must be deterministic for a given program, so that hash codes are consistently generated. The easiest way to implement this is to use a PRNG and seed it with the program string (or some constant transformation thereof).
To seed the PRNG with the program string, we first hash it with the FNV1A algorithm (a very simple hashing algorithm). We never have to worry about collisions, because we don't use this hash for anything collision-sensitive at all. This hash is purely just to initialise the PRNG to be used in a deterministic manner.
JS/ECMAScript does not specify that supplied random generators accept seeds. A PRNG is implemented purely for the sole purpose of generating object hash codes. We use an incredibly simple PRNG architecture called a linear feedback shift register (LFSR).
A LFSR is initialised for the
Contextevery time the CSEC machine is run with a program.We skip the mathematical description. The tap selected for the LFSR guarantees that it is of maximal-length. Due to issues with JS' internal representation of numbers, we select a 31-bit maximal-length Fibonacci LFSR with taps at bit 31, 30, 29, and 28.
1 << 31and0, for different reasons.We store the hash code directly on the target that is passed to this method. This is done to mimic JVM implementations. Hash codes are not to be stored as fields on the object instance as the
hashCodemethod is a native method and thus calls a foreign function; this implementation mirrors the recommendation made by the JLS.System.out::printlnThis is an interesting native function to implement. We are forced to properly mimic/mirror the Java class structure used.
System.outis aPrintStreaminstance. In this implementation, we just create a blankPrintStreaminstance and makePrintStream::printlnnative instead.This is purely just so that we can have the same syntax for printing lines. Notably, we do not care about properly building the
Systemclass andPrintStreamobject involved to match that of Java's.Object::toStringWe follow the JLS for this method:
As previously explained, this relies on the behaviour of
hashCode. We do not yet have reflection in the CSE machine, so we implement this as a native method (even though, of course, this could be implemented entirely in Java).