Skip to content

Conversation

@mattcce
Copy link
Contributor

@mattcce mattcce commented Jul 30, 2025

Based on top of #78.

Note to reviewers: This PR appears much, much larger than it actually is because a formatting change was previously applied to the CSE machine files, but was not applied to test files. The test files were formatted in this PR as they were updated here.

Introduction

This feature involves precompiled classes that the CSEC machine uses. This is equivalent to having precompiled library classes offered by a JDK (also commonly referred to as the Java SE API).

Motivation

Library classes and methods implemented in them can be written in Java code. Java programmers have come to naturally expect that common library classes and their methods are readily available. Some of these (in method reference syntax) include:

  • System.out::println
  • Object::hashCode
  • Object::toString

Instead of injecting these classes as a preamble every time a program is run in the CSE machine, we 'precompile' them, and have them initialised in the environment from the beginning in a single step to avoid extraneous CSE machine states caused by evaluating library classes.

Implementation

Injecting Precompiled Classes

We can 'precompile' Java classes for the CSE machine simply by parsing them and running them as the CSE machine would. The environment that this results in is precisely the initialised environment that should be used for all other programs run in the CSE machine.

We do this before running the main program. The CSEC machine is made to ingest the library classes and thus make them available in the environment before running a new program.

Type Checking Modifications

The type checker must also be able to see such library classes. We achieve this in a similar manner as before, by injecting the type definitions into the type checker. We do so here by just injecting the library code into the type checker before the rest of the program code.

We modify the type checker to ingest any number of programs. It will type check them as though the programs were concatenated one after the other (in order).

We further modify the type checker to use the supplied Object class if provided with one, else it retains its present behaviour.

A future PR might look at injecting the definitions directly into the type checker (that is, effectively bypass type checking for library code). This option is more sensible, but may be presently too complicated to be necessary. The intent is to avoid having to do repeated work and have externally imported classes visibly interfere with the running of the CSE machine.

The frontend presently calls the type checker before attempting to run any Java code at all (whether it be through the compiler or the CSE machine). The target of interest is in frontend/src/commons/utils/JavaHelper.ts. The function JavaRun handles calls to the Java CSE machine. Only after the type checker runs successfully will the frontend attempt to run the Java program.

This is likely a bad way to do it, because it couples the compiler and CSE machine. A minimal modification has been made to call the CSE machine directly and let the CSE machine handle type checking. This lets us decide what needs to get type-checked. A future PR might look at refactoring this and unifying the different components to use the same library classes.

Note: The library classes implementable in the CSE machine and skeleton classes defined in the JVM (but are presently unimplemented) may genuinely clash. A future PR might look at the feasibility and sensibility of refactoring library classes, and in particular native methods, to be unified across all components. JLS code is no different from normal Java code, so the fact that all of the components presently handle Object so specially is concerning (and was a significant source of pain in implementing this feature).

As for this particular feature, the type checker can be modified to include the library classes. We isolate the library classes to only the CSE machine.

CSEC Machine Callbacks and Static Lifetimes

In order to implement the target primitives (described below), a number of changes were made to the CSEC Machine's handling of Contexts. A new field, interfaces, holds all the IO callbacks (stdout, stderr) and new static lifetime items (e.g. LFSR). Correspondingly, various CSEC Machine components have been modified accordingly to propagate this change. Interfaces are presently only used when invoking methods, as only native methods should require and exercise access to items external to the CSEC machine.

The need to propagate changes to other components of the CSEC machine uncovers a level of coupling that may be potentially undesirable from a software engineering perspective. It may just be better to pass in the full Context to each CmdEvaluator. (I say potentially because such changes are generally exceedingly rare.)


Implemented Primitives

The following primitive functions are supplied as part of this feature.

Object::hashCode

Because switching states in the CSEC machine always refreshes/reruns the entire program, the hash code generator must be deterministic for a given program, so that hash codes are consistently generated. The easiest way to implement this is to use a PRNG and seed it with the program string (or some constant transformation thereof).

To seed the PRNG with the program string, we first hash it with the FNV1A algorithm (a very simple hashing algorithm). We never have to worry about collisions, because we don't use this hash for anything collision-sensitive at all. This hash is purely just to initialise the PRNG to be used in a deterministic manner.

JS/ECMAScript does not specify that supplied random generators accept seeds. A PRNG is implemented purely for the sole purpose of generating object hash codes. We use an incredibly simple PRNG architecture called a linear feedback shift register (LFSR).

A LFSR is initialised for the Context every time the CSEC machine is run with a program.

We skip the mathematical description. The tap selected for the LFSR guarantees that it is of maximal-length. Due to issues with JS' internal representation of numbers, we select a 31-bit maximal-length Fibonacci LFSR with taps at bit 31, 30, 29, and 28.

  • In exceedingly rare cases, these functions may fail. In particular, the LFSR fails for the values 1 << 31 and 0, for different reasons.

We store the hash code directly on the target that is passed to this method. This is done to mimic JVM implementations. Hash codes are not to be stored as fields on the object instance as the hashCode method is a native method and thus calls a foreign function; this implementation mirrors the recommendation made by the JLS.

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

System.out::println

This is an interesting native function to implement. We are forced to properly mimic/mirror the Java class structure used.

System.out is a PrintStream instance. In this implementation, we just create a blank PrintStream instance and make PrintStream::println native instead.

This is purely just so that we can have the same syntax for printing lines. Notably, we do not care about properly building the System class and PrintStream object involved to match that of Java's.

Object::toString

We follow the JLS for this method:

getClass().getName() + '@' + Integer.toHexString(hashCode())

As previously explained, this relies on the behaviour of hashCode. We do not yet have reflection in the CSE machine, so we implement this as a native method (even though, of course, this could be implemented entirely in Java).

@mattcce mattcce self-assigned this Jul 30, 2025
@mattcce mattcce added the enhancement New feature or request label Jul 30, 2025
@github-actions
Copy link

github-actions bot commented Jul 30, 2025

Coverage report

St.
Category Percentage Covered / Total
🟡 Statements
73.65% (-0.3% 🔻)
7260/9858
🟡 Branches
60.41% (+0.01% 🔼)
2383/3945
🟡 Functions
68.94% (-0.52% 🔻)
1296/1880
🟡 Lines
74.52% (-0.37% 🔻)
6827/9161
Show new covered files 🐣
St.
File Statements Branches Functions Lines
🔴
... / natives.ts
17.39% 0% 0% 17.39%
🟡 ec-evaluator/lib.ts 62.5% 100% 66.67% 60%
🔴
... / index.ts
59.18% 60% 37.5% 53.49%
🔴 types/index.ts 35% 0% 0% 33.33%
Show files with reduced coverage 🔻
St.
File Statements Branches Functions Lines
🔴
... / errors.ts
42.03% (-1.05% 🔻)
27.27%
14.71% (-0.92% 🔻)
43.94% (-1.22% 🔻)
🟢
... / interpreter.ts
98.77% (-0.27% 🔻)
91.36% (-0.53% 🔻)
98.15%
98.68% (-0.29% 🔻)
🟢
... / nodeCreator.ts
96.77% (-3.23% 🔻)
100%
90% (-10% 🔻)
100%
🟢
... / index.ts
73.85% (-0.52% 🔻)
52.38% (-0.63% 🔻)
95.83% (+0.18% 🔼)
83.41% (-0.84% 🔻)
🟡 types/errors.ts
63.64% (-0.88% 🔻)
0%
41.38% (-3.07% 🔻)
65.08% (-1.02% 🔻)
🟢
... / prechecks.ts
81.58% (-1.52% 🔻)
55.88% (-2.18% 🔻)
100%
86.15% (-2.18% 🔻)
🔴
... / extractor.ts
48.65% (-0.13% 🔻)
36.97% (-0.09% 🔻)
45.28%
52.81% (-0.17% 🔻)

Test suite run success

1123 tests passing in 64 suites.

Report generated by 🧪jest coverage report action from 4d10a34

@martin-henz
Copy link
Member

Good progress. Test cases currently failing; can you take a look?

@mattcce
Copy link
Contributor Author

mattcce commented Jul 31, 2025

Oops — should've marked this one as draft. Still quite a bit of baking left to do for this one.

@mattcce mattcce marked this pull request as draft July 31, 2025 06:38
@mattcce mattcce mentioned this pull request Sep 3, 2025
2 tasks
@mattcce mattcce marked this pull request as ready for review October 17, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants