Skip to content

Jp/wip/epm#140

Merged
melkyades merged 42 commits intomainfrom
jp/wip/epm
May 4, 2026
Merged

Jp/wip/epm#140
melkyades merged 42 commits intomainfrom
jp/wip/epm

Conversation

@melkyades
Copy link
Copy Markdown
Contributor

jp/wip/epm — EPM, kernel/runtime fixes, and bootstrap-from-source

This branch lands a long line of work around the new EPM (Egg Package Manager)
module, the supporting kernel/VM/runtime fixes that bootstrap from source needs,
and a batch of docs + test infrastructure.

Highlights

EPM and module loading

  • [modules/epm] — new EPM module with config-driven workflows
    (new/init/start/dev/install/list/test), built on top of...
  • [argparser] — new ArgParser module (commands, options, parse results).
  • [modules/toml] — TOML parser/writer with tests (used for EPM project config).
  • [loader] — support dotted module names and load-from-path.
  • [kernel/host] — add filesystem & env primitives (writeFile,
    createDirectory, pathExists, currentDirectory, getEnv,
    loadModuleFromPath) and search-path-driven HostSystem>>load:.

Kernel & VM

  • [kernel]ProtoObject _uLongAtOffset:/_uLongAtValidOffset:put:,
    WideSymbol class>>intern: arg fix, ReadStream peek refactor, property
    table fix.
  • [vm]SmallInteger arithmetic/bitwise overflow fallback,
    Float>>timesTwoPower: primitive fix.
  • [runtime] small fixes: debugRuntime, Character,
    loadModuleFromPath_; GC-protect DynamicSymbolProvider symbol table.
  • [runtime/allocator] fix commitMemoryUpTo_ length calculation.
  • [runtime/primitive] fix StringReplaceFromToWithStartingAt bounds.
  • [runtime/evaluator] warn instead of aborting on missing primitive
    (so bootstrap can proceed when a host primitive is unwired).
  • [modules/ston] initialize STONWriter on module load.

Compiler

  • [compiler] LargeInteger as little-endian bytes; parser cleanup;
    bootstrap helpers.
  • [compiler/inliner] do not inline any cascade message (fixes a segfault
    triggered by expr foo ifTrue: [...] ; bar).

Bootstrap-from-source

  • [bootstrap/tonel] support Extension type, comments, method category,
    and $-escaped delimiters.
  • [bootstrap] extract MethodDictBuilder abstraction.
  • [runtime/bootstrap] add Catch2 tests for parser, source loader,
    and compilation pipeline.
  • [compiler/tests] new Compiler.Tests module with SmalltalkScanner
    tests.

Tonel / SUnit

  • [tonel] fix block reader, sort methods, write class comment, add tests.
  • [sunit] TestSuite class>>forModule:, TestSuite>>runDebug for
    log-as-you-go runs.

Misc

  • [random] new PCG XSH-RR PRNG module.
  • [bare-tests] add bitwise/shift regression tests and runnable main:,
    including test182BitShiftRightFullWidth.
  • [build] default Debug to -O1 for a usable VM speed.
  • [runtime/build] migrate conanfile.txtconanfile.py (pins
    cppstd=20, declares catch2/2.13.10); drop -s compiler.cppstd=20
    from the Makefile.
  • [docs] extract coding-style rules to CODING_STYLE.md, add C++
    coding-style guide, runtime/Bootstrap/Compiler READMEs, refresh
    build/run instructions; fix commit-tag format example.
  • [chore] drop committed .vscode/settings.json; ignore build dirs.

Testing

  • Compiler.Tests
  • compiler_tests (Catch2):
  • bootstrapper_parser_tests (Catch2):
  • ./egg TinyBenchmarks runs cleanly.

melkyades added 30 commits April 3, 2026 22:44
Initial implementation of a CLI argument-parsing module used by EPM.

  * ArgParser: top-level parser with global options and named commands;
    extracts globals, dispatches to a Command and runs its action block.
  * Command: per-command options, positionals, subcommands, defaults,
    and required-option validation.
  * Option: short/long flag, value/no-value, default and required.
  * ParseResult: bag of parsed options + positionals + selected command.
  * ArgParserModule: imports Error and OrderedDictionary from Kernel.
Initial EPM (Egg Package Manager) entry-point module exposing a single
`test <module-name>` command that loads <name>.Tests and runs it,
optionally under a debug suite that does not swallow exceptions.

  * EPMModule>>main: wires up an ArgParser and dispatches.
  * commandTest: loads <name>.Tests and either runs main: or, with
    --debug, builds a TestSuite from its TestCase subclasses and runs
    it via runDebug.
  * Imports Kernel, ArgParser and SUnit.
A PCG XSH-RR pseudo-random number generator with 64-bit internal state,
ported from Pharo's Random class and M.E. O'Neill's PCG implementation
(MIT / Apache-2.0).

  * Random: float and integer generators (next, next:, nextInteger:,
    nextBetween:and:, nextIntegerBetween:and:); state held in a 1-elem
    Array mutated in place by primitiveRandomNumber:; seed: resets both
    the seed ivar and the internal state for save/restore.
  * RandomModule: module spec and imports.
  * Tests/RandomTest + Tests/TestsModule: SUnit tests covering range,
    determinism with fixed seed, and bulk generation.
…tests

TonelReader>>nextBlock rewritten as a clearer nested-bracket scanner that
treats $$ as an escape (so $$, $[ and $] inside source no longer skew the
nesting count) and raises an error on unterminated blocks instead of
spinning past end-of-stream.

TonelReader>>readComments now stores the parsed class comment in the
spec under #comment so it survives a round-trip.

TonelWriter:
  * sortedMethods groups class-side methods first, then instance-side,
    each sorted by selector, and writeMethods uses it for stable output.
  * position:afterSelector: replaces the missing Stream>>nextKeyword
    call with a local skipIdentifierIn: helper that skips alphanumeric
    and underscore characters (matching the original semantics).
  * writeComments emits "..." when the class has a non-empty comment.
  * writeMethod: uses sourceObject (guarding against nil) so methods
    without source are skipped instead of crashing.

New Tonel/Module.st extension adds writeSourceTo:/writeExtensionsTo:/
writeModuleClassTo: helpers to dump a module to a Tonel directory.

Tonel/Tests adds TonelWriterTest covering round-trips, dollar-dollar
edge cases, unterminated blocks, method sorting, the no-extra-tabs
regression, and a Latin-1 unicode body. The Tests module uses the new
TestSuite forModule: helper.
Cover SmallInteger/LargeInteger interactions for bitAnd:, bitOr:, bitXor: and bitShift: (test200-test232). Each test exercises a single operation with a recognizable 0xAA bit pattern.

Add BareTestsModule>>main: and runTest: so the suite is runnable via 'egg Kernel.BareTests', plus a README documenting how to run.
Detect overflow in SMI +/-/* primitives and underprimitives so Smalltalk falls back to LargeInteger arithmetic instead of silently wrapping. Use __builtin_mul_overflow for *.

Fail SMI bitAnd:/bitOr:/bitXor: primitives when the argument is not a SmallInteger so the Smalltalk fallback (which dispatches to LargeInteger) runs.

Clamp right-shift amounts >= word width in primitiveSMIBitShift and underprimitiveSMIBitShiftRight; on ARM 'value >> 64' is UB and was silently masked, which made e.g. (-2^64) negated collapse to 0.

Add corresponding Smalltalk fallback paths in Kernel.VM.SmallInteger for +/-/*, bitAnd:/bitOr:/bitXor:, and split bitShift: into bitShiftLeft:/bitShiftRight:.

Covered by Kernel.BareTests test200-test232.
Cover SmallInteger>>bitShift: with a negative shift amount equal to or
greater than the machine word width. On ARM the C++ shift count is
masked, making `1 >> 64 == 1` (UB), which used to corrupt
LargeInteger arithmetic (e.g. `(-2^64) negated` collapsed to '0').
The clamp landed in 8956c90; this test pins the behaviour.
primitiveFloatTimesTwoPower was a copy-paste of an equality
primitive: it required the argument to be a Float and returned a
boolean comparing the two doubles' bit patterns. The Smalltalk
selector Float>>timesTwoPower: takes an Integer scale, so the
primitive always failed and fell through to errorVMSpecific.

That break was hidden until Fraction>>asFloat exercised it for
fractions whose denominator does not fit the IEEE-754 mantissa
(e.g. 1/10^16, produced by scanning literals like 0.1e-35). The
errorVMSpecific path then recursed forever and the VM died with
'Error: stack overflow' instead of returning a sensible Float.

Reimplement the primitive with std::ldexp(self, intArg) and add a
regression bare test (test183FloatTimesTwoPower) that exercises
both the direct primitive and the Fraction>>asFloat code path.
intern: was sending isByteCompliant/reduced/asSymbol to self (the WideSymbol class) instead of aString, so the byte-compliant fast path never worked and would fail on the class side.
Smalltalk-side wrappers around the _primitiveULongAtOffset: underprim, mirroring the existing LMRProtoObject overrides. Required by WideString>>uLongAtValidOffset:put: and ArrayedCollection's raw-access helpers.
…trap helpers

LiteralValue: new LargeInteger tag (bytes + negative); fromIntegerDigits factory asserting base in [2,36] and digits < base; printLargeIntegerString helper; ASSERT throughout.

SSmalltalkParser: drop trailing _ from no-arg methods; split parseLiteralValue into parseIntegerString / parseFloatString; Egg::error on out-of-range radix; pragma() local-var shadow fix.

Bootstrapper / SourceModuleLoader: transferLiteral_ slimmed via newLargeInteger_ and transferCharacter_ helpers; Bootstrapper identity-maps characters; SourceModuleLoader instantiates Character via value:; phase reordering and extension-method handling.
18 tests covering identifiers, keywords, binary selectors, numeric literals (including radix and large integers), strings, symbols, character literals, comments, arrays, byte arrays. testUnicodeScanning marked #knownIssue.
Introduces MethodDictBuilder interface separating array-backed bootstrap dictionaries from real MethodDictionary objects. Bootstrapper.cpp/.h already reference it (committed in 120d6b1); this adds the missing implementation files and wires them into CMakeLists.txt.
…and $-escaped delimiters

CodeSpecs: MethodSpec gains category; ClassSpec gains comment and isExtension.

TonelReader: parseFile recognises "Extension" (vs "Class"); errors on unknown type; preserves leading comment as class comment; passes STON #category through to MethodSpec; nextBlock now correctly skips $[ $] $' $" character literals (not just nesting brackets).

TonelWriter: drop redundant utf8 conversion when writing method head/body.
…ding

Adds host primitives for writeFile, createDirectory, pathExists,
currentDirectory, getEnv and loadModuleFromPath, registered alongside
the existing Host* primitives in Evaluator.

HostSystem now owns a `searchPaths` instvar holding `path -> #ems|#tonel`
associations. `load:` walks the configured paths probing
`base/Name.ems` for #ems entries and `base/Name` directories for #tonel,
then delegates to the host's loadModuleFromPath: primitive (which picks
the right backend by extension).

`setupDefaultSearchPaths` is portable across Linux/macOS/Windows: it
honours `EGG_MODULES_PATH` (split by the platform PATH separator),
probes `./modules` up to four levels above cwd (so the runtime works
from both the project root and a build subdirectory), and adds
`~/.egg/cache/modules` and `/usr/local/share/egg/modules` on Unix.

KernelModule >> useHostModuleLoader now configures the default search
paths before registering the host loader, and exposes passthroughs for
the new host services.
Adds Loader::loadModuleFromPath_ used by the HostLoadModuleFromPath
primitive: dispatches to FileImageSegment for .ems files or to
SourceModuleLoader for directories, caching the result by basename.

Adds Loader::modulePath_ to map dotted module names to filesystem
paths (Compiler.Tests -> Compiler/Tests), and uses it in hasSourceDir_
and loadModule_ so dotted module names resolve correctly.
New TOML 1.0 parser (TOMLParser), writer (TOMLWriter), and a Tests
sub-module (TOMLParserTest). Will be consumed by EPM for epm.toml
manifest handling.
Adds Config (loads ~/.egg/config.toml + ./epm.toml via TOMLParser,
merging user defaults with project overrides) and ProjectGenerator
(scaffolds a fresh Tonel project with package.st, Module.st and
epm.toml).

EPMModule grows commands new, init, start, dev, install, list and
auto-detects the test target from cwd. loadConfig prepends each
project-declared module path (paths.modules) to HostSystem's search
path for both #tonel and #ems lookup.

bin/epm shell wrapper resolves the egg binary via $EGG_HOME or
PATH and execs 'egg EPM "$@"'.
Pass the size of the newly-committed region (newLimit - _committedLimit)
instead of (newLimit - _base), which was committing memory measured from
the space base every time and growing without bound past _committedLimit.
Wrap _symbolTable in a GCedRef so the table survives GC cycles.
Without this, the raw HeapObject* would dangle after a collection.
- Set debugRuntime in the Runtime ctor so error/crash paths can
  print Smalltalk backtraces.
- Bind Kernel's Character class on the runtime side.
- addSegmentSpace_ tolerates a nil module (bootstrap kernel has none).
- Add Runtime::loadModuleFromPath_ wrapper for the Loader's new
  path-based entry point.
- Use the native integer arguments (fromint/toint) when computing the
  replacement length; the previous code subtracted the tagged Object*
  pointers, producing nonsense lengths.
- Treat (from > to) as a no-op rather than running with a negative
  length, matching the Smalltalk fallback semantics.
- Add the missing lower-bound checks (from >= 1, startingAt >= 1) and
  cast sizes to intptr_t so the upper-bound comparisons work even when
  the negative arguments would otherwise wrap around through unsigned
  promotion.

Compiler.Tests: 19 run, 18 passed, 1 known issue, 0 errors.
The inliner used to skip cascades only when the cascade receiver was a
block, but the cascade machinery needs the original receiver value to
deliver the subsequent messages -- inlining ifTrue:/whileTrue:/etc.
substitutes the receiver and breaks evaluation. The original
InternalReadStream>>peekFor: was the case that surfaced this:

    ^self peek = token ifTrue: [position := position + 1]; yourself

Now any cascade message is excluded from inlining, on both the
Smalltalk-side compiler and the C++ port used during bootstrap.

Tests:
- modules/Compiler/Tests/MessageInlinerTest.st: AST-level checks for
  cascade vs non-cascade ifTrue:/whileTrue:/keyword cascades.
- runtime/cpp/Compiler/tests/MessageInlinerTest.cpp: matching Catch2
  tests; the orphan tests/ subdir is now wired into CMake under
  BUILD_TESTING (also added the missing egg_runtime link dep).
- TestsModule now builds the suite via TestSuite forModule: self so new
  TestCase classes are picked up automatically.
- InternalReadStream>>peekFor: rewritten to a non-cascade form (clearer
  intent); the original cascade form also works once the fix is in.
melkyades added 12 commits May 3, 2026 21:20
…mpilation

Adds the previously-untracked Bootstrap/tests/ tree (parser smoke
tests, source-loading, method parsing, integration, treecode
compilation) and wires it into CMake under BUILD_TESTING.

CompilationTest used to call TreecodeEncoder::encodeMethod on a raw
parsed AST without running semantic analysis or setting the encoder's
SCompiledMethod -- it segfaulted on any method that referenced an
identifier (as soon as encodeDynamicVar_ tried to look up the symbol
in the literal table). Switched both treecode tests to the proper
compileMethod_ pipeline and read the resulting treecodes from the
SCompiledMethod.

26 test cases / 69 assertions, all passing.
Editor-local config; should not be tracked. .gitignore already
excludes .vscode/ in the working tree going forward.
Switch the C++ runtime from the legacy conanfile.txt to a Python
conanfile, which lets us:
- pin compiler.cppstd=20 in configure() so users no longer have to
  pass `-s compiler.cppstd=20` on `conan install`
- declare catch2/2.13.10 as a real dep (consumed by the
  Compiler/tests and Bootstrap/tests Catch2 targets)

Generators (CMakeDeps + CMakeToolchain) and the libffi/cxxopts
versions are unchanged.
- CODING_STYLE.md: add "single-assignment temporaries" rule and a
  "No Abbreviations" section.
- CONTRIBUTING.md: fix the commit-tag example so both forms use the
  bracket syntax.
- runtime/cpp/CODING_STYLE.md: new C++ coding style guide for the
  runtime.
- runtime/cpp/README.md: refresh build/run instructions.
- runtime/cpp/Bootstrap/README.md: new doc for the bootstrapper.
- runtime/cpp/Compiler/README.md: new doc for the C++ compiler port.
cppstd=20 is now pinned in conanfile.py's configure(), so passing it
on the command line is redundant.
When a method's pragma names a primitive the C++ runtime doesn't
implement, fall through to the Smalltalk body and print a warning
instead of aborting. This lets bootstrap continue when the kernel
references a host primitive that hasn't been wired up yet.
STONWriter has class-side state that needs to be initialized before
first use; do it from STONModule>>justLoaded so the module is ready
to write as soon as it's loaded.
- build/ (root cmake out-of-tree dir)
- runtime/cpp/build-* (per-platform build dirs)
- .vscode/settings.json (editor-local)
compiler_tests pulls in egg_runtime, which contains Loader.cpp and
SymbolProvider.cpp. Those reference Bootstrapper::newSymbol_() and
SourceModuleLoader::~SourceModuleLoader(), defined in
bootstrapper_lib.

On macOS the static archive was lazy enough not to pull those object
files in (no test referenced them transitively), but CI's Linux
linker does, breaking the link. Always link bootstrapper_lib so the
build is portable.
egg_runtime's Loader.cpp and SymbolProvider.cpp call into
bootstrapper_lib (Bootstrapper::newSymbol_, ~SourceModuleLoader),
while bootstrapper_lib already links egg_runtime. macOS's ld pulls
all archive members and resolves transitively, so the cycle was
hidden; GNU ld is single-pass and needs the cycle declared so CMake
emits --start-group/--end-group.

Add the reverse edge target_link_libraries(egg_runtime PUBLIC
bootstrapper_lib). Bump the subdir's cmake_minimum_required to 3.13
and set CMP0079=NEW so we can target a parent-directory target.
Register and implement Pharo-side host primitives that were missing
from EggEvaluator, so the metacircular runtime matches the C++ VM:

- HostCurrentDirectory: returns FileLocator workingDirectory fullName.
- HostGetEnv: looks up OSEnvironment current, returns nil on miss.
- HostPathExists: tests asFileReference exists.
- HostLoadModuleFromPath: loads a module given a filesystem path.

Refactor Ring2MetacircularConverter#loadModule: to delegate to a new
loadModuleFromPath: (used by HostLoadModuleFromPath), and drop the now
unused readModuleSpec:. Also fix print:on: to render metaclasses as
<ClassName class> instead of falling through to the generic printer.
MSVC's cl has no __builtin_mul_overflow. Introduce a small Compat.h
header providing Egg::mul_overflow_iptr, dispatched at compile time:

- GCC/Clang: __builtin_mul_overflow.
- MSVC x64: _mul128 from <intrin.h>, overflow detected via hi != (lo >> 63).
- Otherwise: #error (we don't currently support 32-bit MSVC builds).

Use it from primitiveSMITimes / underprimitiveSMITimes in Evaluator.cpp.
@melkyades melkyades merged commit f5bfbf2 into main May 4, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant