Fix to resolve Windows 32-bit Out Of Memory benchmark tests#57
Conversation
Root cause: TSouffleGarbageCollector.Initialize was never called anywhere in the codebase, so the Souffle GC singleton was nil. All AllocateObject calls were no-ops (guarded by 'if Assigned(FGC)'), and CollectIfNeeded never triggered. Souffle heap objects (closures, arrays, records, blueprints, strings, upvalues) accumulated without limit, causing EOutOfMemory on Windows after ~29s of benchmark execution. Changes: - Goccia.Engine.Backend.pas: Initialize Souffle GC in TGocciaSouffleBackend.Create (with Enabled := False to prevent automatic collection during execution), run full Collect in destructor, add finalization section for Shutdown - Souffle.Compound.pas: Add GCMarked cycle-breaking checks in TSouffleArray.MarkReferences and TSouffleRecord.MarkReferences (prevents infinite recursion on self-referential objects like globalThis.globalThis) - Souffle.VM.Closure.pas: Add GCMarked check in TSouffleClosure.MarkReferences for upvalue marking - Goccia.Runtime.Operations.pas: Add FMapDelegate, FSetDelegate, and FBlueprintSuperValues to MarkExternalRoots (previously unmarked Souffle heap objects) - Update AGENTS.md and docs/souffle-vm.md with GC lifecycle docs Automatic collection (CollectIfNeeded) is disabled because the mark phase does not yet cover all cross-GC roots during bridge calls. The per-backend-destruction Collect approach is sufficient to prevent OOM in multi-file scenarios (benchmarks, test suites). Co-authored-by: Johannes Stein <frostney@users.noreply.github.com>
|
Cursor Agent can help with this pull request. Just |
📝 WalkthroughWalkthroughThis PR implements garbage collection lifecycle management for the Souffle backend system. Changes include: initializing the GC singleton with automatic collection disabled, adding GCMarked guards to prevent redundant marking in Souffle compound objects and closures, integrating collection during backend destruction, expanding external-root marking coverage for super-values and delegates, and updating documentation to reflect the new GC workflow. Changes
Sequence DiagramsequenceDiagram
participant Backend as TGocciaSouffleBackend
participant GC as TSouffleGarbageCollector
participant Runtime as MarkExternalRoots
participant SouffleObj as Souffle Objects<br/>(Compound, Closure)
Backend->>GC: Create() - Initialize singleton
Backend->>GC: Disable automatic collection
note over GC: Enabled := False
Runtime->>Runtime: Mark external roots
Runtime->>SouffleObj: MarkReferences(obj)
activate SouffleObj
alt GCMarked = False
SouffleObj->>SouffleObj: Recursively mark references
else GCMarked = True
note over SouffleObj: Skip (already marked)
end
deactivate SouffleObj
Backend->>GC: Destroy() - Explicit collection
activate GC
GC->>GC: Collect() - Free Souffle heap
deactivate GC
note over Backend: Module Finalization
Backend->>GC: Shutdown()
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This reverts commit 898ed1e.
Benchmark Results254 benchmarks (no baseline) arraybuffer.js — 14 benchmarks
arrays.js — 19 benchmarks
async-await.js — 6 benchmarks
classes.js — 31 benchmarks
closures.js — 11 benchmarks
collections.js — 12 benchmarks
destructuring.js — 22 benchmarks
fibonacci.js — 8 benchmarks
for-of.js — 7 benchmarks
iterators.js — 20 benchmarks
json.js — 20 benchmarks
jsx.jsx — 21 benchmarks
numbers.js — 11 benchmarks
objects.js — 7 benchmarks
promises.js — 12 benchmarks
strings.js — 11 benchmarks
typed-arrays.js — 22 benchmarks
Measured on ubuntu-latest x64. Changes within ±7% are considered insignificant. |
The Windows CI uses i386-win32 FPC (32-bit), limiting the process to ~2 GB of address space. In bytecode mode, the Souffle VM creates heap objects (closures, arrays, records, blueprints) that were never tracked or collected — TSouffleGarbageCollector.Initialize was never called anywhere in the codebase. On 64-bit platforms this leak is invisible (sufficient address space), but on 32-bit Windows the benchmark runner exhausts memory after ~29 seconds across 254 benchmarks. Initialize the Souffle GC in TGocciaSouffleBackend.Create with automatic collection disabled (Enabled := False). A full Collect runs in the destructor to free all Souffle heap objects between backend instances. Also fix latent bugs in the Souffle GC mark phase (never triggered before because the GC was never initialized): - Add GCMarked cycle-breaking checks in TSouffleArray.MarkReferences, TSouffleRecord.MarkReferences, and TSouffleClosure.MarkReferences - Add FMapDelegate, FSetDelegate, and FBlueprintSuperValues to MarkExternalRoots Mid-execution Collect (like the GocciaScript GC does between benchmark measurement rounds) is not yet possible because the Souffle mark phase does not cover all cross-GC roots in bridge call paths. Per-backend destruction collection is sufficient to bound memory on 32-bit. Co-authored-by: Johannes Stein <frostney@users.noreply.github.com>
a11a150 to
795ad62
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 173: The documented test count is inconsistent: AGENTS.md currently
states "3,433" while docs/souffle-vm.md's "Known Limitations" section still
shows "3,406"; update the numeric string in both documents so they match
(replace "3,406" with "3,433" or vice versa to the authoritative current value)
and ensure the "Known Limitations" heading text remains intact so search/links
still work.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c58fa0f1-3c2e-4802-a386-79a6855472e7
📒 Files selected for processing (6)
AGENTS.mddocs/souffle-vm.mdsouffle/Souffle.Compound.passouffle/Souffle.VM.Closure.pasunits/Goccia.Engine.Backend.pasunits/Goccia.Runtime.Operations.pas
| | GocciaScript Runtime | `Goccia.Runtime.Operations.pas` | `TGocciaRuntimeOperations` — GocciaScript semantics for Souffle VM, bridge caches (`FClosureBridgeCache`, `FArrayBridgeCache`, `FArrayBridgeReverse`, `FRecordBridgeCache`, `FBlueprintBridgeCache`), cross-GC rooting, array bridge sync (`SyncCachedGocciaToSouffle` at bridge entry, `SyncSouffleArrayToCache` after native mutations), native delegate methods (array, string, number, Map, Set), `FBlueprintSuperValues` for built-in subclass static property inheritance | | ||
|
|
||
| **Souffle VM known limitations:** The bytecode backend passes 100% of the test suite (3,406 tests). Most language features execute natively in the VM: user-defined classes with constructors, named methods, getters, setters, static members, private fields/methods, computed property names, and instance fields compile to `TSouffleBlueprint` with `__fields__` initializer closures; classes extending built-in constructors (`Array`, `Map`, `Set`, `Promise`, `Object`) compile natively via `FBlueprintSuperValues` and `TGocciaSuperCallHelper`; array and string iteration uses `TGocciaSouffleArrayIterator`/`TGocciaSouffleStringIterator`; Map and Set iteration uses direct `TGocciaMapIteratorValue`/`TGocciaSetIteratorValue` creation; built-in constructors use a `TGocciaBridgedFunction` fast path; Map/Set property access and method calls use native delegates (`FMapDelegate`/`FSetDelegate`). The bridge still delegates to the interpreter for: module imports, async/await (microtask queue), and decorator evaluation. Bridged arrays use a dual-representation model with `SyncCachedGocciaToSouffle` at bridge entry and `SyncSouffleArrayToCache` after native mutations; `FArrayBridgeReverse` preserves reference identity across round-trips. `FBlueprintBridgeCache` persists across bridge calls to maintain `instanceof` identity for blueprint-backed classes. Only classes with decorators are deferred to the interpreter via `FPendingClasses` / `GOCCIA_EXT_EVAL_CLASS`. `.sbc` binary format uses native endianness (not yet portable). ABC-encoded instructions limit constant pool references to 255 per prototype. See [docs/souffle-vm.md § Known Limitations](docs/souffle-vm.md#known-limitations) for the full list. | ||
| **Souffle VM known limitations:** The bytecode backend passes 100% of the test suite (3,433 tests). Most language features execute natively in the VM: user-defined classes with constructors, named methods, getters, setters, static members, private fields/methods, computed property names, and instance fields compile to `TSouffleBlueprint` with `__fields__` initializer closures; classes extending built-in constructors (`Array`, `Map`, `Set`, `Promise`, `Object`) compile natively via `FBlueprintSuperValues` and `TGocciaSuperCallHelper`; array and string iteration uses `TGocciaSouffleArrayIterator`/`TGocciaSouffleStringIterator`; Map and Set iteration uses direct `TGocciaMapIteratorValue`/`TGocciaSetIteratorValue` creation; built-in constructors use a `TGocciaBridgedFunction` fast path; Map/Set property access and method calls use native delegates (`FMapDelegate`/`FSetDelegate`). The bridge still delegates to the interpreter for: module imports, async/await (microtask queue), and decorator evaluation. Bridged arrays use a dual-representation model with `SyncCachedGocciaToSouffle` at bridge entry and `SyncSouffleArrayToCache` after native mutations; `FArrayBridgeReverse` preserves reference identity across round-trips. `FBlueprintBridgeCache` persists across bridge calls to maintain `instanceof` identity for blueprint-backed classes. Only classes with decorators are deferred to the interpreter via `FPendingClasses` / `GOCCIA_EXT_EVAL_CLASS`. `.sbc` binary format uses native endianness (not yet portable). ABC-encoded instructions limit constant pool references to 255 per prototype. The Souffle GC is initialized by `TGocciaSouffleBackend.Create` with automatic collection disabled; a full `Collect` runs in the backend destructor to free Souffle heap objects between files (critical on 32-bit Windows where address space is limited to ~2 GB). See [docs/souffle-vm.md § Known Limitations](docs/souffle-vm.md#known-limitations) for the full list. |
There was a problem hiding this comment.
Keep the test-count docs in sync.
This now says 3,433 tests, but docs/souffle-vm.md still says 3,406 in the Known Limitations section. Please update both sources together so contributors do not end up with conflicting baseline numbers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` at line 173, The documented test count is inconsistent: AGENTS.md
currently states "3,433" while docs/souffle-vm.md's "Known Limitations" section
still shows "3,406"; update the numeric string in both documents so they match
(replace "3,406" with "3,433" or vice versa to the authoritative current value)
and ensure the "Known Limitations" heading text remains intact so search/links
still work.
Initialize Souffle GC and fix related memory management issues to resolve Windows OOM in benchmarks.
The Souffle garbage collector was never initialized, causing all Souffle heap objects to accumulate indefinitely and leading to out-of-memory errors. Enabling the GC exposed several bugs in its mark phase, including missing roots and infinite recursion for cyclic references, which were addressed by fixing the marking logic and disabling automatic collection during VM execution, opting for explicit collection at controlled safe points.
Summary by CodeRabbit
Release Notes
Documentation
Bug Fixes