Summary
Add support for persistent global variables shared across independently compiled scripts, in the spirit of GNU awk's pm-gawk, while preserving Jawk's current slot-based runtime execution model.
The preferred direction is to remap global slots at execution boundaries when switching to a different compiled script, instead of replacing runtime variable access with HashMap<String, Object> lookups or introducing per-access cell indirection.
Context
Jawk currently compiles AWK identifiers to fixed offsets and executes them through array-backed global and local slots. That design is efficient because:
- variable names are resolved at compile time
- runtime variable access is reduced to slot dereference
- locals and globals have simple, predictable layouts
The problem is that each compiled program has its own global layout.
Example:
- script 1 globals:
A -> 0, B -> 1, C -> 2
- script 2 globals:
C -> 0, D -> 1
If script 1 runs first and then script 2 runs later in the same persistent memory context, script 2 should observe the value previously assigned to C, even though C is compiled to a different slot offset in script 2.
Persistence therefore requires name-based identity across runs, but not necessarily name-based lookups during execution.
Desired behavior
We want to support a persistence model where:
- user-defined global variables survive across script executions
- independently compiled scripts can observe the same global variables by name
- locals remain local to their function invocation and are not persistent
- built-in AWK variables such as
NR, NF, FS, RS, etc. remain runtime-managed and non-persistent
- persistent functions are treated as a separate concern
Proposed direction
Use execution-boundary global slot remapping.
High-level idea
When a new compiled script is about to run:
- read that script's compiled global name-to-offset mapping
- rebuild the AVM global slot array so the new script's expected offsets contain the correct persistent values
- preserve any previously known persistent globals that are not referenced by the new script by appending them after the new script's globals
- replace the AVM's current global-layout metadata with the new layout
- execute the script normally with unchanged slot-based opcodes
Conceptually:
- persistence is keyed by variable name between runs
- execution remains keyed by slot offset within a run
Example
After script 1 runs:
Now script 2 is about to run and expects:
The AVM would rebuild the global layout for script 2 as:
0 -> C with the value previously stored for C
1 -> D with no prior value
2 -> A
3 -> B
That allows script 2 to execute with its own compiled offsets while preserving previously created globals for future runs.
Why this direction
This approach preserves the current runtime strengths:
- no hot-path hash lookup for global reads or writes
- no opcode-level change for
DEREFERENCE, ASSIGN, PLUS_EQ, and similar operations
- locals can remain slot-based and frame-scoped exactly as they are today
- independently compiled scripts keep their own offset layouts without conflict
It also appears less invasive than introducing a GlobalCell indirection layer for every global slot access.
Required runtime changes
This is not a zero-refactor change. At minimum, the runtime will need to:
- retain persistent global values across executions instead of discarding them on reset
- retain enough metadata to map current slots back to global names
- rebuild the globals array when switching to a different compiled global layout
- update the AVM's active global name/offset metadata after each remap
In other words, Jawk still needs a canonical name-based view of persistent globals at execution boundaries, even if the interpreter itself remains slot-based during execution.
Important design constraints
- Uninitialized globals must retain correct AWK semantics after remapping.
- Array vs scalar behavior must remain correct across runs.
- Built-in runtime-managed variables must stay outside this persistence/remapping scheme unless explicitly designed otherwise.
- The remap logic should preserve previously known globals even when the next script does not reference them.
- The design should be explicit about whether it assumes a single sequential AVM or must also support concurrent execution safely.
Non-goals
This issue does not cover:
- persistent local variables
- replacing all runtime variable access with name-based map lookups
- persistent user-defined functions
- persistence file format or on-disk heap implementation details
Those may be follow-up issues.
Open questions
- What should be the canonical persistent representation between runs:
Map<String, Object>, name-to-slot metadata plus a globals array, or another structure?
- Should remapping happen only when switching to a different compiled program, or on every execution?
- How should array/scalar misuse checks interact with values restored from a previous script?
- Should built-in but materialized globals such as
ARGC, ARGV, or ENVIRON participate in this scheme, or remain special cases?
- How should the runtime expose load/save/clear operations for persistent state?
- What concurrency guarantees, if any, should be provided?
Why not a full HashMap<String, Object> runtime
A full name-based runtime would simplify cross-program identity, but it would also:
- make every global variable access pay a map lookup cost
- require reworking scope handling that the current slot model already solves well
- cut across the existing tuple/interpreter contract instead of extending it
For Jawk, that looks like the wrong tradeoff if persistent globals can be achieved by remapping at execution time.
Acceptance direction
A successful implementation should make it possible for:
- one compiled script to assign a user-defined global variable
- a separate compiled script, executed later in the same persistence context, to read or modify that same variable by name
- both scripts to keep their own compiled slot layouts without conflict
- the AVM to remap globals for the active script without changing hot-path opcode behavior
- local variables and built-in runtime state to remain non-persistent unless explicitly designed otherwise
Reference
GNU awk persistent memory manual:
https://www.gnu.org/software/gawk/manual/pm-gawk/pm-gawk.html
Summary
Add support for persistent global variables shared across independently compiled scripts, in the spirit of GNU awk's
pm-gawk, while preserving Jawk's current slot-based runtime execution model.The preferred direction is to remap global slots at execution boundaries when switching to a different compiled script, instead of replacing runtime variable access with
HashMap<String, Object>lookups or introducing per-access cell indirection.Context
Jawk currently compiles AWK identifiers to fixed offsets and executes them through array-backed global and local slots. That design is efficient because:
The problem is that each compiled program has its own global layout.
Example:
A -> 0,B -> 1,C -> 2C -> 0,D -> 1If script 1 runs first and then script 2 runs later in the same persistent memory context, script 2 should observe the value previously assigned to
C, even thoughCis compiled to a different slot offset in script 2.Persistence therefore requires name-based identity across runs, but not necessarily name-based lookups during execution.
Desired behavior
We want to support a persistence model where:
NR,NF,FS,RS, etc. remain runtime-managed and non-persistentProposed direction
Use execution-boundary global slot remapping.
High-level idea
When a new compiled script is about to run:
Conceptually:
Example
After script 1 runs:
0 -> A1 -> B2 -> CNow script 2 is about to run and expects:
0 -> C1 -> DThe AVM would rebuild the global layout for script 2 as:
0 -> Cwith the value previously stored forC1 -> Dwith no prior value2 -> A3 -> BThat allows script 2 to execute with its own compiled offsets while preserving previously created globals for future runs.
Why this direction
This approach preserves the current runtime strengths:
DEREFERENCE,ASSIGN,PLUS_EQ, and similar operationsIt also appears less invasive than introducing a
GlobalCellindirection layer for every global slot access.Required runtime changes
This is not a zero-refactor change. At minimum, the runtime will need to:
In other words, Jawk still needs a canonical name-based view of persistent globals at execution boundaries, even if the interpreter itself remains slot-based during execution.
Important design constraints
Non-goals
This issue does not cover:
Those may be follow-up issues.
Open questions
Map<String, Object>, name-to-slot metadata plus a globals array, or another structure?ARGC,ARGV, orENVIRONparticipate in this scheme, or remain special cases?Why not a full
HashMap<String, Object>runtimeA full name-based runtime would simplify cross-program identity, but it would also:
For Jawk, that looks like the wrong tradeoff if persistent globals can be achieved by remapping at execution time.
Acceptance direction
A successful implementation should make it possible for:
Reference
GNU awk persistent memory manual:
https://www.gnu.org/software/gawk/manual/pm-gawk/pm-gawk.html