Skip to content

feat: Automated opcode generation tool and 43 new interpreter operators#209

Merged
fglock merged 7 commits into
masterfrom
feature/runtime-jvm-compiler
Feb 18, 2026
Merged

feat: Automated opcode generation tool and 43 new interpreter operators#209
fglock merged 7 commits into
masterfrom
feature/runtime-jvm-compiler

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Feb 18, 2026

Summary

Adds automated code generation tool for interpreter opcodes, enabling bulk addition of 43 built-in operators (chr, ord, abs, int, uc, lc, hex, oct, eq, ne, cmp, lt, gt, etc.) with zero-overhead dispatch patterns and optimal JVM performance.

Key Features

Automated Code Generation Tool (dev/tools/generate_opcode_handlers.pl)

  • Parses OperatorHandler.java for eligible operators (scalar unary/binary/ternary)
  • Generates handler classes with zero-overhead switch patterns
  • Auto-updates 4 source files with marker-based insertion
  • Handles deduplication and complex signature filtering
  • Assigns contiguous opcodes by signature type for JVM tableswitch optimization

43 New Interpreter Operators

  • 31 unary: chr, ord, abs, int, log, sqrt, cos, sin, exp, lc, uc, lcfirst, ucfirst, fc, hex, oct, quotemeta, chr_bytes, ord_bytes, length_bytes, binary_not, integer_bitwise_not, srand, sleep, tell, rmdir, closedir, rewinddir, telldir, chdir, exit
  • 12 binary: atan2, eq, ne, lt, le, gt, ge, cmp, x (string repeat), binary&, binary|, binary^

LASTOP-Relative Numbering

  • Generated opcodes use LASTOP + offset notation
  • Easy manual opcode additions: update LASTOP, run tool
  • All generated opcodes auto-adjust

Contiguous Opcode Assignment

  • Binary operators: LASTOP+1 through LASTOP+12
  • Unary operators: LASTOP+13 through LASTOP+43
  • Ensures JVM uses tableswitch (O(1)) instead of lookupswitch (O(log n))
  • Verified with javap: "tableswitch { // 0 to 263"

Comprehensive Documentation

  • Updated dev/interpreter/SKILL.md with code generator guide
  • Eligibility criteria, LASTOP management, common gotchas
  • Testing procedures and workflow examples

Performance

  • ✓ Zero-overhead dispatch: read registers once, switch with direct method calls
  • ✓ JVM tableswitch optimization maintained (contiguous opcodes 0-263)
  • ✓ 100% API compatibility with compiler runtime
  • ✓ All 17 test cases pass (unary, binary, bitwise operators)

Files Changed

Generated Files:

  • ScalarUnaryOpcodeHandler.java (115 lines)
  • ScalarBinaryOpcodeHandler.java (74 lines)

Auto-Updated Files:

  • Opcodes.java - 43 new opcode constants with LASTOP-relative numbering
  • BytecodeInterpreter.java - dispatch cases for all handlers
  • InterpretedCode.java - disassembly cases
  • BytecodeCompiler.java - 576 lines of auto-generated emit cases

Tool & Documentation:

  • dev/tools/generate_opcode_handlers.pl (589 lines)
  • dev/interpreter/SKILL.md (175 lines added)

Test plan

  • Build successful (make)
  • All unit tests pass (make test-unit)
  • 17 operator tests pass with --interpreter flag
  • JVM bytecode verified: tableswitch { // 0 to 263 (optimal dispatch)
  • Opcodes contiguous: no gaps between 0-263
  • Tool regeneration: perl dev/tools/generate_opcode_handlers.pl works correctly
  • LASTOP management: manual opcode additions tested

🤖 Generated with Claude Code

fglock and others added 7 commits February 18, 2026 18:57
…dling

This commit consolidates runtime and JVM compiler enhancements:

Features:
- Add escapeInvalidQuantifierBraces function for Perl regex compatibility
  (currently disabled due to test regressions - needs more work)
- Add DEBUG_REGEX environment variable support for regex debugging

Fixes:
- Preserve RUNTIME context for RHS of logical operators in JVM compiler
- Evaluate LHS of logical operators in SCALAR context (for boolean test)
- Add debug logging to RuntimeRegex.compile() and matchRegexDirect()

Implementation Details:
- EmitLogicalOperator: Changed context handling for logical operators
  - LHS evaluated in SCALAR context for boolean test
  - RHS preserves RUNTIME context when in RUNTIME mode
  - Prevents context loss at subroutine exits

- RegexPreprocessor: Added escapeInvalidQuantifierBraces()
  - Handles Perl-style quantifier braces like {1}, {,3}, {2,5}
  - Escapes invalid braces that would cause Java Pattern.compile() errors
  - Currently disabled (line 82-84) due to edge case regressions
  - Function ready for future refinement and re-enabling

- RuntimeRegex: Added DEBUG_REGEX support
  - Set DEBUG_REGEX=1 environment variable to enable regex debug output
  - Logs pattern compilation, cache hits/misses, and matching operations
  - Helps diagnose regex preprocessing and matching issues

Files Modified:
- EmitLogicalOperator.java: +17/-12 lines
- RegexPreprocessor.java: +212/-0 lines
- RegexPreprocessorHelper.java: +123/-71 lines (refactored)
- RuntimeRegex.java: +41/-13 lines

Test Results (vs master):
- re/regexp.t: 1788/2210 (+2)
- re/pat.t: 896/1296 (+1)
- re/pat_rt_report.t: 2384/2514 (+3)
- re/reg_mesg.t: 1642/2479 (no change)
- Net: +6 improvements, 0 regressions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The BytecodeCompiler was emitting STRING_BITWISE_* opcodes for the
default bitwise operators (&, |, ^) when it should emit BITWISE_*_BINARY
opcodes. In Perl, the default bitwise operators perform numeric operations,
not string operations.

This bug caused eval STRING expressions like 'eval "83 | 120"' to return
930 (string bitwise OR result) instead of 123 (numeric bitwise OR result).

Fixed:
- & now emits BITWISE_AND_BINARY (was STRING_BITWISE_AND)
- | now emits BITWISE_OR_BINARY (was STRING_BITWISE_OR)
- ^ now emits BITWISE_XOR_BINARY (was STRING_BITWISE_XOR)

The string bitwise operators (&., |., ^.) continue to emit STRING_BITWISE_*
opcodes correctly.

Impact: Fixes interpreter parity for bitwise operations in eval STRING context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Created dev/tools/generate_opcode_handlers.pl to automatically generate
opcode handlers for built-in functions from OperatorHandler.java.

Key Features:
- Automatically reads LASTOP from Opcodes.java to determine next opcode
- Skips existing opcodes to avoid duplicates
- Generates handler classes with efficient zero-overhead dispatch pattern
- Automatically updates Opcodes.java, BytecodeInterpreter.java, and
  InterpretedCode.java at marker locations
- Uses -> syntax for clean, modern Java code

Generated Handlers:
- ScalarUnaryOpcodeHandler: 31 operators (chr, ord, abs, sin, cos, lc, uc, etc.)
- ScalarBinaryOpcodeHandler: 12 operators (atan2, eq, ne, lt, le, gt, ge, cmp,
  binary&, binary|, binary^, x)

Opcodes Generated:
- Reserved range: 221-263 (43 opcodes)
- Next available: 264

Markers Added:
- // GENERATED_OPCODES_START/END in Opcodes.java
- // GENERATED_HANDLERS_START/END in BytecodeInterpreter.java
- // GENERATED_DISASM_START/END in InterpretedCode.java

Implementation:
- Added LASTOP constant to track manually-assigned opcodes
- Tool excludes generated sections when reading existing opcodes
- Skips operators with complex signatures (varargs, etc.)
- Skips operators that already have opcodes (rand, length, rindex, index,
  require, isa, bless, ref, join, prototype, getc)

Future Work:
- Add BytecodeCompiler.java generation for emit cases
- Add more operator types (list, array, hash operations)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added manual emit cases for key unary operators (chr, ord, hex, oct, abs,
int, uc, lc) in BytecodeCompiler.java to enable interpreter execution.

Updated SKILL.md with comprehensive code generator documentation:
- Quick start guide
- Eligibility criteria for operators
- LASTOP management critical for opcode numbering
- Common gotchas and solutions
- Testing procedures
- Manual implementation guidance

All 17 test cases now pass with interpreter:
✓ chr, ord, abs, int, uc, lc, hex, oct (unary)
✓ eq, ne, cmp, lt, gt, x (binary)
✓ Bitwise OR, AND, XOR

Next: Enhance tool to auto-generate BytecodeCompiler emit cases to reduce
code repetition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enhanced dev/tools/generate_opcode_handlers.pl to automatically generate
emit cases in BytecodeCompiler.java, eliminating 150+ lines of repetitive code.

Changes:
- Tool now updates 4 files automatically (was 3):
  * Opcodes.java - opcode constants
  * BytecodeInterpreter.java - dispatch cases
  * InterpretedCode.java - disassembly cases
  * BytecodeCompiler.java - emit cases (NEW!)

- Removed 150+ lines of repetitive manual emit code
- All 31 unary operators now generated automatically
- Binary/ternary operators can be added similarly

Verification:
- LASTOP tracking works correctly (starts at 221 = LASTOP + 1)
- All 17 test cases pass ✓
- Build successful, no compilation errors

Benefits:
- Eliminates manual code repetition
- Consistent pattern across all operators
- Easy to add new operators (just run tool)
- Reduces maintenance burden

Tool Usage:
```
perl dev/tools/generate_opcode_handlers.pl
make
```

Next: Add binary/ternary emit case generation for complete automation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changed generated opcodes to use LASTOP + offset notation instead of
hardcoded numbers, making manual opcode additions much easier.

Before:
    public static final short ATAN2 = 228;
    public static final short INT = 221;

After:
    public static final short ATAN2 = LASTOP + 8;
    public static final short INT = LASTOP + 1;

Benefits:
- Add manual opcode: just update LASTOP, run tool
- All 43 generated opcodes auto-adjust
- No manual renumbering needed
- Clear relationship to LASTOP visible in code

Example workflow:
1. Add manual opcode at 221
2. Update LASTOP = 221
3. Run perl dev/tools/generate_opcode_handlers.pl
4. Generated opcodes shift from 221-263 to 222-264 automatically

Verification:
- All 17 tests pass ✓
- INT = LASTOP + 1 = 221 (correct)
- Build successful

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ptimization

Fixed critical opcode ordering issue where opcodes were assigned in
OperatorHandler.java appearance order, creating gaps that prevented JVM
tableswitch optimization. Now assigns opcodes contiguously grouped by
signature type:
- Binary operators: LASTOP+1 through LASTOP+12 (12 contiguous)
- Unary operators: LASTOP+13 through LASTOP+43 (31 contiguous)

This ensures JVM uses tableswitch (O(1)) instead of lookupswitch (O(log n))
for optimal interpreter performance. Verified with javap showing
"tableswitch { // 0 to 263" covering all opcodes.

All 17 test cases pass (chr, ord, abs, int, uc, lc, hex, oct, eq, ne,
cmp, lt, gt, x, bitwise |, &, ^).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fglock fglock merged commit 98f0899 into master Feb 18, 2026
2 checks passed
@fglock fglock deleted the feature/runtime-jvm-compiler branch February 18, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant