Skip to content

Major refactor — recursive descent parser, arithmetic, unary operators, and token system overhaul#26

Merged
nicoSWD merged 49 commits intomasterfrom
ast
Apr 29, 2026
Merged

Major refactor — recursive descent parser, arithmetic, unary operators, and token system overhaul#26
nicoSWD merged 49 commits intomasterfrom
ast

Conversation

@nicoSWD
Copy link
Copy Markdown
Owner

@nicoSWD nicoSWD commented Apr 29, 2026

Release Notes

New Features

  • Arithmetic operations — Full support for math expressions with proper operator precedence: +, -, *, /, %. Parentheses can be used to override precedence. (2 + 3 * 4 == 14)
  • String concatenation — The + operator now performs string concatenation when either operand is a string, and numeric addition otherwise. ("hello " + "world")
  • Unary minus and logical NOT operators — Support for negation (-5), double negation (--5), and logical NOT (!false, !(1 == 2), !foo).
  • Expression result APIRule::result() returns the actual computed value of an expression (e.g., int(15) for 5 * 3), not just true/false.
  • Syntax highlighter — A built-in custom syntax highlighter with customizable styles for token types.
  • Duplicate regex modifier validation — Invalid or duplicate regex modifiers now throw an error.

Major Refactors & Improvements

  • Recursive descent parser — Replaced the old parser with a proper recursive descent parser, improving correctness and maintainability.
  • Token system simplification — Consolidated 27 token classes into a single GenericToken + TokenKind enum, drastically reducing complexity.
  • Lexer streamingLexer::tokenize() converted from array-based to generator-based streaming for better memory efficiency.
  • Eliminated TokenStream God object — Cleaned up the dependency tree by removing the monolithic TokenStream class.
  • RuleEngineBuilder — Extracted dependency wiring from the God constructor in RuleEngine into a dedicated builder.
  • Interpreter pattern in AstEvaluator — Replaced fragile match expressions with the Interpreter pattern for cleaner node evaluation.
  • Consolidated InternalFunction/InternalMethod — Merged into a single InternalCallable DTO.
  • Consolidated Token/TokenKind enums — Reduced duplication across token enums.
  • Removed dead code — Cleaned up the Expression directory, EvaluableExpression, unused TokenIterator methods, and more.
  • PSR-12 coding standard — Migrated from PSR-2 to PSR-12.

Bug Fixes

  • Division by zero — Now properly handled in the evaluator instead of causing errors.
  • String escape sequences — Fixed unescaping of escape sequences in TokenEncapsedString.
  • AST cache not used — Fixed double parsing in Rule.php where the AST cache was never utilized.
  • Infinite recursion — Fixed infinite recursion in Rule.php property hook.
  • Lexer reentrancy — Fixed mutable state / reentrancy issue in the Lexer.
  • FunctionRegistry performance — Eliminated unnecessary new instance creation on every function call.
  • TokenCollection data structure — Replaced SplObjectStorage with ArrayObject to fix wrong data structure usage.
  • Fragile method prefix fallback — Refactored fragile method prefix fallback loop in ObjectMethodCaller.
  • Double evaluation — Eliminated double evaluation in isValid()/getError().
  • Duplicated node evaluation — Fixed duplicated node evaluation logic in AstEvaluator.
  • Stack-based token inspection — Replaced fragile $afterValue boolean with stack-based token inspection.

Dependency Updates

  • Updated PHP requirement to >=8.5
  • Updated phpunit/phpunit to ^13.1
  • Updated mockery/mockery to ^1.6
  • Updated squizlabs/php_codesniffer to ^4.0

nicoSWD added 30 commits April 28, 2026 14:04
Each AST node now implements evaluate(EvaluationContext): mixed,
eliminating the two massive match expressions in AstEvaluator
that violated the Open/Closed Principle.

- Added EvaluationContext DTO (wraps TokenStream + TokenFactory)
- Added evaluate() to Node base class
- Each concrete node implements its own evaluation logic
- AstEvaluator is now a thin orchestrator with zero match expressions
- All 263 tests pass
Made evaluate() delegate to resolve() instead of both methods
independently calling $node->evaluate(). This removes the latent
bug risk where one method could be modified without updating the other.
…ction

The $afterValue boolean flag was fragile because it required manual
maintenance at every token emission point. Adding new token types
required remembering to update the flag, and whitespace/newline
tokens implicitly relied on the previous state persisting.

Replaced it with lastTokenIsValue() which walks backwards through
the token stack, skipping ignorable tokens (whitespace/comments),
and checks the actual token type hierarchy:
- Does the token implement the Value interface?
- Is it a closing parenthesis or closing array?

This is more robust because:
- Context is derived from actual emitted tokens, not a manual flag
- New token types implementing Value automatically work
- Whitespace/comments are naturally skipped via canBeIgnored()
- The logic is self-documenting and testable
The getValue() method was stripping surrounding quotes but not unescaping
the content, so escape sequences like \n, \t, \, etc. were being treated
as literal characters instead of their actual control character equivalents.

Added an unescape() method that handles common JavaScript escape sequences:
\n, \r, \t, \, ", \', $, \0. Unknown sequences are preserved as-is.

Added 9 unit tests in LexerTest.php and 1 integration test in ScalarTest.php.
DivisionNode and ModuloNode now check for a zero divisor before
performing arithmetic operations, following JavaScript semantics:

- Division by zero returns INF, -INF, or NAN (matching JS Infinity/NaN)
- Modulo by zero returns NAN (matching JS NaN)

Added 5 test methods in OperatorsTest.php covering all edge cases.
The getNativeValue() method was defined in the abstract ValueNode class and
implemented in all 6 concrete subclasses (BoolNode, FloatNode, IntegerNode,
NullNode, RegexNode, StringNode), but was never called anywhere in the
codebase. The evaluate() method already returns the native PHP value directly,
making getNativeValue() entirely redundant.
…okenKind enum

- Created TokenKind enum with all token types
- Created GenericToken class backed by TokenKind to replace single-purpose token classes
- Added abstract getKind(): TokenKind to BaseToken
- Made getType() concrete in BaseToken, deriving TokenType from getKind()
- Removed isOfType() method (unused)
- Removed kindToType() from TokenFactory (duplicated in BaseToken::getType())
- Updated canBeIgnored() to use getKind() directly
- Updated ExpressionFactory to use TokenKind matching instead of instanceof
- Removed 27 old token class files
- Updated all tests accordingly

All 278 tests pass.
- Split TokenStream into VariableRegistry, FunctionRegistry, MethodRegistry
- Made VariableRegistry mutable to avoid recreating dependency tree on each parse()
- Inlined TokenIteratorFactory in RuleEngine constructor
- Made defaultVariables a promoted constructor property
- Removed lazy initialization ( flag) from FunctionRegistry and MethodRegistry
- Restored readonly on RuleEngine
The cached AST in Rule::isTrue() and Rule::result() was never actually used
because they called RuleEngine::evaluate() and RuleEngine::result() with the
raw rule string, which re-parsed it every time.

- Added RuleEngine::evaluateNode(Node) and RuleEngine::resolveNode(Node)
  to allow evaluating a pre-parsed AST without re-parsing
- Updated Rule::isTrue() to use evaluateNode() with the cached AST
- Updated Rule::result() to use resolveNode() with the cached AST
- Refactored RuleEngine::evaluate() and RuleEngine::result() to delegate
  to the new node-based methods for consistency
The get accessor on the  property was returning ->error,
which triggered the get accessor again, causing infinite recursion.
Since the get hook had no transformation logic, the entire property
hook was redundant and has been replaced with a simple typed property.
- Replace BaseToken & Operator intersection type with BaseToken in
  ExpressionFactory, ExpressionFactoryInterface, EvaluableExpression,
  and ParserException
- Remove implements Operator from GenericToken
- Remove implements Method from TokenMethod
- Delete unused empty marker interfaces: Logical, Method, Operator,
  Parenthesis, Whitespace
- Keep Value interface as it's still used with instanceof in Lexer
…onRegistry

Refactored FunctionRegistry to store class strings instead of Closures,
with instance caching so each function class is instantiated only once
per registry lifetime. Updated FunctionCallNode and TokenIterator to use
CallableInterface::call() instead of Closure invocation.
nicoSWD added 19 commits April 29, 2026 15:27
Extracted all mutable scanning state ($input, $pos, $length) into a new
LexerContext value object. The Lexer no longer holds any mutable state as
instance properties - a LexerContext is created locally inside tokenize()
and passed to all private helper methods. This makes the Lexer fully
reentrant: multiple calls to tokenize() on the same Lexer instance will
not interfere with each other.

- Created src/Tokenizer/LexerContext.php with peek(), current(), advance(),
  isValid(), and startsWith() convenience methods
- Refactored Lexer.php to remove $input, $pos, $length instance properties
- All private methods now accept LexerContext instead of accessing $this->input/$this->pos/$this->length
…y wiring to RuleEngineBuilder

The RuleEngine constructor was a classic God Constructor - it created and wired
all internal dependencies (TokenFactory, Lexer, VariableRegistry, FunctionRegistry,
MethodRegistry, ObjectMethodCallerFactory, TokenIteratorFactory, Parser,
AstEvaluator) directly inside the constructor body.

Changes:
- RuleEngine now accepts fully-wired dependencies (Parser, AstEvaluator,
  VariableRegistry) via constructor injection
- RuleEngineBuilder.build() now handles all object graph assembly
- Rule.php updated to use builder instead of direct RuleEngine instantiation
- Updated test that directly instantiated RuleEngine with old constructor
Replaced the do...while loop with manual index tracking in
findCallableMethod() with a cleaner two-step approach:
1. Check exact method name first with is_callable()
2. Use foreach loop over prefixes for fallback matching

This preserves the same behavior and prefix priority order
('', 'get', 'is', 'get_', 'is_') while being more maintainable
and less error-prone.
…ted test

- Removed entire src/Expression/ directory (BaseExpression, all comparison
  expression classes, ExpressionFactory, ExpressionFactoryInterface)
- Removed src/Parser/EvaluableExpression.php and EvaluableExpressionFactory.php
  which depended on the Expression classes
- Removed tests/unit/Expression/ExpressionFactoryTest.php which tested dead code

All comparison logic is handled by ComparisonNode + ComparisonOperator enum.
- Remove the redundant Token enum (src/TokenStream/Token/Token.php)
- Update TokenFactory::createFromToken() to accept TokenKind instead of Token
- Remove the 40-line tokenToKind() mapping function
- Update Lexer to use TokenKind instead of Token throughout
- Update test to use TokenKind directly
…on, getMethod)

These methods were only referenced in tests and never used in production code.
Also cleaned up the now-unused constructor parameters from TokenIterator and
TokenIteratorFactory, and updated RuleEngineBuilder accordingly.
…ed streaming

Replace the O(n) array storage in tokenize() with a PHP generator (yield),
eliminating the need to hold all tokens in memory simultaneously.

- Remove  array and ArrayIterator wrapping
- Replace lastTokenIsValue(array ) with isTokenValue(?BaseToken )
  for O(1) context tracking instead of O(n) stack walk
- Track last non-ignorable token via  variable
- Remove unused ArrayIterator import
…y from TokenIterator

- Remove TokenIteratorFactory (pointless abstraction — single create() method
  that just does 'new TokenIterator(...)')
- Inline TokenIterator creation directly in Parser::parse()
- Remove peekRaw() from TokenIterator (identical to current(), misleading name)
- Replace all peekRaw() calls with current() in Parser
- Remove readonly from TokenIterator (implements Iterator with mutating methods)
- Update RuleEngineBuilder and ParserTest to remove factory dependency
isValid() was duplicating the full parse+evaluate pipeline that getError()
already performed. Refactored isValid() to delegate to getError() by
checking if the error string is empty, eliminating the code duplication
while preserving identical behavior.
The class 'TokenizerInterface' was actually an abstract class, not an
interface. Renamed it to 'Tokenizer' to accurately reflect its nature,
and changed the  property from public to protected to fix the
encapsulation issue. Updated all references in Lexer, Parser,
RuleEngineBuilder, and ParserTest.
- Create abstract Lexer class at src/Lexer/Lexer.php
- Rename concrete class to DefaultLexer at src/Lexer/DefaultLexer.php
- Update LexerContext namespace from Tokenizer to Lexer
- Update all imports and references in Parser, RuleEngineBuilder, Highlighter
- Update test files (ParserTest, LexerTest)
- Remove old src/Tokenizer/ directory
- Rename tests/unit/Tokenizer/ to tests/unit/Lexer/
- All 261 tests pass
…idate token classes

- Replace the old stack-based Parser with a clean recursive descent parser
  that builds an AST directly, with documented operator precedence levels
- Consolidate 15+ single-purpose token classes into GenericToken with
  TokenKind enum for type identification
- Delete 10 unused token class files (TokenArray, TokenBoolFalse,
  TokenBoolTrue, TokenFloat, TokenFunction, TokenInteger, TokenNull,
  TokenObject, TokenRegex, TokenVariable)
- Remove dead Value marker interface
- Update all grammar method implementations to use isOfKind() instead
  of instanceof checks
- Update TokenFactory to create GenericToken instances
- Fix TokenBool to extend GenericToken
- Update tests to use GenericToken instead of deleted token classes
- All 261 tests pass with 947 assertions
- Change CallableInterface::call() signature from ?BaseToken to mixed
- Update CallableFunction to accept mixed token and parseParameter() return type
- Remove token wrapping in FunctionCallNode::resolveArguments()
- Remove token wrapping in MethodCallNode::resolveArguments()
- Pass raw object value to MethodRegistry::get() for non-object/non-regex types
- Update ObjectMethodCaller::call() to accept mixed params, remove getTokenValues()
- Update all 12 method and 2 function implementations to work with raw values
- Updated README.md to reference PSR-12 instead of PSR-2
- Updated .styleci.yml preset from psr2 to psr12
- Fixed 338 PSR-12 coding style violations across 109 PHP files
- Moved file-level docblocks before declare(strict_types=1) per PSR-12
- Added phpcs:ignore annotations for intentional snake_case test methods
- Added squizlabs/php_codesniffer as dev dependency
- All 261 tests continue to pass
- phpunit/phpunit: ^12.3 → ^13.1 (12.3.15 → 13.1.7)
- mockery/mockery: ^1.6 (1.6.12, already up to date)
- squizlabs/php_codesniffer: ^4.0 (4.0.1, already up to date)

All transitive dependencies updated accordingly.
@nicoSWD nicoSWD self-assigned this Apr 29, 2026
@nicoSWD nicoSWD merged commit 0bf4368 into master Apr 29, 2026
1 of 5 checks passed
@nicoSWD nicoSWD deleted the ast branch April 30, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant