Skip to content

Add regular expression support (RegExp constructor and /pattern/flags literals) #115

@frostney

Description

@frostney

Summary

GocciaScript currently has no regular expression support. This is documented in docs/language-restrictions.md under "Deferred Built-ins". String methods like replace, replaceAll, and split work with string patterns only.

What's needed

Lexer (Goccia.Lexer.pas)

  • Add gttRegex token type to Goccia.Token.pas
  • Context-sensitive scanning to distinguish /pattern/flags from the division operator /
    • After keywords, (, [, {, ,, ;, operators → regex
    • After identifiers, ), ], numbers, strings → division
  • Parse flags: i, g, m, s, u, y

Parser (Goccia.Parser.pas)

  • Add TGocciaRegexLiteralExpression to Goccia.AST.Expressions.pas
  • Emit regex literal node when gttRegex token is encountered

Runtime

  • New TGocciaRegExpValue in Goccia.Values.RegExpValue.pas:
    • Properties: source, flags, lastIndex, global, ignoreCase, multiline, dotAll, sticky
    • Methods: test(string), exec(string), toString()
  • New Goccia.Builtins.GlobalRegExp.pas for RegExp(pattern, flags?) constructor
  • Bytecode support in Souffle VM for regex literal opcode

String method updates (Goccia.Values.StringObjectValue.pas)

  • replace(regex|string, replacement|fn) — already supports callbacks, add regex dispatch
  • replaceAll(regex|string, replacement|fn) — same
  • split(regex|string) — add regex dispatch
  • New: match(regex) — return match array
  • New: matchAll(regex) — return iterator of matches
  • New: search(regex) — return index of first match

Implementation proposal

Use FreePascal's RegExpr unit (bundled with FPC, no external dependency). It provides PCRE-compatible regex with named groups, lookahead, and Unicode support.

  1. Phase 1 — Core runtime: TGocciaRegExpValue, RegExp constructor, test(), exec(). No literal syntax yet — construct via new RegExp("pattern", "flags").

  2. Phase 2 — Lexer integration: Context-sensitive /pattern/flags literal scanning. This is the trickiest part due to the division ambiguity.

  3. Phase 3 — String integration: Update replace, replaceAll, split to accept regex. Add match, matchAll, search.

  4. Phase 4 — Testing matcher: Add toMatch(regex) to the test API (ties into the Vitest-compatible testing API issue).

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions