feat(lexer): add unicode identifier support by ohah · Pull Request #9 · ohah/zts

ohah · 2026-03-18T12:19:12Z

Summary

unicode.zig: UTF-8 디코딩 + Unicode ID_Start/ID_Continue 판별
유니코드 식별자 (café, 변수, α 등) 스캔
\uXXXX, \u{XXXX} 이스케이프 시퀀스 처리
ASCII fast path 유지

Test plan

zig build test 통과 (102 tests)
11개 유니코드 테스트

🤖 Generated with Claude Code

New file: src/lexer/unicode.zig - decodeUtf8(): UTF-8 byte sequence → codepoint decoder - isIdentifierStart(): Unicode ID_Start + $ + _ - isIdentifierContinue(): Unicode ID_Continue + ZWNJ + ZWJ - Simplified Unicode range tables (Latin, Greek, Cyrillic, CJK, Hangul, etc.) Scanner changes: - scanIdentifierTail() now handles multi-byte UTF-8 identifiers - scanIdentifierEscape() for \uXXXX and \u{XXXX} in identifiers - Non-ASCII bytes in next() trigger unicode identifier path - \u escape at identifier start → escaped_keyword token - ASCII fast path preserved for performance Tests: 11 new tests (7 unicode.zig + 4 scanner unicode identifier) - UTF-8 decoding (1/2/3/4 byte) - Unicode identifier detection (Latin, CJK, Hangul, Greek, Cyrillic) - Scanner: café, 변수, α, mixed ASCII+CJK identifiers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ohah merged commit 6706e85 into main Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lexer): add unicode identifier support#9

feat(lexer): add unicode identifier support#9
ohah merged 1 commit intomainfrom
feature/lexer-unicode

ohah commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohah commented Mar 18, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant