feat(lexer): add base Scanner struct with operators, identifiers, whitespace#3
Merged
feat(lexer): add base Scanner struct with operators, identifiers, whitespace#3
Conversation
…tifiers Scanner core: - init() with BOM skip, deinit() for cleanup - next() main scan loop — tokenize one token per call - peek(), peekAt(), advance(), isAtEnd() — basic read helpers - tokenText() — current token source text - line_offsets table + getLineColumn() for lazy line/column calc Whitespace: - Space, tab, VT, FF skipping - Newline handling: \n, \r\n, \r, U+2028 (LS), U+2029 (PS) - U+00A0 (NBSP), U+FEFF (BOM/ZWNBSP) as whitespace - has_newline_before flag for ASI Operators (all compound forms): - Arithmetic: + - * / % ** ++ -- - Comparison: < > <= >= == != === !== - Bitwise: & | ^ ~ << >> >>> - Logical: && || ! - Assignment: = += -= *= /= %= **= &= |= ^= <<= >>= >>>= &&= ||= ??= - Nullish/Optional: ?? ?. - Arrow: => - Spread: ... Identifiers & Keywords: - ASCII identifier scan (unicode PR later) - Keyword lookup via StaticStringMap - Private identifier (#name) - Hashbang (#!) Literals (placeholder — detailed parsing in future PRs): - Numeric: basic digit scan - String: basic quote matching - Template: basic backtick matching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code reuse: - skipWhitespace 0xE2 branch delegates to handleNewline() directly (removes duplicate check) - BOM checks use std.mem.startsWith for readability Code quality: - Add 4GB source limit assert in init() (D015 u32 offset constraint) - line_offsets initial append uses @Panic instead of catch {} (OOM = unusable state) Tests added: - Empty string literals ('', "") - /= operator - \r alone as line terminator - Whitespace-only source - NBSP (U+00A0) whitespace skipping - All 16 assignment operators Backlog updated with 9 deferred optimization items from review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scannerstruct: 렉서의 핵심 구조체next(): 파서가 호출하는 메인 스캔 함수\n,\r\n,\r, U+2028, U+2029, NBSP, BOMDesign Decisions Applied
Placeholder (추후 PR)
Test plan
zig build test통과zig fmt --check src/통과🤖 Generated with Claude Code