feat(lexer): add base Scanner struct with operators, identifiers, whitespace by ohah · Pull Request #3 · ohah/zts

ohah · 2026-03-18T11:18:46Z

Summary

Scanner struct: 렉서의 핵심 구조체
next(): 파서가 호출하는 메인 스캔 함수
공백/줄바꿈 처리: \n, \r\n, \r, U+2028, U+2029, NBSP, BOM
모든 연산자/구두점 토큰 (51개 복합 형태 포함)
ASCII 식별자 + 키워드 매핑
Hashbang, private identifier
line_offsets 테이블 + getLineColumn() lazy 계산
리터럴은 placeholder (추후 PR에서 세부 구현)

Design Decisions Applied

D015: start+end byte offset
D019: BOM, 줄 끝 문자 전부 인식
D035: UTF-8 기본
D036: 파서가 렉서 호출

Placeholder (추후 PR)

숫자 리터럴 세부 파싱 (hex, octal, binary, bigint, separator)
문자열 리터럴 escape sequence
템플릿 리터럴 ${} interpolation
주석 처리 (// /* */)
유니코드 식별자
SIMD 최적화

Test plan

🤖 Generated with Claude Code

…tifiers Scanner core: - init() with BOM skip, deinit() for cleanup - next() main scan loop — tokenize one token per call - peek(), peekAt(), advance(), isAtEnd() — basic read helpers - tokenText() — current token source text - line_offsets table + getLineColumn() for lazy line/column calc Whitespace: - Space, tab, VT, FF skipping - Newline handling: \n, \r\n, \r, U+2028 (LS), U+2029 (PS) - U+00A0 (NBSP), U+FEFF (BOM/ZWNBSP) as whitespace - has_newline_before flag for ASI Operators (all compound forms): - Arithmetic: + - * / % ** ++ -- - Comparison: < > <= >= == != === !== - Bitwise: & | ^ ~ << >> >>> - Logical: && || ! - Assignment: = += -= *= /= %= **= &= |= ^= <<= >>= >>>= &&= ||= ??= - Nullish/Optional: ?? ?. - Arrow: => - Spread: ... Identifiers & Keywords: - ASCII identifier scan (unicode PR later) - Keyword lookup via StaticStringMap - Private identifier (#name) - Hashbang (#!) Literals (placeholder — detailed parsing in future PRs): - Numeric: basic digit scan - String: basic quote matching - Template: basic backtick matching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Code reuse: - skipWhitespace 0xE2 branch delegates to handleNewline() directly (removes duplicate check) - BOM checks use std.mem.startsWith for readability Code quality: - Add 4GB source limit assert in init() (D015 u32 offset constraint) - line_offsets initial append uses @Panic instead of catch {} (OOM = unusable state) Tests added: - Empty string literals ('', "") - /= operator - \r alone as line terminator - Whitespace-only source - NBSP (U+00A0) whitespace skipping - All 16 assignment operators Backlog updated with 9 deferred optimization items from review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ohah and others added 2 commits March 18, 2026 20:18

ohah merged commit 4ce02a0 into main Mar 18, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lexer): add base Scanner struct with operators, identifiers, whitespace#3

feat(lexer): add base Scanner struct with operators, identifiers, whitespace#3
ohah merged 2 commits intomainfrom
feature/lexer-scanner-base

ohah commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohah commented Mar 18, 2026

Summary

Design Decisions Applied

Placeholder (추후 PR)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant