Skip to content

feat: rowan-based lossless YAML CST parser (Phase 1 of unified parsing)#114

Merged
avrabe merged 1 commit intomainfrom
feat/rowan-yaml-cst
Apr 2, 2026
Merged

feat: rowan-based lossless YAML CST parser (Phase 1 of unified parsing)#114
avrabe merged 1 commit intomainfrom
feat/rowan-yaml-cst

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented Apr 2, 2026

Summary

Phase 1 of the unified parsing architecture: a rowan-based lossless YAML CST parser.

What it does

  • Parses rivet's YAML subset: block mappings, sequences, flow sequences, scalars (plain, single-quoted, double-quoted, block literal/folded), comments
  • Lossless round-trip: parse(source).text() == source for every file
  • Precise spans via rowan TextRange on every node
  • Error recovery: wraps unparseable spans in Error nodes, continues parsing
  • Utility: line_starts() + offset_to_line_col() for LSP position conversion

What it doesn't do (yet)

  • No integration with artifact extraction (Phase 2)
  • No schema-driven section detection (Phase 3)
  • No salsa integration (Phase 5)
  • Doesn't replace serde_yaml yet — standalone module

Design

  • Follows bazel.rs rowan pattern: SyntaxKind enum, YamlLanguage trait, hand-written lexer, recursive-descent parser with GreenNodeBuilder
  • Indent tracking via byte offsets in source text (not token positions)
  • 28 SyntaxKind variants (tokens + composite nodes + Error)

Tests: 18

Round-trip, nested structures, flow sequences, block scalars, comments, quoted strings, URLs with colons, colons in values, error recovery.

Refs #22.

🤖 Generated with Claude Code

Lossless, span-preserving YAML parser for rivet's YAML subset:
- SyntaxKind enum (28 variants: tokens + composite nodes + Error)
- Hand-written lexer: handles plain/quoted scalars, block scalars,
  flow sequences, comments, document markers
- Recursive-descent parser with indent tracking via byte offsets
- Error recovery: wraps unparseable spans in Error nodes
- Round-trip guarantee: parse(source).text() == source

18 tests: simple/nested mappings, sequences, flow sequences,
block scalars, comments, quoted strings, URLs with colons,
colons in values, document markers, error recovery.

Utility functions: line_starts(), offset_to_line_col() for
converting rowan TextRange to LSP line/column positions.

Phase 1 of the unified parsing architecture plan. No integration
with the rest of rivet yet — standalone module.
@avrabe avrabe merged commit 8321f8b into main Apr 2, 2026
11 of 13 checks passed
@avrabe avrabe deleted the feat/rowan-yaml-cst branch April 2, 2026 18:20
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Rivet Criterion Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 9070f02 Previous: 2c9fb62 Ratio
store_insert/10000 14934409 ns/iter (± 964832) 10743106 ns/iter (± 494627) 1.39
validate/10000 12547326 ns/iter (± 1382543) 9515609 ns/iter (± 605808) 1.32

This comment was automatically generated by workflow using github-action-benchmark.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 82.68072% with 115 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rivet-core/src/yaml_cst.rs 82.41% 115 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant