security: block URI scheme bypasses + tree-sitter grammar v1.0 parity (v1.0.1) by rjmitchell · Pull Request #19 · rjmitchell/clear-notation

rjmitchell · 2026-04-10T22:46:50Z

Summary

Ships v1.0.1 with two main things plus a pile of hardening.

Security (CSO audit findings #10, #11 + adversarial review):

Block javascript: and data: URI schemes in link href and figure src via allowlist (http, https, mailto, tel), defaulting to #. Both Python and JS renderers.
Block protocol-relative URLs (//evil.com) — browsers treat these as absolute, so [text -> //evil.com] was a valid CLN document that slipped past the sanitizer as an open redirect.
Block percent-encoded bypasses (javascript%3a..., JAVASCRIPT%3A...) — browsers decode before routing, so literal-colon detection was bypassable. Both renderers now percent-decode before scheme extraction and handle malformed sequences safely.

Tree-sitter grammar v1.0 parity (closes the gap where the Python parser supported nested lists + multi-paragraph items but the grammar was still on v0.1):

External scanner with indent stack (12-level cap, serialized/deserialized across parses with corrupt-buffer guards).
scan_indent_tokens distinguishes nested list markers (INDENT), multi-paragraph continuations (LIST_CONTINUATION requires after_blank_line + indent > 0), and dedents (zero-width DEDENT). Bails during tree-sitter error recovery to avoid corrupting the indent stack.
New grammar rules: list_item_body, list_item_continuation, nested_list. Both unordered and ordered nesting work, including mixed nesting.
9 new tree-sitter corpus tests covering flat, nested (both types), multi-paragraph (both types), and list-followed-by-block regressions.

Editor internals (to consume the new CST shape):

clnComment block type for // comment lines.
Converter populates children[] from list_item_body — shared convertListItemBody helper between unordered and ordered.
Depth-aware serializer with content-column alignment for continuations (including double-digit ordered markers like 10. ).
Comment converter also strips trailing whitespace (round-trip fidelity fix).

Hardening / refactors:

Scanner uses TREE_SITTER_SERIALIZATION_BUFFER_SIZE instead of magic 1024, removes unused <string.h> include, collapses redundant deserialize guards.
Editor converter DRY'd (30 lines eliminated).
Stronger test assertions: positive href="#" checks, case-insensitive scheme coverage, converter unit tests for nested list paths.

Test Coverage

Python: 362 tests (was 349, +13 for URL security including 5 new bypass tests)
Editor (vitest): 336 tests (was 316, +20 for clnComment, nested list converter, depth-aware serializer)
clearnotation-js (vitest): 129 passed + 3 skipped (was 116, +13 for URL security including protocol-relative and percent-encoded bypass tests)
Tree-sitter corpus: 35 tests (was 26, +9 for list grammar)
Conformance fixtures: 70/70 still passing

AI-assessed coverage: ~95% (above 80% target). All critical paths have ★★★ tests (behavior + edge cases + error paths). The one structural gap (scanner corrupt-buffer branches) is defensive C code with no direct test framework — acceptable.

Pre-Landing Review

Two stages of review ran:

Structural pass: CLEAN. Two informational items (both non-blocking, both tracked):

Converter recursion has no explicit depth guard (scanner's MAX_STACK_DEPTH=12 already bounds input, and JS stack is ~10,000 frames — no realistic overflow).
Comment serializer with programmatically-set multi-line props.text would round-trip incorrectly, but this can't happen from a parsed CLN source (grammar enforces single-line comments).

Specialist review: 9 informational items found across testing + maintainability. All 7 mechanical items auto-fixed across 3 commits:

Added missing data: figure src test to JS, ordered list continuation converter test, multi-paragraph ordered corpus test
Extracted convertListItemBody helper (DRY, -30 lines)
Replaced magic 1024 with TREE_SITTER_SERIALIZATION_BUFFER_SIZE
Removed unused <string.h> include
Collapsed redundant deserialize branches

Skipped: 2 stylistic items (LIST_INDENT constant for a 2-char string, cross-language safeUrl reference comment).

Adversarial review: Found 2 CRITICAL bypasses in the new URL sanitizer (C-1 protocol-relative, C-2 percent-encoded). Both fixed in commit 2ad7306 with regression tests in both renderers. Plus 1 INFORMATIONAL scanner hardening fix (error-recovery bail) in commit 3969fbd.

Plan Completion

All 39 checkboxes in docs/superpowers/plans/2026-04-09-editor-v1-parity.md marked done. Every planned task delivered. Additional scope added during ship: adversarial-driven URL bypass fixes (C-1, C-2), scanner error-recovery guard, DRY refactors.

Scope Drift

CLEAN. Intent: editor v1.0 parity + security fix. Delivered: exactly that, plus ship-time hardening driven by adversarial review.

TODOS

Completed: "Editor v1.0 parity + URL security (v1.0.1)" entry updated in TODOS.md with the full shipped scope.

Test plan

All 362 Python tests pass
All 336 editor tests pass
All 129 clearnotation-js tests pass (+3 skipped)
All 35 tree-sitter corpus tests pass
All 70 conformance fixtures pass
Protocol-relative URL bypass blocked (verified: //evil.com → #)
Percent-encoded scheme bypass blocked (verified: javascript%3aalert(1) → #)
Flat + nested (both types) + multi-paragraph (both types) lists parse correctly
Editor round-trip: parse → convert → serialize produces identical CLN source

🤖 Generated with Claude Code

Specs: - JS renderer parity (completed this session) - Include-aware watch (completed this session) - Landing page redesign (completed this session) - Editor v1.0 parity: tree-sitter grammar upgrade, nested lists, comments, converter/serializer updates (ready for implementation) Plans: - JS renderer parity (completed) - Include-aware watch (completed) - Landing page redesign (completed) - Editor v1.0 parity (ready for next session, includes CSO security fix for javascript: URI XSS in renderers)

…ns and case-insensitive case

…items

…umn-0 blocks

…pport

…for editor v1.0 parity - Strip trailing whitespace from comment node text in block-converter (round-trip fix) - Add 3 unit tests for comment conversion (trailing newline, indented prefix, no newline) - Update CLAUDE.md test counts: 357 Python / 331 editor / 123 JS - Add Editor v1.0 parity entry to TODOS.md Completed section

…boxes done

…ulti-paragraph ordered corpus

…eserialize guards

Block //evil.com (protocol-relative) URLs which browsers treat as absolute, and decode percent-encoding before scheme extraction to catch javascript%3a... bypasses. Both Python and JS renderers updated; regression tests added.

Add guard at the top of scan_indent_tokens that detects the error-recovery state (all 5 external symbols valid simultaneously) and returns false immediately to prevent incorrect state mutation during recovery.

rjmitchell added 17 commits April 9, 2026 15:22

security: block javascript: and data: URIs in link href and figure src

2ef362c

test: strengthen URL scheme sanitization tests with positive assertio…

15ee821

…ns and case-insensitive case

feat: add indent stack to tree-sitter scanner for nested list support

d5c8d33

feat: tree-sitter grammar v1.0 with nested lists and multi-paragraph …

07dfec0

…items

fix: LIST_CONTINUATION only fires for indented continuations, not col…

ff39f65

…umn-0 blocks

feat: add clnComment block, update converter for nested lists

cee63a5

feat: depth-aware list serialization with nesting and continuation su…

d2c2d7e

…pport

docs: README test counts, CHANGELOG unreleased entry, mark plan check…

8b82bc7

…boxes done

test: unit tests for nested list converter paths

ef8e653

test: add missing data: figure src case, ordered list continuation, m…

33e6e1d

…ulti-paragraph ordered corpus

refactor: extract convertListItemBody helper for DRY

efcdc30

chore: scanner cleanups — buffer constant, unused include, collapse d…

ca92820

…eserialize guards

chore: scanner bails during tree-sitter error recovery

3969fbd

Add guard at the top of scan_indent_tokens that detects the error-recovery state (all 5 external symbols valid simultaneously) and returns false immediately to prevent incorrect state mutation during recovery.

chore: bump version and changelog (v1.0.1)

924d6e0

rjmitchell merged commit 8b34258 into main Apr 10, 2026
2 checks passed

rjmitchell deleted the feature/editor-v1-parity branch April 10, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: block URI scheme bypasses + tree-sitter grammar v1.0 parity (v1.0.1)#19

security: block URI scheme bypasses + tree-sitter grammar v1.0 parity (v1.0.1)#19
rjmitchell merged 17 commits intomainfrom
feature/editor-v1-parity

rjmitchell commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rjmitchell commented Apr 10, 2026

Summary

Test Coverage

Pre-Landing Review

Plan Completion

Scope Drift

TODOS

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant