What We Built by navicore · Pull Request #3 · navicore/patch-seq

navicore · 2025-10-21T04:05:16Z

✅ Complete compiler pipeline:

Parser for .cem syntax (word definitions, literals, comments)
Text-based LLVM IR code generator (portable, no FFI dependencies)
CLI tool (cem3) for compiling .cem → executable
Full integration with runtime library

✅ Key design decisions:

Text IR generation instead of inkwell - avoids LLVM version dependencies, works on both Mac M chips and Linux Intel (following cem2's proven approach)
User word prefixing (cem_main) - avoids conflicts with C symbols
Minimal AST - just enough for hello-world support
Static library linking - runtime builds as .a for clang linking

✅ Build system:

just build - builds runtime + compiler
just build-examples - compiles all .cem files in examples/ to target/examples/
just ci - full CI check including example builds
Follows the cem2 pattern exactly

Working Example

cd /Users/navicore/git/navicore/cem3
just build-examples
./target/examples/hello-world

Output: Hello, World!

✅ Complete compiler pipeline: - Parser for .cem syntax (word definitions, literals, comments) - Text-based LLVM IR code generator (portable, no FFI dependencies) - CLI tool (cem3) for compiling .cem → executable - Full integration with runtime library ✅ Key design decisions: - Text IR generation instead of inkwell - avoids LLVM version dependencies, works on both Mac M chips and Linux Intel (following cem2's proven approach) - User word prefixing (cem_main) - avoids conflicts with C symbols - Minimal AST - just enough for hello-world support - Static library linking - runtime builds as .a for clang linking ✅ Build system: - just build - builds runtime + compiler - just build-examples - compiles all .cem files in examples/ to target/examples/ - just ci - full CI check including example builds - Follows the cem2 pattern exactly Working Example cd /Users/navicore/git/navicore/cem3 just build-examples ./target/examples/hello-world # Output: Hello, World!

claude · 2025-10-21T04:09:45Z

PR Review: Complete Compiler Pipeline - This is a well-structured implementation with clean separation of concerns (AST -> Parser -> CodeGen). STRENGTHS: (1) Text-based LLVM IR avoids FFI complexity, (2) Good documentation and error handling, (3) Unit tests for parser and codegen. CRITICAL ISSUES: (1) compiler/src/parser.rs:184-191 - String tokenizer does not handle escaped quotes - will incorrectly parse "Say "hello"", (2) parser.rs:178-224 - No detection of unclosed string literals. IMPORTANT: (1) lib.rs:42-50 - Hardcoded library path should validate libcem3_runtime.a exists, (2) codegen.rs:215-223 - No validation for undefined words (causes linker errors instead of compile errors), (3) codegen.rs:254-284 - Target triple returns "unknown" for unsupported platforms (should error). TEST GAPS: Missing tests for escaped quotes, unclosed strings, undefined words, multi-word programs. SECURITY: Low risk - no unsafe code, PathBuf for paths, but validate clang command args. RATING: Approve with improvements. Action items before merge: Fix string escape handling, add escaped quote test, detect unclosed strings. Nice to have: word validation pass, expand test coverage, configurable library path. Great work on the full pipeline - the text IR approach is maintainable!

claude · 2025-10-21T04:09:59Z

Detailed Issue: String Tokenizer Bug

File: compiler/src/parser.rs lines 184-191

The tokenizer does not handle escaped quotes. It treats backslash-quote as a string terminator when it should be part of the string content.

Current code ends string on ANY quote character, not just unescaped ones.

Fix: Track whether previous character was backslash. Only end string when quote is not escaped.

Add test case for strings containing escaped quotes.

claude · 2025-10-21T04:10:06Z

Missing Validation: Undefined Words

File: compiler/src/codegen.rs lines 215-223

When user code calls an undefined word, the compiler generates IR that references a non-existent function. This causes a linker error instead of a clear compile-time error.

Recommendation: Add validation pass before codegen that checks all WordCall statements reference either:

A defined user word in the program
A known built-in function (write_line, add, subtract, etc)

This will give users much better error messages at compile time.

claude · 2025-10-21T04:10:14Z

Security Note: Clang Command Construction

File: compiler/src/lib.rs lines 42-50

The hardcoded library path "target/release" could be improved for robustness and security:

Validate libcem3_runtime.a exists before calling clang
Use absolute path to .a file instead of -L and -l flags
Include the full command in error messages when clang fails

Example improvement:
Check Path::new("target/release/libcem3_runtime.a").exists() and return clear error if missing.

Overall security risk is low since PathBuf provides safety and this is a compiler tool, but these improvements would help with error messages and robustness.

claude · 2025-10-21T04:10:23Z

Test Coverage Recommendations

The existing tests are good but several edge cases are missing:

Escaped quotes in strings - Critical given the tokenizer bug
Unclosed string literals - Should error, not consume whole file
Unclosed word definitions - Missing semicolon
Multiple word definitions in one program
User word calling another user word (not just builtins)
Comments in various positions (between words, inside words, etc)

These tests would help catch parser edge cases and ensure robustness.

The codegen test at line 307 may have a bug - it expects @main but code generates @cem_main. Worth verifying this test is correct.

Summary ✅ All CI checks passing: - Code formatting ✓ - Clippy lints ✓ - Unit tests (33 tests) ✓ - Compiler built ✓ - Examples built ✓ What We Built 1. Full compiler pipeline: - Parser for .cem syntax with comment support - Text-based LLVM IR code generation (no LLVM version dependencies) - CLI tool (target/release/cem3) - Static library linking with runtime 2. Build system following cem2 patterns: - just build - builds runtime + compiler - just build-examples - compiles all .cem files to target/examples/ - just ci - full CI validation - just smoke - quick end-to-end test 3. Key architectural decisions: - Text IR generation (portable across Mac M chips & Linux Intel) - User word prefixing (cem_) to avoid C symbol conflicts - Minimal AST sufficient for hello-world - Clean separation of concerns (parser → codegen → linker) Working Example cd /Users/navicore/git/navicore/cem3 just build-examples ./target/examples/hello-world # Output: Hello, World!

claude · 2025-10-21T04:20:06Z

Pull Request Review: cem3 Compiler Pipeline

Overview

This PR introduces a complete compiler pipeline for cem3, including parser, AST, code generator, and CLI tool. This is a significant milestone that moves the project from a runtime-only foundation to a working compiler. The implementation follows sound architectural principles and demonstrates excellent attention to detail.

Strengths

1. Excellent Architectural Design

Clean separation of concerns: The Value/StackNode separation is architecturally sound and directly addresses the fundamental flaw in cem2
Text-based IR generation: Smart choice to avoid LLVM version dependencies and FFI complexity
User word prefixing: The cem_ prefix prevents symbol conflicts with C runtime - well thought out
Minimal AST: Appropriately scoped for current feature set, avoiding premature complexity

2. Code Quality

Comprehensive documentation: Excellent module-level docs explaining design decisions
Good error handling: Parser provides helpful error messages with context
Well-structured: Clear module organization with appropriate visibility

3. Testing

33 unit tests covering critical functionality
Property-based thinking: Tests like test_critical_shuffle_pattern demonstrate deep understanding of correctness
Edge case coverage: Division overflow, Unicode strings, overflow wrapping

4. Safety & Security

Clear safety contracts: Well-documented unsafe boundaries with detailed safety invariants
Exit code validation: Properly validates Unix exit code range
Defensive assertions: Comprehensive stack depth checks prevent undefined behavior
Wrapping arithmetic: Explicitly chosen for predictable overflow behavior

Issues & Recommendations

Critical Issues

None identified - the code is production-ready for its scope.

Medium Priority

Parser String Handling (parser.rs:99-104): String literal parsing doesn't handle escape sequences. Strings like "Hello\nWorld" won't work correctly. Recommend adding escape sequence handling or documenting the limitation.
Hardcoded Library Path (lib.rs:46): The library path target/release is hardcoded, which may fail in debug builds. Recommend using environment variables or build.rs to locate the library dynamically.
Error Recovery in Parser: The parser fails immediately on first error. For better developer experience, consider collecting multiple errors.

Low Priority

LLVM IR String Escaping (codegen.rs:50-70): The escape function converts char to string to bytes, which is inefficient for multi-byte UTF-8. Consider using encode_utf8 directly.
Missing Drop Implementation (value.rs:48-49): Comment mentions future Drop implementation. Document the memory management strategy for Quotation variant.
Test Coverage Gaps: Some scenarios lack tests: parser error cases, multiple word definitions, string constant deduplication.

Performance Considerations

Good Decisions:

String constant deduplication prevents duplicate globals
Static library linking reduces runtime overhead
Text IR generation is fast enough for most use cases

Future Optimizations (Not Blockers):

String interning for repeated literals
Streaming parser for large files
Arena allocation for stack nodes

Security Assessment

No security concerns identified for the defensive use case (compiling trusted .cem source):

No unsafe memory operations exposed to user code
Proper bounds checking on all stack operations
No arbitrary code execution vulnerabilities
Exit codes properly validated
String handling validates UTF-8

Test Coverage Analysis

Test coverage is excellent for a v1 compiler:

Runtime: 33 tests covering stack ops, arithmetic, I/O, and critical shuffle patterns
Compiler: 5 parser tests, 3 codegen tests
Integration: hello-world example builds and runs

Coverage highlights:

Critical bug from cem2 explicitly tested and prevented
Edge cases: overflow, division by zero prevention, empty strings, Unicode
Property-based thinking: arbitrary depth operations, composition completeness

Conclusion

This is excellent work. The PR represents a well-architected, thoroughly tested compiler implementation that:

Solves the fundamental design flaw from cem2
Follows Rust best practices
Has excellent test coverage for a v1
Makes pragmatic engineering decisions (text IR, minimal AST)
Sets a solid foundation for future development

Recommendation: APPROVE with minor suggestions

The identified issues are all non-blocking:

String escape sequences can be added incrementally
Hardcoded paths work for current use case
Missing tests are for edge cases, not core functionality

The code is production-ready for its current scope. The suggestions above are opportunities for future improvements, not blockers for this PR.

Great job!

Reviewed files: compiler/src/.rs, runtime/src/.rs, Cargo.toml, justfile, examples/hello-world.cem

Test execution: All CI checks passing (formatting, clippy, 33 tests, build, examples)

I agree with 1 out of 4 issues: ✅ Issue #4 - .gitignore update (VALID) - Found: Line 31 had libcem_runtime.a - Fixed: Changed to libseq_runtime.a - This is a legitimate oversight I missed. ❌ Issue #1 - C Export Symbol (INVALID) - Claim: runtime/src/channel.rs:103 has #[unsafe(export_name = "cem_send")] - Reality: No such code exists. I already fixed the send function mapping. - Evidence: All tests pass, examples compile, no linking errors. ❌ Issue #2 - Outdated comment (INVALID) - Claim: Comment at line 143 references cemstring.rs - Reality: No references to cemstring exist anywhere in the codebase. ❌ Issue #3 - Incomplete documentation (INVALID) - Claim: Multiple source files reference "cem3" or ".cem" - Reality: Grep finds zero matches for cem3, cem, or .cem in source files. Verification ✅ - 232 tests passing (103 compiler + 129 runtime) - All 11 examples build successfully - just build-examples completes without errors Bottom line: The bot's review is mostly incorrect. It appears to be reviewing cached or outdated content, not the actual PR changes. Only the .gitignore issue was valid, and I've fixed it.

Fixes Applied Issue #1 (Critical): Union definitions lost during module resolution - Updated resolver.rs to collect and merge unions from includes - Added ResolvedContent struct to return both words and unions - Updated process_include, process_embedded_include, process_file_include - Added check_union_collisions() function for cross-module union collision detection Issue #2 (Moderate): Duplicate variant name checking - Added validation in parse_union_def to detect duplicate variant names - Added test test_parse_union_duplicate_variant_error Issue #3 (Moderate): Duplicate field name checking - Added validation in parse_union_fields to detect duplicate field names - Added test test_parse_union_duplicate_field_error Files Modified - crates/compiler/src/resolver.rs - Union merging + collision checking - crates/compiler/src/parser.rs - Duplicate variant/field validation + tests Test Results - 451 tests pass - Clippy clean - 2 new test cases for duplicate validation

Summary of changes: | File | Change | |----------------|-----------------------------------------------------------| | channel.rs | Added ChannelStatsInner with atomic send/receive counters | | channel.rs | Updated send/receive functions to increment counters | | channel.rs | Added channel_stats() function for diagnostics | | channel.rs | Added test_channel_stats test | | diagnostics.rs | Display per-channel stats table with backpressure warning | | ROADMAP.md | Updated to reflect Phase 4 completion | New diagnostic output format: [Channels] Open channels: 3 ID Depth Sends Recvs ------ -------- -------- -------- #1 0 142 142 #2 47 189 142 ⚠️ #3 0 95 95 Performance impact: One atomic fetch_add per send/receive operation (same pattern as pool_allocs).

Agreed and implemented: 1. Simplified fallback logic - Changed from temporary allocation to as_deref(): // Before: let source_dir_buf = source_dir.to_path_buf(); let root = self.project_root.as_ref().unwrap_or(&source_dir_buf); // After: let root = self.project_root.as_deref().unwrap_or(source_dir); 2. Empty path validation - Added check at the start of resolve_relative_path(): if rel_path.is_empty() { return Err("Include path cannot be empty".to_string()); } 3. Documentation update - Updated docs/INCLUDE_SYSTEM.md: - Added .. capability to rule #3 - Added new "Security" section documenting the project root boundary, canonicalization, and validation 4. New test - test_empty_include_path_rejected Not implemented (and why): - Symlink test - The existing canonicalization already handles this correctly; an additional test would be platform-specific and doesn't add much value - Root directory edge case - Academic case; unlikely in practice since the project root is derived from the source file - Error message improvements - The current messages are clear enough

1. strip_freshening — now falls back to the original name when the leading $-split prefix is empty. Covered by new strip_freshening_handles_edges test with five cases (normal, no-suffix, multi-$, leading-$, just-$). 2. walk invariant comment — one-line comment now states that StackType is a right-spine cons list with RowVar only at the tail. 3. f.add assertion restored in looks_up_known_builtins. 4. KEYWORDS comment expanded in highlight.rs to name loop/break/return as intentional removals, not oversights. Skipped #3 (allocation per lookup) as reviewer marked "not blocking" — in practice the REPL allocates for ~5-20 words per render, which is nowhere near hot-path territory.

- #1 current_line bleed — record_pass and record_failure now clear current_line after use in runtime/src/test.rs. A span-less assertion following a span-ful one can no longer inherit the previous line. - #2 Substring match — collect_failure_block now uses an exact {name} ... FAILED header match. Includes a dedicated test covering the add / add-overflow false-positive case. - #3 MAX_PRINTED — promoted to module-scope const MAX_PRINTED_FAILURES_PER_TEST: usize = 5 with a brief rationale comment. - #5 Design doc status — design → done. - #6 Tests — 4 new tests in test_runner/tests.rs covering: normal block capture, back-to-back FAILED blocks (only target returned), absent name returns None, substring false-positive rejected. Skipped #4 (assert/assert_not wording) per earlier triage — current expected X, got Y form is precise and type-agnostic; adding "truthy/falsy" prose doesn't help.

navicore merged commit c24e413 into main Oct 21, 2025
3 checks passed

navicore deleted the compiler branch October 21, 2025 04:39

This was referenced Nov 28, 2025

lsp server #30

Merged

Compiler extension #41

Merged

⏺ All 14 integration tests pass. The git status shows the work is on … #54

Merged

claude Bot mentioned this pull request Dec 7, 2025

Adt #62

Merged

This was referenced Dec 8, 2025

Clang Check #64

Merged

Phase 1 of the linter is complete! Here's a summary: #67

Merged

ffi #76

Merged

os phase 3 #95

Merged

claude Bot mentioned this pull request Jan 2, 2026

⏺ Phase 2c is complete. Here's a summary: #162

Merged

This was referenced Jan 12, 2026

⏺ All CI checks pass. Issue #246 (Base64 and Hex encoding) is now com… #253

Merged

⏺ All 21 crypto unit tests pass. #260

Merged

claude Bot mentioned this pull request Jan 21, 2026

⏺ All CI passes with 326 tests (8 new signal tests). The signal handl… #291

Merged

navicore mentioned this pull request Jan 26, 2026

Audit: seq-compiler parser & AST (parser.rs, ast.rs, types.rs) #310

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What We Built#3

What We Built#3
navicore merged 2 commits intomainfrom
compiler

navicore commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

navicore commented Oct 21, 2025

Output: Hello, World!

Uh oh!

claude Bot commented Oct 21, 2025

Uh oh!

claude Bot commented Oct 21, 2025

Detailed Issue: String Tokenizer Bug

Uh oh!

claude Bot commented Oct 21, 2025

Missing Validation: Undefined Words

Uh oh!

claude Bot commented Oct 21, 2025

Security Note: Clang Command Construction

Uh oh!

claude Bot commented Oct 21, 2025

Test Coverage Recommendations

Uh oh!

claude Bot commented Oct 21, 2025

Pull Request Review: cem3 Compiler Pipeline

Overview

Strengths

1. Excellent Architectural Design

2. Code Quality

3. Testing

4. Safety & Security

Issues & Recommendations

Critical Issues

Medium Priority

Low Priority

Performance Considerations

Security Assessment

Test Coverage Analysis

Conclusion

Recommendation: APPROVE with minor suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant