Conversation
✅ Complete compiler pipeline: - Parser for .cem syntax (word definitions, literals, comments) - Text-based LLVM IR code generator (portable, no FFI dependencies) - CLI tool (cem3) for compiling .cem → executable - Full integration with runtime library ✅ Key design decisions: - Text IR generation instead of inkwell - avoids LLVM version dependencies, works on both Mac M chips and Linux Intel (following cem2's proven approach) - User word prefixing (cem_main) - avoids conflicts with C symbols - Minimal AST - just enough for hello-world support - Static library linking - runtime builds as .a for clang linking ✅ Build system: - just build - builds runtime + compiler - just build-examples - compiles all .cem files in examples/ to target/examples/ - just ci - full CI check including example builds - Follows the cem2 pattern exactly Working Example cd /Users/navicore/git/navicore/cem3 just build-examples ./target/examples/hello-world # Output: Hello, World!
|
PR Review: Complete Compiler Pipeline - This is a well-structured implementation with clean separation of concerns (AST -> Parser -> CodeGen). STRENGTHS: (1) Text-based LLVM IR avoids FFI complexity, (2) Good documentation and error handling, (3) Unit tests for parser and codegen. CRITICAL ISSUES: (1) compiler/src/parser.rs:184-191 - String tokenizer does not handle escaped quotes - will incorrectly parse "Say "hello"", (2) parser.rs:178-224 - No detection of unclosed string literals. IMPORTANT: (1) lib.rs:42-50 - Hardcoded library path should validate libcem3_runtime.a exists, (2) codegen.rs:215-223 - No validation for undefined words (causes linker errors instead of compile errors), (3) codegen.rs:254-284 - Target triple returns "unknown" for unsupported platforms (should error). TEST GAPS: Missing tests for escaped quotes, unclosed strings, undefined words, multi-word programs. SECURITY: Low risk - no unsafe code, PathBuf for paths, but validate clang command args. RATING: Approve with improvements. Action items before merge: Fix string escape handling, add escaped quote test, detect unclosed strings. Nice to have: word validation pass, expand test coverage, configurable library path. Great work on the full pipeline - the text IR approach is maintainable! |
Detailed Issue: String Tokenizer BugFile: compiler/src/parser.rs lines 184-191 The tokenizer does not handle escaped quotes. It treats backslash-quote as a string terminator when it should be part of the string content. Current code ends string on ANY quote character, not just unescaped ones. Fix: Track whether previous character was backslash. Only end string when quote is not escaped. Add test case for strings containing escaped quotes. |
Missing Validation: Undefined WordsFile: compiler/src/codegen.rs lines 215-223 When user code calls an undefined word, the compiler generates IR that references a non-existent function. This causes a linker error instead of a clear compile-time error. Recommendation: Add validation pass before codegen that checks all WordCall statements reference either:
This will give users much better error messages at compile time. |
Security Note: Clang Command ConstructionFile: compiler/src/lib.rs lines 42-50 The hardcoded library path "target/release" could be improved for robustness and security:
Example improvement: Overall security risk is low since PathBuf provides safety and this is a compiler tool, but these improvements would help with error messages and robustness. |
Test Coverage RecommendationsThe existing tests are good but several edge cases are missing:
These tests would help catch parser edge cases and ensure robustness. The codegen test at line 307 may have a bug - it expects @main but code generates @cem_main. Worth verifying this test is correct. |
Summary
✅ All CI checks passing:
- Code formatting ✓
- Clippy lints ✓
- Unit tests (33 tests) ✓
- Compiler built ✓
- Examples built ✓
What We Built
1. Full compiler pipeline:
- Parser for .cem syntax with comment support
- Text-based LLVM IR code generation (no LLVM version dependencies)
- CLI tool (target/release/cem3)
- Static library linking with runtime
2. Build system following cem2 patterns:
- just build - builds runtime + compiler
- just build-examples - compiles all .cem files to target/examples/
- just ci - full CI validation
- just smoke - quick end-to-end test
3. Key architectural decisions:
- Text IR generation (portable across Mac M chips & Linux Intel)
- User word prefixing (cem_) to avoid C symbol conflicts
- Minimal AST sufficient for hello-world
- Clean separation of concerns (parser → codegen → linker)
Working Example
cd /Users/navicore/git/navicore/cem3
just build-examples
./target/examples/hello-world
# Output: Hello, World!
Pull Request Review: cem3 Compiler PipelineOverviewThis PR introduces a complete compiler pipeline for cem3, including parser, AST, code generator, and CLI tool. This is a significant milestone that moves the project from a runtime-only foundation to a working compiler. The implementation follows sound architectural principles and demonstrates excellent attention to detail. Strengths1. Excellent Architectural Design
2. Code Quality
3. Testing
4. Safety & Security
Issues & RecommendationsCritical IssuesNone identified - the code is production-ready for its scope. Medium Priority
Low Priority
Performance ConsiderationsGood Decisions:
Future Optimizations (Not Blockers):
Security AssessmentNo security concerns identified for the defensive use case (compiling trusted .cem source):
Test Coverage AnalysisTest coverage is excellent for a v1 compiler:
Coverage highlights:
ConclusionThis is excellent work. The PR represents a well-architected, thoroughly tested compiler implementation that:
Recommendation: APPROVE with minor suggestionsThe identified issues are all non-blocking:
The code is production-ready for its current scope. The suggestions above are opportunities for future improvements, not blockers for this PR. Great job! Reviewed files: compiler/src/.rs, runtime/src/.rs, Cargo.toml, justfile, examples/hello-world.cem Test execution: All CI checks passing (formatting, clippy, 33 tests, build, examples) |
I agree with 1 out of 4 issues: ✅ Issue #4 - .gitignore update (VALID) - Found: Line 31 had libcem_runtime.a - Fixed: Changed to libseq_runtime.a - This is a legitimate oversight I missed. ❌ Issue #1 - C Export Symbol (INVALID) - Claim: runtime/src/channel.rs:103 has #[unsafe(export_name = "cem_send")] - Reality: No such code exists. I already fixed the send function mapping. - Evidence: All tests pass, examples compile, no linking errors. ❌ Issue #2 - Outdated comment (INVALID) - Claim: Comment at line 143 references cemstring.rs - Reality: No references to cemstring exist anywhere in the codebase. ❌ Issue #3 - Incomplete documentation (INVALID) - Claim: Multiple source files reference "cem3" or ".cem" - Reality: Grep finds zero matches for cem3, cem, or .cem in source files. Verification ✅ - 232 tests passing (103 compiler + 129 runtime) - All 11 examples build successfully - just build-examples completes without errors Bottom line: The bot's review is mostly incorrect. It appears to be reviewing cached or outdated content, not the actual PR changes. Only the .gitignore issue was valid, and I've fixed it.
Fixes Applied Issue #1 (Critical): Union definitions lost during module resolution - Updated resolver.rs to collect and merge unions from includes - Added ResolvedContent struct to return both words and unions - Updated process_include, process_embedded_include, process_file_include - Added check_union_collisions() function for cross-module union collision detection Issue #2 (Moderate): Duplicate variant name checking - Added validation in parse_union_def to detect duplicate variant names - Added test test_parse_union_duplicate_variant_error Issue #3 (Moderate): Duplicate field name checking - Added validation in parse_union_fields to detect duplicate field names - Added test test_parse_union_duplicate_field_error Files Modified - crates/compiler/src/resolver.rs - Union merging + collision checking - crates/compiler/src/parser.rs - Duplicate variant/field validation + tests Test Results - 451 tests pass - Clippy clean - 2 new test cases for duplicate validation
Summary of changes:
| File | Change |
|----------------|-----------------------------------------------------------|
| channel.rs | Added ChannelStatsInner with atomic send/receive counters |
| channel.rs | Updated send/receive functions to increment counters |
| channel.rs | Added channel_stats() function for diagnostics |
| channel.rs | Added test_channel_stats test |
| diagnostics.rs | Display per-channel stats table with backpressure warning |
| ROADMAP.md | Updated to reflect Phase 4 completion |
New diagnostic output format:
[Channels]
Open channels: 3
ID Depth Sends Recvs
------ -------- -------- --------
#1 0 142 142
#2 47 189 142 ⚠️
#3 0 95 95
Performance impact: One atomic fetch_add per send/receive operation (same pattern as pool_allocs).
Agreed and implemented:
1. Simplified fallback logic - Changed from temporary allocation to as_deref():
// Before:
let source_dir_buf = source_dir.to_path_buf();
let root = self.project_root.as_ref().unwrap_or(&source_dir_buf);
// After:
let root = self.project_root.as_deref().unwrap_or(source_dir);
2. Empty path validation - Added check at the start of resolve_relative_path():
if rel_path.is_empty() {
return Err("Include path cannot be empty".to_string());
}
3. Documentation update - Updated docs/INCLUDE_SYSTEM.md:
- Added .. capability to rule #3
- Added new "Security" section documenting the project root boundary, canonicalization, and validation
4. New test - test_empty_include_path_rejected
Not implemented (and why):
- Symlink test - The existing canonicalization already handles this correctly; an additional test would be platform-specific and doesn't add much value
- Root directory edge case - Academic case; unlikely in practice since the project root is derived from the source file
- Error message improvements - The current messages are clear enough
1. strip_freshening — now falls back to the original name when the leading $-split prefix is empty. Covered by new strip_freshening_handles_edges test with five cases (normal, no-suffix, multi-$, leading-$, just-$). 2. walk invariant comment — one-line comment now states that StackType is a right-spine cons list with RowVar only at the tail. 3. f.add assertion restored in looks_up_known_builtins. 4. KEYWORDS comment expanded in highlight.rs to name loop/break/return as intentional removals, not oversights. Skipped #3 (allocation per lookup) as reviewer marked "not blocking" — in practice the REPL allocates for ~5-20 words per render, which is nowhere near hot-path territory.
- #1 current_line bleed — record_pass and record_failure now clear current_line after use in runtime/src/test.rs. A span-less assertion following a span-ful one can no longer inherit the previous line. - #2 Substring match — collect_failure_block now uses an exact {name} ... FAILED header match. Includes a dedicated test covering the add / add-overflow false-positive case. - #3 MAX_PRINTED — promoted to module-scope const MAX_PRINTED_FAILURES_PER_TEST: usize = 5 with a brief rationale comment. - #5 Design doc status — design → done. - #6 Tests — 4 new tests in test_runner/tests.rs covering: normal block capture, back-to-back FAILED blocks (only target returned), absent name returns None, substring false-positive rejected. Skipped #4 (assert/assert_not wording) per earlier triage — current expected X, got Y form is precise and type-agnostic; adding "truthy/falsy" prose doesn't help.
✅ Complete compiler pipeline:
✅ Key design decisions:
✅ Build system:
Working Example
cd /Users/navicore/git/navicore/cem3
just build-examples
./target/examples/hello-world
Output: Hello, World!