Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

Compiler extension#41

Merged
navicore merged 5 commits intomainfrom
compiler-extension
Nov 30, 2025
Merged

Compiler extension#41
navicore merged 5 commits intomainfrom
compiler-extension

Conversation

@navicore
Copy link
Copy Markdown
Owner

No description provided.

New files:
- compiler/src/config.rs - Configuration types for compiler extensibility

Key types:

// Define external builtins
let builtin = ExternalBuiltin::new("journal-append", "seq_actors_journal_append");

// Configure the compiler
let config = CompilerConfig::new()
    .with_builtin(builtin)
    .with_library_path("/path/to/actor/lib")
    .with_library("seq_actors");

// Compile with config
compile_file_with_config(source, output, false, &config)?;

What it enables:

1. External builtins - seq-actors can define journal-append, actor-send, etc. that Seq code can call
2. Custom linking - seq-actors can link its own runtime library alongside seq-runtime
3. Clean separation - patch-seq has no actor knowledge; seq-actors owns actor semantics entirely

API surface:
- ExternalBuiltin::new(seq_name, symbol) - Define an external function
- CompilerConfig::new().with_builtin(...) - Builder pattern
- compile_file_with_config() / compile_to_ir_with_config() - Config-aware compilation

Tests added:
- External builtin declaration in IR
- External builtin calls in generated code
- Multiple external builtins
- Library path configuration

The seq-actors project can now embed patch-seq and extend it without modifying the core.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 30, 2025

Pull Request Review: Compiler Extension

Summary

This PR adds extensibility to the Seq compiler by introducing a CompilerConfig system that allows external projects to register custom builtins without modifying core compiler code. This is a well-designed extension mechanism that maintains backward compatibility.

✅ Strengths

Architecture & Design

  • Clean API Design: The builder pattern for CompilerConfig is intuitive and follows Rust conventions
  • Backward Compatibility: All existing APIs are preserved via default implementations
  • Separation of Concerns: New config.rs module cleanly encapsulates configuration logic
  • Type Safety: Strong typing throughout with clear ownership semantics

Code Quality

  • Good Documentation: Module-level docs with usage examples in config.rs:1-25 and lib.rs:3-16
  • Consistent Naming: ExternalBuiltin clearly indicates purpose, seq_name vs symbol distinction is clear
  • Error Handling: Proper propagation through existing error handling mechanisms

Test Coverage

Excellent test coverage with 4 new tests:

  • test_external_builtins_declared - Basic functionality
  • test_multiple_external_builtins - Multiple builtins
  • test_external_builtins_with_library_paths - Library path configuration
  • Tests in config.rs for builder pattern and helper methods

🔍 Issues & Recommendations

1. Missing TypeChecker Integration ⚠️ IMPORTANT

The PR adds external builtins to code generation and AST validation, but does not register them with the TypeChecker. This could cause type checking failures for valid code.

Problem: In lib.rs:93 and lib.rs:170, external builtins are passed to validate_word_calls_with_externals() but not to the TypeChecker.

Impact: External builtins will fail type checking unless they're already in the builtin list, defeating the purpose of the extension.

Recommendation: Add TypeChecker registration in lib.rs:

// After line 91/168 where TypeChecker is created
let external_effects: Vec<(&str, Option<&Effect>)> = config
    .external_builtins
    .iter()
    .map(|b| (b.seq_name.as_str(), b.effect.as_ref()))
    .collect();
type_checker.register_external_words(&external_effects);

The TypeChecker already has register_external_words() (see typechecker.rs:40-53) but it's not being called.

2. Library Path Flexibility

Currently library paths and libraries are stored as Vec<String> but there's no way to specify per-builtin linking requirements. This might be fine for initial use cases but could limit future extensibility.

Recommendation: Document this limitation or consider per-builtin library specifications in future iterations.

3. Effect Type System Gap

ExternalBuiltin::effect is Option<Effect> but there's no guidance on when to use None vs providing an actual effect.

Recommendation: Add documentation to config.rs:41-44 explaining:

  • Use None for maximally polymorphic builtins (TypeChecker will infer ( ..a -- ..b ))
  • Provide Some(effect) when the stack signature is known for better type checking

4. Symbol Naming Convention

No validation or documentation about symbol naming conventions (e.g., should they follow seq_actors_* pattern?).

Recommendation: Add a doc comment in ExternalBuiltin about C ABI compatibility and naming conventions to avoid linker conflicts.

🔒 Security Considerations

No security issues identified

  • No unsafe code introduced
  • No user input directly used in code generation
  • Symbol names are controlled by the API consumer (library developers, not end users)
  • LLVM IR generation properly escapes/validates identifiers

⚡ Performance Considerations

Minimal performance impact

  • HashMap lookups in codegen.rs:844 are O(1)
  • External builtins are built once at the start of compilation
  • No changes to hot paths

📋 Minor Suggestions

  1. Line 77: external_builtins: HashMap<String, String> - Consider using &str keys with lifetimes or Cow<str> to avoid unnecessary allocations, though current approach is fine for typical use cases.

  2. Line 189-192: The .iter().map().collect() pattern could use FromIterator more idiomatically, but current code is clear.

  3. Test at line 1158: Consider adding a test that verifies the complete flow including type checking and linking (integration test).

🎯 Overall Assessment

Recommendation: Approve with TypeChecker integration fix

This is a well-thought-out PR that adds valuable extensibility to the compiler. The code quality is high, test coverage is good, and the design is sound. The one critical issue is the missing TypeChecker integration, which should be addressed before merge.

The PR successfully:

  • ✅ Maintains backward compatibility
  • ✅ Provides clear extension points
  • ✅ Includes comprehensive tests
  • ✅ Has good documentation
  • ⚠️ Needs TypeChecker integration to be complete

Suggested next steps:

  1. Add TypeChecker registration for external builtins
  2. Consider adding an integration test that uses external builtins end-to-end
  3. Document symbol naming conventions

Great work on this feature! The design pattern you've established here will make it much easier for projects to extend Seq without forking the compiler.

Issue #1: Missing TypeChecker Integration ✅ Fixed

Added registration of external builtins with the TypeChecker in both:
- compile_file_with_config() (lib.rs:98-106)
- compile_to_ir_with_config() (lib.rs:185-193)

New tests added:
- test_external_builtin_full_pipeline - Verifies the complete pipeline works (parser → AST validation → TypeChecker → codegen)
- test_external_builtin_without_config_fails - Verifies unknown builtins are properly rejected

All 7 external builtin tests pass. The other suggestions from the review are documentation/style improvements that could be addressed separately if
desired.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 30, 2025

Pull Request Review: Compiler Extension

Summary

This PR adds a compiler configuration system that enables external projects to extend the Seq compiler with custom builtins without modifying core compiler code. This is a well-designed extensibility feature that maintains clean separation of concerns.

Code Quality ✅

Excellent design patterns:

  • Builder pattern: CompilerConfig uses a fluent builder API (with_builtin(), with_library()) that is idiomatic and ergonomic
  • Separation of concerns: New config.rs module cleanly separates configuration from code generation logic
  • Backward compatibility: Default config maintains existing behavior via CompilerConfig::default()
  • Consistent naming: external_builtins terminology used consistently throughout

Code organization:

  • Well-structured with clear module boundaries
  • Comprehensive inline documentation with examples
  • Type-safe design using strongly-typed structs rather than stringly-typed APIs

Potential Issues and Recommendations

1. Missing validation in validate_word_calls_with_externals (⚠️)

The hardcoded builtin list in compiler/src/ast.rs:158-303 needs to be kept in sync with:

  • The pattern matching in compiler/src/codegen.rs:809-852
  • The signatures in compiler/src/builtins.rs

Recommendation: Consider extracting the builtin list to a single source of truth, possibly in builtins.rs. This would eliminate the maintenance burden and prevent drift.

2. Type checker external word registration design (🤔)

In lib.rs:96-103, external builtins are registered with the type checker but the conversion to Vec is verbose. Consider adding a helper method to CompilerConfig to simplify this.

3. Clang command construction (⚠️)

In lib.rs:130-149, library paths and libraries are added to clang without validation:

Concerns:

  • No validation that paths exist
  • No sanitization of library names
  • Could lead to confusing linker errors

Recommendation: Add basic validation for library paths and sanitize library names to prevent path injection.

Test Coverage ✅

Excellent test coverage:

  • Unit tests for ExternalBuiltin creation
  • Builder pattern tests
  • Integration tests through full compilation pipeline
  • Negative tests (e.g., test_external_builtin_without_config_fails)
  • Tests verify both IR generation and actual compilation

Test quality highlights:

  • Tests cover the happy path AND error cases
  • Good use of assertions with descriptive messages
  • Integration test validates the full pipeline

Performance Considerations ✅

  • Minimal overhead: HashMap lookup for external builtins is O(1)
  • Configuration is passed by reference to avoid cloning
  • No runtime performance impact for users not using external builtins
  • Compilation pipeline remains efficient

Security Concerns (⚠️)

1. Symbol injection vulnerability

The symbol field in ExternalBuiltin is used directly in generated LLVM IR without validation (compiler/src/codegen.rs:395).

Risk: Malicious symbol names could inject arbitrary LLVM IR directives.

Recommendation: Add validation in ExternalBuiltin::new() to ensure symbols only contain alphanumeric characters, underscores, or periods.

2. Library path injection

While less critical since this is a compiler (not a runtime), untrusted library paths could potentially be abused. The validation suggested in issue #3 above would help mitigate this.

Verdict

This is a well-architected PR that adds valuable extensibility without compromising the existing codebase. The implementation is clean, well-tested, and maintains backward compatibility.

Required before merge:

  1. ⚠️ Add validation for symbol names to prevent LLVM IR injection

Recommended improvements:

  1. Extract builtin list to single source of truth in builtins.rs
  2. Add library path validation
  3. Add helper method for external_effects() conversion

Nice to have:

  1. Make documentation examples testable or properly annotated
  2. Consider adding a validation method to CompilerConfig that runs all validations eagerly

Overall assessment: Approve with minor changes (address security concern #1)

Great work on this extensibility feature! 🎉

…views:

From Review #1:
- ✅ TypeChecker integration - Added registration of external builtins with TypeChecker in both compile functions

From Review #2:
- ✅ Symbol validation - Added validate_symbol() to prevent LLVM IR injection
  - Validates symbols contain only alphanumeric, underscores, periods
  - Panics with clear message on invalid symbols
  - 4 new tests for validation (valid symbols, rejects hyphen, rejects @, rejects empty)

Test count:
- 11 external builtin related tests now pass
- Full test suite passes
- hello-world sanity check passes

Ready to merge.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 30, 2025

Pull Request Review: Compiler Extension

This PR adds a well-designed compiler configuration system that allows external projects to extend the Seq compiler with additional builtins. The implementation is thoughtful and follows good practices. Here's my detailed review:


✅ Strengths

1. Clean API Design

The CompilerConfig and ExternalBuiltin types provide an intuitive builder pattern that makes the extension mechanism easy to use:

CompilerConfig::new()
    .with_builtin(ExternalBuiltin::new("my-op", "runtime_my_op"))
    .with_library_path("/custom/lib")

2. Security - Symbol Validation

The validate_symbol function in config.rs:49-64 is excellent security practice. It prevents LLVM IR injection by restricting symbols to alphanumeric, underscores, and periods only. This is critical for a compiler extension mechanism.

3. Comprehensive Testing

The PR includes 7 new tests covering:

  • Single and multiple external builtins
  • Full compilation pipeline integration
  • Symbol validation (including edge cases like empty strings)
  • Negative tests for undefined builtins

4. Backward Compatibility

The changes maintain full backward compatibility by:

  • Keeping existing compile_file and compile_to_ir functions
  • Adding new *_with_config variants
  • Using CompilerConfig::default() for existing code paths

5. Consistent Integration

The external builtins are properly integrated across all compilation stages:

  • AST validation (validate_word_calls_with_externals)
  • Type checking (register_external_words)
  • Code generation (declarations + calls)
  • Linking (library paths and libraries)

🔍 Code Quality Observations

Documentation

Excellent: The new config.rs module has clear module-level documentation with examples. The public API is well-documented.

Suggestion: Consider adding a doc comment to lib.rs explaining when users should prefer compile_file_with_config vs compile_file.

Error Handling

The implementation properly propagates errors throughout the pipeline. The validate_symbol function provides clear error messages with the invalid character shown.

Type Safety

The integration with the type checker allows external builtins to optionally specify their stack effects. This maintains the language's static typing guarantees while providing flexibility for builtins with complex/unknown effects.


⚠️ Potential Issues

1. Thread Safety Consideration (Minor)

compiler/src/codegen.rs:77: The external_builtins HashMap is populated per-CodeGen instance, which is fine for current usage. However, if CodeGen instances are ever shared across threads, this should be noted in documentation.

Current code is safe - just a consideration for future maintainers.

2. Type Checker Placeholder Effect (Medium)

compiler/src/typechecker.rs:46-50: External builtins without explicit effects get a maximally polymorphic placeholder ( ..ext_in -- ..ext_out ).

Potential issue: This could hide type errors. For example:

: main ( -- Int )
  "string" external-op-expecting-int  ; Should this type-check?
  0
;

Recommendation: Consider one of:

  • Require all external builtins to specify their effect
  • Add a compiler flag for strict vs permissive mode
  • Document this behavior prominently in ExternalBuiltin docs

3. Symbol Name Collision (Low Risk)

There's no check for collisions between:

  • External builtin symbols and built-in symbols
  • External builtin seq_names and built-in seq_names
  • Multiple external builtins with the same seq_name

Example: What happens if someone does:

config.with_builtin(ExternalBuiltin::new("dup", "my_dup"))

Current behavior: The external builtin would shadow the built-in (based on codegen.rs:844 check order).

Recommendation: Add validation in CompilerConfig::with_builtin to detect and warn/error on:

  • seq_name collisions with built-ins
  • Duplicate seq_names within the config

4. Memory Implications (Documentation)

compiler/src/lib.rs:129-133: The clang command is built with custom library paths and libraries from config.

Minor concern: No validation that library paths exist or libraries are available until link time. This is probably acceptable (fail fast at link), but worth documenting.


🔐 Security Review

✅ Strong Points

  1. Symbol validation prevents IR injection - Critical security feature
  2. No arbitrary code execution - Only declares functions, doesn't execute config code
  3. Symbol validation panics in new() - Forces validation at config time, not compile time

Recommendations

  1. Consider validating seq_name as well (currently only symbol is validated)
  2. Document the security implications of providing external libraries
  3. Consider adding a size limit on external_builtins vector (DOS prevention)

🚀 Performance Considerations

Positive

  • HashMap lookups for external builtins are O(1)
  • Building the map once during codegen initialization is efficient
  • No performance impact when config is empty (default path)

No Concerns

The performance impact of this feature appears minimal.


📊 Test Coverage Assessment

Strong coverage for happy paths and basic error cases.

Missing test cases:

  1. ✅ External builtin with explicit stack effect (use ExternalBuiltin::with_effect)
  2. ✅ Type error when using external builtin incorrectly
  3. ❌ Collision between external builtin and built-in name
  4. ❌ Multiple configs merged/chained
  5. ❌ Empty config (though default() path is tested indirectly)

Recommendation: Add tests for collision detection once that's implemented.


📝 Documentation & Examples

What's Good

  • Clear module-level docs in config.rs
  • Good example in lib.rs documentation
  • Tests serve as additional examples

Suggestions

  1. Add a usage example to README.md showing the extension mechanism
  2. Document the linking requirements (users need compatible C ABI functions)
  3. Add a troubleshooting section for common linking errors

🎯 Recommendations Summary

Must Address (Before Merge)

  1. Type safety: Document or improve the behavior of external builtins without explicit effects

Should Address (Before or Soon After Merge)

  1. Collision detection: Validate seq_names don't conflict with built-ins or duplicates
  2. Documentation: Add README section on compiler extension usage
  3. Tests: Add test cases for external builtins with explicit stack effects

Nice to Have (Future Work)

  1. Validate seq_name characters (similar to symbol validation)
  2. Add size limits on config vectors
  3. Validate library paths exist at config time
  4. Add a CompilerConfig::merge(other) method for composability

✅ Final Assessment

Overall: Excellent PR

This is a well-designed, well-tested feature that adds significant extensibility to the compiler. The code quality is high, security considerations are addressed, and the API is clean.

Code Quality: ⭐⭐⭐⭐⭐
Test Coverage: ⭐⭐⭐⭐☆
Security: ⭐⭐⭐⭐☆
Documentation: ⭐⭐⭐⭐☆

Recommendation: Approve with minor suggestions

The type checker placeholder behavior (issue #2) is the main item I'd like clarified before merge. Everything else can be addressed in follow-up PRs if needed.

Great work! This is a valuable addition to the Seq compiler. 🎉

- ✅ TypeChecker integration
- ✅ Symbol validation (security)
- ✅ Type safety documentation
- ✅ Full pipeline tests

Ready to merge when you are.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 30, 2025

PR Review: Compiler Extension

Summary

This PR adds extensibility to the Seq compiler, allowing external projects to register custom builtins without modifying the core compiler. This is a well-designed feature that enables runtime extensions (like actor systems) to integrate cleanly.


✅ Strengths

1. Excellent Architecture

  • Clean separation of concerns with the new config.rs module
  • Builder pattern for CompilerConfig provides ergonomic API
  • Integration points are well-chosen (AST validation, type checking, codegen)

2. Security First

  • Symbol validation (lines 75-89 in config.rs) prevents LLVM IR injection attacks
  • Validation happens at construction time, making misuse impossible
  • Good defensive programming with comprehensive character checks

3. Documentation

  • Excellent inline documentation explaining type safety trade-offs
  • Clear examples showing usage patterns
  • Well-documented API surface

4. Test Coverage

  • Comprehensive unit tests for the new config module
  • Integration tests covering the full compilation pipeline
  • Good edge case coverage (empty symbols, invalid characters, etc.)

🔍 Issues & Recommendations

Critical Issues

None identified - This is production-ready code.

Medium Priority

1. Error Handling: Panics vs Results (config.rs:97-105, 117-119)

The ExternalBuiltin::new() and with_effect() methods panic on invalid input:

pub fn new(seq_name: impl Into<String>, symbol: impl Into<String>) -> Self {
    let symbol = symbol.into();
    Self::validate_symbol(&symbol).expect("Invalid symbol name");  // Panics!
    // ...
}

Recommendation: Consider providing fallible constructors for library use:

pub fn try_new(seq_name: impl Into<String>, symbol: impl Into<String>) 
    -> Result<Self, String> {
    let symbol = symbol.into();
    Self::validate_symbol(&symbol)?;
    Ok(ExternalBuiltin { /* ... */ })
}

// Keep the panicking version for ergonomics
pub fn new(seq_name: impl Into<String>, symbol: impl Into<String>) -> Self {
    Self::try_new(seq_name, symbol).expect("Invalid symbol name")
}

This allows library consumers to handle validation errors gracefully when dealing with untrusted input.

2. Documentation: Runtime Behavior (config.rs:59-67)

The documentation warns about using None for effects but doesn't explain the runtime implications. When an external builtin is called without type checking, what happens if the stack signature is wrong?

Recommendation: Add documentation about runtime crash behavior:

/// **Warning**: Using `None` can hide type errors until runtime.
/// If the actual runtime function expects a different stack layout,
/// the program will crash or produce undefined behavior.
/// Prefer providing an explicit effect when the signature is known.

Low Priority

3. Performance: Repeated External Builtin Checks (codegen.rs:844-848)

In WordCall handling, there's a HashMap lookup for every word call:

if let Some(symbol) = self.external_builtins.get(name) {
    symbol.clone()
} else {
    format!("seq_{}", mangle_name(name))
}

Analysis: This is called during codegen, which happens once per compilation. The performance impact is negligible. No action needed, but worth noting for future optimization if compilation speed becomes a bottleneck.

4. Symbol Validation: Unicode Considerations (config.rs:79-80)

The validation uses char.is_alphanumeric(), which accepts Unicode alphanumeric characters:

if !c.is_alphanumeric() && c != '_' && c != '.' {

Question: Does LLVM support Unicode identifiers? If not, consider restricting to ASCII:

if !c.is_ascii_alphanumeric() && c != '_' && c != '.' {

This is a minor point - current behavior is likely fine, but worth validating against LLVM specs.

5. API Consistency (config.rs:173-179)

The external_names() method returns Vec<&str> while external_builtins is public. This creates two ways to access the same data:

config.external_builtins.iter().map(|b| &b.seq_name)  // Direct access
config.external_names()  // Helper method

Recommendation: This is fine as-is for convenience, but consider making external_builtins private and providing only accessor methods if you want to maintain encapsulation.


🚀 Performance Considerations

  • Minimal overhead: External builtin lookup happens once during codegen
  • No runtime cost: External builtins compile to direct function calls
  • HashMap usage: Appropriate for the scale (typically < 100 builtins)

🔒 Security Analysis

✅ Strong Security Properties

  1. LLVM IR Injection Prevention: Symbol validation prevents malicious code injection
  2. No SQL Injection equivalent: String interpolation is safe (symbols are validated)
  3. Type Safety: Optional but encouraged through good API design

Potential Concerns

None - the validation is thorough and happens at the right layer.


🧪 Test Coverage Assessment

Well Covered

  • ✅ Symbol validation (valid and invalid characters)
  • ✅ Builder pattern functionality
  • ✅ Full compilation pipeline integration
  • ✅ Multiple external builtins
  • ✅ Library path configuration

Could Add (Optional)

  • Type checking with effect: Some(...) vs effect: None
  • Error messages for undefined external builtins with typos
  • Interaction between external builtins and user-defined words with same name

Current coverage is excellent for an initial implementation.


📝 Code Quality

Rust Best Practices ✅

  • Idiomatic builder pattern
  • Good use of impl Into<String> for API flexibility
  • Appropriate visibility (private validation method)
  • Good use of #[cfg(test)] for test modules

Documentation ✅

  • Module-level documentation with examples
  • Type-level documentation explaining semantics
  • Method-level documentation for public API
  • Warning comments for security-sensitive code

Final Verdict

LGTM with minor suggestions

This is high-quality code that follows Rust best practices and adds valuable functionality to the compiler. The security considerations are well-handled, and the test coverage is comprehensive.

The only actionable suggestion is providing fallible constructors (try_new()) for better error handling in library contexts. Everything else is production-ready.

Recommended Action: Merge after considering the try_new() suggestion. All other points are optional enhancements for future iterations.

Great work! 🎉

@navicore navicore merged commit 6c898f6 into main Nov 30, 2025
2 checks passed
@navicore navicore deleted the compiler-extension branch November 30, 2025 20:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant