1. Resolves includes to find available words from std:json and other … by navicore · Pull Request #31 · navicore/patch-seq

navicore · 2025-11-29T01:56:38Z

…modules

Registers those words with both the AST validator and typechecker
Shows errors for genuinely undefined words like xson-serialize
Clears errors when you fix them back to valid words like json-serialize

The include-aware completion and diagnostics are now fully functional.

…modules 2. Registers those words with both the AST validator and typechecker 3. Shows errors for genuinely undefined words like xson-serialize 4. Clears errors when you fix them back to valid words like json-serialize The include-aware completion and diagnostics are now fully functional.

claude · 2025-11-29T01:58:54Z

PR Review: Include-Aware LSP Completion and Diagnostics

Summary

This PR implements a significant enhancement to the LSP server by adding include-aware completion and diagnostics. The implementation resolves includes to find available words from std:json and other modules, registers those words with both the AST validator and typechecker, and provides proper error handling.

✅ Strengths

1. Well-Architected Design

Clean separation of concerns with the new includes.rs module
Proper caching strategy with DocumentState to avoid re-parsing on every keystroke
Context-aware completion filtering (differentiates between code, strings, comments, include statements, etc.)
Recursive include resolution with cycle detection and depth limits

2. Good Error Handling

Graceful degradation when stdlib is not found (lsp/src/main.rs:41)
Proper handling of parse errors in included files (lsp/src/includes.rs:165-168)
Depth limit prevents runaway recursion (lsp/src/includes.rs:125)
Cycle detection using canonicalized paths (lsp/src/includes.rs:139-149)

3. Comprehensive Testing

Unit tests for context detection (lsp/src/completion.rs:383-467)
Tests for URI parsing and include resolution (lsp/src/includes.rs:262-336)
Tests verify actual stdlib resolution which is excellent

4. Compiler Integration

Clean API additions to AST validator (validate_word_calls_with_externals)
Typechecker support for external words with polymorphic placeholders
Maintains backward compatibility with existing validate_word_calls()

🐛 Potential Issues

1. Path Canonicalization on Non-Existent Files

Location: lsp/src/includes.rs:139-145

let canonical = match path.canonicalize() {
    Ok(p) => p,
    Err(_) => {
        debug!("Could not canonicalize: {:?}", path);
        return;
    }
};

Issue: canonicalize() fails if the file doesn't exist. This could silently skip valid include paths if there's a race condition or permission issue.

Suggestion: Consider using the non-canonical path for cycle detection if canonicalization fails, or log a more specific warning about why it failed.

2. Percent Decoding Implementation

Location: lsp/src/includes.rs:239-260

if hex.len() == 2
    && let Ok(byte) = u8::from_str_radix(&hex, 16)
{
    result.push(byte as char);

Issue: This assumes single-byte characters. URIs with multibyte UTF-8 sequences (e.g., %C3%A9 for é) will be decoded incorrectly as two separate characters.

Suggestion: Consider using the percent-encoding crate for robust URL decoding, or accumulate bytes and convert to UTF-8 properly:

let mut bytes = Vec::new();
// collect all %XX sequences
String::from_utf8_lossy(&bytes)

3. Maximally Polymorphic Placeholder May Hide Type Errors

Location: lsp/src/diagnostics.rs:56-60, compiler/src/typechecker.rs:45-51

let placeholder = Effect::new(
    StackType::RowVar("ext_in".to_string()),
    StackType::RowVar("ext_out".to_string()),
);

Issue: When an included word's effect is None, a maximally polymorphic placeholder (..a -- ..b) is used. This could allow type-incorrect code to pass the typechecker.

Impact: Medium - may cause runtime errors for words without effect signatures
Suggestion: Consider:

Emitting a warning when a word has no effect signature
Or requiring all exported words to have effects
Or tracking which words used placeholders and showing them in diagnostics

4. Missing Error Context in Diagnostics

Location: lsp/src/diagnostics.rs:42-48

When checking with external words, the error message from validate_word_calls_with_externals doesn't indicate that included words are available. Users might not understand why json-serialize is valid in one file but not another.

Suggestion: Enhance error messages to show available included words or at least mention include resolution.

⚡ Performance Considerations

1. Re-parsing on Every Change

Location: lsp/src/main.rs:184-185

self.update_document(uri.as_str(), text.clone(), file_path);

Every keystroke triggers:

Full document parse
Include resolution (file I/O + parsing)
Diagnostics run

Impact: Could be slow for large projects with many includes

Suggestion:

Debounce the did_change handler (common in LSP servers)
Only re-resolve includes if the include statements changed
Consider incremental parsing for local words

2. Cloning Included/Local Words on Every Completion

Location: lsp/src/main.rs:246-247

state.included_words.clone(),
state.local_words.clone(),

Impact: Low - typically small vectors, but could add up

Suggestion: Use Arc<Vec<>> to make cloning O(1) instead of O(n)

3. Linear Search for Stdlib Path

Location: lsp/src/includes.rs:28-77

The find_stdlib_path function checks multiple locations sequentially, including canonicalize() calls which can be expensive.

Impact: Very low - only called once at startup

Suggestion: Current implementation is fine; nice optimization would be caching result in a OnceLock if multiple LSP instances share code.

🔒 Security Considerations

1. Path Traversal Risk (Low)

Location: lsp/src/includes.rs:195-220

For Include::Relative, the code does:

let path = dir.join(format!("{}.seq", name));

Issue: If name contains .., this could potentially access files outside the intended directory.

Impact: Low - LSP context is already trusted (user's own code)

Mitigation: Already somewhat protected by checking path.exists(), but consider validating that the canonicalized path stays within expected boundaries.

2. No Resource Limits on Include Resolution

Location: lsp/src/includes.rs:100-113

While there's a depth limit (10), there's no limit on total number of files or total size of included content.

Impact: Low - same trust model as above

Suggestion: Consider adding limits on total included files (e.g., max 100 files) to prevent pathological cases.

📊 Test Coverage

Good:

Context detection tests (strings, comments, word defs, etc.)
URI parsing with spaces
Include resolution with actual stdlib
Document parsing with effects

Missing:

Integration test for completion with included words
Test for circular includes (verify cycle detection works)
Test for nested includes (depth > 1)
Test for the register_external_words typechecker API
Test showing that diagnostics clear when undefined words are fixed

Recommendation: Add at least one integration test showing:

// file1.seq: include "file2"
// file2.seq: : helper ( -- ) ;
// Verify completion shows "helper" in file1

📝 Code Quality

Positive:

✅ Consistent error handling patterns
✅ Good documentation comments
✅ Appropriate use of logging (debug!, info!, warn!)
✅ Clean separation of concerns
✅ Backward compatibility maintained

Minor Suggestions:

Unused pattern prefix in goto_definition
- lsp/src/main.rs:211,223 - _uri, _position could just be _ since they're not used
Consider adding tracing spans
- Would help debug include resolution: #[tracing::instrument]
PLAN.md in production code
- lsp/PLAN.md is 150 lines of planning notes. Consider moving to docs/ or removing before merge.

🎯 Overall Assessment

Quality: ⭐⭐⭐⭐ (4/5)

This is a solid, well-thought-out implementation. The architecture is clean, error handling is generally good, and the test coverage is reasonable. The main concerns are:

Percent decoding bug (should fix)
Polymorphic placeholder hiding errors (consider addressing)
Performance of re-parsing on every keystroke (could optimize later)

🚀 Recommendation

✅ APPROVE with minor fixes

Suggested before merge:

Fix the percent decoding for UTF-8 URIs (use a proper library or fix the implementation)
Consider moving/removing PLAN.md
Add a TODO comment about the polymorphic placeholder issue for future work

Nice to have:

Add integration test for circular includes
Consider debouncing did_change

Great work on this feature! The include-aware completion will significantly improve the developer experience.

Review generated with assistance from Claude Code

1. Fixed percent decoding for UTF-8 - Now properly handles multi-byte UTF-8 sequences like %C3%A9 → é by collecting bytes and decoding with from_utf8_lossy 2. Removed PLAN.md - Planning notes shouldn't be in production code 3. Added TODO comment - Documented the polymorphic placeholder limitation in diagnostics.rs:54-58 for future consideration The other suggestions (debouncing, Arc<Vec<>>, tracing spans) are nice-to-have optimizations that can be done later if performance becomes an issue.

claude · 2025-11-29T02:15:03Z

PR Review #31: Include-Aware LSP Completion and Diagnostics

Overview

This PR successfully implements include-aware completion and diagnostics for the LSP server, addressing the previous feedback. The implementation properly resolves includes, registers external words with the validator and typechecker, and provides context-aware completions.

✅ Excellent Work

1. Previous Feedback Addressed

UTF-8 percent decoding fixed (includes.rs:242-268) - Now properly handles multi-byte UTF-8 sequences
TODO comment added (diagnostics.rs:54-58) - Documents the polymorphic placeholder limitation
Commit confirms other suggestions deferred as nice-to-have optimizations

2. Clean Architecture

Separation of concerns: New includes.rs module is well-organized
Context-aware completions: Excellent detection logic (completion.rs:66-112)
Caching strategy: DocumentState avoids unnecessary re-parsing
Backward compatibility: validate_word_calls() still works

3. Robust Error Handling

Graceful degradation when stdlib not found
Cycle detection with canonical paths
Depth limit prevents runaway recursion
Parse errors in included files dont crash the LSP

4. Comprehensive Testing

Unit tests for context detection
URI parsing tests including UTF-8 cases
Tests verify actual stdlib resolution
Diagnostics tested for undefined words

claude · 2025-11-29T02:15:16Z

🔍 Issues Found

1. Critical: Row Variable Name Collision Risk

Location: compiler/src/typechecker.rs:46-49

All external words without effects get the same row variable names "ext_in" and "ext_out". If multiple such words are used in the same program, the typechecker will unify their effects incorrectly.

Fix: Use fresh variables for each external word:

let placeholder = Effect::new(
    StackType::RowVar(format!("ext_in_{}", name)),
    StackType::RowVar(format!("ext_out_{}", name)),
);

Impact: High - can cause type errors to go undetected

2. Bug: Incorrect String Detection in Comments

Location: completion.rs:77-91

The hash detection logic can incorrectly identify comments. After a string like "hello # world", a real comment may be treated as code because rfind finds the # inside the string.

Test case that would fail:

assert_eq!(detect_context("\"#\" # comment"), ContextType::InComment);

Impact: Medium - autocomplete appears when it shouldnt

3. Bug: Paren Counting Includes Strings

Location: completion.rs:104-110

Counts ALL parens including those in strings, causing incorrect stack effect detection.

Impact: Medium - wrong completions in edge cases

claude · 2025-11-29T02:15:28Z

4. Performance: Cloning on Every Completion

Location: main.rs:234-248

Every completion request clones potentially large Vec and Vec. Consider using Arc<Vec<>> as suggested in previous review.

Impact: Low-Medium - completion requests are frequent

5. Inconsistency: cfg(test) on Public Function

Location: diagnostics.rs:10-12

check_document is pub but only available in tests - confusing API. Either make it pub(crate) or remove cfg(test).

⚡ Performance & Security

Performance:

Re-parsing on every change is reasonable for now
Path canonicalization involves syscalls - consider caching for stdlib files
Storing full document content is fine for current use

Security:

✅ Path traversal well handled
✅ Resource exhaustion protected (depth limit, cycle detection)
⚠️ Consider file size limits for included files (low priority - LSP runs locally)

📋 Test Coverage

Strengths: Context detection, URI parsing, include resolution well tested

Gaps:

Missing tests for completions with included words
Missing edge cases in context detection (bugs mentioned above)
Missing error cases in percent_decode

claude · 2025-11-29T02:15:40Z

🎯 Recommendations

Must Fix (Before Merge):

⚠️ Row variable name collision (Critical) - Use unique names per word in typechecker.rs
⚠️ String detection in comments (Medium) - Fix the # detection logic in completion.rs

Should Fix (Soon):

Paren counting in strings - Fix context detection to ignore parens in strings
Performance of cloning - Use Arc or release lock earlier

Nice to Have:

Improve test coverage to 30%+
Add file size limits for security
Cache canonical paths

✨ Overall Assessment

Quality: 8/10 - Well-architected, addresses previous feedback, good test coverage

Readiness: Almost ready - needs critical bug fixes before merge

Impact: High - Significantly improves LSP experience with include-aware features

This is excellent work that substantially improves the developer experience. The architecture is sound and the implementation is clean. Once the critical type variable collision is fixed and the context detection bugs are addressed, this will be ready to merge.

Great job on addressing the previous feedback, especially the UTF-8 decoding fix! 🎉

claude · 2025-11-29T02:38:09Z

Pull Request Review - PR 31

Summary

This PR implements include-aware completions and diagnostics for the LSP server. It adds the ability to resolve include statements, extract word definitions from included modules, and use them for both autocompletion and error checking. This is a significant enhancement to the developer experience.

Code Quality and Best Practices - Excellent

Strengths

Well-structured architecture with clean separation between includes resolution, completion logic, and diagnostics
Comprehensive context detection in completion.rs handles multiple contexts (strings, comments, stack effects, includes, etc.)
Good documentation with clear doc comments
Proper error handling with graceful degradation when stdlib cannot be found
Cycle detection prevents infinite loops with visited set and depth limit
Excellent test coverage for URI parsing, context detection, and include resolution

Minor Concerns

Hardcoded depth limit of 10 at lsp/src/includes.rs:125 - consider making configurable
Memory duplication in completion handler clones included_words and local_words - could use Arc for large projects
Sort text prefixes use string prefixes which works but is fragile - consider using enum or constants

Potential Bugs and Issues

Critical Issues

Type safety hole with placeholder effects (lsp/src/diagnostics.rs:54-58): Words without effect signatures get a maximally polymorphic placeholder that could allow type-incorrect code to pass. Consider emitting warnings or documenting this limitation.
Incorrect sed usage in release workflow (.github/workflows/release.yml:35): The sed command will replace the FIRST version line it finds, not necessarily the workspace version. Should target only the [workspace.package] section to avoid updating wrong version fields.

Medium Priority Issues

Canonicalization failure handling returns early - consider using non-canonical path as fallback
Missing escape sequence handling will need update when language adds escape sequences
Windows path handling code only compiles on Windows - untested on Unix
Paren counting does not handle parens inside strings

Performance Considerations

Good Practices

Document state caching per URI prevents repeated parsing
Lazy resolution only when documents open/change
Early returns in context detection

Optimizations to Consider

File I/O in LSP thread could block server - consider async I/O or background thread
Stdlib files are re-parsed for every include resolution - consider workspace-level cache

Security Concerns - Low Risk

Path traversal potential exists but low risk in trusted LSP environment
No limit on breadth of includes - could cause excessive I/O with malicious files
Redundant token passing in release workflow

Test Coverage - Very Good

Excellent Coverage

URI parsing with spaces and UTF-8
Context detection tests
Parse error, type error, and undefined word tests
Real stdlib resolution test

Missing Tests

Include cycle detection verification
Include depth limit verification
Completion priority ordering
Error recovery when included files have parse errors

Recommendations

Must Fix Before Merge

Fix sed command in release workflow line 35
Address or document the type safety hole with placeholder effects

Should Fix Soon

Add canonicalization fallback
Add tests for circular includes and depth limits
Consider async file I/O for include resolution

Nice to Have

Make depth limit configurable
Use Arc for shared data
Add completion priority tests

Overall Assessment - 4.5/5 Stars

This is a high-quality PR that significantly improves the LSP functionality. The code is well-structured, well-tested, and follows good practices. The include resolution logic is sound with proper cycle detection.

Main concerns:

Critical sed bug in release workflow that could break releases
Documented type safety issue that should be addressed
Some missing tests for edge cases

With the sed fix and type safety consideration addressed, this PR is ready to merge. The remaining issues are minor optimizations.

Great work on this enhancement! The context-aware completions and include-aware diagnostics will greatly improve the developer experience.

Review generated with Claude Code

Fixed: 1. sed command in release workflow - Now targets specifically the line after [workspace.package] instead of replacing the first version line found 2. Paren counting in strings - New count_unmatched_parens() function ignores parens inside string literals 3. Added tests - For parens inside strings The comment detection was already correct (lines 82-91 check if # is inside a string before treating it as a comment), so no change was needed there.

…ckage] section (0.3.0 → 0.4.0) and leaves everything else untouched. The type safety issue (polymorphic placeholder) was already documented with a TODO comment in diagnostics.rs:54-58 from the earlier fix. That's the "documented" part the review refers to.

claude · 2025-11-29T03:12:52Z

PR Review: Include-aware LSP Completions and Diagnostics

Overview

This PR adds significant LSP functionality by implementing include-aware completions and diagnostics.

✅ Strengths

Well-Structured Architecture - Clean separation of concerns with includes.rs, completion.rs, diagnostics.rs
Robust Context Detection - Smart handling of strings, comments, parentheses (completion.rs:65-112)
UTF-8 Handling - Excellent fix for percent decoding with multi-byte sequences
Type Safety Integration - Good integration with AST validator and typechecker

⚠️ Issues Found

1. Path Traversal Vulnerability 🔴 HIGH

Location: lsp/src/includes.rs:211-215

The Include::Relative path is not sanitized. An attacker could use include "../../../etc/passwd" to read arbitrary files.

Fix: Add sanitization to reject paths with "..", "/", or "\".

2. Incorrect Parenthesis Counting 🟡 MEDIUM

Location: lsp/src/completion.rs:115-128

The count_unmatched_parens function doesn't handle escaped quotes. If Seq supports escape sequences, this will break.

3. Unbounded File Reading 🟡 MEDIUM

Location: lsp/src/includes.rs:154

No size limit on included files - could cause memory exhaustion with large/malicious files.

Fix: Add 10MB file size check before reading.

4. Polymorphic Placeholders 🟡 MEDIUM

Location: lsp/src/diagnostics.rs:54-58

Words without effects get ( ..a -- ..b ) placeholder which allows type-incorrect code to pass. Consider emitting warnings.

5. Release Workflow Race Condition 🟡 MEDIUM

Location: .github/workflows/release.yml:70-78

Fixed 30s sleep is fragile. Tests run after version bump. Add retry logic and verify indexing properly.

📊 Test Coverage

Good coverage for context detection, URI conversion, and include parsing.

Missing tests for:

Escaped quotes in count_unmatched_parens
Cycle detection and depth limit
Malformed paths
Integration tests for LSP completion with includes

🎯 Performance

Good caching practices. Consider:

Debouncing diagnostics
Arc sharing for included words
Lazy stdlib loading

🔒 Security

🔴 Path traversal vulnerability (MUST FIX)
🟡 Unbounded file reading (SHOULD FIX)
✅ No unsafe code, good UTF-8 handling

Final Recommendations

Must Fix Before Merge:

Fix path traversal in includes.rs:211-215
Add file size limit

Should Fix Soon:
3. Document/fix escaped quote handling
4. Improve release workflow
5. Add integration tests

Score: 7.5/10 (9/10 after security fixes)

Great work on the include resolution system and UTF-8 handling!

release

136cb9b

navicore added 2 commits November 28, 2025 18:49

navicore merged commit 7e20b0e into main Nov 29, 2025
3 checks passed

navicore deleted the smart-completion branch November 29, 2025 03:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Resolves includes to find available words from std:json and other …#31

1. Resolves includes to find available words from std:json and other …#31
navicore merged 5 commits intomainfrom
smart-completion

navicore commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

navicore commented Nov 29, 2025

Uh oh!

claude Bot commented Nov 29, 2025

PR Review: Include-Aware LSP Completion and Diagnostics

Summary

✅ Strengths

1. Well-Architected Design

2. Good Error Handling

3. Comprehensive Testing

4. Compiler Integration

🐛 Potential Issues

1. Path Canonicalization on Non-Existent Files

2. Percent Decoding Implementation

3. Maximally Polymorphic Placeholder May Hide Type Errors

4. Missing Error Context in Diagnostics

⚡ Performance Considerations

1. Re-parsing on Every Change

2. Cloning Included/Local Words on Every Completion

3. Linear Search for Stdlib Path

🔒 Security Considerations

1. Path Traversal Risk (Low)

2. No Resource Limits on Include Resolution

📊 Test Coverage

📝 Code Quality

Positive:

Minor Suggestions:

🎯 Overall Assessment

🚀 Recommendation

Uh oh!

claude Bot commented Nov 29, 2025

PR Review #31: Include-Aware LSP Completion and Diagnostics

Overview

✅ Excellent Work

1. Previous Feedback Addressed

2. Clean Architecture

3. Robust Error Handling

4. Comprehensive Testing

Uh oh!

claude Bot commented Nov 29, 2025

🔍 Issues Found

1. Critical: Row Variable Name Collision Risk

2. Bug: Incorrect String Detection in Comments

3. Bug: Paren Counting Includes Strings

Uh oh!

claude Bot commented Nov 29, 2025

4. Performance: Cloning on Every Completion

5. Inconsistency: cfg(test) on Public Function

⚡ Performance & Security

📋 Test Coverage

Uh oh!

claude Bot commented Nov 29, 2025

🎯 Recommendations

Must Fix (Before Merge):

Should Fix (Soon):

Nice to Have:

✨ Overall Assessment

Uh oh!

claude Bot commented Nov 29, 2025

Pull Request Review - PR 31

Summary

Code Quality and Best Practices - Excellent

Strengths

Minor Concerns

Potential Bugs and Issues

Critical Issues

Medium Priority Issues

Performance Considerations

Good Practices

Optimizations to Consider

Security Concerns - Low Risk

Test Coverage - Very Good

Excellent Coverage

Missing Tests

Recommendations

Must Fix Before Merge

Should Fix Soon

Nice to Have

Overall Assessment - 4.5/5 Stars

Uh oh!