Skip to content
This repository was archived by the owner on Apr 29, 2026. It is now read-only.

1. Resolves includes to find available words from std:json and other …#31

Merged
navicore merged 5 commits intomainfrom
smart-completion
Nov 29, 2025
Merged

1. Resolves includes to find available words from std:json and other …#31
navicore merged 5 commits intomainfrom
smart-completion

Conversation

@navicore
Copy link
Copy Markdown
Owner

…modules

  1. Registers those words with both the AST validator and typechecker
  2. Shows errors for genuinely undefined words like xson-serialize
  3. Clears errors when you fix them back to valid words like json-serialize

The include-aware completion and diagnostics are now fully functional.

…modules

2. Registers those words with both the AST validator and typechecker
3. Shows errors for genuinely undefined words like xson-serialize
4. Clears errors when you fix them back to valid words like json-serialize

The include-aware completion and diagnostics are now fully functional.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

PR Review: Include-Aware LSP Completion and Diagnostics

Summary

This PR implements a significant enhancement to the LSP server by adding include-aware completion and diagnostics. The implementation resolves includes to find available words from std:json and other modules, registers those words with both the AST validator and typechecker, and provides proper error handling.

✅ Strengths

1. Well-Architected Design

  • Clean separation of concerns with the new includes.rs module
  • Proper caching strategy with DocumentState to avoid re-parsing on every keystroke
  • Context-aware completion filtering (differentiates between code, strings, comments, include statements, etc.)
  • Recursive include resolution with cycle detection and depth limits

2. Good Error Handling

  • Graceful degradation when stdlib is not found (lsp/src/main.rs:41)
  • Proper handling of parse errors in included files (lsp/src/includes.rs:165-168)
  • Depth limit prevents runaway recursion (lsp/src/includes.rs:125)
  • Cycle detection using canonicalized paths (lsp/src/includes.rs:139-149)

3. Comprehensive Testing

  • Unit tests for context detection (lsp/src/completion.rs:383-467)
  • Tests for URI parsing and include resolution (lsp/src/includes.rs:262-336)
  • Tests verify actual stdlib resolution which is excellent

4. Compiler Integration

  • Clean API additions to AST validator (validate_word_calls_with_externals)
  • Typechecker support for external words with polymorphic placeholders
  • Maintains backward compatibility with existing validate_word_calls()

🐛 Potential Issues

1. Path Canonicalization on Non-Existent Files

Location: lsp/src/includes.rs:139-145

let canonical = match path.canonicalize() {
    Ok(p) => p,
    Err(_) => {
        debug!("Could not canonicalize: {:?}", path);
        return;
    }
};

Issue: canonicalize() fails if the file doesn't exist. This could silently skip valid include paths if there's a race condition or permission issue.

Suggestion: Consider using the non-canonical path for cycle detection if canonicalization fails, or log a more specific warning about why it failed.

2. Percent Decoding Implementation

Location: lsp/src/includes.rs:239-260

if hex.len() == 2
    && let Ok(byte) = u8::from_str_radix(&hex, 16)
{
    result.push(byte as char);

Issue: This assumes single-byte characters. URIs with multibyte UTF-8 sequences (e.g., %C3%A9 for é) will be decoded incorrectly as two separate characters.

Suggestion: Consider using the percent-encoding crate for robust URL decoding, or accumulate bytes and convert to UTF-8 properly:

let mut bytes = Vec::new();
// collect all %XX sequences
String::from_utf8_lossy(&bytes)

3. Maximally Polymorphic Placeholder May Hide Type Errors

Location: lsp/src/diagnostics.rs:56-60, compiler/src/typechecker.rs:45-51

let placeholder = Effect::new(
    StackType::RowVar("ext_in".to_string()),
    StackType::RowVar("ext_out".to_string()),
);

Issue: When an included word's effect is None, a maximally polymorphic placeholder (..a -- ..b) is used. This could allow type-incorrect code to pass the typechecker.

Impact: Medium - may cause runtime errors for words without effect signatures
Suggestion: Consider:

  • Emitting a warning when a word has no effect signature
  • Or requiring all exported words to have effects
  • Or tracking which words used placeholders and showing them in diagnostics

4. Missing Error Context in Diagnostics

Location: lsp/src/diagnostics.rs:42-48

When checking with external words, the error message from validate_word_calls_with_externals doesn't indicate that included words are available. Users might not understand why json-serialize is valid in one file but not another.

Suggestion: Enhance error messages to show available included words or at least mention include resolution.

⚡ Performance Considerations

1. Re-parsing on Every Change

Location: lsp/src/main.rs:184-185

self.update_document(uri.as_str(), text.clone(), file_path);

Every keystroke triggers:

  • Full document parse
  • Include resolution (file I/O + parsing)
  • Diagnostics run

Impact: Could be slow for large projects with many includes

Suggestion:

  • Debounce the did_change handler (common in LSP servers)
  • Only re-resolve includes if the include statements changed
  • Consider incremental parsing for local words

2. Cloning Included/Local Words on Every Completion

Location: lsp/src/main.rs:246-247

state.included_words.clone(),
state.local_words.clone(),

Impact: Low - typically small vectors, but could add up

Suggestion: Use Arc<Vec<>> to make cloning O(1) instead of O(n)

3. Linear Search for Stdlib Path

Location: lsp/src/includes.rs:28-77

The find_stdlib_path function checks multiple locations sequentially, including canonicalize() calls which can be expensive.

Impact: Very low - only called once at startup

Suggestion: Current implementation is fine; nice optimization would be caching result in a OnceLock if multiple LSP instances share code.

🔒 Security Considerations

1. Path Traversal Risk (Low)

Location: lsp/src/includes.rs:195-220

For Include::Relative, the code does:

let path = dir.join(format!("{}.seq", name));

Issue: If name contains .., this could potentially access files outside the intended directory.

Impact: Low - LSP context is already trusted (user's own code)

Mitigation: Already somewhat protected by checking path.exists(), but consider validating that the canonicalized path stays within expected boundaries.

2. No Resource Limits on Include Resolution

Location: lsp/src/includes.rs:100-113

While there's a depth limit (10), there's no limit on total number of files or total size of included content.

Impact: Low - same trust model as above

Suggestion: Consider adding limits on total included files (e.g., max 100 files) to prevent pathological cases.

📊 Test Coverage

Good:

  • Context detection tests (strings, comments, word defs, etc.)
  • URI parsing with spaces
  • Include resolution with actual stdlib
  • Document parsing with effects

Missing:

  • Integration test for completion with included words
  • Test for circular includes (verify cycle detection works)
  • Test for nested includes (depth > 1)
  • Test for the register_external_words typechecker API
  • Test showing that diagnostics clear when undefined words are fixed

Recommendation: Add at least one integration test showing:

// file1.seq: include "file2"
// file2.seq: : helper ( -- ) ;
// Verify completion shows "helper" in file1

📝 Code Quality

Positive:

  • ✅ Consistent error handling patterns
  • ✅ Good documentation comments
  • ✅ Appropriate use of logging (debug!, info!, warn!)
  • ✅ Clean separation of concerns
  • ✅ Backward compatibility maintained

Minor Suggestions:

  1. Unused pattern prefix in goto_definition

    • lsp/src/main.rs:211,223 - _uri, _position could just be _ since they're not used
  2. Consider adding tracing spans

    • Would help debug include resolution: #[tracing::instrument]
  3. PLAN.md in production code

    • lsp/PLAN.md is 150 lines of planning notes. Consider moving to docs/ or removing before merge.

🎯 Overall Assessment

Quality: ⭐⭐⭐⭐ (4/5)

This is a solid, well-thought-out implementation. The architecture is clean, error handling is generally good, and the test coverage is reasonable. The main concerns are:

  1. Percent decoding bug (should fix)
  2. Polymorphic placeholder hiding errors (consider addressing)
  3. Performance of re-parsing on every keystroke (could optimize later)

🚀 Recommendation

✅ APPROVE with minor fixes

Suggested before merge:

  1. Fix the percent decoding for UTF-8 URIs (use a proper library or fix the implementation)
  2. Consider moving/removing PLAN.md
  3. Add a TODO comment about the polymorphic placeholder issue for future work

Nice to have:

  • Add integration test for circular includes
  • Consider debouncing did_change

Great work on this feature! The include-aware completion will significantly improve the developer experience.


Review generated with assistance from Claude Code

1. Fixed percent decoding for UTF-8 - Now properly handles multi-byte UTF-8 sequences like %C3%A9 → é by collecting bytes and decoding with
from_utf8_lossy
2. Removed PLAN.md - Planning notes shouldn't be in production code
3. Added TODO comment - Documented the polymorphic placeholder limitation in diagnostics.rs:54-58 for future consideration

The other suggestions (debouncing, Arc<Vec<>>, tracing spans) are nice-to-have optimizations that can be done later if performance becomes an issue.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

PR Review #31: Include-Aware LSP Completion and Diagnostics

Overview

This PR successfully implements include-aware completion and diagnostics for the LSP server, addressing the previous feedback. The implementation properly resolves includes, registers external words with the validator and typechecker, and provides context-aware completions.

✅ Excellent Work

1. Previous Feedback Addressed

  • UTF-8 percent decoding fixed (includes.rs:242-268) - Now properly handles multi-byte UTF-8 sequences
  • TODO comment added (diagnostics.rs:54-58) - Documents the polymorphic placeholder limitation
  • Commit confirms other suggestions deferred as nice-to-have optimizations

2. Clean Architecture

  • Separation of concerns: New includes.rs module is well-organized
  • Context-aware completions: Excellent detection logic (completion.rs:66-112)
  • Caching strategy: DocumentState avoids unnecessary re-parsing
  • Backward compatibility: validate_word_calls() still works

3. Robust Error Handling

  • Graceful degradation when stdlib not found
  • Cycle detection with canonical paths
  • Depth limit prevents runaway recursion
  • Parse errors in included files dont crash the LSP

4. Comprehensive Testing

  • Unit tests for context detection
  • URI parsing tests including UTF-8 cases
  • Tests verify actual stdlib resolution
  • Diagnostics tested for undefined words

@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

🔍 Issues Found

1. Critical: Row Variable Name Collision Risk

Location: compiler/src/typechecker.rs:46-49

All external words without effects get the same row variable names "ext_in" and "ext_out". If multiple such words are used in the same program, the typechecker will unify their effects incorrectly.

Fix: Use fresh variables for each external word:

let placeholder = Effect::new(
    StackType::RowVar(format!("ext_in_{}", name)),
    StackType::RowVar(format!("ext_out_{}", name)),
);

Impact: High - can cause type errors to go undetected

2. Bug: Incorrect String Detection in Comments

Location: completion.rs:77-91

The hash detection logic can incorrectly identify comments. After a string like "hello # world", a real comment may be treated as code because rfind finds the # inside the string.

Test case that would fail:

assert_eq!(detect_context("\"#\" # comment"), ContextType::InComment);

Impact: Medium - autocomplete appears when it shouldnt

3. Bug: Paren Counting Includes Strings

Location: completion.rs:104-110

Counts ALL parens including those in strings, causing incorrect stack effect detection.

Impact: Medium - wrong completions in edge cases

@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

4. Performance: Cloning on Every Completion

Location: main.rs:234-248

Every completion request clones potentially large Vec and Vec. Consider using Arc<Vec<>> as suggested in previous review.

Impact: Low-Medium - completion requests are frequent

5. Inconsistency: cfg(test) on Public Function

Location: diagnostics.rs:10-12

check_document is pub but only available in tests - confusing API. Either make it pub(crate) or remove cfg(test).

⚡ Performance & Security

Performance:

  • Re-parsing on every change is reasonable for now
  • Path canonicalization involves syscalls - consider caching for stdlib files
  • Storing full document content is fine for current use

Security:

  • ✅ Path traversal well handled
  • ✅ Resource exhaustion protected (depth limit, cycle detection)
  • ⚠️ Consider file size limits for included files (low priority - LSP runs locally)

📋 Test Coverage

Strengths: Context detection, URI parsing, include resolution well tested

Gaps:

  • Missing tests for completions with included words
  • Missing edge cases in context detection (bugs mentioned above)
  • Missing error cases in percent_decode

@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

🎯 Recommendations

Must Fix (Before Merge):

  1. ⚠️ Row variable name collision (Critical) - Use unique names per word in typechecker.rs
  2. ⚠️ String detection in comments (Medium) - Fix the # detection logic in completion.rs

Should Fix (Soon):

  1. Paren counting in strings - Fix context detection to ignore parens in strings
  2. Performance of cloning - Use Arc or release lock earlier

Nice to Have:

  1. Improve test coverage to 30%+
  2. Add file size limits for security
  3. Cache canonical paths

✨ Overall Assessment

Quality: 8/10 - Well-architected, addresses previous feedback, good test coverage

Readiness: Almost ready - needs critical bug fixes before merge

Impact: High - Significantly improves LSP experience with include-aware features

This is excellent work that substantially improves the developer experience. The architecture is sound and the implementation is clean. Once the critical type variable collision is fixed and the context detection bugs are addressed, this will be ready to merge.

Great job on addressing the previous feedback, especially the UTF-8 decoding fix! 🎉

@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

Pull Request Review - PR 31

Summary

This PR implements include-aware completions and diagnostics for the LSP server. It adds the ability to resolve include statements, extract word definitions from included modules, and use them for both autocompletion and error checking. This is a significant enhancement to the developer experience.

Code Quality and Best Practices - Excellent

Strengths

  • Well-structured architecture with clean separation between includes resolution, completion logic, and diagnostics
  • Comprehensive context detection in completion.rs handles multiple contexts (strings, comments, stack effects, includes, etc.)
  • Good documentation with clear doc comments
  • Proper error handling with graceful degradation when stdlib cannot be found
  • Cycle detection prevents infinite loops with visited set and depth limit
  • Excellent test coverage for URI parsing, context detection, and include resolution

Minor Concerns

  1. Hardcoded depth limit of 10 at lsp/src/includes.rs:125 - consider making configurable
  2. Memory duplication in completion handler clones included_words and local_words - could use Arc for large projects
  3. Sort text prefixes use string prefixes which works but is fragile - consider using enum or constants

Potential Bugs and Issues

Critical Issues

  1. Type safety hole with placeholder effects (lsp/src/diagnostics.rs:54-58): Words without effect signatures get a maximally polymorphic placeholder that could allow type-incorrect code to pass. Consider emitting warnings or documenting this limitation.

  2. Incorrect sed usage in release workflow (.github/workflows/release.yml:35): The sed command will replace the FIRST version line it finds, not necessarily the workspace version. Should target only the [workspace.package] section to avoid updating wrong version fields.

Medium Priority Issues

  1. Canonicalization failure handling returns early - consider using non-canonical path as fallback
  2. Missing escape sequence handling will need update when language adds escape sequences
  3. Windows path handling code only compiles on Windows - untested on Unix
  4. Paren counting does not handle parens inside strings

Performance Considerations

Good Practices

  • Document state caching per URI prevents repeated parsing
  • Lazy resolution only when documents open/change
  • Early returns in context detection

Optimizations to Consider

  1. File I/O in LSP thread could block server - consider async I/O or background thread
  2. Stdlib files are re-parsed for every include resolution - consider workspace-level cache

Security Concerns - Low Risk

  1. Path traversal potential exists but low risk in trusted LSP environment
  2. No limit on breadth of includes - could cause excessive I/O with malicious files
  3. Redundant token passing in release workflow

Test Coverage - Very Good

Excellent Coverage

  • URI parsing with spaces and UTF-8
  • Context detection tests
  • Parse error, type error, and undefined word tests
  • Real stdlib resolution test

Missing Tests

  • Include cycle detection verification
  • Include depth limit verification
  • Completion priority ordering
  • Error recovery when included files have parse errors

Recommendations

Must Fix Before Merge

  1. Fix sed command in release workflow line 35
  2. Address or document the type safety hole with placeholder effects

Should Fix Soon

  1. Add canonicalization fallback
  2. Add tests for circular includes and depth limits
  3. Consider async file I/O for include resolution

Nice to Have

  1. Make depth limit configurable
  2. Use Arc for shared data
  3. Add completion priority tests

Overall Assessment - 4.5/5 Stars

This is a high-quality PR that significantly improves the LSP functionality. The code is well-structured, well-tested, and follows good practices. The include resolution logic is sound with proper cycle detection.

Main concerns:

  1. Critical sed bug in release workflow that could break releases
  2. Documented type safety issue that should be addressed
  3. Some missing tests for edge cases

With the sed fix and type safety consideration addressed, this PR is ready to merge. The remaining issues are minor optimizations.

Great work on this enhancement! The context-aware completions and include-aware diagnostics will greatly improve the developer experience.


Review generated with Claude Code

Fixed:
1. sed command in release workflow - Now targets specifically the line after [workspace.package] instead of replacing the first version line found
2. Paren counting in strings - New count_unmatched_parens() function ignores parens inside string literals
3. Added tests - For parens inside strings

The comment detection was already correct (lines 82-91 check if # is inside a string before treating it as a comment), so no change was needed there.
…ckage] section (0.3.0 → 0.4.0) and leaves everything else untouched.

  The type safety issue (polymorphic placeholder) was already documented with a TODO comment in diagnostics.rs:54-58 from the earlier fix. That's the
  "documented" part the review refers to.
@claude
Copy link
Copy Markdown

claude Bot commented Nov 29, 2025

PR Review: Include-aware LSP Completions and Diagnostics

Overview

This PR adds significant LSP functionality by implementing include-aware completions and diagnostics.

✅ Strengths

  1. Well-Structured Architecture - Clean separation of concerns with includes.rs, completion.rs, diagnostics.rs
  2. Robust Context Detection - Smart handling of strings, comments, parentheses (completion.rs:65-112)
  3. UTF-8 Handling - Excellent fix for percent decoding with multi-byte sequences
  4. Type Safety Integration - Good integration with AST validator and typechecker

⚠️ Issues Found

1. Path Traversal Vulnerability 🔴 HIGH

Location: lsp/src/includes.rs:211-215

The Include::Relative path is not sanitized. An attacker could use include "../../../etc/passwd" to read arbitrary files.

Fix: Add sanitization to reject paths with "..", "/", or "\".

2. Incorrect Parenthesis Counting 🟡 MEDIUM

Location: lsp/src/completion.rs:115-128

The count_unmatched_parens function doesn't handle escaped quotes. If Seq supports escape sequences, this will break.

3. Unbounded File Reading 🟡 MEDIUM

Location: lsp/src/includes.rs:154

No size limit on included files - could cause memory exhaustion with large/malicious files.

Fix: Add 10MB file size check before reading.

4. Polymorphic Placeholders 🟡 MEDIUM

Location: lsp/src/diagnostics.rs:54-58

Words without effects get ( ..a -- ..b ) placeholder which allows type-incorrect code to pass. Consider emitting warnings.

5. Release Workflow Race Condition 🟡 MEDIUM

Location: .github/workflows/release.yml:70-78

Fixed 30s sleep is fragile. Tests run after version bump. Add retry logic and verify indexing properly.

📊 Test Coverage

Good coverage for context detection, URI conversion, and include parsing.

Missing tests for:

  • Escaped quotes in count_unmatched_parens
  • Cycle detection and depth limit
  • Malformed paths
  • Integration tests for LSP completion with includes

🎯 Performance

Good caching practices. Consider:

  • Debouncing diagnostics
  • Arc sharing for included words
  • Lazy stdlib loading

🔒 Security

  • 🔴 Path traversal vulnerability (MUST FIX)
  • 🟡 Unbounded file reading (SHOULD FIX)
  • ✅ No unsafe code, good UTF-8 handling

Final Recommendations

Must Fix Before Merge:

  1. Fix path traversal in includes.rs:211-215
  2. Add file size limit

Should Fix Soon:
3. Document/fix escaped quote handling
4. Improve release workflow
5. Add integration tests

Score: 7.5/10 (9/10 after security fixes)

Great work on the include resolution system and UTF-8 handling!

@navicore navicore merged commit 7e20b0e into main Nov 29, 2025
3 checks passed
@navicore navicore deleted the smart-completion branch November 29, 2025 03:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant