feat: tree-sitter, arboard clipboard, GitHub templates#57
Conversation
1. tree-sitter AST-based symbol extraction (repo_map.rs): - Rust, Python, JS/TS, Go via tree-sitter grammars - Regex fallback for unsupported languages - Accurate: no false positives from comments/strings - Binary +0.3MB (8.7→9.0M) 2. arboard cross-platform clipboard: - Replaces shell pbcopy/xclip with native arboard crate - Works on Wayland, X11, macOS, Windows 3. GitHub project health: - Issue templates (bug report, feature request) - PR template with checklist - SECURITY.md vulnerability reporting policy 274 tests, 0 fail.
📝 WalkthroughWalkthroughAdds GitHub issue and PR templates and a SECURITY.md; replaces shell-based clipboard calls with the arboard crate in mc-cli; and adds Tree-sitter-based symbol extraction with a regex fallback and increased per-file symbol cap in mc-core. Changes
Sequence DiagramsequenceDiagram
participant File as Source File
participant Detect as Language Detector
participant TS as Tree-sitter Parser
participant Collect as Symbol Collector
participant Regex as Regex Fallback
participant Out as Symbol List
File->>Detect: provide content + extension
Detect->>TS: initialize parser for language
alt Supported & parsed
TS->>TS: build AST
TS->>Collect: traverse AST
Collect->>Out: emit symbols
else Unsupported or parse failure
Detect->>Regex: invoke regex extractor
Regex->>Out: emit symbols
end
Out->>Out: truncate to cap (30)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request introduces project templates, a security policy, and significantly enhances the clipboard and symbol extraction functionality. Clipboard operations in the CLI now utilize the arboard library instead of manual shell commands. Symbol extraction in mc-core has been upgraded to use tree-sitter for Rust, Python, JavaScript/TypeScript, and Go, providing more accurate AST-based results with a regex fallback. The review feedback highlights several improvement opportunities in the tree-sitter implementation, including the need for specific TSX grammar handling, correcting field lookups for Rust impl blocks, optimizing tree traversal performance from
mc/crates/mc-core/src/repo_map.rs
Outdated
| "rs" => tree_sitter_rust::LANGUAGE, | ||
| "py" => tree_sitter_python::LANGUAGE, | ||
| "js" | "jsx" => tree_sitter_javascript::LANGUAGE, | ||
| "ts" | "tsx" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT, |
There was a problem hiding this comment.
For .tsx files, you should use tree_sitter_typescript::LANGUAGE_TSX to ensure JSX elements are parsed correctly. Using the standard TypeScript grammar for TSX files can lead to parsing errors when encountering JSX syntax.
| "ts" | "tsx" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT, | |
| "ts" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT, | |
| "tsx" => tree_sitter_typescript::LANGUAGE_TSX, |
| "function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => { | ||
| node.child_by_field_name("name").map(|n| { | ||
| let prefix = match kind { | ||
| "struct_item" => "struct ", | ||
| "enum_item" => "enum ", | ||
| "trait_item" => "trait ", | ||
| "impl_item" => "impl ", | ||
| _ => "fn ", | ||
| }; | ||
| format!("{prefix}{}", n.utf8_text(source).unwrap_or("")) | ||
| }) | ||
| } |
There was a problem hiding this comment.
In the Rust grammar, impl_item nodes do not have a name field; the type being implemented is associated with the type field (and optionally a trait field). Consequently, impl blocks are currently omitted from the symbol list because child_by_field_name("name") returns None. You should check for the type field when the node kind is impl_item.
| "function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => { | |
| node.child_by_field_name("name").map(|n| { | |
| let prefix = match kind { | |
| "struct_item" => "struct ", | |
| "enum_item" => "enum ", | |
| "trait_item" => "trait ", | |
| "impl_item" => "impl ", | |
| _ => "fn ", | |
| }; | |
| format!("{prefix}{}", n.utf8_text(source).unwrap_or("")) | |
| }) | |
| } | |
| "function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => { | |
| let name_node = if kind == "impl_item" { | |
| node.child_by_field_name("type") | |
| } else { | |
| node.child_by_field_name("name") | |
| }; | |
| name_node.map(|n| { | |
| let prefix = match kind { | |
| "struct_item" => "struct ", | |
| "enum_item" => "enum ", | |
| "trait_item" => "trait ", | |
| "impl_item" => "impl ", | |
| _ => "fn ", | |
| }; | |
| format!("{prefix}{}", n.utf8_text(source).unwrap_or("")) | |
| }) | |
| } |
| "function_declaration" => node | ||
| .child_by_field_name("name") | ||
| .map(|n| n.utf8_text(source).unwrap_or("").to_string()), | ||
| "class_declaration" => node |
There was a problem hiding this comment.
For consistency with the other supported languages (Rust, Python, Go), consider adding a prefix like fn to JavaScript/TypeScript function declarations. Currently, they are added to the repository map without any prefix, which makes the output inconsistent.
"js" | "jsx" | "ts" | "tsx" => match kind {
"function_declaration" => node
.child_by_field_name("name")
.map(|n| format!("fn {}", n.utf8_text(source).unwrap_or(""))),| (0..node.child_count()) | ||
| .filter_map(|i| node.child(i)) | ||
| .find(|c| c.kind() == "type_spec") | ||
| .and_then(|ts| ts.child_by_field_name("name")) |
There was a problem hiding this comment.
Iterating over children using node.child(i) in a loop is an child(i) performs a traversal from the start of the children list. It is significantly more efficient to use the children() iterator with a TreeCursor.
.or_else(|| {
let mut cursor = node.walk();
node.children(&mut cursor)
.find(|c| c.kind() == "type_spec")
.and_then(|ts| ts.child_by_field_name("name"))
})
mc/crates/mc-core/src/repo_map.rs
Outdated
| let mut cursor = node.walk(); | ||
| for i in 0..node.child_count() { | ||
| if let Some(child) = node.child(i) { | ||
| collect_symbols(&child, source, ext, symbols); | ||
| } | ||
| } |
There was a problem hiding this comment.
The cursor variable is currently unused, which will trigger a compiler warning. Additionally, using node.child(i) in a loop results in children() iterator is the idiomatic and efficient way to traverse the tree. You should also consider limiting recursion to container nodes (like classes or modules) and skipping function bodies to improve performance on large files.
let mut cursor = node.walk();
for child in node.children(&mut cursor) {
collect_symbols(&child, source, ext, symbols);
}There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
.github/pull_request_template.md (1)
10-13: Add Clippy to the checklist.The current checklist stops at
cargo test, so contributors can mark it complete without running the repo’s expected lint pass.♻️ Suggested update
- - [ ] `cargo fmt --all && cargo test --workspace` passes + - [ ] `cargo fmt --all && cargo test --workspace && cargo clippy --workspace --all-targets` passesBased on learnings, Run
cargo test --workspace && cargo clippy --workspace --all-targetsbefore committing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/pull_request_template.md around lines 10 - 13, Update the PR checklist to include running Clippy: add a new checkbox alongside the existing `cargo fmt --all && cargo test --workspace` entry to require `cargo clippy --workspace --all-targets` (or combine into a single checklist item such as `cargo test --workspace && cargo clippy --workspace --all-targets`) so contributors must run the lint pass before marking the checklist complete; modify the checklist in .github/pull_request_template.md near the `cargo fmt`/`cargo test` lines and ensure the wording matches the repo’s CI invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mc/crates/mc-core/src/repo_map.rs`:
- Around line 214-216: collect_symbols() currently walks the full subtree so
nested declarations are being indexed; replace the call that builds symbols
(collect_symbols(&root, bytes, ext, &mut symbols)) with a top-level-only
collection: either add a depth/only_top_level flag to collect_symbols or
implement a new collect_top_level_symbols(root, bytes, ext, &mut symbols) that
only emits children immediately under root (i.e., does not recurse into child
blocks/classes/functions), then use that to populate symbols (and do the same
replacement for the other occurrences referenced in the diff) so only file-level
declarations are included before truncation to 30 entries.
- Around line 280-285: The unused variable `cursor` created by `node.walk()`
should be removed to avoid a warning/error; delete the `let mut cursor =
node.walk();` line and keep the loop that iterates over `node.child_count()` and
calls `collect_symbols(&child, source, ext, symbols)` as-is so `node`,
`collect_symbols`, `node.child_count`, and `node.child` remain unchanged.
- Around line 198-206: The extract_symbols_treesitter function currently maps
both "ts" and "tsx" to tree_sitter_typescript::LANGUAGE_TYPESCRIPT which
misparses JSX; update the match arm so "ts" uses
tree_sitter_typescript::LANGUAGE_TYPESCRIPT and "tsx" uses
tree_sitter_typescript::LANGUAGE_TSX (or the equivalent TSX constant from the
crate) so .tsx files get the TSX grammar; adjust the match clause that handles
"ts" | "tsx" into two separate arms and ensure the LANGUAGE_TSX symbol is
imported/available.
---
Nitpick comments:
In @.github/pull_request_template.md:
- Around line 10-13: Update the PR checklist to include running Clippy: add a
new checkbox alongside the existing `cargo fmt --all && cargo test --workspace`
entry to require `cargo clippy --workspace --all-targets` (or combine into a
single checklist item such as `cargo test --workspace && cargo clippy
--workspace --all-targets`) so contributors must run the lint pass before
marking the checklist complete; modify the checklist in
.github/pull_request_template.md near the `cargo fmt`/`cargo test` lines and
ensure the wording matches the repo’s CI invocation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b8333364-f38e-45d5-86a3-89bd2e100e81
⛔ Files ignored due to path filters (1)
mc/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
.github/ISSUE_TEMPLATE/bug_report.yml.github/ISSUE_TEMPLATE/feature_request.yml.github/pull_request_template.mdSECURITY.mdmc/crates/mc-cli/Cargo.tomlmc/crates/mc-cli/src/main.rsmc/crates/mc-core/Cargo.tomlmc/crates/mc-core/src/repo_map.rs
| let mut symbols = Vec::new(); | ||
| collect_symbols(&root, bytes, ext, &mut symbols); | ||
| Some(symbols) |
There was a problem hiding this comment.
This now indexes nested declarations, not just top-level symbols.
collect_symbols() walks the entire subtree, so nested Python defs/classes and other non-file-level declarations are emitted the same way as top-level items. Because Line 194 truncates to 30 entries, nested symbols can crowd out later top-level declarations and change ranking quality.
Also applies to: 219-285
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mc/crates/mc-core/src/repo_map.rs` around lines 214 - 216, collect_symbols()
currently walks the full subtree so nested declarations are being indexed;
replace the call that builds symbols (collect_symbols(&root, bytes, ext, &mut
symbols)) with a top-level-only collection: either add a depth/only_top_level
flag to collect_symbols or implement a new collect_top_level_symbols(root,
bytes, ext, &mut symbols) that only emits children immediately under root (i.e.,
does not recurse into child blocks/classes/functions), then use that to populate
symbols (and do the same replacement for the other occurrences referenced in the
diff) so only file-level declarations are included before truncation to 30
entries.
| "rs" => match kind { | ||
| "function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => { | ||
| node.child_by_field_name("name").map(|n| { | ||
| let prefix = match kind { | ||
| "struct_item" => "struct ", | ||
| "enum_item" => "enum ", | ||
| "trait_item" => "trait ", | ||
| "impl_item" => "impl ", | ||
| _ => "fn ", | ||
| }; | ||
| format!("{prefix}{}", n.utf8_text(source).unwrap_or("")) | ||
| }) | ||
| } | ||
| _ => None, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd mc/crates/mc-core/src && head -n 250 repo_map.rs | tail -n +200Repository: kienbui1995/mc-code
Length of output: 2106
🏁 Script executed:
cd mc/crates/mc-core/src && grep -B 10 -A 20 "collect_symbols" repo_map.rs | head -n 50Repository: kienbui1995/mc-code
Length of output: 1621
🏁 Script executed:
cd mc && git log --oneline --all -- "crates/mc-core/src/repo_map.rs" | head -20Repository: kienbui1995/mc-code
Length of output: 441
🏁 Script executed:
cd mc && git show 6785b7e:crates/mc-core/src/repo_map.rs | head -n 300 | tail -n +220Repository: kienbui1995/mc-code
Length of output: 270
🏁 Script executed:
cd mc && git show 6785b7e:mc/crates/mc-core/src/repo_map.rs | sed -n '220,250p'Repository: kienbui1995/mc-code
Length of output: 1393
🏁 Script executed:
cd mc && git show 91a5b7b:mc/crates/mc-core/src/repo_map.rs 2>/dev/null | head -n 50 || echo "File not found in that commit"Repository: kienbui1995/mc-code
Length of output: 1654
🏁 Script executed:
cd mc && git show 91a5b7b:mc/crates/mc-core/src/repo_map.rs | grep -A 50 "fn extract_symbols" | head -n 60Repository: kienbui1995/mc-code
Length of output: 1638
🏁 Script executed:
cd mc && web_search "tree-sitter rust grammar impl_item name field"Repository: kienbui1995/mc-code
Length of output: 113
🏁 Script executed:
python3 <<'EOF'
import json
import subprocess
# Check if tree-sitter-cli or docs are available
result = subprocess.run(["find", ".", "-name", "*tree*sitter*", "-type", "d"],
capture_output=True, text=True, timeout=5)
print("Tree-sitter directories found:")
print(result.stdout[:500])
EOFRepository: kienbui1995/mc-code
Length of output: 94
🌐 Web query:
tree-sitter rust grammar impl_item name field structure
💡 Result:
In the tree-sitter-rust grammar (https://github.com/tree-sitter/tree-sitter-rust), "impl_item" is a named node type representing an impl block in Rust source code, such as "impl Trait for Type { ... }". The exact field structure (named children like "name") is defined in the grammar.js file and reflected in src/node-types.json, which lists node types, subtypes, fields, and children for the generated parser. Node types JSON follows the standard Tree-sitter format with "type", "named", "fields" (named children), and "children" (anonymous/positional children). To access fields in Rust API: - Use node.child_by_field_name("name") to get the named child node for the impl name. - Fields are assigned in grammar rules using field("name", $.identifier). The grammar repository contains grammar.js (rules), grammar.json (compiled), and node-types.json (structure). tree-sitter-rust crate exports NODE_TYPES constant with this JSON. Crates like tss-rust generate enums from it, confirming "impl_item" exists among 280+ node types. Example usage: let mut parser = Parser::new; parser.set_language(&tree_sitter_rust::LANGUAGE).unwrap; let tree = parser.parse(r#" impl MyTrait for MyType { fn method(&self) {} } "#, None).unwrap; let impl_node = tree.root_node.child_by_field_name("item").unwrap.child_by_field_name("impl_item").unwrap; // Adjust path let name_node = impl_node.child_by_field_name("name").unwrap; Precise fields require viewing node-types.json in repo src/.
Citations:
- 1: https://github.com/tree-sitter/tree-sitter-rust
- 2: https://github.com/tree-sitter/tree-sitter-rust/blob/master/queries/highlights.scm
- 3: https://docs.rs/tree-sitter/latest/tree_sitter/struct.Node.html
- 4: https://tree-sitter.github.io/tree-sitter/creating-parsers/3-writing-the-grammar.html
- 5: https://docs.rs/tree-sitter/*/tree_sitter/struct.Language.html
- 6: Documentation about tree node fields tree-sitter/tree-sitter#416
- 7: https://mintlify.com/tree-sitter/tree-sitter/api/rust/language
- 8: https://docs.rs/tss-rust/latest/tree_sitter_symbols_rust/
- 9: https://docs.rs/tss-rust/latest/src/tree_sitter_symbols_rust/lib.rs.html
- 10: https://github.com/hydro-project/rust-sitter
- 11: https://www.npmjs.com/package/tree-sitter-rust
🏁 Script executed:
cd mc && grep -r "tree-sitter-rust" Cargo.tomlRepository: kienbui1995/mc-code
Length of output: 45
🏁 Script executed:
cd mc && find . -name "Cargo.toml" -exec grep -l "tree" {} \;Repository: kienbui1995/mc-code
Length of output: 90
🏁 Script executed:
cd mc && cat crates/mc-core/Cargo.toml | grep -A 5 -B 5 "tree-sitter"Repository: kienbui1995/mc-code
Length of output: 397
🌐 Web query:
tree-sitter-rust impl_item node-types.json name field
💡 Result:
In the tree-sitter-rust grammar, the impl_item node represents items declared within an impl block (e.g., fn, const, type aliases inside impl Trait for Type { ... }). The node-types.json file, generated from the grammar.js, defines the structure of all nodes including impl_item. While the exact content of tree-sitter-rust's node-types.json for impl_item is not directly quoted in search results, Tree-sitter node definitions follow a standard format where named fields like "name" are explicitly listed in the "fields" object if defined in the grammar using field("name", rule). To access the "name" field of an impl_item node in Rust code: - Use node.child_by_field_name("name") or node.field("name").child to get the child node representing the name (typically an identifier). - First confirm the field exists via language.field_id_for_name("name"), where language is tree_sitter_rust::LANGUAGE. The tss-rust crate generates type-safe enums from node-types.json, confirming impl_item exists among 280+ node types with metadata like fields accessible via features like "meta_fields". Example from similar grammars (e.g., method_definition) shows fields like "name": {"multiple": false, "required": true, "types": [{"type": "identifier", "named": true}] }. The grammar.js (at https://github.com/tree-sitter/tree-sitter-rust/blob/master/src/grammar.js) likely defines impl_item with field("name", ...) for the item name, standard for Rust item nodes (function_item, struct_item, etc.).
Citations:
- 1: https://tree-sitter.github.io/tree-sitter/using-parsers/6-static-node-types
- 2: https://docs.rs/tss-rust/latest/tree_sitter_symbols_rust/
- 3: https://docs.rs/tree-sitter-javascript/%3E=0.20/constant.NODE_TYPES.html
- 4: https://mintlify.com/tree-sitter/tree-sitter/api/rust/language
- 5: https://github.com/tree-sitter/tree-sitter-rust/blob/master/queries/highlights.scm
- 6: https://github.com/tree-sitter/tree-sitter-rust/
- 7: https://www.npmjs.com/package/tree-sitter-rust
🏁 Script executed:
cd mc && python3 <<'EOF'
# Let's check what tree-sitter-rust version 0.23 defines for impl_item
# by looking at the crate documentation or source
import subprocess
import json
# Try to find tree-sitter docs or check the grammar directly
result = subprocess.run(
["rg", "impl_item", ".", "-t", "rs", "-A", "3", "-B", "1"],
capture_output=True,
text=True,
timeout=10
)
print("References to impl_item in the codebase:")
print(result.stdout[:2000] if result.stdout else "No matches")
EOFRepository: kienbui1995/mc-code
Length of output: 114
🏁 Script executed:
cd mc && python3 <<'EOF'
# Check if we can inspect tree-sitter-rust node types at runtime
# by looking at what the actual grammar defines
result = subprocess.run(
["rg", "\"impl", ".", "--max-count", "20"],
capture_output=True,
text=True,
timeout=5
)
print("Examples of impl usage:")
print(result.stdout[:1500] if result.stdout else "No examples")
EOFRepository: kienbui1995/mc-code
Length of output: 179
🏁 Script executed:
cd mc && find . -name "*.rs" -type f -exec grep -l "impl " {} \; | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -50 {}'Repository: kienbui1995/mc-code
Length of output: 9082
🌐 Web query:
tree-sitter-rust grammar.js impl_item field definition
💡 Result:
The tree-sitter-rust grammar defines impl_item as a non-terminal node in its JavaScript grammar file (grammar.js) located at https://github.com/tree-sitter/tree-sitter-rust/blob/master/src/grammar.js. This node represents Rust's impl blocks, such as impl Trait for Type { ... } or inherent implementations. From the highlights.scm query file, impl is highlighted as a keyword, and impl_item is referenced in usage examples like Emacs tree-sitter configurations for identifying impl blocks (e.g., (rust-mode . (impl_item))). The grammar.js uses Tree-sitter's DSL with functions like seq, choice, field, etc., to define impl_item. Likely structure (inferred from typical Tree-sitter Rust grammar and DSL docs): impl_item: $ => choice( field('inherent',
Citations:
- 1: https://github.com/tree-sitter/tree-sitter-rust
- 2: https://github.com/tree-sitter/tree-sitter-rust/blob/master/queries/highlights.scm
- 3: https://github.com/tree-sitter/tree-sitter-rust/
- 4: https://www.npmjs.com/package/tree-sitter-rust
- 5: https://github.com/tree-sitter/tree-sitter-rust/tree/3a56481f8d13b6874a28752502a58520b9139dc7
- 6: https://tree-sitter.github.io/tree-sitter/creating-parsers/2-the-grammar-dsl.html
- 7: https://linkedin.com/in/jakub-dupak
Rust impl extraction may silently drop symbols if Tree-sitter impl_item lacks a stable name field.
The previous regex-based implementation (commit 91a5b7b) extracted impl symbols by matching the "impl " prefix and splitting on delimiters, capturing all impl blocks. The current Tree-sitter-based approach at lines 222–235 calls node.child_by_field_name("name") on impl_item nodes. If this field is absent, optional, or varies across impl variants (inherent impls, generic impls, trait impls), child_by_field_name() returns None, causing the .map() operation to return None without adding a symbol to the vec. This is a silent behavioral change from the old implementation. Verify that all Rust impl block variants reliably expose a name field in the tree-sitter-rust grammar, or add fallback extraction logic to handle impl blocks without this field.
… tsx grammar 1. PR template: add clippy to checklist 2. Remove unused cursor variable in collect_symbols 3. TSX uses LANGUAGE_TSX (was using LANGUAGE_TYPESCRIPT) 274 tests, 0 fail.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
mc/crates/mc-core/src/repo_map.rs (2)
224-235:⚠️ Potential issue | 🟠 MajorRust
implblocks may be skipped entirely here.This branch only emits an
implsymbol whenchild_by_field_name("name")succeeds, butimplblocks do not have a simple identifier in source the waystructorfndo. That makes this a likely regression from the old regex path, which did indeximpl ...lines. Please verify the pinnedtree-sitter-rustgrammar and fall back to the target type / trait when nonamefield exists.#!/bin/bash set -euo pipefail version=$(python3 - <<'PY' import pathlib, re text = pathlib.Path("mc/crates/mc-core/Cargo.toml").read_text() m = re.search(r'tree-sitter-rust\s*=\s*"(?:[=^~])?([^"]+)"', text) print(m.group(1) if m else "") PY ) for ref in "v${version}" "${version}" "master"; do url="https://raw.githubusercontent.com/tree-sitter/tree-sitter-rust/${ref}/src/node-types.json" if curl -fsSL "$url" -o /tmp/tree-sitter-rust-node-types.json; then echo "Fetched $url" break fi done python3 - <<'PY' import json with open("/tmp/tree-sitter-rust-node-types.json") as fh: nodes = json.load(fh) impl_item = next(node for node in nodes if node.get("type") == "impl_item") print("impl_item fields:", sorted(impl_item.get("fields", {}).keys())) print("impl_item children types:", [t["type"] for t in impl_item.get("children", {}).get("types", [])]) PYExpected result: if
nameis not listed inimpl_item fields, the current lookup can never produce animplsymbol.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mc/crates/mc-core/src/repo_map.rs` around lines 224 - 235, The impl_item branch in repo_map.rs only emits a symbol when child_by_field_name("name") exists, which misses impl blocks; update that branch so when kind == "impl_item" and node.child_by_field_name("name") is None you instead look up the impl's target children (e.g. child_by_field_name("type") and child_by_field_name("trait") or similarly named fields from the tree-sitter-rust grammar) and build a fallback symbol (examples: "impl <Trait> for <Type>" or "impl <Type>") by extracting utf8_text(source) from those child nodes; ensure you still fall back to the existing "fn"/"struct"/etc. behavior for other kinds and handle missing children gracefully.
220-285:⚠️ Potential issue | 🟠 MajorKeep symbol collection at file scope.
collect_symbols()walks the full subtree, so nested Python defs/classes and Rust items insideimplblocks are emitted the same way as top-level declarations. Because the list is truncated later, nested items can crowd out later file-level symbols and materially change ranking quality. Limit this to root-level declarations, with explicit handling only for wrapper nodes like exports.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mc/crates/mc-core/src/repo_map.rs` around lines 220 - 285, collect_symbols currently descends the entire subtree so nested defs (e.g. Python functions/classes inside classes or Rust items inside impls) get emitted; change it to only emit top-level declarations by checking node.parent() (or by only iterating the root node's direct children) and skip adding symbols for nodes whose parent is not the file/module root, but still recurse into explicit wrapper nodes like "export_statement", module/file nodes, or other known wrappers so their direct children are considered; update collect_symbols (and any callers) to perform this root-level check (or add a new helper that handles a single-level scan) and keep handling of ext-specific kinds ("function_definition", "class_definition", "impl_item", "function_item", etc.) as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mc/crates/mc-core/src/repo_map.rs`:
- Around line 190-191: The Tree-sitter branch currently discards TypeScript
interface-only declarations because symbols is assigned from
extract_symbols_treesitter(ext, &content).unwrap_or_else(||
extract_symbols_regex(&content)); — either update extract_symbols_treesitter to
also collect `export interface` nodes (handle TS/TSX
InterfaceDeclaration/TSInterfaceDeclaration in the AST traversal used there) or,
alternatively, call extract_symbols_regex(&content) and merge its results into
the Tree-sitter result when present (e.g., if extract_symbols_treesitter returns
Some(vec), extend that vec with any missing regex-only symbols before assigning
to symbols); also apply the same merge/collection logic to the other similar
block referenced (the code around the 247-255 region) so interface-only files
aren’t dropped.
---
Duplicate comments:
In `@mc/crates/mc-core/src/repo_map.rs`:
- Around line 224-235: The impl_item branch in repo_map.rs only emits a symbol
when child_by_field_name("name") exists, which misses impl blocks; update that
branch so when kind == "impl_item" and node.child_by_field_name("name") is None
you instead look up the impl's target children (e.g. child_by_field_name("type")
and child_by_field_name("trait") or similarly named fields from the
tree-sitter-rust grammar) and build a fallback symbol (examples: "impl <Trait>
for <Type>" or "impl <Type>") by extracting utf8_text(source) from those child
nodes; ensure you still fall back to the existing "fn"/"struct"/etc. behavior
for other kinds and handle missing children gracefully.
- Around line 220-285: collect_symbols currently descends the entire subtree so
nested defs (e.g. Python functions/classes inside classes or Rust items inside
impls) get emitted; change it to only emit top-level declarations by checking
node.parent() (or by only iterating the root node's direct children) and skip
adding symbols for nodes whose parent is not the file/module root, but still
recurse into explicit wrapper nodes like "export_statement", module/file nodes,
or other known wrappers so their direct children are considered; update
collect_symbols (and any callers) to perform this root-level check (or add a new
helper that handles a single-level scan) and keep handling of ext-specific kinds
("function_definition", "class_definition", "impl_item", "function_item", etc.)
as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ee596355-a2e8-47b0-8550-8624831c4db2
📒 Files selected for processing (2)
.github/pull_request_template.mdmc/crates/mc-core/src/repo_map.rs
✅ Files skipped from review due to trivial changes (1)
- .github/pull_request_template.md
| let symbols = extract_symbols_treesitter(ext, &content) | ||
| .unwrap_or_else(|| extract_symbols_regex(&content)); |
There was a problem hiding this comment.
TypeScript interfaces drop out of indexing on the AST path.
The regex fallback below still extracts export interface ..., but supported .ts/.tsx files never reach that path once Tree-sitter parsing succeeds. As written, interface-only files now return no symbols, which is a real search-quality regression for TS-heavy repos. Please either collect interfaces in the AST branch too, or merge in the regex-only constructs before truncating.
Also applies to: 247-255
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mc/crates/mc-core/src/repo_map.rs` around lines 190 - 191, The Tree-sitter
branch currently discards TypeScript interface-only declarations because symbols
is assigned from extract_symbols_treesitter(ext, &content).unwrap_or_else(||
extract_symbols_regex(&content)); — either update extract_symbols_treesitter to
also collect `export interface` nodes (handle TS/TSX
InterfaceDeclaration/TSInterfaceDeclaration in the AST traversal used there) or,
alternatively, call extract_symbols_regex(&content) and merge its results into
the Tree-sitter result when present (e.g., if extract_symbols_treesitter returns
Some(vec), extend that vec with any missing regex-only symbols before assigning
to symbols); also apply the same merge/collection logic to the other similar
block referenced (the code around the 247-255 region) so interface-only files
aren’t dropped.
Leveraging open-source ecosystem
tree-sitter AST symbol extraction
Replaces regex-based symbol extraction with proper AST parsing:
arboard clipboard
Cross-platform clipboard (replaces shell
pbcopy/xclip):GitHub project health
274 tests, 0 fail.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation