Skip to content

feat: tree-sitter, arboard clipboard, GitHub templates#57

Merged
kienbui1995 merged 2 commits intomainfrom
feat/tree-sitter-gh-templates
Apr 13, 2026
Merged

feat: tree-sitter, arboard clipboard, GitHub templates#57
kienbui1995 merged 2 commits intomainfrom
feat/tree-sitter-gh-templates

Conversation

@kienbui1995
Copy link
Copy Markdown
Owner

@kienbui1995 kienbui1995 commented Apr 13, 2026

Leveraging open-source ecosystem

tree-sitter AST symbol extraction

Replaces regex-based symbol extraction with proper AST parsing:

  • Rust: function_item, struct_item, enum_item, trait_item, impl_item
  • Python: function_definition, class_definition
  • JS/TS: function_declaration, class_declaration
  • Go: function_declaration, type_declaration
  • Regex fallback for unsupported languages
  • Binary: 8.7MB → 9.0MB (+0.3MB)

arboard clipboard

Cross-platform clipboard (replaces shell pbcopy/xclip):

  • Works on Wayland, X11, macOS, Windows
  • No external tool dependency

GitHub project health

  • Bug report + feature request issue templates
  • PR template with checklist
  • SECURITY.md

274 tests, 0 fail.

Summary by CodeRabbit

  • New Features

    • Multi-language symbol extraction for improved code analysis across Rust, Python, JavaScript/TypeScript, and Go.
  • Bug Fixes

    • Improved clipboard handling with more reliable behavior and clearer error reporting.
  • Documentation

    • Added a security policy for vulnerability reporting and response expectations.
    • Added standardized issue and pull-request templates for bug reports, feature requests, and PRs.

1. tree-sitter AST-based symbol extraction (repo_map.rs):
   - Rust, Python, JS/TS, Go via tree-sitter grammars
   - Regex fallback for unsupported languages
   - Accurate: no false positives from comments/strings
   - Binary +0.3MB (8.7→9.0M)

2. arboard cross-platform clipboard:
   - Replaces shell pbcopy/xclip with native arboard crate
   - Works on Wayland, X11, macOS, Windows

3. GitHub project health:
   - Issue templates (bug report, feature request)
   - PR template with checklist
   - SECURITY.md vulnerability reporting policy

274 tests, 0 fail.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Adds GitHub issue and PR templates and a SECURITY.md; replaces shell-based clipboard calls with the arboard crate in mc-cli; and adds Tree-sitter-based symbol extraction with a regex fallback and increased per-file symbol cap in mc-core.

Changes

Cohort / File(s) Summary
GitHub Configuration
.github/ISSUE_TEMPLATE/bug_report.yml, .github/ISSUE_TEMPLATE/feature_request.yml, .github/pull_request_template.md, SECURITY.md
Add bug/feature issue templates and a PR template with checklist; add SECURITY.md describing vulnerability reporting and response timelines.
CLI Clipboard Integration
mc/crates/mc-cli/Cargo.toml, mc/crates/mc-cli/src/main.rs
Add arboard dependency and replace OS-specific shell clipboard subprocesses with arboard::Clipboard; surface clipboard errors to app UI instead of ignoring.
Core Parsing Enhancement
mc/crates/mc-core/Cargo.toml, mc/crates/mc-core/src/repo_map.rs
Add tree-sitter and language grammars; implement Tree-sitter-based symbol extraction with helpers and fallback to existing regex extractor; remove per-extractor cap and apply a global per-file cap increased to 30.

Sequence Diagram

sequenceDiagram
    participant File as Source File
    participant Detect as Language Detector
    participant TS as Tree-sitter Parser
    participant Collect as Symbol Collector
    participant Regex as Regex Fallback
    participant Out as Symbol List

    File->>Detect: provide content + extension
    Detect->>TS: initialize parser for language
    alt Supported & parsed
        TS->>TS: build AST
        TS->>Collect: traverse AST
        Collect->>Out: emit symbols
    else Unsupported or parse failure
        Detect->>Regex: invoke regex extractor
        Regex->>Out: emit symbols
    end
    Out->>Out: truncate to cap (30)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 With tiny paws I tweak and hop,

clipboards fixed—no shell to stop.
I parse the trees and hunt each name,
fall back to regex when it's tame.
Security doors closed, I dance—hip hop! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: tree-sitter integration, arboard clipboard replacement, and GitHub templates addition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/tree-sitter-gh-templates

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 13, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedtree-sitter@​0.24.75810093100100
Addedtree-sitter-typescript@​0.23.291100100100100
Addedtree-sitter-python@​0.23.691100100100100
Addedtree-sitter-javascript@​0.23.191100100100100
Addedtree-sitter-go@​0.23.491100100100100
Addedtree-sitter-rust@​0.23.391100100100100

View full report

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces project templates, a security policy, and significantly enhances the clipboard and symbol extraction functionality. Clipboard operations in the CLI now utilize the arboard library instead of manual shell commands. Symbol extraction in mc-core has been upgraded to use tree-sitter for Rust, Python, JavaScript/TypeScript, and Go, providing more accurate AST-based results with a regex fallback. The review feedback highlights several improvement opportunities in the tree-sitter implementation, including the need for specific TSX grammar handling, correcting field lookups for Rust impl blocks, optimizing tree traversal performance from $O(N^2)$ to $O(N)$, and ensuring consistent symbol naming across languages.

"rs" => tree_sitter_rust::LANGUAGE,
"py" => tree_sitter_python::LANGUAGE,
"js" | "jsx" => tree_sitter_javascript::LANGUAGE,
"ts" | "tsx" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For .tsx files, you should use tree_sitter_typescript::LANGUAGE_TSX to ensure JSX elements are parsed correctly. Using the standard TypeScript grammar for TSX files can lead to parsing errors when encountering JSX syntax.

Suggested change
"ts" | "tsx" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT,
"ts" => tree_sitter_typescript::LANGUAGE_TYPESCRIPT,
"tsx" => tree_sitter_typescript::LANGUAGE_TSX,

Comment on lines +223 to +234
"function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => {
node.child_by_field_name("name").map(|n| {
let prefix = match kind {
"struct_item" => "struct ",
"enum_item" => "enum ",
"trait_item" => "trait ",
"impl_item" => "impl ",
_ => "fn ",
};
format!("{prefix}{}", n.utf8_text(source).unwrap_or(""))
})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the Rust grammar, impl_item nodes do not have a name field; the type being implemented is associated with the type field (and optionally a trait field). Consequently, impl blocks are currently omitted from the symbol list because child_by_field_name("name") returns None. You should check for the type field when the node kind is impl_item.

Suggested change
"function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => {
node.child_by_field_name("name").map(|n| {
let prefix = match kind {
"struct_item" => "struct ",
"enum_item" => "enum ",
"trait_item" => "trait ",
"impl_item" => "impl ",
_ => "fn ",
};
format!("{prefix}{}", n.utf8_text(source).unwrap_or(""))
})
}
"function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => {
let name_node = if kind == "impl_item" {
node.child_by_field_name("type")
} else {
node.child_by_field_name("name")
};
name_node.map(|n| {
let prefix = match kind {
"struct_item" => "struct ",
"enum_item" => "enum ",
"trait_item" => "trait ",
"impl_item" => "impl ",
_ => "fn ",
};
format!("{prefix}{}", n.utf8_text(source).unwrap_or(""))
})
}

Comment on lines +247 to +250
"function_declaration" => node
.child_by_field_name("name")
.map(|n| n.utf8_text(source).unwrap_or("").to_string()),
"class_declaration" => node
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the other supported languages (Rust, Python, Go), consider adding a prefix like fn to JavaScript/TypeScript function declarations. Currently, they are added to the repository map without any prefix, which makes the output inconsistent.

        "js" | "jsx" | "ts" | "tsx" => match kind {
            "function_declaration" => node
                .child_by_field_name("name")
                .map(|n| format!("fn {}", n.utf8_text(source).unwrap_or(""))),

Comment on lines +263 to +266
(0..node.child_count())
.filter_map(|i| node.child(i))
.find(|c| c.kind() == "type_spec")
.and_then(|ts| ts.child_by_field_name("name"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Iterating over children using node.child(i) in a loop is an $O(N^2)$ operation in tree-sitter because child(i) performs a traversal from the start of the children list. It is significantly more efficient to use the children() iterator with a TreeCursor.

                .or_else(|| {
                    let mut cursor = node.walk();
                    node.children(&mut cursor)
                        .find(|c| c.kind() == "type_spec")
                        .and_then(|ts| ts.child_by_field_name("name"))
                })

Comment on lines +280 to +285
let mut cursor = node.walk();
for i in 0..node.child_count() {
if let Some(child) = node.child(i) {
collect_symbols(&child, source, ext, symbols);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cursor variable is currently unused, which will trigger a compiler warning. Additionally, using node.child(i) in a loop results in $O(N^2)$ complexity for nodes with many children (like the root node). Using the children() iterator is the idiomatic and efficient way to traverse the tree. You should also consider limiting recursion to container nodes (like classes or modules) and skipping function bodies to improve performance on large files.

    let mut cursor = node.walk();
    for child in node.children(&mut cursor) {
        collect_symbols(&child, source, ext, symbols);
    }

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
.github/pull_request_template.md (1)

10-13: Add Clippy to the checklist.

The current checklist stops at cargo test, so contributors can mark it complete without running the repo’s expected lint pass.

♻️ Suggested update
- - [ ] `cargo fmt --all && cargo test --workspace` passes
+ - [ ] `cargo fmt --all && cargo test --workspace && cargo clippy --workspace --all-targets` passes

Based on learnings, Run cargo test --workspace && cargo clippy --workspace --all-targets before committing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/pull_request_template.md around lines 10 - 13, Update the PR
checklist to include running Clippy: add a new checkbox alongside the existing
`cargo fmt --all && cargo test --workspace` entry to require `cargo clippy
--workspace --all-targets` (or combine into a single checklist item such as
`cargo test --workspace && cargo clippy --workspace --all-targets`) so
contributors must run the lint pass before marking the checklist complete;
modify the checklist in .github/pull_request_template.md near the `cargo
fmt`/`cargo test` lines and ensure the wording matches the repo’s CI invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mc/crates/mc-core/src/repo_map.rs`:
- Around line 214-216: collect_symbols() currently walks the full subtree so
nested declarations are being indexed; replace the call that builds symbols
(collect_symbols(&root, bytes, ext, &mut symbols)) with a top-level-only
collection: either add a depth/only_top_level flag to collect_symbols or
implement a new collect_top_level_symbols(root, bytes, ext, &mut symbols) that
only emits children immediately under root (i.e., does not recurse into child
blocks/classes/functions), then use that to populate symbols (and do the same
replacement for the other occurrences referenced in the diff) so only file-level
declarations are included before truncation to 30 entries.
- Around line 280-285: The unused variable `cursor` created by `node.walk()`
should be removed to avoid a warning/error; delete the `let mut cursor =
node.walk();` line and keep the loop that iterates over `node.child_count()` and
calls `collect_symbols(&child, source, ext, symbols)` as-is so `node`,
`collect_symbols`, `node.child_count`, and `node.child` remain unchanged.
- Around line 198-206: The extract_symbols_treesitter function currently maps
both "ts" and "tsx" to tree_sitter_typescript::LANGUAGE_TYPESCRIPT which
misparses JSX; update the match arm so "ts" uses
tree_sitter_typescript::LANGUAGE_TYPESCRIPT and "tsx" uses
tree_sitter_typescript::LANGUAGE_TSX (or the equivalent TSX constant from the
crate) so .tsx files get the TSX grammar; adjust the match clause that handles
"ts" | "tsx" into two separate arms and ensure the LANGUAGE_TSX symbol is
imported/available.

---

Nitpick comments:
In @.github/pull_request_template.md:
- Around line 10-13: Update the PR checklist to include running Clippy: add a
new checkbox alongside the existing `cargo fmt --all && cargo test --workspace`
entry to require `cargo clippy --workspace --all-targets` (or combine into a
single checklist item such as `cargo test --workspace && cargo clippy
--workspace --all-targets`) so contributors must run the lint pass before
marking the checklist complete; modify the checklist in
.github/pull_request_template.md near the `cargo fmt`/`cargo test` lines and
ensure the wording matches the repo’s CI invocation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8333364-f38e-45d5-86a3-89bd2e100e81

📥 Commits

Reviewing files that changed from the base of the PR and between 6a6ea88 and 6785b7e.

⛔ Files ignored due to path filters (1)
  • mc/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • .github/ISSUE_TEMPLATE/bug_report.yml
  • .github/ISSUE_TEMPLATE/feature_request.yml
  • .github/pull_request_template.md
  • SECURITY.md
  • mc/crates/mc-cli/Cargo.toml
  • mc/crates/mc-cli/src/main.rs
  • mc/crates/mc-core/Cargo.toml
  • mc/crates/mc-core/src/repo_map.rs

Comment on lines +214 to +216
let mut symbols = Vec::new();
collect_symbols(&root, bytes, ext, &mut symbols);
Some(symbols)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This now indexes nested declarations, not just top-level symbols.

collect_symbols() walks the entire subtree, so nested Python defs/classes and other non-file-level declarations are emitted the same way as top-level items. Because Line 194 truncates to 30 entries, nested symbols can crowd out later top-level declarations and change ranking quality.

Also applies to: 219-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mc/crates/mc-core/src/repo_map.rs` around lines 214 - 216, collect_symbols()
currently walks the full subtree so nested declarations are being indexed;
replace the call that builds symbols (collect_symbols(&root, bytes, ext, &mut
symbols)) with a top-level-only collection: either add a depth/only_top_level
flag to collect_symbols or implement a new collect_top_level_symbols(root,
bytes, ext, &mut symbols) that only emits children immediately under root (i.e.,
does not recurse into child blocks/classes/functions), then use that to populate
symbols (and do the same replacement for the other occurrences referenced in the
diff) so only file-level declarations are included before truncation to 30
entries.

Comment on lines +222 to +235
"rs" => match kind {
"function_item" | "struct_item" | "enum_item" | "trait_item" | "impl_item" => {
node.child_by_field_name("name").map(|n| {
let prefix = match kind {
"struct_item" => "struct ",
"enum_item" => "enum ",
"trait_item" => "trait ",
"impl_item" => "impl ",
_ => "fn ",
};
format!("{prefix}{}", n.utf8_text(source).unwrap_or(""))
})
}
_ => None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd mc/crates/mc-core/src && head -n 250 repo_map.rs | tail -n +200

Repository: kienbui1995/mc-code

Length of output: 2106


🏁 Script executed:

cd mc/crates/mc-core/src && grep -B 10 -A 20 "collect_symbols" repo_map.rs | head -n 50

Repository: kienbui1995/mc-code

Length of output: 1621


🏁 Script executed:

cd mc && git log --oneline --all -- "crates/mc-core/src/repo_map.rs" | head -20

Repository: kienbui1995/mc-code

Length of output: 441


🏁 Script executed:

cd mc && git show 6785b7e:crates/mc-core/src/repo_map.rs | head -n 300 | tail -n +220

Repository: kienbui1995/mc-code

Length of output: 270


🏁 Script executed:

cd mc && git show 6785b7e:mc/crates/mc-core/src/repo_map.rs | sed -n '220,250p'

Repository: kienbui1995/mc-code

Length of output: 1393


🏁 Script executed:

cd mc && git show 91a5b7b:mc/crates/mc-core/src/repo_map.rs 2>/dev/null | head -n 50 || echo "File not found in that commit"

Repository: kienbui1995/mc-code

Length of output: 1654


🏁 Script executed:

cd mc && git show 91a5b7b:mc/crates/mc-core/src/repo_map.rs | grep -A 50 "fn extract_symbols" | head -n 60

Repository: kienbui1995/mc-code

Length of output: 1638


🏁 Script executed:

cd mc && web_search "tree-sitter rust grammar impl_item name field"

Repository: kienbui1995/mc-code

Length of output: 113


🏁 Script executed:

python3 <<'EOF'
import json
import subprocess

# Check if tree-sitter-cli or docs are available
result = subprocess.run(["find", ".", "-name", "*tree*sitter*", "-type", "d"], 
                       capture_output=True, text=True, timeout=5)
print("Tree-sitter directories found:")
print(result.stdout[:500])
EOF

Repository: kienbui1995/mc-code

Length of output: 94


🌐 Web query:

tree-sitter rust grammar impl_item name field structure

💡 Result:

In the tree-sitter-rust grammar (https://github.com/tree-sitter/tree-sitter-rust), "impl_item" is a named node type representing an impl block in Rust source code, such as "impl Trait for Type { ... }". The exact field structure (named children like "name") is defined in the grammar.js file and reflected in src/node-types.json, which lists node types, subtypes, fields, and children for the generated parser. Node types JSON follows the standard Tree-sitter format with "type", "named", "fields" (named children), and "children" (anonymous/positional children). To access fields in Rust API: - Use node.child_by_field_name("name") to get the named child node for the impl name. - Fields are assigned in grammar rules using field("name", $.identifier). The grammar repository contains grammar.js (rules), grammar.json (compiled), and node-types.json (structure). tree-sitter-rust crate exports NODE_TYPES constant with this JSON. Crates like tss-rust generate enums from it, confirming "impl_item" exists among 280+ node types. Example usage: let mut parser = Parser::new; parser.set_language(&tree_sitter_rust::LANGUAGE).unwrap; let tree = parser.parse(r#" impl MyTrait for MyType { fn method(&self) {} } "#, None).unwrap; let impl_node = tree.root_node.child_by_field_name("item").unwrap.child_by_field_name("impl_item").unwrap; // Adjust path let name_node = impl_node.child_by_field_name("name").unwrap; Precise fields require viewing node-types.json in repo src/.

Citations:


🏁 Script executed:

cd mc && grep -r "tree-sitter-rust" Cargo.toml

Repository: kienbui1995/mc-code

Length of output: 45


🏁 Script executed:

cd mc && find . -name "Cargo.toml" -exec grep -l "tree" {} \;

Repository: kienbui1995/mc-code

Length of output: 90


🏁 Script executed:

cd mc && cat crates/mc-core/Cargo.toml | grep -A 5 -B 5 "tree-sitter"

Repository: kienbui1995/mc-code

Length of output: 397


🌐 Web query:

tree-sitter-rust impl_item node-types.json name field

💡 Result:

In the tree-sitter-rust grammar, the impl_item node represents items declared within an impl block (e.g., fn, const, type aliases inside impl Trait for Type { ... }). The node-types.json file, generated from the grammar.js, defines the structure of all nodes including impl_item. While the exact content of tree-sitter-rust's node-types.json for impl_item is not directly quoted in search results, Tree-sitter node definitions follow a standard format where named fields like "name" are explicitly listed in the "fields" object if defined in the grammar using field("name", rule). To access the "name" field of an impl_item node in Rust code: - Use node.child_by_field_name("name") or node.field("name").child to get the child node representing the name (typically an identifier). - First confirm the field exists via language.field_id_for_name("name"), where language is tree_sitter_rust::LANGUAGE. The tss-rust crate generates type-safe enums from node-types.json, confirming impl_item exists among 280+ node types with metadata like fields accessible via features like "meta_fields". Example from similar grammars (e.g., method_definition) shows fields like "name": {"multiple": false, "required": true, "types": [{"type": "identifier", "named": true}] }. The grammar.js (at https://github.com/tree-sitter/tree-sitter-rust/blob/master/src/grammar.js) likely defines impl_item with field("name", ...) for the item name, standard for Rust item nodes (function_item, struct_item, etc.).

Citations:


🏁 Script executed:

cd mc && python3 <<'EOF'
# Let's check what tree-sitter-rust version 0.23 defines for impl_item
# by looking at the crate documentation or source
import subprocess
import json

# Try to find tree-sitter docs or check the grammar directly
result = subprocess.run(
    ["rg", "impl_item", ".", "-t", "rs", "-A", "3", "-B", "1"],
    capture_output=True,
    text=True,
    timeout=10
)
print("References to impl_item in the codebase:")
print(result.stdout[:2000] if result.stdout else "No matches")
EOF

Repository: kienbui1995/mc-code

Length of output: 114


🏁 Script executed:

cd mc && python3 <<'EOF'
# Check if we can inspect tree-sitter-rust node types at runtime
# by looking at what the actual grammar defines

result = subprocess.run(
    ["rg", "\"impl", ".", "--max-count", "20"],
    capture_output=True,
    text=True,
    timeout=5
)

print("Examples of impl usage:")
print(result.stdout[:1500] if result.stdout else "No examples")
EOF

Repository: kienbui1995/mc-code

Length of output: 179


🏁 Script executed:

cd mc && find . -name "*.rs" -type f -exec grep -l "impl " {} \; | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && head -50 {}'

Repository: kienbui1995/mc-code

Length of output: 9082


🌐 Web query:

tree-sitter-rust grammar.js impl_item field definition

💡 Result:

The tree-sitter-rust grammar defines impl_item as a non-terminal node in its JavaScript grammar file (grammar.js) located at https://github.com/tree-sitter/tree-sitter-rust/blob/master/src/grammar.js. This node represents Rust's impl blocks, such as impl Trait for Type { ... } or inherent implementations. From the highlights.scm query file, impl is highlighted as a keyword, and impl_item is referenced in usage examples like Emacs tree-sitter configurations for identifying impl blocks (e.g., (rust-mode . (impl_item))). The grammar.js uses Tree-sitter's DSL with functions like seq, choice, field, etc., to define impl_item. Likely structure (inferred from typical Tree-sitter Rust grammar and DSL docs): impl_item: $ => choice( field('inherent', $.inherent_impl_item), field('trait', $.trait_impl_item) ), where inherent_impl_item might be seq('impl', optional($.type_parameters), optional($.generics), '{', repeat($.impl_item_list), '}'), and trait_impl_item includes 'for' clause, etc. Fields would use field('name', ...) for the impl name/type, field('trait', ...) for trait name, etc. The repo's latest release is v0.24.2 (2026-03-27), confirming it's actively maintained. node-types.json lists all node types including impl_item with its fields. To view exact definition, check grammar.js directly in the repo.

Citations:


Rust impl extraction may silently drop symbols if Tree-sitter impl_item lacks a stable name field.

The previous regex-based implementation (commit 91a5b7b) extracted impl symbols by matching the "impl " prefix and splitting on delimiters, capturing all impl blocks. The current Tree-sitter-based approach at lines 222–235 calls node.child_by_field_name("name") on impl_item nodes. If this field is absent, optional, or varies across impl variants (inherent impls, generic impls, trait impls), child_by_field_name() returns None, causing the .map() operation to return None without adding a symbol to the vec. This is a silent behavioral change from the old implementation. Verify that all Rust impl block variants reliably expose a name field in the tree-sitter-rust grammar, or add fallback extraction logic to handle impl blocks without this field.

… tsx grammar

1. PR template: add clippy to checklist
2. Remove unused cursor variable in collect_symbols
3. TSX uses LANGUAGE_TSX (was using LANGUAGE_TYPESCRIPT)

274 tests, 0 fail.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
mc/crates/mc-core/src/repo_map.rs (2)

224-235: ⚠️ Potential issue | 🟠 Major

Rust impl blocks may be skipped entirely here.

This branch only emits an impl symbol when child_by_field_name("name") succeeds, but impl blocks do not have a simple identifier in source the way struct or fn do. That makes this a likely regression from the old regex path, which did index impl ... lines. Please verify the pinned tree-sitter-rust grammar and fall back to the target type / trait when no name field exists.

#!/bin/bash
set -euo pipefail

version=$(python3 - <<'PY'
import pathlib, re
text = pathlib.Path("mc/crates/mc-core/Cargo.toml").read_text()
m = re.search(r'tree-sitter-rust\s*=\s*"(?:[=^~])?([^"]+)"', text)
print(m.group(1) if m else "")
PY
)

for ref in "v${version}" "${version}" "master"; do
  url="https://raw.githubusercontent.com/tree-sitter/tree-sitter-rust/${ref}/src/node-types.json"
  if curl -fsSL "$url" -o /tmp/tree-sitter-rust-node-types.json; then
    echo "Fetched $url"
    break
  fi
done

python3 - <<'PY'
import json

with open("/tmp/tree-sitter-rust-node-types.json") as fh:
    nodes = json.load(fh)

impl_item = next(node for node in nodes if node.get("type") == "impl_item")
print("impl_item fields:", sorted(impl_item.get("fields", {}).keys()))
print("impl_item children types:", [t["type"] for t in impl_item.get("children", {}).get("types", [])])
PY

Expected result: if name is not listed in impl_item fields, the current lookup can never produce an impl symbol.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mc/crates/mc-core/src/repo_map.rs` around lines 224 - 235, The impl_item
branch in repo_map.rs only emits a symbol when child_by_field_name("name")
exists, which misses impl blocks; update that branch so when kind == "impl_item"
and node.child_by_field_name("name") is None you instead look up the impl's
target children (e.g. child_by_field_name("type") and
child_by_field_name("trait") or similarly named fields from the tree-sitter-rust
grammar) and build a fallback symbol (examples: "impl <Trait> for <Type>" or
"impl <Type>") by extracting utf8_text(source) from those child nodes; ensure
you still fall back to the existing "fn"/"struct"/etc. behavior for other kinds
and handle missing children gracefully.

220-285: ⚠️ Potential issue | 🟠 Major

Keep symbol collection at file scope.

collect_symbols() walks the full subtree, so nested Python defs/classes and Rust items inside impl blocks are emitted the same way as top-level declarations. Because the list is truncated later, nested items can crowd out later file-level symbols and materially change ranking quality. Limit this to root-level declarations, with explicit handling only for wrapper nodes like exports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mc/crates/mc-core/src/repo_map.rs` around lines 220 - 285, collect_symbols
currently descends the entire subtree so nested defs (e.g. Python
functions/classes inside classes or Rust items inside impls) get emitted; change
it to only emit top-level declarations by checking node.parent() (or by only
iterating the root node's direct children) and skip adding symbols for nodes
whose parent is not the file/module root, but still recurse into explicit
wrapper nodes like "export_statement", module/file nodes, or other known
wrappers so their direct children are considered; update collect_symbols (and
any callers) to perform this root-level check (or add a new helper that handles
a single-level scan) and keep handling of ext-specific kinds
("function_definition", "class_definition", "impl_item", "function_item", etc.)
as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mc/crates/mc-core/src/repo_map.rs`:
- Around line 190-191: The Tree-sitter branch currently discards TypeScript
interface-only declarations because symbols is assigned from
extract_symbols_treesitter(ext, &content).unwrap_or_else(||
extract_symbols_regex(&content)); — either update extract_symbols_treesitter to
also collect `export interface` nodes (handle TS/TSX
InterfaceDeclaration/TSInterfaceDeclaration in the AST traversal used there) or,
alternatively, call extract_symbols_regex(&content) and merge its results into
the Tree-sitter result when present (e.g., if extract_symbols_treesitter returns
Some(vec), extend that vec with any missing regex-only symbols before assigning
to symbols); also apply the same merge/collection logic to the other similar
block referenced (the code around the 247-255 region) so interface-only files
aren’t dropped.

---

Duplicate comments:
In `@mc/crates/mc-core/src/repo_map.rs`:
- Around line 224-235: The impl_item branch in repo_map.rs only emits a symbol
when child_by_field_name("name") exists, which misses impl blocks; update that
branch so when kind == "impl_item" and node.child_by_field_name("name") is None
you instead look up the impl's target children (e.g. child_by_field_name("type")
and child_by_field_name("trait") or similarly named fields from the
tree-sitter-rust grammar) and build a fallback symbol (examples: "impl <Trait>
for <Type>" or "impl <Type>") by extracting utf8_text(source) from those child
nodes; ensure you still fall back to the existing "fn"/"struct"/etc. behavior
for other kinds and handle missing children gracefully.
- Around line 220-285: collect_symbols currently descends the entire subtree so
nested defs (e.g. Python functions/classes inside classes or Rust items inside
impls) get emitted; change it to only emit top-level declarations by checking
node.parent() (or by only iterating the root node's direct children) and skip
adding symbols for nodes whose parent is not the file/module root, but still
recurse into explicit wrapper nodes like "export_statement", module/file nodes,
or other known wrappers so their direct children are considered; update
collect_symbols (and any callers) to perform this root-level check (or add a new
helper that handles a single-level scan) and keep handling of ext-specific kinds
("function_definition", "class_definition", "impl_item", "function_item", etc.)
as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ee596355-a2e8-47b0-8550-8624831c4db2

📥 Commits

Reviewing files that changed from the base of the PR and between 6785b7e and c7e427a.

📒 Files selected for processing (2)
  • .github/pull_request_template.md
  • mc/crates/mc-core/src/repo_map.rs
✅ Files skipped from review due to trivial changes (1)
  • .github/pull_request_template.md

Comment on lines +190 to +191
let symbols = extract_symbols_treesitter(ext, &content)
.unwrap_or_else(|| extract_symbols_regex(&content));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TypeScript interfaces drop out of indexing on the AST path.

The regex fallback below still extracts export interface ..., but supported .ts/.tsx files never reach that path once Tree-sitter parsing succeeds. As written, interface-only files now return no symbols, which is a real search-quality regression for TS-heavy repos. Please either collect interfaces in the AST branch too, or merge in the regex-only constructs before truncating.

Also applies to: 247-255

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mc/crates/mc-core/src/repo_map.rs` around lines 190 - 191, The Tree-sitter
branch currently discards TypeScript interface-only declarations because symbols
is assigned from extract_symbols_treesitter(ext, &content).unwrap_or_else(||
extract_symbols_regex(&content)); — either update extract_symbols_treesitter to
also collect `export interface` nodes (handle TS/TSX
InterfaceDeclaration/TSInterfaceDeclaration in the AST traversal used there) or,
alternatively, call extract_symbols_regex(&content) and merge its results into
the Tree-sitter result when present (e.g., if extract_symbols_treesitter returns
Some(vec), extend that vec with any missing regex-only symbols before assigning
to symbols); also apply the same merge/collection logic to the other similar
block referenced (the code around the 247-255 region) so interface-only files
aren’t dropped.

@kienbui1995 kienbui1995 merged commit 6fea2f8 into main Apr 13, 2026
9 checks passed
@kienbui1995 kienbui1995 deleted the feat/tree-sitter-gh-templates branch April 13, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant