Skip to content

feat: codebase_search — symbol-aware code retrieval#38

Merged
kienbui1995 merged 1 commit intomainfrom
feat/smart-context-retrieval
Apr 12, 2026
Merged

feat: codebase_search — symbol-aware code retrieval#38
kienbui1995 merged 1 commit intomainfrom
feat/smart-context-retrieval

Conversation

@kienbui1995
Copy link
Copy Markdown
Owner

@kienbui1995 kienbui1995 commented Apr 12, 2026

Smart Context Retrieval

New codebase_search tool for agents to find relevant code:

{"query": "authentication handler config", "max_results": 5}

Returns:

📄 src/auth.rs (score: 8.0)
   fn authenticate, struct AuthConfig
📄 src/handler.rs (score: 3.0)
   fn handle_request

How it works

  • Builds symbol index from repo (fn, struct, class, trait, impl, def)
  • Multi-term TF-IDF scoring: filename (3x), symbol (2x), path (1x)
  • No external dependencies (no embedding server needed)

Why

Before: agent had to grep blindly or read files one by one.
After: agent searches symbols first, then reads only relevant files.

192 tests pass.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added codebase search functionality that enables querying code repositories with relevance-based ranking. Results include matching file paths and code symbols with configurable result limits.
  • Tests

    • Added unit tests for search functionality validation.

New tool: codebase_search
- Searches codebase index for symbols (fn, struct, class, trait, impl)
  and file paths matching query terms
- TF-IDF-style scoring: filename match (3x), symbol match (2x), path match (1x)
- Returns ranked results with file paths and matching symbols
- Agent can find relevant code before reading files

Enhanced RepoMap:
- Added search() method with multi-term scoring
- Stored as struct (not String) so search is available at runtime
- to_prompt_section() still used for system prompt injection

Tests: 190→192 (search_finds_symbols, search_ranks_by_relevance)
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b354423d-1b41-474b-bd05-4afeeb38973d

📥 Commits

Reviewing files that changed from the base of the PR and between be049af and c88feaa.

📒 Files selected for processing (4)
  • mc/crates/mc-core/src/repo_map.rs
  • mc/crates/mc-core/src/runtime.rs
  • mc/crates/mc-tools/src/registry.rs
  • mc/crates/mc-tools/src/spec.rs

📝 Walkthrough

Walkthrough

The changes implement a new codebase search feature by adding a search method to RepoMap that tokenizes queries and ranks files by relevance (with higher weight for filename matches), introduces a SearchResult struct to return matched files with associated symbols, and exposes this functionality as a new "codebase_search" tool in the conversation runtime's tool dispatcher.

Changes

Cohort / File(s) Summary
Search Implementation
mc/crates/mc-core/src/repo_map.rs
Added public search() method that tokenizes queries, scores files by substring matches in paths and symbol names, filters and sorts results by relevance, and returns a Vec<SearchResult>. Introduced new public SearchResult struct with path, symbols, and relevance score fields. Includes unit tests for symbol finding and relevance ranking.
Tool Integration
mc/crates/mc-core/src/runtime.rs
Changed repo_map field type from Option<String> to Option<RepoMap> to store the built map directly. Added "codebase_search" tool handler in dispatch_tool that accepts query and max_results parameters, calls the search method, and returns formatted results. Updated prompt construction to convert RepoMap to a prompt section at request-build time.
Tool Registry
mc/crates/mc-tools/src/registry.rs, mc/crates/mc-tools/src/spec.rs
Added new "codebase_search" ToolSpec with schema requiring query string and optional max_results integer. Updated registry test to expect 27 tools instead of 26.

Sequence Diagram

sequenceDiagram
    participant User
    participant Runtime as ConversationRuntime
    participant Dispatcher as Tool Dispatcher
    participant RepoMap
    participant Results

    User->>Runtime: Call "codebase_search" tool<br/>(query, max_results)
    activate Runtime
    Runtime->>Dispatcher: dispatch_tool("codebase_search", input)
    activate Dispatcher
    Dispatcher->>Dispatcher: Parse query & max_results<br/>from input JSON
    alt repo_map initialized
        Dispatcher->>RepoMap: search(query, max_results)
        activate RepoMap
        RepoMap->>RepoMap: Tokenize query<br/>Score files by relevance<br/>Collect matching symbols
        RepoMap-->>Dispatcher: Vec<SearchResult>
        deactivate RepoMap
        Dispatcher->>Dispatcher: Format results with<br/>paths and symbols
        Dispatcher-->>Results: Formatted matches
    else repo_map not initialized
        Dispatcher-->>Results: "Repo map not initialized"<br/>(is_error: true)
    end
    deactivate Dispatcher
    Runtime-->>User: Tool response
    deactivate Runtime
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A search across the codebase wide,
With symbols found and ranked with pride,
The RepoMap now helps us seek,
Through paths and functions, peak to peak! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/smart-context-retrieval

Comment @coderabbitai help to get the list of available commands and usage tips.

@kienbui1995 kienbui1995 merged commit 91a5b7b into main Apr 12, 2026
4 of 5 checks passed
@kienbui1995 kienbui1995 deleted the feat/smart-context-retrieval branch April 12, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant