Skip to content

Conversation

@waynesun09
Copy link
Contributor

Summary

Fixes token truncation behavior in the MCP search_code tool to return partial content instead of discarding files completely when token limits are exceeded.

Problem

Currently, when a search result file would exceed the maxTokens limit, the entire file is discarded with only a truncation message returned. This provides no useful data to the user, especially problematic when the first file is too large.

Example failure scenario:

  • User requests search with maxTokens=6000
  • First result file contains 10K tokens
  • Current behavior: Returns ONLY truncation message, no file data
  • Result: User gets no information at all

Solution

Modified the truncation logic in packages/mcp/src/index.ts to:

  1. Calculate remaining token budget before breaking the loop
  2. If meaningful space remains (>100 tokens), truncate the file content to fit
  3. Append a clear truncation marker: ...[content truncated due to token limit]
  4. Add the truncated content to results
  5. Continue to add the overall truncation message at the end

Changes

File: packages/mcp/src/index.ts (lines 125-142)

Before:

if ((totalTokens + tokens) > maxTokens) {
    isResponseTruncated = true;
    break;  // Discards the file completely
}

After:

if ((totalTokens + tokens) > maxTokens) {
    const remainingTokens = maxTokens - totalTokens;
    
    if (remainingTokens > 100) {
        const maxLength = Math.floor(remainingTokens * 4);
        const truncatedText = text.substring(0, maxLength) + 
            "\n\n...[content truncated due to token limit]";
        
        content.push({
            type: "text",
            text: truncatedText,
        });
    }
    
    isResponseTruncated = true;
    break;
}

Benefits

✅ Users receive partial data instead of nothing
✅ Better debugging and analysis experience
✅ More useful for AI-powered code analysis workflows
✅ Consistent with expected truncation behavior
✅ Maintains backward compatibility (still includes truncation message)

Example Impact

Scenario: Search returns 50 files, each ~2K tokens, with maxTokens=10000

Before: Returns first 5 complete files (10K tokens), discards file #6 completely
After: Returns first 5 complete files + truncated version of file #6

Testing

  • ✅ Verified token calculation logic (chars/4 approximation)
  • ✅ Tested with various token limits (100, 1000, 10000, 150000)
  • ✅ Confirmed truncation marker appears correctly
  • ✅ Validated backward compatibility (truncation message still appended)

Related Issues

This fix specifically addresses issues observed in AI agent workflows where large codebases trigger token limits on early results, providing zero useful data to analysis tasks.


Commit: c5b8fda
Branch: fix/mcp-search-truncation

@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@waynesun09 waynesun09 force-pushed the fix/mcp-search-truncation branch from c5b8fda to e9903f4 Compare November 7, 2025 16:50
@brendan-kellam
Copy link
Contributor

LGTM, this approach makes sense. Thanks for the contribution 👍 Left one comment, once resolved happy to merge

…_code

When search results exceed maxTokens limit, now returns partial truncated
content instead of discarding the file completely.

Changes:
- Calculate remaining token budget before breaking
- Truncate file content to fit within remaining tokens (if > 100 tokens left)
- Append truncation marker to indicate content was cut off
- Still add truncation message at end of all results

Benefits:
- Users get partial data instead of nothing
- Better debugging and analysis experience
- More useful for AI-powered code analysis tasks
- Consistent with expected behavior when limits are reached

Example: If file would use 10K tokens but only 2K remain, return
first ~8K chars of content + truncation marker instead of dropping it.

Signed-off-by: Wayne Sun <gsun@redhat.com>
@waynesun09 waynesun09 force-pushed the fix/mcp-search-truncation branch from d99c047 to a1da34a Compare November 7, 2025 18:24
@brendan-kellam brendan-kellam merged commit 278c0dc into sourcebot-dev:main Nov 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants