-
Notifications
You must be signed in to change notification settings - Fork 1.3k
prototype: SEP-1577 - Sampling With Tools #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
ochafik
wants to merge
31
commits into
main
Choose a base branch
from
ochafik/sep1577
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add comprehensive tool calling support to MCP sampling: - New content types: ToolCallContent and ToolResultContent - Split SamplingMessage into role-specific UserMessage/AssistantMessage - Add ToolChoice schema for controlling tool usage behavior - Update CreateMessageRequest with tools and tool_choice parameters - Update CreateMessageResult with new stop reasons (toolUse, refusal, other) - Enhance ClientCapabilities.sampling with context and tools sub-capabilities - Mark includeContext as soft-deprecated in favor of explicit tools - Add comprehensive unit tests (27 new test cases covering all new schemas) All tests pass (47/47 in types.test.ts). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…OpenAI APIs Remove the non-standard isError field from ToolResultContentSchema. Errors should be represented in the content object itself, matching the standard behavior of Claude and OpenAI tool result APIs. Updated tests to validate error content directly without isError flag. All tests pass (47/47 in types.test.ts, 683/683 in full suite). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update backfillSampling.ts to support SEP-1577 tool calling: **New Conversions:** - MCP Tool → Claude API tool format (toolToClaudeFormat) - MCP ToolChoice → Claude tool_choice (toolChoiceToClaudeFormat) - Claude tool_use → MCP ToolCallContent (in contentToMcp) - MCP ToolResultContent → Claude tool_result (in contentFromMcp) **Message Handling:** - Extract and convert tools/tool_choice from CreateMessageRequest - Pass tools to Claude API messages.create - Handle multi-content responses (prioritize tool_use over text) - Map stop reasons: tool_use → toolUse, end_turn → endTurn, etc. **Flow Support:** The proxy now fully supports agentic tool calling loops: 1. Server sends request with tools 2. Claude responds with tool_use 3. Server executes tool and sends tool_result 4. Claude provides final answer All conversions maintain type safety with proper MCP types (UserMessage, AssistantMessage, ToolCallContent, ToolResultContent). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…le search Adds a comprehensive example showing how to use MCP sampling with a tool loop. The server exposes a fileSearch tool that uses an LLM with locally-defined ripgrep and read tools to intelligently search and read files. Key features: - Implements a full agentic tool loop pattern - Uses systemPrompt parameter for proper LLM instruction - Validates tool inputs using Zod schemas - Ensures path safety with canonicalization and CWD constraints - Demonstrates recursive tool use (LLM decides which tools to call) - Proper error handling throughout the tool loop - Includes iteration limit to prevent infinite loops This example demonstrates SEP-1577 tool calling support in MCP sampling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Analyzed existing test files and examples to document: - How to set up Client with StdioClientTransport for testing - How to implement sampling request handlers - Proper test structure and cleanup patterns - Example code snippets for sampling handlers - How to simulate a tool loop conversation in tests - Common pitfalls and solutions This analysis covers both unit testing (InMemoryTransport) and integration testing (StdioClientTransport) patterns for servers that use MCP sampling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds integration tests for toolLoopSampling server that verify: - Complete tool loop flow (ripgrep → read → final answer) - Path validation and security (prevents directory traversal) - Error handling for invalid tool names - Input validation with malformed tool inputs - Maximum iteration limit enforcement Tests use StdioClientTransport to spawn actual server process and implement sampling handlers that simulate LLM behavior with tool calls and responses. All 5 tests pass successfully, providing solid coverage of the agentic tool loop pattern. Also updates toolLoopSampling.ts with linter formatting fixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…pSampling Updated toolLoopSampling.ts to properly handle CreateMessageResult.content as both single content blocks and arrays: - Changed runToolLoop return type to include both answer and transcript - Extract and execute ALL tool_use blocks in parallel using Promise.all() - Concatenate all text content blocks for final answer - Return full message transcript for debugging This ensures the tool loop works correctly when the LLM returns multiple content blocks (text + tool calls, or multiple tool calls in one turn). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added sendLoggingMessage calls to provide real-time feedback on tool loop operations: - Log iteration number at the start of each loop - Log number and names of tools being executed - Log completion message with total iteration count Also fixed toolWithSampleServer.ts to handle CreateMessageResult.content as arrays (extract and concatenate all text blocks). This provides visibility into the tool loop's progress for debugging and monitoring purposes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…oolLoopSampling Added line range support to the read tool: - Added optional startLineInclusive and endLineInclusive parameters - Returns numbered lines when range is specified - Validates line ranges and provides helpful error messages Improved logging with tool-specific messages: - Loop iteration logs: "Loop iteration N: X tool invocation(s) requested" - Ripgrep logs: "Searching pattern 'X' under Y" - Read logs: "Reading file X" or "Reading file X (lines A-B)" Updated tool descriptions: - Added hint to read tool about requesting context lines around matches - Emphasized that ripgrep output includes line numbers This provides better visibility into tool operations and enables more efficient file reading by fetching only relevant line ranges. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This was referenced Oct 1, 2025
Added comprehensive token usage tracking and reporting to the tool loop sampling example: **backfillSampling.ts**: - Pass Anthropic API usage data through _meta field of CreateMessageResult - Includes all token counts from Claude API response **toolLoopSampling.ts**: - Added AggregatedUsage interface to track cumulative token counts - Aggregate usage across all API calls in the tool loop: - input_tokens (regular input) - cache_creation_input_tokens (tokens to create cache) - cache_read_input_tokens (tokens read from cache) - output_tokens (generated output) - api_calls (number of createMessage calls) - Updated runToolLoop return type to include usage - Display formatted usage summary in fileSearch tool output: - Total input tokens with breakdown by type - Total output tokens - Total tokens consumed - Number of API calls made Example output: ``` --- Token Usage Summary --- Total Input Tokens: 1234 - Regular: 800 - Cache Creation: 200 - Cache Read: 234 Total Output Tokens: 567 Total Tokens: 1801 API Calls: 3 ``` This provides complete visibility into Claude API token consumption for monitoring costs and optimizing cache usage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Updated backfillSampling and toolLoopSampling to support the new schema where UserMessage and AssistantMessage content can be arrays: **backfillSampling.ts**: - Split contentFromMcp into contentBlockFromMcp (single block) and contentFromMcp (handles both single and arrays) - Updated message mapping to pass content arrays directly to Claude API - Now properly handles messages with multiple content blocks **toolLoopSampling.ts**: - Removed flattening logic that created multiple messages - SamplingMessage now natively supports content arrays - Simplified message history management **toolLoopSampling.test.ts**: - Added helper to handle content as potentially an array - Updated all test assertions to work with array content - All 5 tests passing This aligns with the MCP protocol change allowing content arrays in UserMessage and AssistantMessage, enabling multi-block responses (e.g., text + tool calls in one message). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…ocol/typescript-sdk into ochafik/sep1577
…ion of tool loop (no tools!)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft reference implementation + examples for SEP-1577 - Sampling With Tools
CreateMessageResult.content
andSamplingMessage.content
to be an array of content (similar to Allow Prompt/Sampling Messages to contain multiple content blocks. modelcontextprotocol#198localResearch
that has two local tools (ripgrep
&read
) and uses sampling to do its biddingcc/ @bhosmer-ant
Motivation and Context
This allows & demonstrates tool call loops in MCP servers using improvements to sampling proposed in modelcontextprotocol/modelcontextprotocol#1577
How Has This Been Tested?
Then give a prompt to the
localResearch
tool such assummarize the main classes
. It crunches for a while (logs tool calls on the fly) then should give its final output w/ the full tool loop transcript as debug content:see final output
see debug output
Breaking Changes
CreateMessageResult.content
is now the union of a content block or an array of content blocksincludeContext
w/ values !=none
will fail ifcapabilities.sampling.context
isn't set on the client.Types of changes
Checklist