Skip to content

Improve-MCP-Server-tools#335

Merged
omar-inkeep merged 6 commits intomainfrom
Improve-MCP-Server-tools
Sep 26, 2025
Merged

Improve-MCP-Server-tools#335
omar-inkeep merged 6 commits intomainfrom
Improve-MCP-Server-tools

Conversation

@omar-inkeep
Copy link
Copy Markdown
Contributor

@omar-inkeep omar-inkeep commented Sep 26, 2025

Several changes related to mcp server tools:

Available tools:

  • not stored in db anymore. We fetch them every time that we need them -- mcp server details, mcp server overview page, graph page, etc.

Active tools:

  • sdk is able to set active tools, and mcp server details shows warning badge if active tool is not available

Selected tools:

  • sdk is able to set selected tools via agentMcp(), and graph page shows warning badge if it's not available

Other things:

  • show status if not healthy on mcp server card
  • show last error message in mcp server details page

@vercel
Copy link
Copy Markdown

vercel Bot commented Sep 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
agents-docs Ready Ready Preview Comment Sep 26, 2025 1:55pm
agents-manage-api Ready Ready Preview Comment Sep 26, 2025 1:55pm
agents-manage-ui Ready Ready Preview Comment Sep 26, 2025 1:55pm
agents-run-api Ready Ready Preview Comment Sep 26, 2025 1:55pm

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 26, 2025

Claude finished @omar-inkeep's task —— View job


PR Review: MCP Server Tools Improvements

Conducted architectural review of the MCP server tools changes as requested.

Review Checklist:

  • Read repository guidelines (CLAUDE.md)
  • Understand the architectural shift from DB storage to on-demand fetching
  • Review new active/selected tools concepts and implementation
  • Analyze UI/UX changes and status indicators
  • Review test coverage and quality changes
  • Assess performance and scalability implications
  • Check for security and error handling considerations
  • Provide comprehensive feedback on system design

Executive Summary

This PR implements a significant architectural improvement to MCP server tool management, shifting from persistent database storage of available tools to on-demand fetching. The changes introduce sophisticated concepts of "active tools" and "selected tools" while improving the user experience with better status indicators and error reporting.

Major Architectural Changes

🏗️ Database Schema Simplification

  • Removed: available_tools column from tools table (evident from schema migrations)
  • Added: selectedTools BLOB field to agent_tool_relations for per-agent tool selection
  • Impact: Cleaner separation of concerns - tools table stores server configuration, relations table stores agent-specific selections

🔄 On-Demand Tool Discovery

The shift from persistent to dynamic tool discovery is well-architected:

Strengths:

  • dbResultToMcpTool() function elegantly handles the transformation from DB records to runtime tool objects
  • Proper error handling and status computation (healthy, needs_auth, unhealthy)
  • Health status automatically updated in DB on each discovery attempt
  • Credential integration through CredentialStuffer maintains security patterns

Performance Considerations:

// tools.ts:167-171 - Each tool fetched on-demand
const toolDefinitions: McpToolDefinition[] = Object.entries(serverTools).map(
  ([name, toolDef]) => ({
    name,
    description: (toolDef as any).description || '',
    inputSchema: extractInputSchema(toolDef as any),
  })
);
  • Concern: N+1 query pattern when loading multiple tools
  • Mitigation: Consider implementing tool discovery caching with TTL

🎯 Active vs Selected Tools Concept

The three-tier tool concept is well-designed:

  1. Available Tools: All tools discovered from MCP server
  2. Active Tools: Subset enabled by server configuration (config.mcp.activeTools)
  3. Selected Tools: Per-agent subset chosen from active tools

Implementation Quality:

  • orphaned-tools-detector.ts provides robust validation logic
  • Proper handling of null/empty states in tool selection
  • Good separation between temporary UI state and persisted selections

Code Quality Assessment

Excellent Error Handling

// tools.ts:191-197
const lastErrorComputed = toolNeedsAuth
  ? 'Authentication required - OAuth login needed'
  : error instanceof Error
    ? error.message
    : 'Tool discovery failed';

Well-structured error classification and user-friendly messaging.

Strong Type Safety

// generateTaskHandler.ts:331-346
const isValidTransferResult = (output: unknown): output is {
  type: 'transfer';
  target: string;
} => {
  return (
    typeof output === 'object' &&
    output !== null &&
    'type' in output &&
    // ... proper type guards
  );
};

Proper type guards prevent runtime errors in agent transfer logic.

Comprehensive UI Feedback

The orphaned tools detection system provides excellent user experience:

  • Clear warning messages about unavailable selected tools
  • Node-specific feedback in the graph interface
  • Proactive validation before runtime failures

Areas for Improvement

🔴 High Priority: Performance

Issue: Potential N+1 queries in tool discovery

// generateTaskHandler.ts:167-171
const toolsForAgentResult: McpTool[] =
  (await Promise.all(
    toolsForAgent.data.map(
      async (item) => await dbResultToMcpTool(item.tool, dbClient, credentialStoreRegistry)
    )
  )) ?? [];

Recommendation: Implement tool discovery batching or caching layer:

// Suggested approach
class ToolDiscoveryCache {
  private cache = new Map<string, { tools: McpToolDefinition[]; expires: number }>();
  
  async getOrDiscoverTools(toolIds: string[], ttlMs = 300000): Promise<Map<string, McpToolDefinition[]>> {
    // Batch discovery with cache
  }
}

🟡 Medium Priority: Error Recovery

Issue: No automatic retry mechanism for transient failures

// tools.ts:177-179
availableTools = await discoverToolsFromServer(dbResult, dbClient, credentialStoreRegistry);
status = 'healthy';

Recommendation: Add exponential backoff retry for network-related failures:

const discoverWithRetry = async (tool: ToolSelect, retries = 3) => {
  for (let i = 0; i < retries; i++) {
    try {
      return await discoverToolsFromServer(tool, dbClient, credentialStoreRegistry);
    } catch (error) {
      if (i === retries - 1 || !isRetryableError(error)) throw error;
      await delay(Math.pow(2, i) * 1000);
    }
  }
};

🟡 Medium Priority: Race Conditions

Issue: Concurrent tool discoveries could lead to inconsistent state

// tools.ts:202-209 - Race condition possible
await updateTool(dbClient)({
  scopes: { tenantId: dbResult.tenantId, projectId: dbResult.projectId },
  toolId: dbResult.id,
  data: {
    updatedAt: now,
    lastError: lastErrorComputed,
  },
});

Recommendation: Use database-level optimistic locking or mutex pattern.

🟡 Medium Priority: Memory Usage

Issue: Tool definitions held in memory without bounds

  • Large MCP servers could return hundreds of tool definitions
  • No memory pressure management in tool discovery

Recommendation: Implement streaming or pagination for tool discovery.

Security Review

Strong Credential Management

  • Proper integration with CredentialStoreRegistry
  • No credential leakage in error messages or logs
  • OAuth flow properly isolated in dedicated service

Input Validation

  • Zod schemas for all API endpoints
  • Proper type guards throughout the codebase
  • SQL injection prevention through parameterized queries

Testing Assessment

Good Mock Strategy

// tools.test.ts:30-60
vi.mock('@inkeep/agents-core', async () => {
  const actual = await vi.importActual('@inkeep/agents-core');
  return {
    ...actual,
    dbResultToMcpTool: vi.fn().mockImplementation(async (dbResult) => ({
      // Consistent test data structure
    })),
  };
});

🟡 Missing Integration Tests

  • No end-to-end tests for tool discovery flow
  • OAuth flow testing appears limited
  • No performance/load testing for multiple tool discovery

Recommendation: Add integration tests covering:

describe('Tool Discovery Integration', () => {
  it('should handle concurrent tool discoveries gracefully');
  it('should recover from transient MCP server failures');
  it('should respect active tool filtering');
});

Customer Impact Assessment

Positive UX Improvements

  • Real-time tool availability status
  • Clear feedback on configuration issues
  • Proper OAuth integration for authenticated tools

Backward Compatibility

  • Existing tool configurations remain valid
  • Migration path preserves functionality
  • No breaking API changes

⚠️ Performance Considerations

  • Initial tool loading may be slower due to on-demand discovery
  • Network failures now directly impact tool availability
  • Recommend documenting expected latency characteristics

Recommendations

Immediate (Pre-merge)

  1. Add tool discovery caching to prevent performance degradation
  2. Implement retry logic for transient failures
  3. Add integration tests for critical paths

Short Term (Next Sprint)

  1. Add telemetry/metrics for tool discovery performance
  2. Implement circuit breaker pattern for failing MCP servers
  3. Add tool discovery rate limiting to prevent abuse

Long Term (Architectural)

  1. Consider background tool sync for frequently used tools
  2. Implement tool discovery webhooks for real-time updates
  3. Add tool usage analytics for optimization insights

Overall Assessment

This is a well-architected improvement that successfully modernizes the MCP tool management system. The shift to on-demand discovery is the right architectural direction, and the implementation demonstrates strong engineering practices.

Key Strengths:

  • Clean separation of concerns
  • Robust error handling and user feedback
  • Strong type safety throughout
  • Good UI/UX considerations

Key Areas for Attention:

  • Performance optimization needs immediate attention
  • Integration testing should be expanded
  • Consider production monitoring for tool discovery reliability

Recommendation: APPROVE with suggested performance improvements implemented.

The changes are production-ready with the understanding that performance optimization should be prioritized in the immediate follow-up work.


@omar-inkeep
Copy link
Copy Markdown
Contributor Author

also addresses:
PRD-4904
PRD-4897

@omar-inkeep omar-inkeep merged commit 77bd54d into main Sep 26, 2025
2 of 4 checks passed
@omar-inkeep omar-inkeep deleted the Improve-MCP-Server-tools branch September 26, 2025 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant