diff --git a/.github/upstream-projects.yaml b/.github/upstream-projects.yaml index df8d66f0..c7f6e540 100644 --- a/.github/upstream-projects.yaml +++ b/.github/upstream-projects.yaml @@ -35,7 +35,7 @@ projects: - id: toolhive repo: stacklok/toolhive - version: v0.24.0 + version: v0.24.1 # toolhive is a monorepo covering the CLI, the Kubernetes # operator, and the vMCP gateway. It also introduces cross- # cutting features that land in concepts/, integrations/, diff --git a/docs/toolhive/guides-vmcp/scaling-and-performance.mdx b/docs/toolhive/guides-vmcp/scaling-and-performance.mdx index 6a20cf1c..5a163ef7 100644 --- a/docs/toolhive/guides-vmcp/scaling-and-performance.mdx +++ b/docs/toolhive/guides-vmcp/scaling-and-performance.mdx @@ -138,6 +138,81 @@ a dedicated vMCP instance per team instead. ::: +## Capacity limits + +Review these limits before planning capacity for a vMCP deployment. + +### Per-pod session cache + +Each vMCP pod holds a node-local LRU cache capped at **1,000 concurrent +sessions**. When the cache is full, the least-recently-used session is evicted +and its backend connections are closed. Any request in flight at eviction time +fails, and the next request for that session ID triggers a cache miss. + +When Redis session storage is configured, the session manager transparently +rebuilds the session from stored metadata and reconnects to backends, so clients +do not need to reinitialize. Without Redis, an evicted session is lost and the +client must reinitialize. + +To serve more than 1,000 concurrent sessions per replica, add vMCP replicas and +configure Redis session storage. Total capacity scales as `replicas × 1,000`. + +### Session time-to-live (TTL) + +The vMCP server applies a **30-minute inactivity TTL** to session metadata. A +session that receives no activity for 30 minutes expires, and the client must +reinitialize it. + +With Redis session storage, the TTL is a sliding window: every request +atomically refreshes the key's expiry. Active sessions remain valid indefinitely +as long as they receive at least one request per TTL window. There is no +absolute maximum session lifetime. + +### File descriptors + +Each open backend connection consumes one file descriptor on the vMCP pod. A pod +aggregating many MCP backends at high session concurrency can exhaust the +container's `nofile` limit before hitting the 1,000-session cache cap. + +Estimate the requirement as `concurrent_sessions × backends_per_session`, plus +overhead for incoming client connections. The default Linux soft `nofile` limit +is typically 1,024; raise it in the container spec or at the node level if you +expect to serve hundreds of sessions aggregating multiple backends. + +### Redis sizing + +When you enable Redis session storage, size the Redis instance for the full +fleet. Session payloads include routing tables and tool metadata. A rough +estimate is 10-50 KB per session depending on backend count and tool count, with +a fleet-wide maximum of `replicas × 1,000` concurrent sessions. + +Configure Redis with the `allkeys-lru` eviction policy so Redis sheds stale +sessions under memory pressure rather than returning errors on new writes. Redis +persistence is not required for session storage; if the Redis instance restarts, +all sessions are lost and clients must reinitialize. + +The Redis client uses these default timeouts. They are hardcoded defaults and +are not currently exposed through the VirtualMCPServer CRD. + +| Setting | Default | +| ------------- | --------- | +| Dial timeout | 5 seconds | +| Read timeout | 3 seconds | +| Write timeout | 3 seconds | + +### Stateful backend data loss on pod restart + +vMCP is a stateless proxy: it holds routing tables and tool aggregation state, +but backend MCP servers own their own state (browser sessions, database cursors, +open files). When a vMCP pod restarts or is evicted, backend connections are +torn down without a graceful MCP shutdown sequence. + +With Redis session storage, the routing table survives and clients can +reconnect. However, the new connection does not recover any backend-side state; +it starts fresh. In-flight tool calls are lost without a response. Implement +retry logic with idempotency guards for tool invocations that modify external +state. + ## Next steps - [Explore Kubernetes operator guides](../guides-k8s/index.mdx) for managing MCP diff --git a/static/api-specs/toolhive-api.yaml b/static/api-specs/toolhive-api.yaml index e6c07a3e..d601f942 100644 --- a/static/api-specs/toolhive-api.yaml +++ b/static/api-specs/toolhive-api.yaml @@ -372,8 +372,8 @@ components: subject_token_type: description: |- SubjectTokenType specifies the type of the subject token being exchanged. - Common values: tokenTypeAccessToken (default), tokenTypeIDToken, tokenTypeJWT. - If empty, defaults to tokenTypeAccessToken. + Common values: oauth.TokenTypeAccessToken (default), oauth.TokenTypeIDToken, oauth.TokenTypeJWT. + If empty, defaults to oauth.TokenTypeAccessToken. type: string token_url: description: TokenURL is the OAuth 2.0 token endpoint URL @@ -1176,6 +1176,13 @@ components: K8sPodTemplatePatch is a JSON string to patch the Kubernetes pod template Only applicable when using Kubernetes runtime type: string + mcpserver_generation: + description: |- + MCPServerGeneration is the K8s .metadata.generation of the MCPServer CR that rendered + this RunConfig. The Kubernetes runtime uses it as a monotonic version to prevent stale + rolling-update pods from overwriting a newer RunConfig's StatefulSet apply. Zero value + means unversioned (backward-compat with older operators, or non-operator callers). + type: integer middleware_configs: description: |- MiddlewareConfigs contains the list of middleware to apply to the transport @@ -4324,12 +4331,30 @@ paths: schema: type: string description: Bad Request + "401": + content: + application/json: + schema: + type: string + description: Unauthorized (registry refused credentials) + "404": + content: + application/json: + schema: + type: string + description: Not Found (artifact not present in registry) "409": content: application/json: schema: type: string description: Conflict + "429": + content: + application/json: + schema: + type: string + description: Too Many Requests (registry rate limit) "500": content: application/json: @@ -4341,7 +4366,13 @@ paths: application/json: schema: type: string - description: Bad Gateway + description: Bad Gateway (upstream registry failure) + "504": + content: + application/json: + schema: + type: string + description: Gateway Timeout (upstream pull timed out) summary: Install a skill tags: - skills @@ -4560,6 +4591,24 @@ paths: schema: type: string description: Bad Request + "401": + content: + application/json: + schema: + type: string + description: Unauthorized (registry refused credentials) + "404": + content: + application/json: + schema: + type: string + description: Not Found (artifact not present in registry) + "429": + content: + application/json: + schema: + type: string + description: Too Many Requests (registry rate limit) "500": content: application/json: @@ -4571,7 +4620,13 @@ paths: application/json: schema: type: string - description: Bad Gateway + description: Bad Gateway (upstream registry or git resolver failure) + "504": + content: + application/json: + schema: + type: string + description: Gateway Timeout (upstream pull timed out) summary: Get skill content tags: - skills