Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/upstream-projects.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ projects:

- id: toolhive
repo: stacklok/toolhive
version: v0.24.0
version: v0.24.1
# toolhive is a monorepo covering the CLI, the Kubernetes
# operator, and the vMCP gateway. It also introduces cross-
# cutting features that land in concepts/, integrations/,
Expand Down
75 changes: 75 additions & 0 deletions docs/toolhive/guides-vmcp/scaling-and-performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,81 @@ a dedicated vMCP instance per team instead.

:::

## Capacity limits

Review these limits before planning capacity for a vMCP deployment.

### Per-pod session cache

Each vMCP pod holds a node-local LRU cache capped at **1,000 concurrent
sessions**. When the cache is full, the least-recently-used session is evicted
and its backend connections are closed. Any request in flight at eviction time
fails, and the next request for that session ID triggers a cache miss.

When Redis session storage is configured, the session manager transparently
rebuilds the session from stored metadata and reconnects to backends, so clients
do not need to reinitialize. Without Redis, an evicted session is lost and the
client must reinitialize.

To serve more than 1,000 concurrent sessions per replica, add vMCP replicas and
configure Redis session storage. Total capacity scales as `replicas × 1,000`.

### Session time-to-live (TTL)

The vMCP server applies a **30-minute inactivity TTL** to session metadata. A
session that receives no activity for 30 minutes expires, and the client must
reinitialize it.

With Redis session storage, the TTL is a sliding window: every request
atomically refreshes the key's expiry. Active sessions remain valid indefinitely
as long as they receive at least one request per TTL window. There is no
absolute maximum session lifetime.

### File descriptors

Each open backend connection consumes one file descriptor on the vMCP pod. A pod
aggregating many MCP backends at high session concurrency can exhaust the
container's `nofile` limit before hitting the 1,000-session cache cap.

Estimate the requirement as `concurrent_sessions × backends_per_session`, plus
overhead for incoming client connections. The default Linux soft `nofile` limit
is typically 1,024; raise it in the container spec or at the node level if you
expect to serve hundreds of sessions aggregating multiple backends.

### Redis sizing

When you enable Redis session storage, size the Redis instance for the full
fleet. Session payloads include routing tables and tool metadata. A rough
estimate is 10-50 KB per session depending on backend count and tool count, with
a fleet-wide maximum of `replicas × 1,000` concurrent sessions.

Configure Redis with the `allkeys-lru` eviction policy so Redis sheds stale
sessions under memory pressure rather than returning errors on new writes. Redis
persistence is not required for session storage; if the Redis instance restarts,
all sessions are lost and clients must reinitialize.

The Redis client uses these default timeouts. They are hardcoded defaults and
are not currently exposed through the VirtualMCPServer CRD.

| Setting | Default |
| ------------- | --------- |
| Dial timeout | 5 seconds |
| Read timeout | 3 seconds |
| Write timeout | 3 seconds |

### Stateful backend data loss on pod restart

vMCP is a stateless proxy: it holds routing tables and tool aggregation state,
but backend MCP servers own their own state (browser sessions, database cursors,
open files). When a vMCP pod restarts or is evicted, backend connections are
torn down without a graceful MCP shutdown sequence.

With Redis session storage, the routing table survives and clients can
reconnect. However, the new connection does not recover any backend-side state;
it starts fresh. In-flight tool calls are lost without a response. Implement
retry logic with idempotency guards for tool invocations that modify external
state.

## Next steps

- [Explore Kubernetes operator guides](../guides-k8s/index.mdx) for managing MCP
Expand Down
63 changes: 59 additions & 4 deletions static/api-specs/toolhive-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -372,8 +372,8 @@ components:
subject_token_type:
description: |-
SubjectTokenType specifies the type of the subject token being exchanged.
Common values: tokenTypeAccessToken (default), tokenTypeIDToken, tokenTypeJWT.
If empty, defaults to tokenTypeAccessToken.
Common values: oauth.TokenTypeAccessToken (default), oauth.TokenTypeIDToken, oauth.TokenTypeJWT.
If empty, defaults to oauth.TokenTypeAccessToken.
type: string
token_url:
description: TokenURL is the OAuth 2.0 token endpoint URL
Expand Down Expand Up @@ -1176,6 +1176,13 @@ components:
K8sPodTemplatePatch is a JSON string to patch the Kubernetes pod template
Only applicable when using Kubernetes runtime
type: string
mcpserver_generation:
description: |-
MCPServerGeneration is the K8s .metadata.generation of the MCPServer CR that rendered
this RunConfig. The Kubernetes runtime uses it as a monotonic version to prevent stale
rolling-update pods from overwriting a newer RunConfig's StatefulSet apply. Zero value
means unversioned (backward-compat with older operators, or non-operator callers).
type: integer
middleware_configs:
description: |-
MiddlewareConfigs contains the list of middleware to apply to the transport
Expand Down Expand Up @@ -4324,12 +4331,30 @@ paths:
schema:
type: string
description: Bad Request
"401":
content:
application/json:
schema:
type: string
description: Unauthorized (registry refused credentials)
"404":
content:
application/json:
schema:
type: string
description: Not Found (artifact not present in registry)
"409":
content:
application/json:
schema:
type: string
description: Conflict
"429":
content:
application/json:
schema:
type: string
description: Too Many Requests (registry rate limit)
"500":
content:
application/json:
Expand All @@ -4341,7 +4366,13 @@ paths:
application/json:
schema:
type: string
description: Bad Gateway
description: Bad Gateway (upstream registry failure)
"504":
content:
application/json:
schema:
type: string
description: Gateway Timeout (upstream pull timed out)
summary: Install a skill
tags:
- skills
Expand Down Expand Up @@ -4560,6 +4591,24 @@ paths:
schema:
type: string
description: Bad Request
"401":
content:
application/json:
schema:
type: string
description: Unauthorized (registry refused credentials)
"404":
content:
application/json:
schema:
type: string
description: Not Found (artifact not present in registry)
"429":
content:
application/json:
schema:
type: string
description: Too Many Requests (registry rate limit)
"500":
content:
application/json:
Expand All @@ -4571,7 +4620,13 @@ paths:
application/json:
schema:
type: string
description: Bad Gateway
description: Bad Gateway (upstream registry or git resolver failure)
"504":
content:
application/json:
schema:
type: string
description: Gateway Timeout (upstream pull timed out)
summary: Get skill content
tags:
- skills
Expand Down