Skip to content

bug: vMCP marks unauthenticated backends as unhealthy, causing PhaseFailed for OAuth-protected groups #4824

@lorr1

Description

@lorr1

Bug description

When all backends in an MCPGroup require upstream OAuth (e.g., upstreamInject auth), the vMCP health checker marks them as BackendUnauthenticated because health check probes don't carry user tokens. Since determinePhase() only counts BackendHealthy backends, the vMCP enters PhaseFailed with "No healthy backends available" — even though the backends are running and serve real user requests with valid OAuth tokens.

This means any vMCP whose backends all use per-user upstream OAuth will permanently report as failed.

Steps to reproduce

  1. Create an MCPGroup with a single MCPServer that uses upstreamInject auth (e.g., Google Drive with OAuth)
  2. Create a VirtualMCPServer referencing that group
  3. Wait for health checks to run
  4. Observe the VirtualMCPServer status:
    phase: Failed
    message: "No healthy backends (0 degraded, 0 unhealthy, 0 unknown, 1 unauthenticated)"
    

Expected behavior

Backends returning 401 due to missing user-level OAuth tokens should not cause the vMCP to enter PhaseFailed. The vMCP should treat BackendUnauthenticated as a distinct state — the backend is reachable and running, it just requires per-request user auth. The vMCP should still route traffic to these backends.

Actual behavior

The vMCP enters PhaseFailed and reports "No healthy backends available" because determinePhase() counts Unauthenticated as zero toward Healthy.

Root cause

In pkg/vmcp/health/monitor.go, determinePhase() only considers BackendHealthy when determining the phase:

// monitor.go:616-618
if summary.Healthy == 0 {
    return vmcp.PhaseFailed
}

BackendUnauthenticated is tracked separately and not counted as healthy or routable:

// monitor.go:522-533
switch state.Status {
case vmcp.BackendHealthy:
    summary.Healthy++
case vmcp.BackendDegraded:
    summary.Degraded++
case vmcp.BackendUnhealthy:
    summary.Unhealthy++
case vmcp.BackendUnknown:
    summary.Unknown++
case vmcp.BackendUnauthenticated:
    summary.Unauthenticated++
}

Additionally, ToCRDStatus() in pkg/vmcp/types.go maps both BackendUnhealthy and BackendUnauthenticated to "unavailable", losing the distinction in the CRD status.

Suggested fix

In determinePhase(), treat BackendUnauthenticated as routable — count it toward availability when deciding if the vMCP should be PhaseFailed vs PhaseDegraded or PhaseReady. For example:

routable := summary.Healthy + summary.Unauthenticated
if routable == summary.Total {
    return vmcp.PhaseReady
}
if routable == 0 {
    return vmcp.PhaseFailed
}
return vmcp.PhaseDegraded

Also consider giving BackendUnauthenticated its own CRD status string (e.g., "unauthenticated") instead of mapping it to "unavailable", so operators can distinguish "backend needs user auth" from "backend is down".

Environment

  • ToolHive operator 0.20.0 on Kubernetes (EKS)
  • VirtualMCPServer with upstreamInject auth (Google OAuth)
  • Backend MCPServer pod is running and Ready

Metadata

Metadata

Assignees

No one assigned

    Labels

    authbugSomething isn't workingoperatorvmcpVirtual MCP Server related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions