Skip to content

Intermittent HTTP 500 from remote MCP endpoint: "can't get copilot user by id" twirp permission_denied #2508

@KentWangYQ

Description

@KentWangYQ

Description

Since approximately May 18, 2026 (coinciding with v1.0.5 release), the remote MCP endpoint at https://api.githubcopilot.com/mcp/ intermittently returns HTTP 500 with a raw text/plain error body. The failure rate is approximately 7–10% and affects write operations like create_branch. Read-only operations (search_repositories, get_file_contents) appear unaffected.

Error Response

HTTP Status: 500 Internal Server Error
Content-Type: text/plain; charset=utf-8 (not a valid JSON-RPC or SSE response)

can't get copilot user by id: error getting copilot user details: twirp error permission_denied: Error from intermediary with HTTP status code 403 "Forbidden"

Full Request/Response Context

Request:

POST https://api.githubcopilot.com/mcp/
Authorization: Bearer <valid-PAT-with-repo-scope>
Content-Type: application/json
Accept: application/json, text/event-stream
Mcp-Session-Id: <valid-session-id>
{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "id": 9,
  "params": {
    "name": "create_branch",
    "arguments": {
      "owner": "<org>",
      "repo": "<private-repo>",
      "branch": "test-branch-name",
      "from_branch": "main"
    }
  }
}

Response Headers:

HTTP/2 500
content-type: text/plain; charset=utf-8
x-github-backend: Kubernetes
x-github-request-id: D44E:1D706B:5F7B92F:675BE79:6A0C6556

Reproduction

  • Frequency: ~7–10% of create_branch calls (1 in 9 to 1 in 15 attempts)
  • Pattern: Random, not correlated with request timing or concurrency
  • Session state: Occurs on freshly initialized sessions (not stale session issue)
  • Authentication: PAT is valid (other calls in the same session succeed; initialize and tools/list always return 200)

Minimal reproduction: send 15–30 consecutive create_branch tool calls (each with a new session). Typically 1–3 will return HTTP 500.

Analysis

The error chain suggests the failure occurs in the Copilot hosting infrastructure (not in the open-source MCP server logic):

  1. Request passes HTTP gateway authentication (PAT is valid → not 401)
  2. Hosting layer attempts internal twirp RPC: getCopilotUserById
  3. An internal service mesh intermediary returns 403 Forbidden to the twirp call
  4. This unhandled error propagates as a raw HTTP 500 to the client

The 403 from the "intermediary" indicates a service-to-service authentication failure (likely mTLS cert rotation, internal token expiry, or inconsistent sidecar state during deployment), not a user-permission issue.

Impact

  • MCP clients using streamablehttp_client have their entire transport killed by this error (the SDK's post_writer TaskGroup crashes, closing read/write streams)
  • Simple in-session retries are ineffective because the session is dead after the 500
  • Automated workloads (CI/CD, evaluation pipelines) see cascading failures

Expected Behavior

  • HTTP 500 should not leak internal error details to clients
  • The getCopilotUserById lookup should have retry/circuit-breaker logic on the server side
  • Alternatively, the error should be wrapped in a proper JSON-RPC error response so MCP clients can handle it gracefully

Environment

  • MCP Protocol Version: 2024-11-05
  • Transport: Streamable HTTP (POST)
  • Client: Python MCP SDK (mcp package)
  • Authentication: GitHub PAT (repo + read:org scopes)
  • GitHub MCP Server version: v1.0.5 (remote, hosted)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions