Skip to content

SEP-2663: Tasks Extension#2663

Merged
CaitieM20 merged 58 commits into
modelcontextprotocol:mainfrom
LucaButBoring:feat/ext-tasks
May 15, 2026
Merged

SEP-2663: Tasks Extension#2663
CaitieM20 merged 58 commits into
modelcontextprotocol:mainfrom
LucaButBoring:feat/ext-tasks

Conversation

@LucaButBoring
Copy link
Copy Markdown
Contributor

@LucaButBoring LucaButBoring commented Apr 29, 2026

This SEP defines an extension that allows a server respond to a tools/call request with an asynchronous task handle instead of a final result, allowing the client to retrieve the eventual result by polling. The extension introduces three methods: tasks/get, tasks/update, and tasks/cancel; a polymorphic-result discriminator (resultType: "task"); and a Task shape that carries a task status, in-progress server-to-client requests, and a final result or error. Task creation is server-directed: the client signals support by including the extension in its per-request capabilities, and the server decides on a per-request basis whether to materialize a task.

Tasks will become a foundational building block of MCP and are expected to be supported in future protocol versions. The experimental tasks feature in the 2025-11-25 specification served as a stopgap until the protocol's extension mechanism was available. Now that extensions have been formalized, moving tasks to an official extension gives the feature time to incubate and evolve based on additional real-world implementation feedback, without being constrained by the core specification's release cadence. Once the extension has stabilized and achieved broad adoption, it is intended to be promoted into the core protocol.

This proposal removes the version of tasks specified in the 2025-11-25 release. It is shaped by implementation feedback since that release and by several changes to the base protocol expected to arrive in the 2026-06-30 specification:

Motivation and Context

The experimental tasks feature served as an alternate execution mode for tool calls, elicitation, and sampling, allowing receivers to return a poll handle instead of blocking until a final result was ready. Implementation experience surfaced several challenges:

  1. The handshake is fragile. Tasks today expose method-level capabilities (tasks.requests.tools.call declares that tools/call MAY be task-augmented) alongside a tool-level execution.taskSupport field that declares whether a particular tool will accept the augmentation. Clients express their own support for tasks by passing a task parameter on their requests, but MUST NOT include it if the method/tool does not support tasks. A client that wants to opt into tasks must therefore prime its state with a tools/list call before issuing any task-augmented request, and cannot blindly attach a task parameter to every request to handle tools isomorphically. This is confusing, implicit, and easy to get wrong.

  2. tasks/result is a blocking trap. In the current flow, a client that observes input_required is required to call tasks/result prematurely so that the server has an SSE stream on which to side-channel elicitation or sampling requests. tasks/result then blocks until the entire operation completes. This forces long-lived persistent connections that many clients and servers do not want to implement, and it conflicts with SEP-2260, which disallows unsolicited server-to-client requests outright. Under SEP-2260, the SSE semantics that justified the blocking behavior no longer apply.

  3. tasks/list scoping cannot be defined. To avoid clients cancelling or retrieving results for tasks they shouldn't have access to, all tasks should be bound to some sort of "authorization context," the implementation of which is left to individual servers according to their existing bespoke permission models. However, in many cases, it is not possible to perform this binding, in which case the task ID becomes the only line of defense against contamination. In this scenario, it is unsafe for a server to support tasks/list at all. While it was possible for tasks to instead be bound to a session, SEP-2567 removes sessions from the protocol. There is no other natural scope a server can define unilaterally — task IDs can be unguessable handles that a server can recognize one at a time, but servers cannot reliably correlate two unrelated handles to the same caller without additional state.

Beyond implementation challenges, tasks face another structural issue: Client-hosted tasks are no longer expressible. SEP-1686 permitted clients to host tasks for elicitation and sampling, in part to avoid coupling tasks to tool calls. SEP-2260 makes any unsolicited server-to-client request invalid; every server-to-client polling request under client-hosted tasks would be unsolicited by definition.

This proposal intends to solve the above issues by redesigning certain aspects of the feature and moving tasks out to an official extension. Redefining tasks as an official extension gives the feature more time to incubate and evolve independently of the core specification, promoting adoption. As part of the redesign, this proposal consolidates the polling lifecycle into tasks/get and a new tasks/update to remove the blocking tasks/result method. The redesign allows servers to return tasks unsolicited (in response to ordinary, non-task-flagged requests) to eliminate the per-request opt-in and the tools/list warmup, relying instead on the extension capability as the single handshake point. Finally, this proposal removes client-hosted elicitation and sampling tasks in compliance with SEP-2260.

How Has This Been Tested?

Conformance test suite: modelcontextprotocol/conformance#262

Breaking Changes

Described in proposal.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Supersedes #2557.


AI Use Disclosure: The extension SEP document in this PR was initially drafted using claude.ai with the previous iteration as a reference. I rewrote/rephrased many sections myself and verified its correctness, using claude.ai as a reviewer to iteratively scrub out issues.

@LucaButBoring LucaButBoring changed the title SEP-XXXX: Tasks Extension SEP-2663: Tasks Extension Apr 29, 2026
@LucaButBoring LucaButBoring requested review from a team as code owners April 29, 2026 01:14
@LucaButBoring LucaButBoring added this to the 2026-06-30-RC milestone Apr 29, 2026
@LucaButBoring LucaButBoring added SEP in-review SEP proposal ready for review. extension roadmap/agents Roadmap: Agent Communication (Tasks lifecycle) labels Apr 29, 2026
@LucaButBoring
Copy link
Copy Markdown
Contributor Author

Moving discussion from #2557 over here @CaitieM20 @markdroth @Randgalt @kurtisvg @localden @pja-ant @dsp-ant @maxisbey @maciej-kisiel @ylxlpl

(Tagged everyone who commented on #2557)

@localden
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @LucaButBoring - I'll post the comment that I was typing earlier in #2557 and let you validate how much of this is still relevant.

Some notes beyond the other bits I called out in the review. There's a few places where I think the SEP is a little underspecified:

  1. CreateTaskResult and GetTaskResult both carry resultType: "task" but have different shapes. In schema.ts, CreateTaskResult is { task: Task } (nested), while GetTaskResult is Result & DetailedTask (flat). So a client switching on resultType === "task" then has to also check whether it got result.task.taskId or result.taskId. Is the nesting on CreateTaskResult intentional, or a holdover from before GetTaskResult was flattened?
  2. The inputRequests key contract stops at "SHOULD dedupe." The tasks spec says clients should dedupe by key, and the inputResponses JSDoc says keys match inputRequests keys. But it doesn't say whether a key is unique for the task's lifetime or can be reused after the server consumes the response.
  3. Retry of tasks/get with inputResponses. What happens if a client sends tasks/get { inputResponses: { k: ... } }, network blips, and client retries? If the response was a sampling result the server feeds into a downstream API call, that call just ran twice. IMO the smallest fix is to say the server MUST treat inputResponses keyed on a request it has already consumed as a no-op. That makes the key the idempotency token, and it's almost what the key-matching contract says already.
  4. "The same requests will be included" is ambiguous for partial responses. The Input Requests section says if the client polls again before providing all responses, the same requests reappear. Does "the same requests" mean the full original set (client must re-send what it already provided) or only the still-unfulfilled remainder? I'd read it as the remainder, but the text doesn't say so, and it interacts with the idempotency point above.
  5. The cancel behavior has two stories. The spec says servers MAY ignore cancellation but MUST support tasks/cancel, which I read as: always return a valid CancelTaskResult, possibly with a non-cancelled status. But the "Cancellation Not Supported" example returns a -32603 JSON-RPC error instead. Those are different contracts. Can we formalize that the response carries the task's current status, which may not be cancelled, and drop the -32603 example.
  6. ttl and pollInterval are now in different units. The schema still documents pollInterval in milliseconds while the SEP moves ttl to seconds. So { ttl: 60, pollInterval: 5000 } is 60 seconds next to 5000 milliseconds. @pja-ant raised this before and I don't see it landed yet. Both fields should match.
  7. The Failed example might be the wrong status under the new rule. The Task Flow Change section says failed is for JSON-RPC errors and application faults go to completed with isError: true. The Failed example shows error: { code: -32603, message: "API rate limit exceeded" }. A downstream API rate-limiting the tool is an application fault (exactly the case the new rule routes to completed). If -32603 here means the MCP server itself fell over, the message should say that; otherwise the example is the case the rule says not to use failed for.
  8. Is taskId alone always sufficient for tasks/get? requestState lets a server externalize lookup state to the client (a backend job ID, a serialized continuation) so it doesn't have to keep a mapping table - that makes sense. But in a fully stateless deployment a server could push that to the limit and put the entire task record in requestState, keeping nothing locally. At that point tasks/get { taskId } without requestState has nothing to look up, which runs into the "MUST NOT return CreateTaskResult until tasks/get would find it" guarantee. Should we be explicit about the taskId always being sufficient as a standalone index of a task?

A couple of schema regressions I noticed too:

  • CallToolRequestParams, CreateMessageRequestParams, and the ElicitRequest*Params types no longer extend anything after TaskAugmentedRequestParams was removed, so they've lost the RequestParams base and _meta? with it.
  • ServerRequest still includes GetTaskRequest and CancelTaskRequest even though client-hosted tasks are removed.

@localden localden moved this to In Review in SEP Review Pipeline Apr 29, 2026
@localden localden moved this from In Review to Review Batch in SEP Review Pipeline Apr 29, 2026
Comment thread seps/2663-tasks-extension.md Outdated
Comment thread seps/2663-tasks-extension.md Outdated
Comment thread seps/2663-tasks-extension.md
Comment thread seps/2663-tasks-extension.md
@LucaButBoring
Copy link
Copy Markdown
Contributor Author

LucaButBoring commented Apr 29, 2026

@localden Thanks for the feedback, going through this:

  1. CreateTaskResult and GetTaskResult both carry resultType: "task" but have different shapes. In schema.ts, CreateTaskResult is { task: Task } (nested), while GetTaskResult is Result & DetailedTask (flat). So a client switching on resultType === "task" then has to also check whether it got result.task.taskId or result.taskId. Is the nesting on CreateTaskResult intentional, or a holdover from before GetTaskResult was flattened?

This revision limits resultType: "task" to CreateTaskResult to avoid any ambiguity, noticed that issue while rewriting this. GetTaskResult was always flat, the distinction was that we made CreateTaskResult nested at the last minute in 2025-11-25 to allow switching on it. That nesting is a holdover from before we had resultType, so we can actually flatten CreateTaskResult, too.

edit: updated

  1. The inputRequests key contract stops at "SHOULD dedupe." The tasks spec says clients should dedupe by key, and the inputResponses JSDoc says keys match inputRequests keys. But it doesn't say whether a key is unique for the task's lifetime or can be reused after the server consumes the response.

This revision does require keys to be unique over the lifetime of a task, and not reused between distinct requests.

  1. Retry of tasks/get with inputResponses. What happens if a client sends tasks/get { inputResponses: { k: ... } }, network blips, and client retries? If the response was a sampling result the server feeds into a downstream API call, that call just ran twice. IMO the smallest fix is to say the server MUST treat inputResponses keyed on a request it has already consumed as a no-op. That makes the key the idempotency token, and it's almost what the key-matching contract says already.

Yup, that's how tasks/update works in this revision.

  1. "The same requests will be included" is ambiguous for partial responses. The Input Requests section says if the client polls again before providing all responses, the same requests reappear. Does "the same requests" mean the full original set (client must re-send what it already provided) or only the still-unfulfilled remainder? I'd read it as the remainder, but the text doesn't say so, and it interacts with the idempotency point above.

I struck out that phrasing in this revision, now it can actually be either, as tasks/update is eventually-consistent - but the new key uniqueness constraint means that this is fine from the client's perspective, now.

  1. The cancel behavior has two stories. The spec says servers MAY ignore cancellation but MUST support tasks/cancel, which I read as: always return a valid CancelTaskResult, possibly with a non-cancelled status. But the "Cancellation Not Supported" example returns a -32603 JSON-RPC error instead. Those are different contracts. Can we formalize that the response carries the task's current status, which may not be cancelled, and drop the -32603 example.

To deal with that, in this revision, tasks/cancel no longer has any result (and is also eventually-consistent, like tasks/update).

  1. ttl and pollInterval are now in different units. The schema still documents pollInterval in milliseconds while the SEP moves ttl to seconds. So { ttl: 60, pollInterval: 5000 } is 60 seconds next to 5000 milliseconds. @pja-ant raised this before and I don't see it landed yet. Both fields should match.

A TTL in integer seconds makes sense, but I'm not sure if a polling interval in integer seconds does - 500ms would be a reasonable polling interval for a relatively quick, but high-variance (1s-20s) task. A duration is probably better-expressed with units included in the value (e.g. "500ms"), but that would be nonstandard for us - I suppose I could name it pollIntervalMilliseconds, but that feels awkward and inconsistent in its own right, since nothing else includes units in the field name so far.

edit: updated to include units in the field names

  1. The Failed example might be the wrong status under the new rule. The Task Flow Change section says failed is for JSON-RPC errors and application faults go to completed with isError: true. The Failed example shows error: { code: -32603, message: "API rate limit exceeded" }. A downstream API rate-limiting the tool is an application fault (exactly the case the new rule routes to completed). If -32603 here means the MCP server itself fell over, the message should say that; otherwise the example is the case the rule says not to use failed for.

Noted, I'll update the phrasing here - it actually doesn't really mean the MCP server fell over either, the literal intent is just that if the inner request returns a JSON-RPC error, that's failed, and in every other case (including a tool call with isError: true), that's completed.

edit: updated

  1. Is taskId alone always sufficient for tasks/get? requestState lets a server externalize lookup state to the client (a backend job ID, a serialized continuation) so it doesn't have to keep a mapping table - that makes sense. But in a fully stateless deployment a server could push that to the limit and put the entire task record in requestState, keeping nothing locally. At that point tasks/get { taskId } without requestState has nothing to look up, which runs into the "MUST NOT return CreateTaskResult until tasks/get would find it" guarantee. Should we be explicit about the taskId always being sufficient as a standalone index of a task?

I don't think there's an inconsistency here? requestState is already on the request shape for tasks/get - the requirement is that the client echoes whatever the server gives it. So, in the case where the full task record is in requestState, the server would return the initial value in CreateTaskResult, the client would pick that up, and then it would echo it in tasks/get, maintaining the full record through that flow.

edit: updated, I misinterpreted this - noted here

A couple of schema regressions I noticed too:

  • CallToolRequestParams, CreateMessageRequestParams, and the ElicitRequest*Params types no longer extend anything after TaskAugmentedRequestParams was removed, so they've lost the RequestParams base and _meta? with it.
  • ServerRequest still includes GetTaskRequest and CancelTaskRequest even though client-hosted tasks are removed.

Ah, I missed that on #2557 - I'll make sure this is handled correctly when I write the schema changes here.

@He-Pin
Copy link
Copy Markdown
Contributor

He-Pin commented Apr 29, 2026

This is great, allows integration of various organizational extensions.

@pja-ant
Copy link
Copy Markdown
Contributor

pja-ant commented Apr 29, 2026

A TTL in integer seconds makes sense, but I'm not sure if a polling interval in integer seconds does - 500ms would be a reasonable polling interval for a relatively quick, but high-variance (1s-20s) task. A duration is probably better-expressed with units included in the value (e.g. "500ms"), but that would be nonstandard for us - I suppose I could name it pollIntervalMilliseconds, but that feels awkward and inconsistent in its own right, since nothing else includes units in the field name so far.

The option space is:

  1. Have everything as seconds
  2. Allow different units, but don't include it in the name or value
  3. Allow different units, but use a string (e.g. "500ms")
  4. Allow different units, and add it to the name

IMO:

  1. Too limiting - seconds isn't appropriate for everything
  2. Strongly prefer we don't do this. We know what happens: https://en.wikipedia.org/wiki/Mars_Climate_Orbiter
  3. An option, but IMO having to parse is just annoying.
  4. My strong preference. It's simple and avoids any confusion. It's a little more verbose.

I agree that (4) is non-standard, but IMO we just make it the standard starting now and make sure that TTL lists also adopts this standard.

@LucaButBoring
Copy link
Copy Markdown
Contributor Author

We've decided not to add it for now, we're holding off until someone very specifically needs an updating value for their use case, as it was adding a lot of requirements to this spec for not much value in exchange.

For use cases where you need an arbitrary but unchanging value, you can use the task ID field instead, and do something like encode a JWT or similar into it (this is also true of tasks today).

@LucaButBoring
Copy link
Copy Markdown
Contributor Author

Ported over all corrections after review round on modelcontextprotocol/experimental-ext-tasks#2

@localden localden added accepted SEP accepted by core maintainers, but still requires final wording and reference implementation. and removed accepted-with-changes labels May 13, 2026
Comment thread seps/2663-tasks-extension.md Outdated
Copy link
Copy Markdown
Contributor

@CaitieM20 CaitieM20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Looks Good,

A couple things lets mark this as Final -> see comment.
Also I think there are examples in schema/draft/examples we should be deleting as well.

  • GetTaskPayloadRequest
  • GetTaskPayloadResult
  • GetTaskPayloadResultResponse
  • TaskInputResponseRequest
  • TaskInputResponseRequestParams
    Can you do a quick pass and make sure we've deleted all the examples that are tied to the schema we are removing.

Also I think we are missing a changelog comment

@CaitieM20 CaitieM20 added final SEP finalized. and removed accepted SEP accepted by core maintainers, but still requires final wording and reference implementation. labels May 15, 2026
Resolve conflicts keeping tasks out of core schema (moved to extension),
incorporate SEP-2260 final status, SEP-2549 TTL additions, and add MRTR
changelog entry from main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CaitieM20 CaitieM20 merged commit 3395973 into modelcontextprotocol:main May 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extension final SEP finalized. roadmap/agents Roadmap: Agent Communication (Tasks lifecycle) SEP

Projects

Status: Review Batch

Development

Successfully merging this pull request may close these issues.