Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions core/docs/design-decouple-cancellation-token.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Design: Decouple CancellationToken from Incoming HTTP Request

## Problem

When a bot handler performs long-running work — most notably streaming LLM responses back to Teams — the `CancellationToken` passed into the handler is tied to the lifetime of the **incoming HTTP request** (`HttpContext.RequestAborted`). Teams closes that connection once it receives the initial HTTP response (typically within ~15 seconds), which fires the cancellation token and aborts any in-flight outbound calls the handler is still making.

### Observed behavior

```
dbug: HTTP POST .../v3/conversations/.../activities/... Response Status 202
fail: Error processing activity: Id=...
System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection:
The I/O operation has been aborted because of either a thread exit or an application request.
```

The exception propagates through the OpenAI streaming pipeline, through `BotApplication.ProcessAsync`, and surfaces as a 500 to the ASP.NET middleware — even though the bot was functioning correctly.

### Why this matters

Streaming bots send responses via the **Bot Connector API** (`ConversationClient.SendActivityAsync`), not through the original HTTP response body. The handler legitimately outlives the HTTP request, so cancellation of that request should **not** cancel the handler's work.

## Solution

Replace the HTTP-bound `CancellationToken` with a **configurable timeout-based token** inside `BotApplication.ProcessAsync`.

### Changes

#### 1. `BotApplicationOptions.ProcessActivityTimeout`

A new property on `BotApplicationOptions`:

```csharp
public TimeSpan ProcessActivityTimeout { get; set; } = TimeSpan.FromMinutes(5);
```

- **Default: 5 minutes** — long enough for streaming LLM responses, short enough to prevent runaway handlers.
- Set to `Timeout.InfiniteTimeSpan` to disable the timeout entirely.
- Configurable per application instance via DI / builder options.

#### 2. `BotApplication.ProcessAsync` — token replacement

Before this change:

```csharp
CancellationToken token = Debugger.IsAttached ? CancellationToken.None : cancellationToken;
await MiddleWare.RunPipelineAsync(this, activity, this.OnActivity, 0, token);
```

After:

```csharp
using var cts = new CancellationTokenSource(_processActivityTimeout);
CancellationToken token = Debugger.IsAttached ? CancellationToken.None : cts.Token;
await MiddleWare.RunPipelineAsync(this, activity, this.OnActivity, 0, token);
```

The HTTP request's `cancellationToken` is no longer forwarded to the handler pipeline.

#### 3. Graceful timeout handling

A new catch clause handles the timeout without crashing:

```csharp
catch (OperationCanceledException) when (cts.IsCancellationRequested)
{
_logger.LogWarning("Activity processing timed out after {Timeout}: Id={Id}",
_processActivityTimeout, activity.Id);
}
```

This prevents `BotHandlerException` from being thrown when the timeout fires, which is a recoverable situation (the handler simply took too long).

## Design Decisions

### Why not keep the HTTP token as a linked source?

Using `CancellationTokenSource.CreateLinkedTokenSource(cancellationToken)` would still propagate HTTP disconnection to the handler — defeating the purpose. The HTTP request completing is an **expected** event for streaming handlers, not an error signal.

### Why a timeout instead of `CancellationToken.None`?

Unbounded processing is a resource leak risk. A timeout provides a safety net:
- Prevents handlers from running indefinitely if the LLM or external service hangs.
- Gives operators a tuning knob via `ProcessActivityTimeout`.
- Preserves the existing `Debugger.IsAttached` → `CancellationToken.None` escape hatch for debugging.

### Why handle this at the framework level?

- Every streaming bot would need the same workaround in user code.
- The framework owns the token plumbing and is the right place to define its semantics.
- Non-streaming bots are unaffected — 5 minutes is generous for synchronous handlers and can be reduced via options.

### Impact on non-streaming bots

Non-streaming handlers that complete within the HTTP request lifetime are unaffected. The 5-minute default is well above typical synchronous handler durations. Apps that want tighter timeouts can set `ProcessActivityTimeout` to a lower value.

## Alternatives Considered

| Alternative | Drawback |
|---|---|
| Catch `TaskCanceledException` in each sample/handler | Pushes framework responsibility to every consumer; easy to forget |
| Use `CancellationToken.None` unconditionally | No timeout safety net; runaway handlers can leak resources |
| Expose a `bool IsStreaming` flag to switch behavior | Over-engineered; all handlers benefit from decoupling |
| Let ASP.NET Core's `RequestTimeout` middleware handle it | That controls the *HTTP* timeout, not the *handler processing* timeout — different concerns |
12 changes: 11 additions & 1 deletion core/src/Microsoft.Teams.Bot.Core/BotApplication.cs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ public class BotApplication
private readonly ILogger<BotApplication> _logger;
private readonly ConversationClient? _conversationClient;
private readonly UserTokenClient? _userTokenClient;
private readonly TimeSpan _processActivityTimeout = TimeSpan.FromMinutes(5);
internal TurnMiddleware MiddleWare { get; }

/// <summary>
Expand Down Expand Up @@ -46,6 +47,7 @@ public BotApplication(ConversationClient conversationClient, UserTokenClient use
MiddleWare = new TurnMiddleware();
_conversationClient = conversationClient;
_userTokenClient = userTokenClient;
_processActivityTimeout = options.ProcessActivityTimeout;
logger.LogInformationGuarded("Started {ThisType} listener for AppID:{AppId} with SDK version {SdkVersion}", GetType().Name, options.AppId, Version);
}

Expand Down Expand Up @@ -101,11 +103,19 @@ public virtual async Task ProcessAsync(HttpContext httpContext, CancellationToke
using (_logger.BeginScope("ActivityType={ActivityType} ActivityId={ActivityId} ServiceUrl={ServiceUrl} MSCV={MSCV}",
activity.Type, activity.Id, activity.ServiceUrl, httpContext.Request.GetCorrelationVector()))
{
// Use a dedicated timeout instead of the HTTP request's cancellation token.
// The HTTP token fires when the client disconnects, which is expected for
// streaming handlers that outlive the original request.
using var cts = new CancellationTokenSource(_processActivityTimeout);
try
{
CancellationToken token = Debugger.IsAttached ? CancellationToken.None : cancellationToken;
CancellationToken token = Debugger.IsAttached ? CancellationToken.None : cts.Token;
await MiddleWare.RunPipelineAsync(this, activity, this.OnActivity, 0, token).ConfigureAwait(false);
}
catch (OperationCanceledException) when (cts.IsCancellationRequested)
{
_logger.LogWarning("Activity processing timed out after {Timeout}: Id={Id}", _processActivityTimeout, activity.Id);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing activity: Id={Id}", activity.Id);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,12 @@ public sealed class BotApplicationOptions
/// Gets or sets the application (client) ID, used for logging and diagnostics.
/// </summary>
public string AppId { get; set; } = string.Empty;

/// <summary>
/// Gets or sets the maximum time allowed for processing an incoming activity.
/// This timeout replaces the HTTP request's cancellation token so that handlers
/// (especially streaming handlers) are not canceled when the incoming HTTP connection closes.
/// Defaults to 5 minutes. Set to <see cref="Timeout.InfiniteTimeSpan"/> to disable the timeout.
/// </summary>
public TimeSpan ProcessActivityTimeout { get; set; } = TimeSpan.FromMinutes(5);
}
10 changes: 6 additions & 4 deletions core/test/Microsoft.Teams.Bot.Core.UnitTests/MiddlewareTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -156,11 +156,13 @@ public async Task Middleware_ReceivesCancellationToken()

DefaultHttpContext httpContext = CreateHttpContextWithActivity(activity);

CancellationTokenSource cts = new();

await botApp.ProcessAsync(httpContext, cts.Token);
await botApp.ProcessAsync(httpContext);

Assert.Equal(cts.Token, receivedToken);
// ProcessAsync creates its own timeout-based CancellationToken instead of
// forwarding the caller's token, so the middleware should receive a valid
// (non-default) token that is not yet cancelled.
Assert.NotEqual(default, receivedToken);
Assert.False(receivedToken.IsCancellationRequested);
}

[Fact]
Expand Down
Loading