Skip to content

fix(apps): resolve streaming error handling for 2-minute timeout & content stream not allowed#453

Open
Athosone wants to merge 10 commits into
microsoft:mainfrom
Athosone:fix/http-stream-rate-limit
Open

fix(apps): resolve streaming error handling for 2-minute timeout & content stream not allowed#453
Athosone wants to merge 10 commits into
microsoft:mainfrom
Athosone:fix/http-stream-rate-limit

Conversation

@Athosone

@Athosone Athosone commented Jun 6, 2026

Copy link
Copy Markdown

PR has evolved due to a number of raised issues wrt streaming error handling. This PR solely focuses on handling the 403 for the 2-minute timeout, and surfacing the "Content stream is not allowed" error.

This table highlights all the different error codes, we are concerned with the 403s:
https://learn.microsoft.com/en-us/microsoftteams/platform/bots/streaming-ux?tabs=csharp#error-codes

  1. Content stream is not allowed -> Error is surfaced, not as a cancelled error
  2. Content stream is not allowed on an already completed streamed message -> Not possible with the SDK
  3. Content stream finished due to exceeded streaming time. -> Handled in this PR. A new regular message is sent instead.
  4. Content stream was canceled by user -> Already handled today
  5. Request streamed content should contain the previously streamed content -> Not possible with the SDK

TO BE DISCUSSED:

  1. Bursty informative updates - continued discussion in HttpStream sends streaming activities faster than the 1 request/second limit #452 to reach an ideal solution
  2. Message size too large -> To be discussed in smoother error handling for "message size too large" on stream #488

OLD DESCRIPTION:

Fixes #452

Problem

HttpStream can send streaming activities faster than the Teams 1 request/second
per-stream limit. Teams then throttles the stream and returns
403 ContentStreamNotAllowed, which _send converts to StreamCancelledError
and the user sees as a red error toast.

Two causes in http_stream.py:

  1. _flush() sent every queued informative update back to back, then the text
    chunk. A burst of update() calls became a burst of POSTs in one flush.
  2. The only pacing was a call_later(0.5, ...) reschedule, armed only when a
    backlog survived a flush. Once the queue drained, _timeout was None, so the
    next emit() fired an immediate flush with no pacing.

Fix

A per-stream leaky-slot limiter (make_limiter in utils/limiter.py), gating
every chunk send in _send_activity. Its next_slot is instance state shared
across flushes, so it paces both the in-flush burst and the next emit() after
the queue drains. The first send is not delayed, so the progress bar still
appears promptly.

  • min_send_interval on HttpStream.__init__ (default 1.0s) lets callers buffer
    toward the docs' 1.5 to 2s advice.
  • coalesce_informative_updates defaults to True: a burst of informative
    updates in one flush collapses to the latest, matching the docs' "one
    informative message, reused for each update". Set it False to pace out every
    update at 1 req/s instead (see Notes).

Tests

Limiter spacing and input validation; rapid update() burst coalesces to the
latest by default; with coalesce_informative_updates=False every update is sent
in order, none dropped, paced at the interval; emits are paced across flushes (the
post-drain case); the interval is configurable; the flag never drops text.
Existing test_http_stream.py behaviour is unchanged. Full packages/apps suite
passes, ruff and pyright clean.

Notes

  • The limiter is acquired in _send, so every HTTP attempt is paced: retries and
    close()'s final send included. acquire() only waits when a send would
    actually be too soon, so close pays no latency unless it lands within the
    interval of the last chunk. Pass min_send_interval=0 to disable pacing.
  • With coalesce_informative_updates=False, _flush holds its lock across the
    paced sleeps, so a very long informative burst could keep the lock long enough
    that close() times out. The default (coalesce on) avoids this.
  • The same gap likely exists in the sibling TS and .NET SDKs; not addressed here.

Copilot AI review requested due to automatic review settings June 6, 2026 21:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Introduces a reusable async rate limiter and integrates it into HttpStream to pace outbound activity sends, with additional tests covering pacing and coalescing behavior.

Changes:

  • Added make_limiter() utility for async rate limiting.
  • Updated HttpStream to enforce a per-stream minimum send interval and optionally coalesce informative updates.
  • Added/updated tests to validate pacing, ordering, and coalescing semantics.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/apps/src/microsoft_teams/apps/utils/limiter.py New async limiter utility based on monotonic time and asyncio.sleep().
packages/apps/src/microsoft_teams/apps/utils/init.py Exposes make_limiter via package exports.
packages/apps/src/microsoft_teams/apps/http_stream.py Adds pacing/coalescing knobs and enforces pacing via limiter in _send_activity.
packages/apps/tests/test_limiter.py New unit test validating limiter spacing semantics.
packages/apps/tests/test_http_stream.py Updates existing tests to use shorter intervals; adds new pacing/coalescing tests.

from typing import Awaitable, Callable


def make_limiter(rate: int, period: float = 1.0) -> Callable[[], Awaitable[None]]:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already handled: make_limiter raises ValueError for rate < 1 and period < 0, so rate == 0 (ZeroDivisionError) and negative periods fail fast at construction. period == 0 is allowed on purpose and means "no pacing" (interval 0), which is how callers disable the limiter.

Comment thread packages/apps/src/microsoft_teams/apps/utils/limiter.py Outdated
Comment thread packages/apps/src/microsoft_teams/apps/http_stream.py Outdated
Comment thread packages/apps/tests/test_http_stream.py Outdated
HttpStream could send streaming activities faster than the Teams
1 request/second per-stream limit, which gets the stream throttled and
returned as 403 ContentStreamNotAllowed (surfaced to the user as a red
error toast). Two causes: _flush() sent every queued informative update
back to back plus the text chunk, and the only pacing was a 0.5s
reschedule that was dropped once the queue drained, so the next emit()
fired immediately.

Add a per-stream leaky-slot limiter (make_limiter) and gate every chunk
send through it in _send_activity. Because the limiter state is shared
across flushes, it paces both the in-flush burst and the next post-drain
emit. min_send_interval is exposed (default 1.0s) so callers can buffer
toward the docs' 1.5-2s advice. A burst of informative updates in one
flush coalesces to the latest by default (coalesce_informative_updates,
matching the docs' one-reused-message guidance); set it False to pace
out every update instead.
@Athosone Athosone force-pushed the fix/http-stream-rate-limit branch from 813b322 to 402286b Compare June 6, 2026 21:34
@Athosone

Athosone commented Jun 6, 2026

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="Michelin"

Acquire the per-stream limiter in _send instead of _send_activity so every
HTTP attempt is paced, including retries and close()'s final send. This keeps
a retry or the final message from landing within the interval of the last
chunk and tripping the Teams 1 req/s throttle. acquire() only waits when a
send would actually be too soon, so close() pays no latency after an idle gap;
min_send_interval=0 disables pacing. Tests that don't assert pacing construct
with min_send_interval=0 to stay fast.

Addresses Copilot review feedback on microsoft#453.
Comment thread packages/apps/src/microsoft_teams/apps/http_stream.py Outdated
Comment thread packages/apps/src/microsoft_teams/apps/http_stream.py Outdated
Comment thread packages/apps/src/microsoft_teams/apps/http_stream.py Outdated
Comment thread packages/apps/src/microsoft_teams/apps/utils/limiter.py Outdated
@Athosone

Copy link
Copy Markdown
Author

Thanks a lot for taking the time to review @lilyydu I will come back to you soon.

Btw I noticed two other issues while playing with it:

  1. 2 minutes timeout hard-coded for the streaming response. This is a problem for us, we are running a chat bot that helps our support user to resolve incident, doing RCA etc. sometimes it can takes some times to gather all the required info, would you mind if I proposed a new issue/pr to address this? The fix would be to allow the user to configure the timeout via the app config?
  2. When we send a payload to the stream if it is too big we receive a 413 entity too large, The fix I would like to propose is that the httpstream chunk the payload and send it in batch of "MAX_SIZE". I can also make a PR/issue if you are ok with it?

However I wonder if the httpstream should carry that responsability alone, are these constraints only tight to it?

…ault, simpler limiter

- Plumb stream_min_send_interval and stream_coalesce_informative_updates
  through AppOptions -> ActivitySender -> HttpStream
- Flip coalesce_informative_updates default to False so informative
  updates are paced, not silently dropped; coalescing is now opt-in
- Simplify make_limiter(rate, period) to make_limiter(interval) and
  reword its docstring as a fixed-interval gate (token bucket of size 1)
@heyitsaamir

Copy link
Copy Markdown
Collaborator

Thanks for the push here!

Some thoughts:

  1. Let’s not expose configurable flags for this. Informative updates should be coalesced by default because they’re status replacements, not cumulative content. If we skip intermediate informative updates, let’s log only the count at debug level.

  2. I think HttpStream should have two modes:

    • informative mode: before text streaming starts, send the latest pending informative update
    • text mode: once text streaming starts, do not send informative updates anymore

Can we still use the same retry path and the same send pacing/limiter for informative updates, text chunks, retries, and final close.

@heyitsaamir

Copy link
Copy Markdown
Collaborator

Thanks a lot for taking the time to review @lilyydu I will come back to you soon.

Btw I noticed two other issues while playing with it:

  1. 2 minutes timeout hard-coded for the streaming response. This is a problem for us, we are running a chat bot that helps our support user to resolve incident, doing RCA etc. sometimes it can takes some times to gather all the required info, would you mind if I proposed a new issue/pr to address this? The fix would be to allow the user to configure the timeout via the app config?
  2. When we send a payload to the stream if it is too big we receive a 413 entity too large, The fix I would like to propose is that the httpstream chunk the payload and send it in batch of "MAX_SIZE". I can also make a PR/issue if you are ok with it?

However I wonder if the httpstream should carry that responsability alone, are these constraints only tight to it?

Yeah these are outlined in the official streaming docs. Admittedly, the streaming system here could help by just closing the stream and just doing regular updates.
Were you thinking something along the same lines?

I reached out to the folks that implemented streaming to ask what their preferred recommendation would be for these cases.

@Athosone

Copy link
Copy Markdown
Author

Hi !
Thanks for your feedback,

If this is a platform limit that you cannot increase then yes we could fallback updating the message (this is what I am doing on my app at the moment but the experience is not as nice as with a stream :p).

For your comment on this PR I totally agree and will update the code, removing the flag in app config and coalesce the informative update and applying the limiter to different code path that emit bytes to the platform.

It matches my experience in the sense of, the user doesnt really care for the next narrative update of an agent or the result of a tool call if you already have the answer ready to be sent.

Athosone and others added 3 commits June 10, 2026 21:42
…stream modes

Informative updates are status replacements, not cumulative content, so a
burst now always collapses to the latest one (skipped count logged at debug
level) instead of being paced out one-by-one or gated behind a flag.

The stream now has two explicit modes: informative mode sends the latest
pending informative update per flush; once text streaming starts the stream
permanently switches to text mode and informative updates are dropped. Text
landing in the same flush as informative updates supersedes them.

Removes the stream_min_send_interval / stream_coalesce_informative_updates
plumbing from AppOptions and ActivitySender; pacing stays fixed at the Teams
1 req/s limit and all sends keep going through the same limiter/retry path.
…eping

An informative-only flush after text streaming starts used to fall through
and re-send the unchanged cumulative text, burning a paced limiter slot
under the flush lock for a no-op request. The chunk send is now guarded on
text actually having been added by this flush.

Also from the cleanup pass: track the latest informative update with a
scalar and counter instead of building a list that only ever yields its
last element, drop the unreachable start_length branch, use the typed
ChannelData.stream_type field instead of getattr, trim a comment that
restated the limiter docs, and deduplicate test boilerplate behind
_await_pending_flush/_track_send_times helpers (tests with no timing
assertions now run with pacing disabled).
@lilyydu

lilyydu commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Hey @Athosone,

We're still waiting on guidance from internal folks, but I'm gonna test out your implementation in the meantime. Thanks for responding to our requests!

@lilyydu

lilyydu commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Hey @Athosone,

Can you share what error message you get back for the 403 you are receiving from the 1 request /sec limitation? There's a couple that fall into that bucket here https://learn.microsoft.com/en-us/microsoftteams/platform/bots/streaming-ux?tabs=csharp#error-codes and want to make sure we cover our bases

@lilyydu

lilyydu commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Hey @Athosone,

Can you share what error message you get back for the 403 you are receiving from the 1 request /sec limitation? There's a couple that fall into that bucket here https://learn.microsoft.com/en-us/microsoftteams/platform/bots/streaming-ux?tabs=csharp#error-codes and want to make sure we cover our bases

@Athosone soft bump so we can unblock you!

@Athosone

Copy link
Copy Markdown
Author

SOrry about that, totally missed the notification,

I got different error messages:

The bot failed to complete the streaming process within the strict time limit of two minutes

And

The bot sent a message that exceeds the current [message size](https://learn.microsoft.com/en-us/microsoftteams/platform/bots/how-to/format-your-bot-messages) restriction

For the first one I morph the stream to a message

For the second I implemented a chunker so that my adapter abstract the complexity to my main code.

@lilyydu

lilyydu commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Hey @Athosone ,

First, thanks for all the investigation and work you've done here! Streaming can definitely be improved so we highly appreciate your feedback and suggestions. A number of issues have been raised so we think it's best to isolate each into its own respective PR

In regard to the bursty informative updates, I repro'd the code snippets you attached, and it does result in 429s but these are handled via our retry logic. However, we do agree that we should have some mechanism to control the bursty behaviour. I tried out the fix you're proposing here with 100 informative updates but it coalesced into one single msg - not sure if that's the ideal experience too. Let's continue bouncing ideas in #452 for this

The 403s you were receiving are due to the 2 errors you attached above (2 minute limitation and chunk size). We currently return all 403s as "cancelled" which does not properly showcase the error table I linked above. I'm going to update this PR to surface the 2 minute error instead and then open up additional PRs following after to surface the other errors.

Apologies for the delay with resolving this but glad we are surfacing and inching closer to a better streaming exp :)

lilyydu and others added 2 commits June 24, 2026 15:36
Removes the per-stream leaky-bucket limiter and associated changes,
resetting the affected files to their main baseline. This provides a
clean starting point for a new implementation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lilyydu lilyydu changed the title fix(apps): rate-limit HttpStream to 1 req/s streaming limit fix(apps): resolve streaming error handling for 2-minute timeout Jun 26, 2026
@lilyydu

lilyydu commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Updated the description:

PR has evolved due to a number of raised issues wrt streaming error handling. This PR solely focuses on handling the 403 for the 2-minute timeout, and surfacing the "Content stream is not allowed" error.

This table highlights all the different error codes, we are concerned with the 403s:
https://learn.microsoft.com/en-us/microsoftteams/platform/bots/streaming-ux?tabs=csharp#error-codes

  1. Content stream is not allowed -> Error is surfaced, not as a cancelled error
  2. Content stream is not allowed on an already completed streamed message -> Not possible with the SDK
  3. Content stream finished due to exceeded streaming time. -> Handled in this PR. A new regular message is sent instead.
  4. Content stream was canceled by user -> Already handled today
  5. Request streamed content should contain the previously streamed content -> Not possible with the SDK

TO BE DISCUSSED:

  1. Bursty informative updates - continued discussion in HttpStream sends streaming activities faster than the 1 request/second limit #452 to reach an ideal solution
  2. Message size too large -> To be discussed in smoother error handling for "message size too large" on stream #488

@lilyydu lilyydu changed the title fix(apps): resolve streaming error handling for 2-minute timeout fix(apps): resolve streaming error handling for 2-minute timeout & content stream not allowed Jun 26, 2026
Comment on lines +33 to +34
stream_min_send_interval: float = 1.0,
stream_coalesce_informative_updates: bool = False,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stream_coalesce_informative_updates should be true always. let's not expose this.

def __init__(
self,
client: Client,
stream_min_send_interval: float = 1.0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also don't think we need this.

Comment on lines +353 to +357
if message != "Content stream was cancelled by user.":
if message == "Content stream finished due to exceeded streaming time.":
self._timed_out = True
logger.warning("Teams encountered an error while streaming. Sending as a regular message.")
raise StreamCancelledError(message) from e

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will raise the StreamCancelled error to the handler. We should probably raise a different error here.

Can we just do a switch statement here for various types of canceled errors? Also point to the learn docs that describe these.

if e.response.status_code == 403:
error = e.response.json().get("error", {})
message = error.get("message", "")
if message != "Content stream was cancelled by user.":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also make sure that the backend has tests for these particular strings

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and maybe a more resilient error could be to just check for phrases like "canceled", "exceeded streaming time" etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HttpStream sends streaming activities faster than the 1 request/second limit

4 participants