Skip to content

.NET: Fix off-thread RunStatus race where GetStatusAsync can return Running after ResumeAsync halts#5412

Open
peibekwe wants to merge 3 commits intomainfrom
peibekwe/workflow-test-fix
Open

.NET: Fix off-thread RunStatus race where GetStatusAsync can return Running after ResumeAsync halts#5412
peibekwe wants to merge 3 commits intomainfrom
peibekwe/workflow-test-fix

Conversation

@peibekwe
Copy link
Copy Markdown
Contributor

Motivation and Context

In StreamingRunEventStream.RunLoopAsync, after every _inputWaiter.WaitForInputAsync wake-up, _runStatus was being flipped to Running unconditionally. This can lead stale signal to flip status back to running after consumer already observed halt signal.

Fixes #5411

Description

Only flip RunStatus to Running when there are messages to process.
LockstepRunEventStream is not affected — it runs supersteps synchronously on the consumer thread and has no comparable wake-up race.

This addresses one of the intermittent workflow test failures in the pipeline.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@peibekwe peibekwe self-assigned this Apr 21, 2026
Copilot AI review requested due to automatic review settings April 21, 2026 21:23
@moonbox3 moonbox3 added .NET workflows Related to Workflows in agent-framework labels Apr 21, 2026
@github-actions github-actions Bot changed the title Fix off-thread RunStatus race where GetStatusAsync can return Running after ResumeAsync halts .NET: Fix off-thread RunStatus race where GetStatusAsync can return Running after ResumeAsync halts Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a race in the off-thread workflow run loop where RunStatus could transiently flip back to Running after a halt had already been observed, and adds a stress regression test to prevent reintroduction.

Changes:

  • Update StreamingRunEventStream.RunLoopAsync to set RunStatus.Running only when there are unprocessed messages to run.
  • Add an off-thread stress regression test that repeatedly runs the Step9 sample to catch the previously intermittent status race.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs Prevents stale input signals from transiently setting RunStatus back to Running when there is no work to process.
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/SampleSmokeTest.cs Adds a stress regression test targeting the Step9 multi-response resume scenario in off-thread execution.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@peibekwe peibekwe marked this pull request as ready for review April 21, 2026 23:08
@peibekwe peibekwe requested review from alliscode and lokitoth April 21, 2026 23:08
@peibekwe peibekwe added this pull request to the merge queue Apr 22, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 22, 2026
@peibekwe peibekwe added this pull request to the merge queue Apr 22, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET workflows Related to Workflows in agent-framework

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: [Bug]: Workflows - Fix off-thread RunStatus race where GetStatusAsync can return Running after ResumeAsync halts

5 participants