.NET: Fixing issue where OpenTelemetry span is never exported in .NET in-process workflow execution#4196
Open
alliscode wants to merge 6 commits intomicrosoft:mainfrom
Open
Conversation
…ity never stopped in streaming OffThread path The WorkflowRunActivity_IsStopped_Streaming_OffThread test demonstrates that the workflow.run OpenTelemetry Activity created in StreamingRunEventStream.RunLoopAsync is started but never stopped when using the OffThread/Default streaming execution. The background run loop keeps running after event consumption completes, so the using Activity? declaration never disposes until explicit StopAsync() is called. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> 2. Fix workflow.run Activity never stopped in streaming OffThread execution (microsoft#4155) The workflow.run OpenTelemetry Activity in StreamingRunEventStream.RunLoopAsync was scoped to the method lifetime via 'using'. Since the run loop only exits on cancellation, the Activity was never stopped/exported until explicit disposal. Fix: Remove 'using' and explicitly dispose the Activity when the workflow reaches Idle status (all supersteps complete). A safety-net disposal in the finally block handles cancellation and error paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TaoChenOSU
reviewed
Feb 23, 2026
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs
Outdated
Show resolved
Hide resolved
TaoChenOSU
reviewed
Feb 23, 2026
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to ensure OpenTelemetry workflow-run spans (Activity) are reliably stopped/disposed (and therefore exported) during .NET in-process workflow execution, including streaming scenarios, and adds regression tests around activity lifecycle behavior.
Changes:
- Updated
StreamingRunEventStream.RunLoopAsyncto manually manage the workflow-runActivitylifecycle (stop onIdleand ensure disposal on loop exit). - Added
WorkflowRunActivityStopTeststo assert workflow-run activities are started and stopped across multiple execution modes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs | Changes workflow-run Activity disposal timing to stop/export spans earlier and adds a safety-net disposal on exit. |
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/WorkflowRunActivityStopTests.cs | Adds regression coverage validating workflow-run activities are stopped/disposed in lockstep, off-thread, and streaming usage. |
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/WorkflowRunActivityStopTests.cs
Show resolved
Hide resolved
TaoChenOSU
reviewed
Feb 23, 2026
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs
Outdated
Show resolved
Hide resolved
…\nImplements two-level telemetry hierarchy per PR feedback from lokitoth:\n- workflow.session: spans the entire run loop / stream lifetime\n- workflow_invoke: per input-to-halt cycle, nested within the session\n\nThis ensures the session activity stays open across multiple turns,\nwhile individual run activities are created and disposed per cycle.\n\nAlso fixes linkedSource CancellationTokenSource disposal leak in\nStreamingRunEventStream (added using declaration)."
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/LockstepRunEventStream.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/StreamingRunEventStream.cs
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/Observability/WorkflowTelemetryContext.cs
Outdated
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/ObservabilityTests.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/WorkflowRunActivityStopTests.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/Execution/LockstepRunEventStream.cs
Outdated
Show resolved
Hide resolved
Member
Author
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Member
Author
|
@copilot open a new pull request to apply changes based on the comments in this thread |
…dd error tag\n\n1. LockstepRunEventStream: Remove 'using' from Activity in async iterator\n and manually dispose in finally block (fixes microsoft#4155 pattern). Also dispose\n linkedSource CTS in finally to prevent leak.\n2. Tags.cs: Add ErrorMessage (\"error.message\") tag for runtime errors,\n distinct from BuildErrorMessage (\"build.error.message\").\n3. ActivityNames: Rename WorkflowRun from \"workflow_invoke\" to \"workflow.run\"\n for cross-language consistency.\n4. WorkflowTelemetryContext: Fix XML doc to say \"outer/parent span\" instead\n of \"root-level span\".\n5. ObservabilityTests: Assert WorkflowSession absence when DisableWorkflowRun\n is true.\n6. WorkflowRunActivityStopTests: Fix streaming test race by disposing\n StreamingRun before asserting activities are stopped.\n7. StreamingRunEventStream/LockstepRunEventStream: Use Tags.ErrorMessage\n instead of Tags.BuildErrorMessage for runtime error events."
…urce, move SessionStarted earlier\n\n- Revert ActivityNames.WorkflowRun back to \"workflow_invoke\" (OTEL semantic convention contract)\n- Use 'using' declaration for linkedSource CTS in LockstepRunEventStream (no timing sensitivity)\n- Move SessionStarted event before WaitForInputAsync in StreamingRunEventStream to match Lockstep behavior"
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request addresses the issue where workflow run telemetry spans (
Activityobjects) were not always properly stopped and exported, particularly in streaming and lockstep execution environments. The changes ensure that workflow run activities are disposed as soon as the workflow reaches the idle state or when the run loop exits, preventing telemetry data from being lost. Additionally, comprehensive regression tests are added to verify correct activity lifecycle management.Improvements to Activity Lifecycle Management:
workflow.runActivityis disposed immediately when the workflow reaches theIdlestate, so telemetry spans are promptly exported rather than waiting for cancellation or disposal.workflow.runActivityif it was not already stopped when the run loop exits, covering cancellation and error scenarios.usingstatement from the activity initialization to allow manual control over the activity's disposal timing.Testing and Regression Coverage:
WorkflowRunActivityStopTests.csto verify that workflow run activities are always properly stopped and exported to telemetry backends, covering lockstep, off-thread, and streaming execution environments, as well as ensuring that all started activities are stopped.Closes #4155
Contribution Checklist