Skip to content

.NET: [Bug]: DurableTask: SuperstepState.AccumulatedEvents overflows CustomStatus 16 KB cap on multi-executor workflows with typed outputs #5745

@mfalkiewicz

Description

@mfalkiewicz

Summary

For Durable Functions workflows built on Microsoft.Agents.AI.DurableTask, the orchestrator-level SuperstepState.AccumulatedEvents accumulates serialized executor events across every superstep and projects the cumulative list into Durable Functions' CustomStatus via SetCustomStatus(LiveStatus). Because CustomStatus is capped at 16 KB UTF-16 by Durable Functions, any workflow whose executor lifecycle events sum to more than ~16 KB hard-fails at the orchestrator level once that ceiling is crossed — there is no public hook to trim, suppress, or compact AccumulatedEvents from runtime code.

Observed in production usage on Microsoft.Agents.AI.DurableTask 1.5.0-preview.260507.1 with Microsoft.Agents.AI.Hosting.AzureFunctions.

The mechanism (decompiled from 1.5.0-preview.260507.1)

Microsoft.Agents.AI.DurableTask ships an internal SuperstepState whose AccumulatedEvents list lives for the entire orchestration:

// SuperstepState
public List<string> AccumulatedEvents { get; } = new List<string>();
public DurableWorkflowLiveStatus LiveStatus { get; } = new DurableWorkflowLiveStatus();

The orchestrator's per-executor loop appends each activity's serialized events:

state.AccumulatedEvents.AddRange(executorResultInfo.Events);
flag |= executorResultInfo.HaltRequested;
if (!context.IsReplaying)
{
    PublishEventsToLiveStatus(context, state);
}

PublishEventsToLiveStatus then projects the cumulative list directly into CustomStatus:

private static void PublishEventsToLiveStatus(TaskOrchestrationContext context, SuperstepState state)
{
    state.LiveStatus.Events = state.AccumulatedEvents;
    context.SetCustomStatus((object)state.LiveStatus);
}

executorResultInfo.Events comes from each activity's IWorkflowContext.OutboundEvents, which carries ExecutorInvokedEvent, ExecutorCompletedEvent, and WorkflowOutputEvent entries — the WorkflowOutputEvent payload contains the executor's typed output. Across N executors, AccumulatedEvents grows by the serialized size of each.

Repro

Any MAF workflow registered against the AzureFunctions companion host with:

  • 6+ scoped executors chained sequentially
  • typed outputs in the 1–4 KB serialized range (i.e. records with several fields plus a moderately sized string property, common for document-extraction / structured-response workflows)
  • run via the Functions companion (Durable Functions backed)

Hits this at the orchestrator step that runs PublishEventsToLiveStatus after the cumulative size crosses 16 KB:

Microsoft.Azure.WebJobs.Extensions.DurableTask:
CustomStatus is too large: limit = 16 KB (UTF-16), actual = 17.32 KB.

Concrete repro available in connells-tech/ai-agent-runtime-dusty — the document-processing workflow (src/Dusty.WorkflowFunctions/DocumentProcessingFunctionsWorkflow.cs) chains 6 executors (entry-unpack → validate-inputs → load-configuration → extract-layout → analyse-document → validate-outputs). The orchestrator fails at superstep 7 with exactly the message above. Per-activity sizes are well-behaved; cumulative is not.

Why per-activity workarounds don't fix it

ConnellsGroup.Ai.Agent.Base ships an opt-in ScopedWorkflowExecutor with enableStatusSlimming: true that clears ExecutorInvokedEvent and prior ExecutorCompletedEvent from IWorkflowContext.OutboundEvents after the application handler returns. This trims what executorResultInfo.Events will contain when MAF reads OutboundEvents post-return — but the orchestrator's AccumulatedEvents already holds events from prior supersteps, and those cannot be reached from inside an activity. Verified empirically: the same workflow fails at the same byte count (17.32 KB) with and without the slimming hook enabled.

Manual reflection workarounds against OutboundEvents have the same limitation — they can only affect the current activity's contribution. The orchestrator's SuperstepState is not exposed to executors.

Proposed fixes (any one would unblock)

  1. Make AccumulatedEvents trim-able from a public hook. Expose an option on WorkflowOptions (or equivalent) like OnSuperstepCompleted: Func<IReadOnlyList<string>, IReadOnlyList<string>> that lets the host filter/compact the accumulated event log before PublishEventsToLiveStatus runs. Lowest-risk; orchestrators that don't opt in retain current behavior.

  2. Reset AccumulatedEvents per superstep. Change PublishEventsToLiveStatus to publish only the current superstep's events, on the grounds that Durable history retains everything for after-the-fact retrieval, and CustomStatus is the live-status projection. Replace the cumulative semantic with a per-superstep one. Behavioral change but matches the way external streaming consumers typically poll (latest delta, not full history).

  3. Skip serializing executor lifecycle events into CustomStatus by default. Most production consumers care about LiveStatus.PendingEvents (HITL request ports) and final results, not per-executor ExecutorInvokedEvent/ExecutorCompletedEvent payloads. An opt-in to omit lifecycle event payloads from LiveStatus.Events while keeping PendingEvents and result metadata would solve this for the common case.

Option 1 is the most flexible and the smallest behavioral change; option 3 is the easiest to recommend by default since the data is duplicated in history anyway.

Workarounds we're using today

  • enableStatusSlimming: true on ScopedWorkflowExecutor (helps for small workflows; insufficient here)
  • Synchronous extraction path that bypasses the orchestrator entirely (works, but loses the durable-execution semantics this whole satellite exists to provide)
  • Splitting workflows into smaller chained orchestrations (significant refactor)

Impact

Any consumer building a non-trivial structured-output workflow on Microsoft.Agents.AI.DurableTask + Azure Functions will hit this once executor count × per-executor output size exceeds the ~16 KB envelope. The threshold is reached quickly for document processing, multi-step extraction, or any workflow with typed outputs larger than a few hundred bytes per executor.

Versions

  • Microsoft.Agents.AI.DurableTask 1.5.0-preview.260507.1
  • Microsoft.Agents.AI.Workflows 1.5.0-preview.260507.1
  • Microsoft.Agents.AI.Hosting.AzureFunctions 1.5.0-preview.260507.1
  • Microsoft.Azure.Functions.Worker (Azure Functions Core Tools 4.9.0)
  • Microsoft.Azure.WebJobs.Extensions.DurableTask 3.8.2

Filed via Claude Code on behalf of @mfalkiewicz (Connells Group) — happy to provide a minimal repro repo / additional reflection-dump if useful.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions