Skip to content

.NET: [Bug]: ForeachExecutor loses iteration state (_index, _values) after checkpoint restore — loop exits immediately on resume #5009

@yurii-beketov

Description

@yurii-beketov

Description

Describe the bug

When a Question action (human-in-the-loop / ExternalInputRequest) is placed inside a Foreach loop in a declarative workflow, the loop terminates after the first iteration upon resume. Instead of continuing to the next item in the collection, the loop exits immediately after the user answers the question.

To reproduce

  1. Create a declarative workflow with a Foreach loop iterating over a collection of 3+ items
  2. Inside the loop body, include a Question action
  3. Execute the workflow — the first iteration runs, the Question pauses the workflow
  4. Resume the workflow by providing the user's answer
  5. Expected: The loop advances to iteration 2 and asks the question again
  6. Actual: The loop exits immediately after the first answer; no further iterations occur

Root cause analysis

ForeachExecutor stores its iteration state as private instance fields:

// ForeachExecutor.cs
private int _index;
private FormulaValue[] _values;

These fields are populated during ExecuteAsync (evaluates Model.Items into _values, resets _index = 0) and advanced in TakeNextAsync (_index++).

The problem: When a checkpoint is created (e.g., when a Question pauses the workflow), these fields are not persisted. On resume:

  1. The workflow is rebuilt from YAML → a new ForeachExecutor instance is created with _index = 0 and _values = [] (constructor defaults)
  2. OnCheckpointRestoredAsync is called, but ForeachExecutor does not override it — the base DeclarativeActionExecutor implementation only restores PowerFx formula scopes via WorkflowFormulaState.RestoreAsync
  3. ExecuteAsync is not re-invoked (the loop already started before the checkpoint)
  4. After the Question is answered, control flows back to TakeNextAsync, which checks:
    if (this.HasValue = this._index < this._values.Length)  // 0 < 0 → false
  5. HasValue = false → the loop exits

Contrast with QuestionExecutor, which correctly survives checkpoint/restore because it uses DurableProperty<T> for its mutable state:

// QuestionExecutor.cs — these survive checkpoints
private readonly DurableProperty<int> _promptCount = new(nameof(_promptCount));
private readonly DurableProperty<bool> _hasExecuted = new(nameof(_hasExecuted));

DurableProperty reads/writes through context.ReadStateAsync / context.QueueStateUpdateAsync, which are included in the checkpoint's StateManager export. ForeachExecutor uses plain private int/FormulaValue[] instead, making them invisible to the checkpoint system.

Suggested fix

ForeachExecutor should persist _index and restore both _index and _values across checkpoints. Two approaches:

Option A — Override OnCheckpointRestoredAsync:

// In ForeachExecutor:
private readonly DurableProperty<int> _durableIndex = new(nameof(_durableIndex));

public override async ValueTask TakeNextAsync(IWorkflowContext context, object? _, CancellationToken cancellationToken)
{
    if (this.HasValue = this._index < this._values.Length)
    {
        // ... existing code ...
        this._index++;
        await this._durableIndex.WriteAsync(context, this._index).ConfigureAwait(false);
    }
}

protected override async ValueTask OnCheckpointRestoredAsync(IWorkflowContext context, CancellationToken cancellationToken = default)
{
    await base.OnCheckpointRestoredAsync(context, cancellationToken);

    // Re-evaluate Items expression (same logic as ExecuteAsync)
    EvaluationResult<DataValue> expressionResult = this.Evaluator.GetValue(this.Model.Items);
    if (expressionResult.Value is TableDataValue tableValue)
    {
        this._values = [.. tableValue.Values.Select(value => value.Properties.Values.First().ToFormula())];
    }
    else
    {
        this._values = [expressionResult.Value.ToFormula()];
    }

    // Restore index from durable state
    this._index = await this._durableIndex.ReadAsync(context).ConfigureAwait(false);
}

Option B — Use OnCheckpointingAsync + OnCheckpointRestoredAsync to serialize/deserialize both _index and _values via context.QueueStateUpdateAsync / context.ReadStateAsync.

Additional context

  • This affects any action that triggers checkpoint/resume inside a foreach loop, not only Question — it would also impact RequestExternalInput, function calls that pause the workflow, etc.
  • The existing ForeachExecutorTest unit tests do not cover checkpoint/restore scenarios.
  • The ForeachTemplate.cs (T4 code-gen) has the same issue since it mirrors ForeachExecutor.

Version

Observed on the main branch of microsoft/agent-framework (dotnet, Microsoft.Agents.AI.Workflows.Declarative).

Code Sample

Error Messages / Stack Traces

Package Versions

Microsoft.Agents.AI.Workflows.Declarative: 1.0.0-rc2

.NET Version

.NET 10

Additional Context

No response

Metadata

Metadata

Assignees

Labels

.NETbugSomething isn't working

Type

No fields configured for Bug.

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions