Python: Fix incorrect workflow timings in DevUI by adding `created_at` to executor events by moonbox3 · Pull Request #5615 · microsoft/agent-framework

moonbox3 · 2026-05-04T02:27:58Z

Motivation and Context

Workflow steps in DevUI appeared to take seconds apart even when they completed nearly instantly. This was caused by the frontend falling back to a synthetic timestamp that forced a minimum 1-second gap between events, because executor-related events emitted by the backend contained no created_at field.

Fixes #5545

Description

The root cause was two-fold: CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent had no created_at field, so the mapper emitted them without timestamps; and workflow-view.tsx synthesized _uiTimestamp using Math.max(Math.floor(Date.now()/1000), lastTimestamp + 1) — integer-second precision with a forced +1 s gap — never consulting event.created_at. The fix adds an optional created_at: float field to both event models, populates it with time.time() in the mapper for all three executor event paths (invoked, completed, failed), and updates both call sites in workflow-view.tsx to prefer event.created_at over the synthetic fallback. A regression test was added to test_mapper.py to ensure executor-mapped events always carry a created_at timestamp going forward.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…orkflow timings (microsoft#5545) CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent lacked a created_at field, causing the frontend to synthesize timestamps using integer-second precision with a forced +1s minimum gap between events. This made instant workflows appear to take 3+ seconds in the DevUI timeline. Fix: - Add optional created_at: float | None field to both custom event models - Populate created_at=float(time.time()) in the mapper for executor_invoked, executor_completed, and executor_failed events Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

workflow-view.tsx synthesized _uiTimestamp using Math.max(baseTimestamp, lastTimestamp + 1) with integer-second precision, forcing a minimum 1-second gap between every sequential event. This made instant workflows appear to take several seconds in the DevUI timeline. The fix prefers event.created_at (a float Unix timestamp populated by the backend mapper for all executor events) and only falls back to the synthetic timestamp when created_at is absent. This matches the pattern already used in devuiStore.ts:addDebugEvent. Added a regression test in test_mapper.py verifying that the mapper attaches created_at to all executor lifecycle events (invoked, completed, failed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3 · 2026-05-04T02:30:35Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	31050	3592	88%

report-only-changed-files is enabled. No files were changed during this commit :)

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
6235	30 💤	0 ❌	0 🔥	1m 43s ⏱️

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 91% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The PR correctly fixes the workflow timing bug by (1) adding created_at: float | None = None to CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent, (2) having the mapper populate created_at=float(time.time()) for all executor events, and (3) updating workflow-view.tsx to prefer the real event timestamp over the synthesized Math.max(baseTimestamp, lastTimestamp+1) fallback. The devuiStore.ts already had the same created_at-preference pattern, so the PR makes workflow-view.tsx consistent. The old bug-characterization tests are properly replaced by a passing regression test in test_mapper.py. No correctness issues found.

✓ Security Reliability

The fix is correct and complete: created_at fields are added to CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent, the mapper populates them with float(time.time()) for all three executor event paths, and the frontend now prefers the event's own created_at over the synthesized lastTimestamp + 1 value. One minor reliability concern: when created_at is present, the previous monotonic guarantee (Math.max(baseTimestamp, lastTimestamp + 1)) is bypassed. In the rare case of out-of-order event delivery or clock skew, two events could share the same _uiTimestamp. In practice, given sequential mapper execution and Python's float(time.time()) sub-second precision, this risk is negligible. The test reorganisation is sound: the deleted bug-repro tests are superseded by the new passing regression test in test_mapper.py.

✓ Test Coverage

The PR deletes a dedicated bug-documenting test file (test_workflow_timings_bug.py) and replaces it with a well-formed regression test in test_mapper.py, while also fixing the frontend timestamp logic to prefer created_at over synthesized values. The mapper already sets created_at=float(time.time()) unconditionally on all three executor event paths (lines 1059, 1092, 1126 of _mapper.py) and both custom model classes now declare created_at as an optional float field (lines 67, 81 of _openai_custom.py). The new test correctly uses an isolated mapper2 to avoid context-state cross-contamination for the failed path, which is an improvement over the deleted test. One minor gap: the deleted file contained test_custom_event_models_lack_created_at_field, which directly asserted that created_at is present in CustomResponseOutputItemAddedEvent.model_fields and CustomResponseOutputItemDoneEvent.model_fields. That model-level guard is not replaced by the new mapper-output test; if created_at were removed from either model, the mapper would raise a ValidationError (not produce None), so in practice the gap is caught differently, but the direct model-field assertion provided a clearer signal. No blocking issues found. The frontend change has no automated test coverage, but there is no existing frontend test infrastructure in the repo.

✗ Design Approach

The PR moves the UI toward using real event timestamps, but the approach is still too narrow for the workflow timeline contract it supports today. workflow-view.tsx now only looks for a top-level created_at, while the still-supported response.workflow_event.completed fallback shape carries its time in data.timestamp from the mapper. Because execution-timeline.tsx renders those fallback events from _uiTimestamp, they will continue to pick up the old synthetic Math.max(baseTimestamp, lastTimestamp + 1) clock and can still show fabricated 1-second gaps.

Flagged Issues

workflow-view.tsx only normalizes top-level created_at, but the supported fallback event response.workflow_event.completed is produced by the mapper with data.timestamp (_mapper.py:1241-1248). execution-timeline.tsx uses _uiTimestamp for that fallback path (:258-260, :343-369), so those workflow events still receive the fabricated lastTimestamp + 1 synthetic timeline. The timestamp normalization in workflow-view.tsx should also consume openAIEvent.data.timestamp before synthesizing a value.

Suggestions

The new test test_executor_events_carry_created_at_timestamp asserts getattr(event, 'created_at', None) is not None but does not verify the value is a valid Unix timestamp. Add assert event.created_at > 0 to guard against zero or negative values being emitted.
Using created_at directly removes the previous monotonic guarantee. If two events share the same float timestamp (e.g., rapid mapper calls on a low-resolution clock), _uiTimestamp will collide. Consider a tiebreaker: const uniqueTimestamp = eventTimestamp !== undefined ? Math.max(eventTimestamp, lastTimestamp) : Math.max(baseTimestamp, lastTimestamp + 1);
The deleted test_custom_event_models_lack_created_at_field directly verified created_at is present in model_fields for both custom event classes. Consider adding a model-level assertion to test_mapper.py or a dedicated model test so that accidentally removing the field produces a clear named failure rather than a downstream ValidationError.
Consider consolidating timestamp normalization into a single helper that reads created_at, then data.timestamp, then falls back to synthesis. This fixes the broader event-shape problem rather than only the custom executor-item path.

Automated review by moonbox3's agents

Copilot

Pull request overview

Fixes incorrect workflow step timings in the DevUI by ensuring executor-related streamed events include a usable timestamp and teaching the frontend to prefer it.

Changes:

Add optional created_at to DevUI custom output-item event models (response.output_item.added / done).
Populate created_at for executor invoked/completed/failed events in the Python DevUI mapper.
Update workflow-view.tsx to prefer event.created_at over the synthetic +1s timestamp scheme, and add a regression test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
python/packages/devui/tests/devui/test_mapper.py	Adds regression test asserting executor-mapped events always carry `created_at`.
python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx	Prefers `created_at` when generating `_uiTimestamp` for timeline/debug display.
python/packages/devui/agent_framework_devui/models/_openai_custom.py	Extends custom output-item event models with optional `created_at`.
python/packages/devui/agent_framework_devui/_mapper.py	Sets `created_at=time.time()` on executor output-item events.

- Read data.timestamp (ISO string) and response.created_at in addition to top-level created_at when deriving _uiTimestamp, so response.workflow_event.completed events get a real server timestamp instead of a synthesized one - Change uniqueTimestamp tiebreaker: when a real server timestamp is available use Math.max(eventTimestamp, lastTimestamp) rather than lastTimestamp + 1, eliminating artificial 1-second gaps while still preserving monotonic ordering - Apply the same fix in the HIL streaming path (second setOpenAIEvents call in workflow-view.tsx) - Add assert event.created_at > 0 to regression test to guard against zero or negative timestamps - Add test_custom_output_item_event_models_have_created_at_field model- level test so removing the field produces a clear named failure rather than a downstream ValidationError Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 89%

✗ Correctness

The new timestamp extraction logic correctly adds response.created_at (branch 2) and data.timestamp (branch 3) fallbacks, and the test additions are sound. However, both instances of the data.timestamp branch (lines 591 and 1022) produce a raw new Date(...).getTime() / 1000 that can be NaN when the ISO string is unparseable (e.g., Python microsecond precision without a timezone indicator in edge environments). Because NaN !== undefined evaluates to true, the subsequent eventTimestamp !== undefined guard on lines 602–603 and 1033–1034 does not intercept it, Math.max(NaN, lastTimestamp) returns NaN, and _uiTimestamp is silently set to NaN, corrupting timeline ordering for every downstream consumer.

✗ Security Reliability

The multi-source timestamp extraction logic introduces a reliability hole in the data.timestamp branch: new Date(anyInvalidString).getTime() / 1000 silently produces NaN. Because NaN !== undefined is true, the eventTimestamp !== undefined guard does not catch this, so Math.max(NaN, lastTimestamp) propagates NaN as _uiTimestamp. This breaks timeline rendering for that event (execution-timeline.tsx line 260: event._uiTimestamp * 1000 becomes NaN) and resets the monotonic ordering seed (NaN || 0 → 0) for all subsequent events. The same pattern appears in both duplicate blocks at lines 590–591 and 1021–1022. The Python test additions are correct and well-scoped.

✓ Test Coverage

The Python-side test additions are solid: the new created_at > 0 assertion on executor events is meaningful, and the model-field presence test guards against accidental field removal. The mapper does populate created_at=float(time.time()) for both executor event types, so these tests will exercise real behaviour. One gap: the referenced failing test (test_workflow_timings_bug.py) is absent from the source tree (only a stale .pyc exists), meaning there is no test verifying the two new frontend timestamp-extraction paths (response.created_at and data.timestamp). The frontend changes themselves are logically sound, but they are entirely untested. Additionally, the test_custom_output_item_event_models_have_created_at_field test does not verify that the mapper actually populates created_at with a non-None value when constructing these events — it only checks the model schema. A companion integration-style assertion (similar to test_executor_events_carry_created_at_timestamp) would close that gap.

✗ Design Approach

I found one design-level regression in the frontend timing change. Removing the forced monotonic increment for real timestamps fixes the visible 1-second gaps, but the fallback workflow timeline still treats _uiTimestamp as part of a synthetic per-run identifier. That means same-second executor events can now collapse onto the same synthetic ID, which reintroduces incorrect merging of fallback runs and their output buckets.

Flagged Issues

Lines 590–591 and 1021–1022: new Date(isoString).getTime() / 1000 yields NaN for unparseable strings (e.g. Python's datetime.now().isoformat() with microsecond precision and no Z). NaN !== undefined is true, so the eventTimestamp !== undefined guard does not catch it. Math.max(NaN, lastTimestamp) returns NaN, setting _uiTimestamp: NaN, which breaks timeline rendering at execution-timeline.tsx:260 and silently resets the monotonic ordering seed at workflow-view.tsx:597 (NaN || 0 → 0). Guard the parsed value with Number.isFinite() before use.
Line 603: replacing the forced monotonic increment with Math.max(eventTimestamp, lastTimestamp) removes the uniqueness guarantee the fallback timeline depends on. execution-timeline.tsx:359–360 builds syntheticItemId from executorId + uiTimestamp, and keys run output/state off that ID at lines 367, 382, and 408. Two runs of the same executor within the same second now share an identical _uiTimestamp and their fallback entries collide. Ordering and identity should be decoupled — use sequence_number or a local counter to guarantee uniqueness.

Suggestions

Add test_workflow_timings_bug.py to the source tree (currently only a stale .pyc exists). Without it there is no regression guard for the response.created_at and data.timestamp frontend extraction paths.
Extend test_custom_output_item_event_models_have_created_at_field (or add a companion test) to assert that mapper.convert_event(...) actually populates created_at with a non-None, positive value — mirroring the pattern in test_executor_events_carry_created_at_timestamp.

Automated review by moonbox3's agents

…, add regression tests - workflow-view.tsx (×2): Wrap data.timestamp ISO→number conversion in a Number.isFinite() guard. Python's datetime.now().isoformat() emits microseconds without a trailing 'Z' (e.g. '2024-01-15T12:34:56.123456'), which some JS engines cannot parse, returning NaN. NaN !== undefined is true so the eventTimestamp !== undefined guard did not catch it, poisoning _uiTimestamp and resetting the monotonic ordering seed (NaN || 0 → 0). - execution-timeline.tsx: Replace uiTimestamp in the fallback syntheticItemId with the per-executor runNumber counter. Two runs of the same executor within the same second previously received identical _uiTimestamp values and therefore identical syntheticItemIds, causing their output buckets, state, and run entries to collide (execution-timeline.tsx:360–408). - Add missing test_workflow_timings_bug.py source file (only a stale .pyc existed). Three regression tests: · test_custom_event_models_lack_created_at_field – model field guard · test_workflow_executor_events_lack_created_at – mapper populates created_at · test_rapid_workflow_events_have_no_top_level_timestamps – confirms data.timestamp format that requires the frontend NaN guard Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…imings in DevUI are incorrect

…ated bug file - Delete test_workflow_timings_bug.py; tests belong in existing module files - The two tests already present in test_mapper.py (test_executor_events_carry_created_at_timestamp and test_custom_output_item_event_models_have_created_at_field) cover the same ground as the first two tests in the deleted file - Add test_executor_completed_maps_to_output_item_done_event to test_mapper.py, replacing the third test from the deleted file with a generic, issue-agnostic name and docstring Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot and others added 2 commits May 4, 2026 02:16

Copilot AI review requested due to automatic review settings May 4, 2026 02:27

moonbox3 self-assigned this May 4, 2026

moonbox3 added the python label May 4, 2026

Copilot started reviewing on behalf of moonbox3 May 4, 2026 02:28 View session

github-actions Bot reviewed May 4, 2026

View reviewed changes

moonbox3 commented May 4, 2026

View reviewed changes

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated

Copilot AI reviewed May 4, 2026

View reviewed changes

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated

moonbox3 commented May 4, 2026

View reviewed changes

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated

Copilot and others added 2 commits May 4, 2026 02:54

Address review feedback for microsoft#5545: Python: [Bug]: Workflow t…

0ba2089

…imings in DevUI are incorrect

moonbox3 commented May 4, 2026

View reviewed changes

Comment thread python/packages/devui/tests/devui/test_workflow_timings_bug.py Outdated

Comment thread python/packages/devui/tests/devui/test_workflow_timings_bug.py Outdated

Copilot and others added 2 commits May 4, 2026 04:46

Address review feedback for microsoft#5545: review comment fixes

eb30420

eavanvalkenburg approved these changes May 4, 2026

View reviewed changes

moonbox3 enabled auto-merge May 4, 2026 05:14

TaoChenOSU approved these changes May 4, 2026

View reviewed changes

moonbox3 added this pull request to the merge queue May 4, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026

eavanvalkenburg added this pull request to the merge queue May 5, 2026

Merged via the queue into microsoft:main with commit 27f9266 May 5, 2026
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix incorrect workflow timings in DevUI by adding `created_at` to executor events#5615

Python: Fix incorrect workflow timings in DevUI by adding `created_at` to executor events#5615
eavanvalkenburg merged 7 commits into
microsoft:mainfrom
moonbox3:agent/fix-5545-1

moonbox3 commented May 4, 2026

Uh oh!

moonbox3 commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

moonbox3 commented May 4, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✗ Correctness

✗ Security Reliability

✓ Test Coverage

✗ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moonbox3 commented May 4, 2026 •

edited

Loading