Skip to content

Python: Fix incorrect workflow timings in DevUI by adding created_at to executor events#5615

Merged
eavanvalkenburg merged 7 commits into
microsoft:mainfrom
moonbox3:agent/fix-5545-1
May 5, 2026
Merged

Python: Fix incorrect workflow timings in DevUI by adding created_at to executor events#5615
eavanvalkenburg merged 7 commits into
microsoft:mainfrom
moonbox3:agent/fix-5545-1

Conversation

@moonbox3
Copy link
Copy Markdown
Contributor

@moonbox3 moonbox3 commented May 4, 2026

Motivation and Context

Workflow steps in DevUI appeared to take seconds apart even when they completed nearly instantly. This was caused by the frontend falling back to a synthetic timestamp that forced a minimum 1-second gap between events, because executor-related events emitted by the backend contained no created_at field.

Fixes #5545

Description

The root cause was two-fold: CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent had no created_at field, so the mapper emitted them without timestamps; and workflow-view.tsx synthesized _uiTimestamp using Math.max(Math.floor(Date.now()/1000), lastTimestamp + 1) — integer-second precision with a forced +1 s gap — never consulting event.created_at. The fix adds an optional created_at: float field to both event models, populates it with time.time() in the mapper for all three executor event paths (invoked, completed, failed), and updates both call sites in workflow-view.tsx to prefer event.created_at over the synthetic fallback. A regression test was added to test_mapper.py to ensure executor-mapped events always carry a created_at timestamp going forward.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

Copilot and others added 2 commits May 4, 2026 02:16
…orkflow timings (microsoft#5545)

CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent lacked a
created_at field, causing the frontend to synthesize timestamps using integer-second
precision with a forced +1s minimum gap between events. This made instant workflows
appear to take 3+ seconds in the DevUI timeline.

Fix:
- Add optional created_at: float | None field to both custom event models
- Populate created_at=float(time.time()) in the mapper for executor_invoked,
  executor_completed, and executor_failed events

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
workflow-view.tsx synthesized _uiTimestamp using Math.max(baseTimestamp,
lastTimestamp + 1) with integer-second precision, forcing a minimum 1-second
gap between every sequential event. This made instant workflows appear to take
several seconds in the DevUI timeline.

The fix prefers event.created_at (a float Unix timestamp populated by the
backend mapper for all executor events) and only falls back to the synthetic
timestamp when created_at is absent. This matches the pattern already used in
devuiStore.ts:addDebugEvent.

Added a regression test in test_mapper.py verifying that the mapper attaches
created_at to all executor lifecycle events (invoked, completed, failed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 4, 2026 02:27
@moonbox3 moonbox3 self-assigned this May 4, 2026
@moonbox3 moonbox3 added the python label May 4, 2026
@moonbox3
Copy link
Copy Markdown
Contributor Author

moonbox3 commented May 4, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
TOTAL31050359288% 
report-only-changed-files is enabled. No files were changed during this commit :)

Python Unit Test Overview

Tests Skipped Failures Errors Time
6235 30 💤 0 ❌ 0 🔥 1m 43s ⏱️

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 91% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by moonbox3's agents

Copy link
Copy Markdown
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The PR correctly fixes the workflow timing bug by (1) adding created_at: float | None = None to CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent, (2) having the mapper populate created_at=float(time.time()) for all executor events, and (3) updating workflow-view.tsx to prefer the real event timestamp over the synthesized Math.max(baseTimestamp, lastTimestamp+1) fallback. The devuiStore.ts already had the same created_at-preference pattern, so the PR makes workflow-view.tsx consistent. The old bug-characterization tests are properly replaced by a passing regression test in test_mapper.py. No correctness issues found.

✓ Security Reliability

The fix is correct and complete: created_at fields are added to CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent, the mapper populates them with float(time.time()) for all three executor event paths, and the frontend now prefers the event's own created_at over the synthesized lastTimestamp + 1 value. One minor reliability concern: when created_at is present, the previous monotonic guarantee (Math.max(baseTimestamp, lastTimestamp + 1)) is bypassed. In the rare case of out-of-order event delivery or clock skew, two events could share the same _uiTimestamp. In practice, given sequential mapper execution and Python's float(time.time()) sub-second precision, this risk is negligible. The test reorganisation is sound: the deleted bug-repro tests are superseded by the new passing regression test in test_mapper.py.

✓ Test Coverage

The PR deletes a dedicated bug-documenting test file (test_workflow_timings_bug.py) and replaces it with a well-formed regression test in test_mapper.py, while also fixing the frontend timestamp logic to prefer created_at over synthesized values. The mapper already sets created_at=float(time.time()) unconditionally on all three executor event paths (lines 1059, 1092, 1126 of _mapper.py) and both custom model classes now declare created_at as an optional float field (lines 67, 81 of _openai_custom.py). The new test correctly uses an isolated mapper2 to avoid context-state cross-contamination for the failed path, which is an improvement over the deleted test. One minor gap: the deleted file contained test_custom_event_models_lack_created_at_field, which directly asserted that created_at is present in CustomResponseOutputItemAddedEvent.model_fields and CustomResponseOutputItemDoneEvent.model_fields. That model-level guard is not replaced by the new mapper-output test; if created_at were removed from either model, the mapper would raise a ValidationError (not produce None), so in practice the gap is caught differently, but the direct model-field assertion provided a clearer signal. No blocking issues found. The frontend change has no automated test coverage, but there is no existing frontend test infrastructure in the repo.

✗ Design Approach

The PR moves the UI toward using real event timestamps, but the approach is still too narrow for the workflow timeline contract it supports today. workflow-view.tsx now only looks for a top-level created_at, while the still-supported response.workflow_event.completed fallback shape carries its time in data.timestamp from the mapper. Because execution-timeline.tsx renders those fallback events from _uiTimestamp, they will continue to pick up the old synthetic Math.max(baseTimestamp, lastTimestamp + 1) clock and can still show fabricated 1-second gaps.

Flagged Issues

  • workflow-view.tsx only normalizes top-level created_at, but the supported fallback event response.workflow_event.completed is produced by the mapper with data.timestamp (_mapper.py:1241-1248). execution-timeline.tsx uses _uiTimestamp for that fallback path (:258-260, :343-369), so those workflow events still receive the fabricated lastTimestamp + 1 synthetic timeline. The timestamp normalization in workflow-view.tsx should also consume openAIEvent.data.timestamp before synthesizing a value.

Suggestions

  • The new test test_executor_events_carry_created_at_timestamp asserts getattr(event, 'created_at', None) is not None but does not verify the value is a valid Unix timestamp. Add assert event.created_at > 0 to guard against zero or negative values being emitted.
  • Using created_at directly removes the previous monotonic guarantee. If two events share the same float timestamp (e.g., rapid mapper calls on a low-resolution clock), _uiTimestamp will collide. Consider a tiebreaker: const uniqueTimestamp = eventTimestamp !== undefined ? Math.max(eventTimestamp, lastTimestamp) : Math.max(baseTimestamp, lastTimestamp + 1);
  • The deleted test_custom_event_models_lack_created_at_field directly verified created_at is present in model_fields for both custom event classes. Consider adding a model-level assertion to test_mapper.py or a dedicated model test so that accidentally removing the field produces a clear named failure rather than a downstream ValidationError.
  • Consider consolidating timestamp normalization into a single helper that reads created_at, then data.timestamp, then falls back to synthesis. This fixes the broader event-shape problem rather than only the custom executor-item path.

Automated review by moonbox3's agents

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect workflow step timings in the DevUI by ensuring executor-related streamed events include a usable timestamp and teaching the frontend to prefer it.

Changes:

  • Add optional created_at to DevUI custom output-item event models (response.output_item.added / done).
  • Populate created_at for executor invoked/completed/failed events in the Python DevUI mapper.
  • Update workflow-view.tsx to prefer event.created_at over the synthetic +1s timestamp scheme, and add a regression test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
python/packages/devui/tests/devui/test_mapper.py Adds regression test asserting executor-mapped events always carry created_at.
python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Prefers created_at when generating _uiTimestamp for timeline/debug display.
python/packages/devui/agent_framework_devui/models/_openai_custom.py Extends custom output-item event models with optional created_at.
python/packages/devui/agent_framework_devui/_mapper.py Sets created_at=time.time() on executor output-item events.

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated
Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated
- Read data.timestamp (ISO string) and response.created_at in addition
  to top-level created_at when deriving _uiTimestamp, so
  response.workflow_event.completed events get a real server timestamp
  instead of a synthesized one
- Change uniqueTimestamp tiebreaker: when a real server timestamp is
  available use Math.max(eventTimestamp, lastTimestamp) rather than
  lastTimestamp + 1, eliminating artificial 1-second gaps while still
  preserving monotonic ordering
- Apply the same fix in the HIL streaming path (second setOpenAIEvents
  call in workflow-view.tsx)
- Add assert event.created_at > 0 to regression test to guard against
  zero or negative timestamps
- Add test_custom_output_item_event_models_have_created_at_field model-
  level test so removing the field produces a clear named failure rather
  than a downstream ValidationError

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 89%

✗ Correctness

The new timestamp extraction logic correctly adds response.created_at (branch 2) and data.timestamp (branch 3) fallbacks, and the test additions are sound. However, both instances of the data.timestamp branch (lines 591 and 1022) produce a raw new Date(...).getTime() / 1000 that can be NaN when the ISO string is unparseable (e.g., Python microsecond precision without a timezone indicator in edge environments). Because NaN !== undefined evaluates to true, the subsequent eventTimestamp !== undefined guard on lines 602–603 and 1033–1034 does not intercept it, Math.max(NaN, lastTimestamp) returns NaN, and _uiTimestamp is silently set to NaN, corrupting timeline ordering for every downstream consumer.

✗ Security Reliability

The multi-source timestamp extraction logic introduces a reliability hole in the data.timestamp branch: new Date(anyInvalidString).getTime() / 1000 silently produces NaN. Because NaN !== undefined is true, the eventTimestamp !== undefined guard does not catch this, so Math.max(NaN, lastTimestamp) propagates NaN as _uiTimestamp. This breaks timeline rendering for that event (execution-timeline.tsx line 260: event._uiTimestamp * 1000 becomes NaN) and resets the monotonic ordering seed (NaN || 0 → 0) for all subsequent events. The same pattern appears in both duplicate blocks at lines 590–591 and 1021–1022. The Python test additions are correct and well-scoped.

✓ Test Coverage

The Python-side test additions are solid: the new created_at > 0 assertion on executor events is meaningful, and the model-field presence test guards against accidental field removal. The mapper does populate created_at=float(time.time()) for both executor event types, so these tests will exercise real behaviour. One gap: the referenced failing test (test_workflow_timings_bug.py) is absent from the source tree (only a stale .pyc exists), meaning there is no test verifying the two new frontend timestamp-extraction paths (response.created_at and data.timestamp). The frontend changes themselves are logically sound, but they are entirely untested. Additionally, the test_custom_output_item_event_models_have_created_at_field test does not verify that the mapper actually populates created_at with a non-None value when constructing these events — it only checks the model schema. A companion integration-style assertion (similar to test_executor_events_carry_created_at_timestamp) would close that gap.

✗ Design Approach

I found one design-level regression in the frontend timing change. Removing the forced monotonic increment for real timestamps fixes the visible 1-second gaps, but the fallback workflow timeline still treats _uiTimestamp as part of a synthetic per-run identifier. That means same-second executor events can now collapse onto the same synthetic ID, which reintroduces incorrect merging of fallback runs and their output buckets.

Flagged Issues

  • Lines 590–591 and 1021–1022: new Date(isoString).getTime() / 1000 yields NaN for unparseable strings (e.g. Python's datetime.now().isoformat() with microsecond precision and no Z). NaN !== undefined is true, so the eventTimestamp !== undefined guard does not catch it. Math.max(NaN, lastTimestamp) returns NaN, setting _uiTimestamp: NaN, which breaks timeline rendering at execution-timeline.tsx:260 and silently resets the monotonic ordering seed at workflow-view.tsx:597 (NaN || 0 → 0). Guard the parsed value with Number.isFinite() before use.
  • Line 603: replacing the forced monotonic increment with Math.max(eventTimestamp, lastTimestamp) removes the uniqueness guarantee the fallback timeline depends on. execution-timeline.tsx:359–360 builds syntheticItemId from executorId + uiTimestamp, and keys run output/state off that ID at lines 367, 382, and 408. Two runs of the same executor within the same second now share an identical _uiTimestamp and their fallback entries collide. Ordering and identity should be decoupled — use sequence_number or a local counter to guarantee uniqueness.

Suggestions

  • Add test_workflow_timings_bug.py to the source tree (currently only a stale .pyc exists). Without it there is no regression guard for the response.created_at and data.timestamp frontend extraction paths.
  • Extend test_custom_output_item_event_models_have_created_at_field (or add a companion test) to assert that mapper.convert_event(...) actually populates created_at with a non-None, positive value — mirroring the pattern in test_executor_events_carry_created_at_timestamp.

Automated review by moonbox3's agents

Comment thread python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx Outdated
Copilot and others added 2 commits May 4, 2026 02:54
…, add regression tests

- workflow-view.tsx (×2): Wrap data.timestamp ISO→number conversion in a
  Number.isFinite() guard.  Python's datetime.now().isoformat() emits
  microseconds without a trailing 'Z' (e.g. '2024-01-15T12:34:56.123456'),
  which some JS engines cannot parse, returning NaN.  NaN !== undefined is
  true so the eventTimestamp !== undefined guard did not catch it, poisoning
  _uiTimestamp and resetting the monotonic ordering seed (NaN || 0 → 0).

- execution-timeline.tsx: Replace uiTimestamp in the fallback syntheticItemId
  with the per-executor runNumber counter.  Two runs of the same executor
  within the same second previously received identical _uiTimestamp values
  and therefore identical syntheticItemIds, causing their output buckets,
  state, and run entries to collide (execution-timeline.tsx:360–408).

- Add missing test_workflow_timings_bug.py source file (only a stale .pyc
  existed).  Three regression tests:
    · test_custom_event_models_lack_created_at_field – model field guard
    · test_workflow_executor_events_lack_created_at – mapper populates created_at
    · test_rapid_workflow_events_have_no_top_level_timestamps – confirms
      data.timestamp format that requires the frontend NaN guard

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread python/packages/devui/tests/devui/test_workflow_timings_bug.py Outdated
Comment thread python/packages/devui/tests/devui/test_workflow_timings_bug.py Outdated
Copilot and others added 2 commits May 4, 2026 04:46
…ated bug file

- Delete test_workflow_timings_bug.py; tests belong in existing module files
- The two tests already present in test_mapper.py (test_executor_events_carry_created_at_timestamp
  and test_custom_output_item_event_models_have_created_at_field) cover the same ground as the
  first two tests in the deleted file
- Add test_executor_completed_maps_to_output_item_done_event to test_mapper.py, replacing the
  third test from the deleted file with a generic, issue-agnostic name and docstring

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@moonbox3 moonbox3 enabled auto-merge May 4, 2026 05:14
@moonbox3 moonbox3 added this pull request to the merge queue May 4, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue May 5, 2026
Merged via the queue into microsoft:main with commit 27f9266 May 5, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: Workflow timings in DevUI are incorrect

4 participants