Python: Fix incorrect workflow timings in DevUI by adding created_at to executor events#5615
Conversation
…orkflow timings (microsoft#5545) CustomResponseOutputItemAddedEvent and CustomResponseOutputItemDoneEvent lacked a created_at field, causing the frontend to synthesize timestamps using integer-second precision with a forced +1s minimum gap between events. This made instant workflows appear to take 3+ seconds in the DevUI timeline. Fix: - Add optional created_at: float | None field to both custom event models - Populate created_at=float(time.time()) in the mapper for executor_invoked, executor_completed, and executor_failed events Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
workflow-view.tsx synthesized _uiTimestamp using Math.max(baseTimestamp, lastTimestamp + 1) with integer-second precision, forcing a minimum 1-second gap between every sequential event. This made instant workflows appear to take several seconds in the DevUI timeline. The fix prefers event.created_at (a float Unix timestamp populated by the backend mapper for all executor events) and only falls back to the synthetic timestamp when created_at is absent. This matches the pattern already used in devuiStore.ts:addDebugEvent. Added a regression test in test_mapper.py verifying that the mapper attaches created_at to all executor lifecycle events (invoked, completed, failed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✓ Correctness
The PR correctly fixes the workflow timing bug by (1) adding
created_at: float | None = NonetoCustomResponseOutputItemAddedEventandCustomResponseOutputItemDoneEvent, (2) having the mapper populatecreated_at=float(time.time())for all executor events, and (3) updatingworkflow-view.tsxto prefer the real event timestamp over the synthesizedMath.max(baseTimestamp, lastTimestamp+1)fallback. ThedevuiStore.tsalready had the samecreated_at-preference pattern, so the PR makesworkflow-view.tsxconsistent. The old bug-characterization tests are properly replaced by a passing regression test intest_mapper.py. No correctness issues found.
✓ Security Reliability
The fix is correct and complete:
created_atfields are added toCustomResponseOutputItemAddedEventandCustomResponseOutputItemDoneEvent, the mapper populates them withfloat(time.time())for all three executor event paths, and the frontend now prefers the event's owncreated_atover the synthesizedlastTimestamp + 1value. One minor reliability concern: whencreated_atis present, the previous monotonic guarantee (Math.max(baseTimestamp, lastTimestamp + 1)) is bypassed. In the rare case of out-of-order event delivery or clock skew, two events could share the same_uiTimestamp. In practice, given sequential mapper execution and Python'sfloat(time.time())sub-second precision, this risk is negligible. The test reorganisation is sound: the deleted bug-repro tests are superseded by the new passing regression test intest_mapper.py.
✓ Test Coverage
The PR deletes a dedicated bug-documenting test file (test_workflow_timings_bug.py) and replaces it with a well-formed regression test in test_mapper.py, while also fixing the frontend timestamp logic to prefer created_at over synthesized values. The mapper already sets created_at=float(time.time()) unconditionally on all three executor event paths (lines 1059, 1092, 1126 of _mapper.py) and both custom model classes now declare created_at as an optional float field (lines 67, 81 of _openai_custom.py). The new test correctly uses an isolated mapper2 to avoid context-state cross-contamination for the failed path, which is an improvement over the deleted test. One minor gap: the deleted file contained test_custom_event_models_lack_created_at_field, which directly asserted that created_at is present in CustomResponseOutputItemAddedEvent.model_fields and CustomResponseOutputItemDoneEvent.model_fields. That model-level guard is not replaced by the new mapper-output test; if created_at were removed from either model, the mapper would raise a ValidationError (not produce None), so in practice the gap is caught differently, but the direct model-field assertion provided a clearer signal. No blocking issues found. The frontend change has no automated test coverage, but there is no existing frontend test infrastructure in the repo.
✗ Design Approach
The PR moves the UI toward using real event timestamps, but the approach is still too narrow for the workflow timeline contract it supports today.
workflow-view.tsxnow only looks for a top-levelcreated_at, while the still-supportedresponse.workflow_event.completedfallback shape carries its time indata.timestampfrom the mapper. Becauseexecution-timeline.tsxrenders those fallback events from_uiTimestamp, they will continue to pick up the old syntheticMath.max(baseTimestamp, lastTimestamp + 1)clock and can still show fabricated 1-second gaps.
Flagged Issues
-
workflow-view.tsxonly normalizes top-levelcreated_at, but the supported fallback eventresponse.workflow_event.completedis produced by the mapper withdata.timestamp(_mapper.py:1241-1248).execution-timeline.tsxuses_uiTimestampfor that fallback path (:258-260,:343-369), so those workflow events still receive the fabricatedlastTimestamp + 1synthetic timeline. The timestamp normalization inworkflow-view.tsxshould also consumeopenAIEvent.data.timestampbefore synthesizing a value.
Suggestions
- The new test
test_executor_events_carry_created_at_timestampassertsgetattr(event, 'created_at', None) is not Nonebut does not verify the value is a valid Unix timestamp. Addassert event.created_at > 0to guard against zero or negative values being emitted. - Using
created_atdirectly removes the previous monotonic guarantee. If two events share the same float timestamp (e.g., rapid mapper calls on a low-resolution clock),_uiTimestampwill collide. Consider a tiebreaker:const uniqueTimestamp = eventTimestamp !== undefined ? Math.max(eventTimestamp, lastTimestamp) : Math.max(baseTimestamp, lastTimestamp + 1); - The deleted
test_custom_event_models_lack_created_at_fielddirectly verifiedcreated_atis present inmodel_fieldsfor both custom event classes. Consider adding a model-level assertion totest_mapper.pyor a dedicated model test so that accidentally removing the field produces a clear named failure rather than a downstreamValidationError. - Consider consolidating timestamp normalization into a single helper that reads
created_at, thendata.timestamp, then falls back to synthesis. This fixes the broader event-shape problem rather than only the custom executor-item path.
Automated review by moonbox3's agents
There was a problem hiding this comment.
Pull request overview
Fixes incorrect workflow step timings in the DevUI by ensuring executor-related streamed events include a usable timestamp and teaching the frontend to prefer it.
Changes:
- Add optional
created_atto DevUI custom output-item event models (response.output_item.added/done). - Populate
created_atfor executor invoked/completed/failed events in the Python DevUI mapper. - Update
workflow-view.tsxto preferevent.created_atover the synthetic+1stimestamp scheme, and add a regression test.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| python/packages/devui/tests/devui/test_mapper.py | Adds regression test asserting executor-mapped events always carry created_at. |
| python/packages/devui/frontend/src/components/features/workflow/workflow-view.tsx | Prefers created_at when generating _uiTimestamp for timeline/debug display. |
| python/packages/devui/agent_framework_devui/models/_openai_custom.py | Extends custom output-item event models with optional created_at. |
| python/packages/devui/agent_framework_devui/_mapper.py | Sets created_at=time.time() on executor output-item events. |
- Read data.timestamp (ISO string) and response.created_at in addition to top-level created_at when deriving _uiTimestamp, so response.workflow_event.completed events get a real server timestamp instead of a synthesized one - Change uniqueTimestamp tiebreaker: when a real server timestamp is available use Math.max(eventTimestamp, lastTimestamp) rather than lastTimestamp + 1, eliminating artificial 1-second gaps while still preserving monotonic ordering - Apply the same fix in the HIL streaming path (second setOpenAIEvents call in workflow-view.tsx) - Add assert event.created_at > 0 to regression test to guard against zero or negative timestamps - Add test_custom_output_item_event_models_have_created_at_field model- level test so removing the field produces a clear named failure rather than a downstream ValidationError Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 89%
✗ Correctness
The new timestamp extraction logic correctly adds
response.created_at(branch 2) anddata.timestamp(branch 3) fallbacks, and the test additions are sound. However, both instances of thedata.timestampbranch (lines 591 and 1022) produce a rawnew Date(...).getTime() / 1000that can beNaNwhen the ISO string is unparseable (e.g., Python microsecond precision without a timezone indicator in edge environments). BecauseNaN !== undefinedevaluates totrue, the subsequenteventTimestamp !== undefinedguard on lines 602–603 and 1033–1034 does not intercept it,Math.max(NaN, lastTimestamp)returnsNaN, and_uiTimestampis silently set toNaN, corrupting timeline ordering for every downstream consumer.
✗ Security Reliability
The multi-source timestamp extraction logic introduces a reliability hole in the
data.timestampbranch:new Date(anyInvalidString).getTime() / 1000silently producesNaN. BecauseNaN !== undefinedistrue, theeventTimestamp !== undefinedguard does not catch this, soMath.max(NaN, lastTimestamp)propagatesNaNas_uiTimestamp. This breaks timeline rendering for that event (execution-timeline.tsx line 260:event._uiTimestamp * 1000becomesNaN) and resets the monotonic ordering seed (NaN || 0 → 0) for all subsequent events. The same pattern appears in both duplicate blocks at lines 590–591 and 1021–1022. The Python test additions are correct and well-scoped.
✓ Test Coverage
The Python-side test additions are solid: the new
created_at > 0assertion on executor events is meaningful, and the model-field presence test guards against accidental field removal. The mapper does populatecreated_at=float(time.time())for both executor event types, so these tests will exercise real behaviour. One gap: the referenced failing test (test_workflow_timings_bug.py) is absent from the source tree (only a stale .pyc exists), meaning there is no test verifying the two new frontend timestamp-extraction paths (response.created_atanddata.timestamp). The frontend changes themselves are logically sound, but they are entirely untested. Additionally, thetest_custom_output_item_event_models_have_created_at_fieldtest does not verify that the mapper actually populatescreated_atwith a non-None value when constructing these events — it only checks the model schema. A companion integration-style assertion (similar totest_executor_events_carry_created_at_timestamp) would close that gap.
✗ Design Approach
I found one design-level regression in the frontend timing change. Removing the forced monotonic increment for real timestamps fixes the visible 1-second gaps, but the fallback workflow timeline still treats
_uiTimestampas part of a synthetic per-run identifier. That means same-second executor events can now collapse onto the same synthetic ID, which reintroduces incorrect merging of fallback runs and their output buckets.
Flagged Issues
- Lines 590–591 and 1021–1022:
new Date(isoString).getTime() / 1000yieldsNaNfor unparseable strings (e.g. Python'sdatetime.now().isoformat()with microsecond precision and noZ).NaN !== undefinedistrue, so theeventTimestamp !== undefinedguard does not catch it.Math.max(NaN, lastTimestamp)returnsNaN, setting_uiTimestamp: NaN, which breaks timeline rendering atexecution-timeline.tsx:260and silently resets the monotonic ordering seed atworkflow-view.tsx:597(NaN || 0 → 0). Guard the parsed value withNumber.isFinite()before use. - Line 603: replacing the forced monotonic increment with
Math.max(eventTimestamp, lastTimestamp)removes the uniqueness guarantee the fallback timeline depends on.execution-timeline.tsx:359–360buildssyntheticItemIdfromexecutorId + uiTimestamp, and keys run output/state off that ID at lines 367, 382, and 408. Two runs of the same executor within the same second now share an identical_uiTimestampand their fallback entries collide. Ordering and identity should be decoupled — usesequence_numberor a local counter to guarantee uniqueness.
Suggestions
- Add
test_workflow_timings_bug.pyto the source tree (currently only a stale.pycexists). Without it there is no regression guard for theresponse.created_atanddata.timestampfrontend extraction paths. - Extend
test_custom_output_item_event_models_have_created_at_field(or add a companion test) to assert thatmapper.convert_event(...)actually populatescreated_atwith a non-None, positive value — mirroring the pattern intest_executor_events_carry_created_at_timestamp.
Automated review by moonbox3's agents
…, add regression tests
- workflow-view.tsx (×2): Wrap data.timestamp ISO→number conversion in a
Number.isFinite() guard. Python's datetime.now().isoformat() emits
microseconds without a trailing 'Z' (e.g. '2024-01-15T12:34:56.123456'),
which some JS engines cannot parse, returning NaN. NaN !== undefined is
true so the eventTimestamp !== undefined guard did not catch it, poisoning
_uiTimestamp and resetting the monotonic ordering seed (NaN || 0 → 0).
- execution-timeline.tsx: Replace uiTimestamp in the fallback syntheticItemId
with the per-executor runNumber counter. Two runs of the same executor
within the same second previously received identical _uiTimestamp values
and therefore identical syntheticItemIds, causing their output buckets,
state, and run entries to collide (execution-timeline.tsx:360–408).
- Add missing test_workflow_timings_bug.py source file (only a stale .pyc
existed). Three regression tests:
· test_custom_event_models_lack_created_at_field – model field guard
· test_workflow_executor_events_lack_created_at – mapper populates created_at
· test_rapid_workflow_events_have_no_top_level_timestamps – confirms
data.timestamp format that requires the frontend NaN guard
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…imings in DevUI are incorrect
…ated bug file - Delete test_workflow_timings_bug.py; tests belong in existing module files - The two tests already present in test_mapper.py (test_executor_events_carry_created_at_timestamp and test_custom_output_item_event_models_have_created_at_field) cover the same ground as the first two tests in the deleted file - Add test_executor_completed_maps_to_output_item_done_event to test_mapper.py, replacing the third test from the deleted file with a generic, issue-agnostic name and docstring Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Workflow steps in DevUI appeared to take seconds apart even when they completed nearly instantly. This was caused by the frontend falling back to a synthetic timestamp that forced a minimum 1-second gap between events, because executor-related events emitted by the backend contained no
created_atfield.Fixes #5545
Description
The root cause was two-fold:
CustomResponseOutputItemAddedEventandCustomResponseOutputItemDoneEventhad nocreated_atfield, so the mapper emitted them without timestamps; andworkflow-view.tsxsynthesized_uiTimestampusingMath.max(Math.floor(Date.now()/1000), lastTimestamp + 1)— integer-second precision with a forced +1 s gap — never consultingevent.created_at. The fix adds an optionalcreated_at: floatfield to both event models, populates it withtime.time()in the mapper for all three executor event paths (invoked, completed, failed), and updates both call sites inworkflow-view.tsxto preferevent.created_atover the synthetic fallback. A regression test was added totest_mapper.pyto ensure executor-mapped events always carry acreated_attimestamp going forward.Contribution Checklist