[Data] Add scheduling-loop max metric to DatasetStatsSummary#63345
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces percentile and average tracking for dataset execution statistics, specifically for the streaming scheduling loop duration, and adds optional py-spy profiling support to the worker scaling benchmark. The code review identified several critical issues: returning infinity instead of zero when no samples exist will cause crashes in formatting utilities due to overflow errors; storing all timing samples in an unbounded list creates a memory leak risk; and changing the scheduling duration metric from a total to an average breaks existing wall-time breakdown calculations. Additionally, the reviewer noted that sorting the sample list on every percentile calculation is inefficient and suggested using more performant data structures or caching.
7a5ccc9 to
622cafb
Compare
622cafb to
a68bf78
Compare
a68bf78 to
e9366e2
Compare
77d7af2 to
49300ca
Compare
49300ca to
8a150e0
Compare
Adds two new fields to ``DatasetStatsSummary``:
- ``streaming_exec_schedule_avg_s``: per-iteration average
(sourced from ``Timer.avg()``)
- ``streaming_exec_schedule_max_s``: per-iteration max
(sourced from the already-tracked ``Timer.max()``)
``streaming_exec_schedule_s`` keeps its existing meaning (total
wall-clock time across all scheduler iterations) so the
``runtime_metrics()`` breakdown — which divides this value by
total_wall_time to compute a percentage — remains correct.
Memory stays O(1) per Timer; no sample buffer.
Release-test helper ``collect_dataset_stats`` is updated to surface
the per-iteration values as ``avg_/max_scheduling_loop_duration_s``.
Zero-iteration safety: the previous ``if self.streaming_exec_schedule_s``
guard at the build site was a dead check (``Timer()`` is always
truthy). With no scheduler iterations recorded (e.g. non-streaming
execution), ``Timer.avg()`` returns ``float("inf")``, which would
break JSON serialization in release-test output and produce nonsense
in ``runtime_metrics()``. Replaced with an explicit
``_total_count > 0`` check that returns 0 for all three fields when
no samples are present.
Rationale: total scheduler time scales with run duration and is hard
to compare across runs of different lengths. The new avg/max fields
give per-iteration quantities that directly reflect scheduler
efficiency, without breaking the existing total-based breakdown.
Signed-off-by: xgui <xgui@anyscale.com>
8a150e0 to
a1fd253
Compare
Three small follow-ups to the review thread, none requiring a guard
or private-attribute access at the call site:
1. ``Timer.avg()`` now returns 0 when no samples have been recorded
(was ``float("inf")``). Matches the zero-sample semantics of
``Timer.get()`` and ``Timer.max()``, which both return 0. The
previous ``inf`` return broke JSON serialization downstream and
forced consumers to special-case the empty path.
2. ``StreamingExecutor._generate_stats`` now assigns ``Timer()`` (an
empty Timer) instead of ``None`` when ``_initial_stats`` is falsy.
The type annotation on the field is ``Timer``, not
``Optional[Timer]``; this makes runtime match the annotation.
3. The ``DatasetStats.to_summary`` call site drops the
``schedule_timer._total_count > 0`` guard. It now just calls
``.get()`` / ``.avg()`` / ``.max()`` directly — the Timer is
always present (per #2) and the methods all return 0 for an
empty Timer (per #1).
The pickle PR's reviewers asked for both:
- "not access ``schedule_timer._total_count``"
- "make ``streaming_exec_schedule_s`` always non-None"
This commit does both, plus the Timer.avg() consistency fix that
makes the simplified call site safe.
Signed-off-by: xgui <xgui@anyscale.com>
Reverts the global ``Timer.avg() -> 0`` change from the prior commit.
The previous behavior of returning ``float("inf")`` for an empty Timer
is retained as the default to preserve the "undefined" signal for
display callers (``fmt(timer.avg())`` renders it as ``"inf s"``).
The summary build site that needs a JSON-safe / arithmetic-safe value
opts out explicitly:
schedule_timer.avg(default=0.0)
Only the schedule-Timer call site uses the override. All other
``Timer.avg()`` callers (the iterator-side metrics) keep the existing
``inf`` semantics unchanged.
Signed-off-by: xgui <xgui@anyscale.com>
Reverts the ``default=`` kwarg added in the previous commit; call
``avg()`` directly with its existing signature. ``StreamingExecutor.
_generate_stats`` always assigns a ``Timer``, and downstream
consumers of ``streaming_exec_schedule_avg_s`` can interpret the
``float('inf')`` empty-Timer signal the same way they already do for
the other ``Timer.avg()`` call sites.
Signed-off-by: xgui <xgui@anyscale.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 29ba77e. Configure here.

Description
Adds two new fields to
DatasetStatsSummary:streaming_exec_schedule_sTimer.get()(unchanged)streaming_exec_schedule_avg_s(new)Timer.avg()streaming_exec_schedule_max_s(new)Timer.max()The existing
streaming_exec_schedule_skeeps its total-duration meaning soruntime_metrics()(which divides it bytotal_wall_timeto compute a percentage) continues to produce a correct breakdown. Per-iteration values are exposed under separate field names.Memory stays O(1) per Timer — no sample buffer;
_maxis already tracked.Tested using https://buildkite.com/ray-project/release/builds/92867#019e28c9-c8ad-4fb3-9776-c89f52f93200