Skip to content

fix(workflows): correct values_difference aggregation in Data Aggregator#2388

Open
madhavcodez wants to merge 1 commit into
roboflow:mainfrom
madhavcodez:fix/data-aggregator-values-difference
Open

fix(workflows): correct values_difference aggregation in Data Aggregator#2388
madhavcodez wants to merge 1 commit into
roboflow:mainfrom
madhavcodez:fix/data-aggregator-values-difference

Conversation

@madhavcodez
Copy link
Copy Markdown

What's broken

The Data Aggregator block's values_difference aggregation is documented as "Calculate difference between max and min value observed" (data_aggregator/v1.py), but it returns wrong — sometimes negative — values.

ValuesDifferenceState.on_data locked the first observed value into the min slot and the second into the max slot, and never cross-compared them:

def on_data(self, value):
    if self._min_value is None:
        self._min_value = value      # 1st value -> min slot only
        return None
    if self._max_value is None:
        self._max_value = value      # 2nd value -> max slot only, no comparison
        return None
    self._min_value = min(self._min_value, value)
    self._max_value = max(self._max_value, value)

So:

Observed values values_difference Correct (max − min)
[10, 1] −9 9
[10, 1, 5] 0 9
[5, 4, 3, 2, 1] 3 4

A max−min difference can never be negative, yet this produced −9.

The clearest demonstration is internal self-contradiction: feeding 10, 1, 5, 5 with aggregation_mode={"speed": ["values_difference", "max", "min"]}, the block returns

{'speed_values_difference': 0, 'speed_max': 10, 'speed_min': 1}

— it reports max=10 and min=1 but values_difference=0, in a single result.

Root cause

The two-slot bootstrap assigned values to min/max by arrival order instead of by comparison, so the first value was never eligible to become the max and the second was never eligible to become the min.

The fix

Seed both min and max from the first observation, then update both on every subsequent value:

def on_data(self, value):
    if self._min_value is None:
        self._min_value = value
        self._max_value = value
        return None
    self._min_value = min(self._min_value, value)
    self._max_value = max(self._max_value, value)

This matches the idiom already used by the sibling MaxState / MinState classes in the same file. No other aggregation mode is affected.

Behavior change to note

A window with a single observed value now returns 0 (was None). 0 is the correct max−min for one point and is consistent with max/min/avg, which already return a value for a single observation; the old None was an artifact of the broken two-slot seeding, not an intentional "insufficient data" signal. None is still returned when no values were observed.

Tests

Added tests/workflows/unit_tests/core_steps/analytics/test_data_aggregator_v1.py (11 tests): the regression cases above, all-equal / negative / float ranges, single-value (0) and empty (None), and an end-to-end test through DataAggregatorBlockV1.run() asserting values_difference == max − min for [10, 1, 5, 5].

pytest tests/workflows/unit_tests/core_steps/analytics/test_data_aggregator_v1.py -v
# 11 passed

The block-level test exercises the full aggregation path. Happy to add a full workflow integration test under tests/workflows/integration_tests/execution/ (extending test_workflow_with_data_aggregation.py with a values_difference assertion) if you'd prefer that coverage as well.

ValuesDifferenceState locked the first observed value into the min slot and the second into the max slot and never cross-compared them, so values_difference could be negative or wrong (e.g. [10,1] gave -9, [10,1,5] gave 0). Seed both min and max from the first value and update both on every observation so it always equals max-min. Adds unit tests for the aggregation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant