fix(workflows): correct values_difference aggregation in Data Aggregator#2388
Open
madhavcodez wants to merge 1 commit into
Open
fix(workflows): correct values_difference aggregation in Data Aggregator#2388madhavcodez wants to merge 1 commit into
madhavcodez wants to merge 1 commit into
Conversation
ValuesDifferenceState locked the first observed value into the min slot and the second into the max slot and never cross-compared them, so values_difference could be negative or wrong (e.g. [10,1] gave -9, [10,1,5] gave 0). Seed both min and max from the first value and update both on every observation so it always equals max-min. Adds unit tests for the aggregation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's broken
The Data Aggregator block's
values_differenceaggregation is documented as "Calculate difference between max and min value observed" (data_aggregator/v1.py), but it returns wrong — sometimes negative — values.ValuesDifferenceState.on_datalocked the first observed value into theminslot and the second into themaxslot, and never cross-compared them:So:
values_difference[10, 1][10, 1, 5][5, 4, 3, 2, 1]A max−min difference can never be negative, yet this produced
−9.The clearest demonstration is internal self-contradiction: feeding
10, 1, 5, 5withaggregation_mode={"speed": ["values_difference", "max", "min"]}, the block returns— it reports
max=10andmin=1butvalues_difference=0, in a single result.Root cause
The two-slot bootstrap assigned values to
min/maxby arrival order instead of by comparison, so the first value was never eligible to become the max and the second was never eligible to become the min.The fix
Seed both
minandmaxfrom the first observation, then update both on every subsequent value:This matches the idiom already used by the sibling
MaxState/MinStateclasses in the same file. No other aggregation mode is affected.Behavior change to note
A window with a single observed value now returns
0(wasNone).0is the correct max−min for one point and is consistent withmax/min/avg, which already return a value for a single observation; the oldNonewas an artifact of the broken two-slot seeding, not an intentional "insufficient data" signal.Noneis still returned when no values were observed.Tests
Added
tests/workflows/unit_tests/core_steps/analytics/test_data_aggregator_v1.py(11 tests): the regression cases above, all-equal / negative / float ranges, single-value (0) and empty (None), and an end-to-end test throughDataAggregatorBlockV1.run()assertingvalues_difference == max − minfor[10, 1, 5, 5].The block-level test exercises the full aggregation path. Happy to add a full workflow integration test under
tests/workflows/integration_tests/execution/(extendingtest_workflow_with_data_aggregation.pywith avalues_differenceassertion) if you'd prefer that coverage as well.