Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cumulative_sum() function #7388

Merged
merged 1 commit into from
Oct 7, 2016
Merged

Conversation

jsternberg
Copy link
Contributor

@jsternberg jsternberg commented Sep 30, 2016

The cumulative_sum() function can be used to sum each new point and
output the current total. For the following points:

cpu value=2 0
cpu value=4 10
cpu value=6 20

This would output the following points:

> SELECT cumulative_sum(value) FROM cpu
time    value
----    -----
0       2
10      6
20      12

As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.

The function can also be used with an aggregate like derivative().

> SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)

@jsternberg
Copy link
Contributor Author

@jwilder @nathanielc

@jwilder
Copy link
Contributor

jwilder commented Oct 3, 2016

Can you update the description with more details about the use of this function?

@jwilder jwilder added this to the 1.1.0 milestone Oct 3, 2016
@jsternberg
Copy link
Contributor Author

Updated the description.

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

I don't understand why we are skipping duplicate values. Why is that part of a cumulative sum? Do any other reducers skip duplicates?

}

func (r *FloatCumulativeSumReducer) AggregateFloat(p *FloatPoint) {
if !r.curr.Nil && r.curr.Time == p.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this logic for? Skipping points in the sum that have the same time? What is the use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other stream processors will skip points at the same time such as derivative() and difference() so I had this one do it too. It's to avoid having it emit multiple values at the same time. I can remove this though. I don't think I have the same thing with moving_average() even though it's also a stream processor. Which do you think is more correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't skip points in this case. The reason that derivative does it is it needs to use the elapsed time and it would create a divide by 0 otherwise. Since this is just a sum I would rather it simply sum the values and nothing else, similar to moving_avg.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would concur with @nathanielc's logic here, if this is meant to be a 'wrapper' function to something like derivative anyway, let the derivative handle the dupes accordingly and simply sum up the reseults it presents. That should keep things easy.

// FloatCumulativeSumReducer cumulates the values from each point.
type FloatCumulativeSumReducer struct {
curr FloatPoint
emitted bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emitted doesn't seem to be used anywhere, did I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emitted is used to skip points at the same time value. It is used in Emit().

@jsternberg
Copy link
Contributor Author

@nathanielc I updated this with your recommendation. Can you take another look?

Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks

The `cumulative_sum()` function can be used to sum each new point and
output the current total. For the following points:

    cpu value=2 0
    cpu value=4 10
    cpu value=6 20

This would output the following points:

    > SELECT cumulative_sum(value) FROM cpu
    time    value
    ----    -----
    0       2
    10      6
    20      12

As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.

The function can also be used with an aggregate like `derivative()`.

    > SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
@jsternberg jsternberg merged commit e955d16 into master Oct 7, 2016
@jsternberg jsternberg deleted the js-cumulative-sum-over-time branch October 7, 2016 20:43
@jwilder jwilder mentioned this pull request Oct 25, 2016
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants