Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arithmetic in aggregations #622

Closed
grobie opened this Issue Apr 3, 2015 · 15 comments

Comments

Projects
None yet
4 participants
@grobie
Copy link
Member

grobie commented Apr 3, 2015

I come across use cases from time to time where I'd love being able to do some arithmetic in by clauses of aggregations. For example:

count by (> 64) sum(process_rss_bytes) by (app, proc)) / 1024 / 1024 / 1024)

In this example (with an exemplary syntax) I want to plot the number of services below and above a certain threshold. It's possible to construct graphs roughly showing the desired data with at least promdash, but it'd be convenient to have first class support for that. For me at least ;), what do people think?

@brian-brazil @beorn7

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 3, 2015

That expression has unbalanced brackets. You can already mostly do that via:

count (
  sum by (app, proc)(process_rss_bytes) > 64
)

What you want is a way for the filtering functions to return 1/0 instead of filtering, and then an aggregation function to group things by value. Then it'd look something like:

count_values("output_label",  sum by (app, proc)(process_rss_bytes) > bool 64)
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 3, 2015

See #1049

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 3, 2015

We still need a count_values function to complete this request.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 11, 2015

Actually count_values should be an aggreagator

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 24, 2015

Hmm, this will need to be passed a parameter to know what the output labelname is. We don't currently have those for aggregators.

Possible syntaxes:

count_values by (shard) into labelname (expression)
count_values into labelname by (shard)(expression)
count_values(labelname) by (shard) (expression)
count_values by (shard)(expression)   # Hardcode to 'value'

@fabxc Thoughts?

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 24, 2015

In general I expect most people being happy with "value" and it would keep the language simpler.
For the rare cases where one doesn't value, relabeling will work.

Then again topk and bottomk are arguably aggregators too. If we change those, we need parameterization anyway. into is not a good general purpose fit and we don't want to introduce new keywords for single features.

So we might just as well allow parameters for count_values too.
This would give us a consistent syntax:

count_values(<labelname>) by(<shard>)(<expr>)
topk(<num>) by(<shard>)(<expr>)
bottomk(<num>) by(<shard>)(<expr>)
@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Sep 24, 2015

The current reason why topk() / bottomk() aren't just aggregators is that they select (filter) elements, instead of aggregating. That is, the N bottom/top elements are outputted as is (with all the labels of the individual elements). One could imagine those functions having a by clause, but then that selection vs. aggregation behavior would still need to be maintained. That's also why max(foo) is not equivalent to topk(1, foo).

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 24, 2015

Another option is

count_values by (shard)(labelname, expression)

which makes more sense in the topk/bottomk case.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Sep 26, 2015

Makes sense and would also not make the current expressions invalid.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Sep 26, 2015

Yeah, I think the last one is probably the best.

@brian-brazil brian-brazil added this to the v1.0.0 milestone May 2, 2016

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 8, 2016

@brian-brazil you flagged this for 1.0 – will this be breaking?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Jun 8, 2016

This isn't breaking, just a new aggregator.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 8, 2016

Can we remove the 1.0 milestone then? I'd like to focus on getting that
release out of the door.

On Wed, Jun 8, 2016 at 12:16 PM Brian Brazil notifications@github.com
wrote:

This isn't breaking, just a new aggregator.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#622 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AEuA8mGpggjw5-Gqs1Jl8zMCAjARgzsmks5qJpZjgaJpZM4D5jI0
.

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Jun 24, 2016

Removing from the milestone – cool if we get it in but no requirement.

@fabxc fabxc removed this from the v1.0.0 milestone Jun 24, 2016

brian-brazil added a commit that referenced this issue Jul 5, 2016

Add count_values() aggregator.
This is useful for counting how many instances
of a job are running a particular version/build.

Fixes #622

@fabxc fabxc closed this in #1793 Jul 8, 2016

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.