Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`max` with time window #383

Closed
bernerdschaefer opened this Issue Mar 13, 2014 · 31 comments

Comments

Projects
None yet
6 participants
@bernerdschaefer
Copy link
Contributor

bernerdschaefer commented Mar 13, 2014

Given a gauge "instance_memory_usage_bytes" with a "service" label,
I want to find the maximum value of "instance_memory_usage_bytes" by service for a given time window.

That is, max(instance_memory_usage_bytes) by (service) tells me the maximum value across timeseries at a given point in time. But something like max(instance_memory_usage_bytes)[7d] by (service) would tell me the maximum value observed across timeseries during the last 7 days.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Mar 13, 2014

It would probably look more like rate(), without supporting BY-clauses. Then you could do the same by doing:

max(max_over_time(instance_memory_usage_bytes[7d])) by (service)

Note that this will look at all 7d worth of data for every single timeseries involved. Could get slow-ish, but probably fine in tabular view.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

This would be a very useful feature.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

/cc @stuartnelson3 This would be a very good starter project in Prometheus if you're interested. You'd need to implement a new function in https://github.com/prometheus/prometheus/blob/master/rules/ast/functions.go. Actually, a set of functions to do min, max, avg, and median would be nice.

Anyone have feelings what they should be called? Candidates (with avg as an example):

  • moving_avg
  • range_avg
  • avg_range
  • avg_over_range
  • avg_over_time
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

Min/max/mean/sum would be good, I'm not sure it's practical to implement median as it can't be done in constant space and if we add retention policies correctness could be a problem.

Maybe a grammar construct rather than a new set of functions? In principle it should be all the same aggregation functions.

Will there be any limitations as to how much data you can pull in?

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

Ah yeah, median could be problematic. Would anyone need a sum over time? It isn't very meaningful, and the result completely depends on how many scrapes happened under a time window. Still, might be useful for meta-things like, "I have this series that's always 1, and I want to know how many data points there were for it in a given window of time".

What would a grammar construct look like if it doesn't look like a function?

Not sure about limits. I wasn't planning on putting any limits in upfront, but we could change that if it becomes necessary.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

I can imagine some advanced use cases for sum involving self-referential rules.

Maybe:

mean across(var)
mean over(var)
across(var) do mean
across(var) calculate mean

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

Hmm, I prefer a simpler grammar without space-separated multi-word ops. That is, except for the cases where each word adds optional functionality by itself, like in sum(...) by (...). We also need to make the time aspect clear to differentiate these new functions/ops from the existing aggregation ops that do vertical aggregation over multiple timeseries.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

I'm considering in the above that 'across' is like 'by', rather than a multi-word op.

I think having some orthogonality between cross-section and longitudinal aggergattions would make the system easier to use.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

But the "across" doesn't tell me linguistically whether it means averaging across series or across time. Just visually, I still strongly prefer functions/ops that are a single word without spaces... Less loose parts flying around in the expression.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

But the "across" doesn't tell me linguistically whether it means averaging across series or across time.

As it stands "sum" doesn't tell you that either, so another option is to have it serve both roles.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

Right, at least seeing both next to each other, it should be clear which one is which. With across this wouldn't be the case, but if I saw avg and moving_avg I'd at least have a chance to infer which one is for averaging over time vs. over series. And as I said, I want to avoid dangling words if possible. Plus, it's technically easier because you just need to add a function and not add more complexity to the grammar.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

I'm not sure about moving_avg in particular, I'd expect that to return a matrix rather than a vector. avg_over_time is the only one that gets the right idea across to me, it still feels clunky though.

Maybe poll the mailing list for ideas?

Plus, it's technically easier because you just need to add a function and not add more complexity to the grammar.

I think we should work towards what's the easier to learn and reason about as an end-user, unless implementation turns out to be particular thorny.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

I'd like a single-word construct more as an end-user as well. IMO the language is clearer the fewer loose (whitespace-separated) words there are. I am quite ok with avg_over_time then, actually. Does it feel clunky mainly because of underscores? We'd still have an opportunity to switch to camelCase (like Graphite's function names), but mixing that with underscores in metric names would be weird as well, right?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 26, 2014

It feels clunky due to the lack of orthogonality, that you treat matrices differently to vectors. Maybe we need to think more broadly - e.g. rate also has a time window so arguably follow the same pattern.

I've no strong preference on underscores vs. camelCase, as long as we're consistent.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 26, 2014

But this is not primarily about treating matrices differently to vectors - the whole operation is a totally different kind of average operation. It's the average over time within each series vs. the average of all series at a fixed time. Also, the only reason why the existing aggregations are operators vs. functions is because they have extra grammatical clauses like by and keeping_extra which don't exist conceptually for aggregations over time within each series...

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 27, 2014

Also, the only reason why the existing aggregations are operators vs. functions is because they have extra grammatical clauses like by and keeping_extra which don't exist conceptually for aggregations over time within each series...

The current aggregations appear to be functions, as they're a prefixes on a parenthesised expression. sort(... is the same as sum(... lexically, and they both take in a Vector and return a Vector. Putting arguments to a function call after the closing paren of the function call is rather odd, usually they'd prefix or be arguments.

I think we should step back and analyse all the functions, and see how to best handle the categories of M->V, V->V and V->S generally to make sure that everything is internally self-consistent, and thus make the language more intuitive. For example whatever we do for sum_over_time() grammar/convention wise, we should also do for rate() as it too converts a matrix to a vector.

@matttproud

This comment has been minimized.

Copy link
Member

matttproud commented May 27, 2014

+1 to consistency of nomenclature conveying the character of inputs and
outputs.

On Tue, May 27, 2014 at 9:55 AM, brian-brazil notifications@github.comwrote:

Also, the only reason why the existing aggregations are operators vs.
functions is because they have extra grammatical clauses like by and
keeping_extra which don't exist conceptually for aggregations over time
within each series...

The current aggregations appear to be functions, as they're a prefixes on
a parenthesised expression. sort(... is the same as sum(... lexically, and
they both take in a Vector and return a Vector. Putting arguments to a
function call after the closing paren of the function call is rather odd,
usually they'd prefix or be arguments.

I think we should step back and analyse all the functions, and see how to
best handle the categories of M->V, V->V and V->S generally to make sure
that everything is internally self-consistent, and thus make the language
more intuitive. For example whatever we do for sum_over_time()
grammar/convention wise, we should also do for rate() as it too converts a
matrix to a vector.


Reply to this email directly or view it on GitHubhttps://github.com//issues/383#issuecomment-44245182
.

Key: 0xC42234343FDE43A8
Fingerprint: 6945 9248 4362 68CB EA83 F1EA C422 3434 3FDE 43A8

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 27, 2014

On Tue, May 27, 2014 at 9:55 AM, brian-brazil notifications@github.comwrote:

The current aggregations appear to be functions, as they're a prefixes on
a parenthesised expression. sort(... is the same as sum(... lexically, and
they both take in a Vector and return a Vector. Putting arguments to a
function call after the closing paren of the function call is rather odd,
usually they'd prefix or be arguments.

Ok, my terminology of "function" and "operator" is probably off, so the
only thing I meant to say here is that the existing aggregation
operators/functions need to be treated specially by the grammar because it
is possible to specify grouping labels for these and passing those as
normal function arguments would be weird.

As for putting the grouping labels after the closing function paren of the
aggregation function, there's precedent for that at least in SQL:

SELECT AVG(foo) FROM something GROUP BY bar, baz;

More importantly, changing the structure of the existing avg/max/min/sum
would be quite costly at this stage. Everyone at SC is already using it.

I think we should step back and analyse all the functions, and see how to
best handle the categories of M->V, V->V and V->S generally to make sure
that everything is internally self-consistent, and thus make the language
more intuitive. For example whatever we do for sum_over_time()
grammar/convention wise, we should also do for rate() as it too converts a
matrix to a vector.

Ok, help me understand the problem here. Both sum_over_time() and rate()
would already be structurally the same (single word function name with one
matrix argument, returning a vector). So is this only about the name or am
I missing more?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 27, 2014

Ok, help me understand the problem here. Both sum_over_time() and rate()
would already be structurally the same (single word function name with one
matrix argument, returning a vector). So is this only about the name or am
I missing more?

If sum_over_time works over time, and sum works over a cross-section I'd expect rate to work over a cross-section (ignoring that such a computation makes no sense) and rate_over_time to work over time.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 27, 2014

As for putting the grouping labels after the closing function paren of the
aggregation function, there's precedent for that at least in SQL:

Consider how you'd count the distinct number of labels:
count(sum(some really long and complex expression) by (labelA)) by (labelB)

How do we make such a thing readable?

I think that
count by (labelB) (sum by (labelA) (some really long and complex expression)
would be easier to read and cause less confusion. We'll likely also want by clauses on the binary operators at some point.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 27, 2014

Ok, so for the rate vs. sum_over_time, the issue seems to be only naming. I think purity is again at odds with practicality here:

  • we need a really good reason to break existing users, and I still don't see this as being important enough for that
  • a rate is always over time, so rate_over_time would be redundant
  • for very frequently used functions like rate and sum, it's also just nice and user-friendly to have short names, both for typing and for not blowing up expression lengths

I see your point about the BY-clause position in nested expressions. On the other hand, we initially chose the sum(...) by (...) ordering to be more readable to humans (/cc @matttproud was in favor of that), because in natural language you wouldn't say, "sum by labels these series", but "sum these series by labels". So besides not breaking existing users, there's also this trade-off between better readability in simple expressions vs. in nested expressions.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 27, 2014

@u-c-l @bernerdschaefer @grobie @matttproud et. al. - maybe some of you have more opinions on this.

@beorn7

This comment has been minimized.

Copy link
Member

beorn7 commented May 27, 2014

Wow... you are already three levels higher up than me... :)

WRT "by (...)": Syntactically, we could allow both orders, couldn't we? That would accommodate complex expressions as well as simple ones, but has a potential of being confusing.... Natural language comparisons are kind of dangerous, especially if you talk to German speakers... ;) [1]

WRT to types of function arguments and return values: The ancient discussion if a function names should contain hints about that, or if you simply overload a function name (max(vector): across metrics, max(matrix): over time...). Some find overloading elegant, some find it a source of errors from hell... "xxx_over_time" sounds like a reasonable naming scheme to break ambiguities, but it should probably only be used if there actually is an ambiguity to break (e.g. rate() looks good to me as it is). It would then be a rule of thumb that in doubt, the "simpler" name refers to vectors (max(...) is for vectors, max_over_time(...) is for matrices).
I'm kind of reluctant to look for a solution deeper in the grammar of the language. Functions that only work for certain types of arguments is something most users are very familiar with (and thanks to Java/C++, many even find if natural if the same function behaves differently depending on the type of the argument...).

[1] Mark Twain: "An average sentence, in a German newspaper, is a sublime and impressive curiosity; it occupies a quarter of a column; it contains all the ten parts of speech -- not in regular order, but mixed; it is built mainly of compound words constructed by the writer on the spot, and not to be found in any dictionary -- six or seven words compacted into one, without joint or seam -- that is, without hyphens; it treats of fourteen or fifteen different subjects, each inclosed in a parenthesis of its own, with here and there extra parentheses which reinclose three or four of the minor parentheses, making pens within pens: finally, all the parentheses and reparentheses are massed together between a couple of king-parentheses, one of which is placed in the first line of the majestic sentence and the other in the middle of the last line of it -- after which comes the VERB, and you find out for the first time what the man has been talking about; and after the verb -- merely by way of ornament, as far as I can make out -- the writer shovels in "haben sind gewesen gehabt haben geworden sein," or words to that effect, and the monument is finished."

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 28, 2014

@u-c-l:

  • WRT by: yes, I guess we could actually allow both orders. It means a slightly more complicated grammar and potentially confused users, but I could see it being worth it. Then again, that's what the C++ committee probably thought at each step along the way too (http://www.ustream.tv/recorded/47947981) :P In all seriousness, it's probably fine.
  • WRT to function naming, it sounds like we agree.

As another reference point, this is how Graphite names average functions:

  • avg (long: averageSeries) to average multiple series together
  • movingAverage to do what we discussed with avg_over_time. I think I really like the terminology "moving average" much better than "average over time", since the former is the common name for it in statistics. So maybe moving_avg?
@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 28, 2014

moving_avg works for avg as it's a common term that I'd expect users to have heard of. I'm not sure about moving_count, moving_sum or moving_max so I'd tend towards the over_time. Particularly as the moving only happens if you use it across different time periods, a given calculation doesn't have any movement at all (i.e. moving_avg implies returning a matrix).

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 28, 2014

Right, moving_(sum|min|max|count) etc. wouldn't be as clear. If we allowed multiple aliases for each function, we could have both avg_over_time and moving_avg for averages, and only (sum|min|max|count)_over_time for the others.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented May 28, 2014

I'd say we go with over_time for them all rather than complicating things.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented May 28, 2014

Fine with me for now. We can still easily add an alias later if we want.

So basically, to summarize the feature request: build five new functions that each take a matrix argument and within each series, do aggregation over the matrix's time window:

  • avg_over_time
  • sum_over_time
  • min_over_time
  • max_over_time
  • count_over_time
@grobie

This comment has been minimized.

Copy link
Member

grobie commented May 28, 2014

that was a long read down here. I'll keep it short ;)

+1 to the feature request
+1 to the summary
-1 to aliases

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Jul 28, 2014

Implemented in http://review.prometheus.io/#/c/348/ and waiting for review.

juliusv added a commit that referenced this issue Jul 28, 2014

Add abs() and over-time aggregation functions.
This implements aggregation functions over time as request in
#383.

Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e

@juliusv juliusv closed this Jul 31, 2014

juliusv added a commit that referenced this issue Nov 25, 2014

Add abs() and over-time aggregation functions.
This implements aggregation functions over time as request in
#383.

Change-Id: Ifd69b850de8cfdf6e7a6c0e042056fa4c672410e

simonpasquier pushed a commit to simonpasquier/prometheus that referenced this issue Oct 12, 2017

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.