Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up`max` with time window #383
Comments
This comment has been minimized.
This comment has been minimized.
|
It would probably look more like
Note that this will look at all 7d worth of data for every single timeseries involved. Could get slow-ish, but probably fine in tabular view. |
This comment has been minimized.
This comment has been minimized.
|
This would be a very useful feature. |
This comment has been minimized.
This comment has been minimized.
|
/cc @stuartnelson3 This would be a very good starter project in Prometheus if you're interested. You'd need to implement a new function in https://github.com/prometheus/prometheus/blob/master/rules/ast/functions.go. Actually, a set of functions to do min, max, avg, and median would be nice. Anyone have feelings what they should be called? Candidates (with avg as an example):
|
This comment has been minimized.
This comment has been minimized.
|
Min/max/mean/sum would be good, I'm not sure it's practical to implement median as it can't be done in constant space and if we add retention policies correctness could be a problem. Maybe a grammar construct rather than a new set of functions? In principle it should be all the same aggregation functions. Will there be any limitations as to how much data you can pull in? |
This comment has been minimized.
This comment has been minimized.
|
Ah yeah, median could be problematic. Would anyone need a sum over time? It isn't very meaningful, and the result completely depends on how many scrapes happened under a time window. Still, might be useful for meta-things like, "I have this series that's always 1, and I want to know how many data points there were for it in a given window of time". What would a grammar construct look like if it doesn't look like a function? Not sure about limits. I wasn't planning on putting any limits in upfront, but we could change that if it becomes necessary. |
This comment has been minimized.
This comment has been minimized.
|
I can imagine some advanced use cases for sum involving self-referential rules. Maybe: mean across(var) |
This comment has been minimized.
This comment has been minimized.
|
Hmm, I prefer a simpler grammar without space-separated multi-word ops. That is, except for the cases where each word adds optional functionality by itself, like in |
This comment has been minimized.
This comment has been minimized.
|
I'm considering in the above that 'across' is like 'by', rather than a multi-word op. I think having some orthogonality between cross-section and longitudinal aggergattions would make the system easier to use. |
This comment has been minimized.
This comment has been minimized.
|
But the "across" doesn't tell me linguistically whether it means averaging across series or across time. Just visually, I still strongly prefer functions/ops that are a single word without spaces... Less loose parts flying around in the expression. |
This comment has been minimized.
This comment has been minimized.
As it stands "sum" doesn't tell you that either, so another option is to have it serve both roles. |
This comment has been minimized.
This comment has been minimized.
|
Right, at least seeing both next to each other, it should be clear which one is which. With |
This comment has been minimized.
This comment has been minimized.
|
I'm not sure about moving_avg in particular, I'd expect that to return a matrix rather than a vector. avg_over_time is the only one that gets the right idea across to me, it still feels clunky though. Maybe poll the mailing list for ideas?
I think we should work towards what's the easier to learn and reason about as an end-user, unless implementation turns out to be particular thorny. |
This comment has been minimized.
This comment has been minimized.
|
I'd like a single-word construct more as an end-user as well. IMO the language is clearer the fewer loose (whitespace-separated) words there are. I am quite ok with |
This comment has been minimized.
This comment has been minimized.
|
It feels clunky due to the lack of orthogonality, that you treat matrices differently to vectors. Maybe we need to think more broadly - e.g. rate also has a time window so arguably follow the same pattern. I've no strong preference on underscores vs. camelCase, as long as we're consistent. |
This comment has been minimized.
This comment has been minimized.
|
But this is not primarily about treating matrices differently to vectors - the whole operation is a totally different kind of average operation. It's the average over time within each series vs. the average of all series at a fixed time. Also, the only reason why the existing aggregations are operators vs. functions is because they have extra grammatical clauses like |
This comment has been minimized.
This comment has been minimized.
The current aggregations appear to be functions, as they're a prefixes on a parenthesised expression. sort(... is the same as sum(... lexically, and they both take in a Vector and return a Vector. Putting arguments to a function call after the closing paren of the function call is rather odd, usually they'd prefix or be arguments. I think we should step back and analyse all the functions, and see how to best handle the categories of M->V, V->V and V->S generally to make sure that everything is internally self-consistent, and thus make the language more intuitive. For example whatever we do for sum_over_time() grammar/convention wise, we should also do for rate() as it too converts a matrix to a vector. |
This comment has been minimized.
This comment has been minimized.
|
+1 to consistency of nomenclature conveying the character of inputs and On Tue, May 27, 2014 at 9:55 AM, brian-brazil notifications@github.comwrote:
Key: 0xC42234343FDE43A8 |
This comment has been minimized.
This comment has been minimized.
|
On Tue, May 27, 2014 at 9:55 AM, brian-brazil notifications@github.comwrote:
As for putting the grouping labels after the closing function paren of the SELECT AVG(foo) FROM something GROUP BY bar, baz; More importantly, changing the structure of the existing avg/max/min/sum
|
This comment has been minimized.
This comment has been minimized.
If sum_over_time works over time, and sum works over a cross-section I'd expect rate to work over a cross-section (ignoring that such a computation makes no sense) and rate_over_time to work over time. |
This comment has been minimized.
This comment has been minimized.
Consider how you'd count the distinct number of labels: How do we make such a thing readable? I think that |
This comment has been minimized.
This comment has been minimized.
|
Ok, so for the
I see your point about the |
This comment has been minimized.
This comment has been minimized.
|
@u-c-l @bernerdschaefer @grobie @matttproud et. al. - maybe some of you have more opinions on this. |
This comment has been minimized.
This comment has been minimized.
|
Wow... you are already three levels higher up than me... :) WRT "by (...)": Syntactically, we could allow both orders, couldn't we? That would accommodate complex expressions as well as simple ones, but has a potential of being confusing.... Natural language comparisons are kind of dangerous, especially if you talk to German speakers... ;) [1] WRT to types of function arguments and return values: The ancient discussion if a function names should contain hints about that, or if you simply overload a function name (max(vector): across metrics, max(matrix): over time...). Some find overloading elegant, some find it a source of errors from hell... "xxx_over_time" sounds like a reasonable naming scheme to break ambiguities, but it should probably only be used if there actually is an ambiguity to break (e.g. rate() looks good to me as it is). It would then be a rule of thumb that in doubt, the "simpler" name refers to vectors (max(...) is for vectors, max_over_time(...) is for matrices). [1] Mark Twain: "An average sentence, in a German newspaper, is a sublime and impressive curiosity; it occupies a quarter of a column; it contains all the ten parts of speech -- not in regular order, but mixed; it is built mainly of compound words constructed by the writer on the spot, and not to be found in any dictionary -- six or seven words compacted into one, without joint or seam -- that is, without hyphens; it treats of fourteen or fifteen different subjects, each inclosed in a parenthesis of its own, with here and there extra parentheses which reinclose three or four of the minor parentheses, making pens within pens: finally, all the parentheses and reparentheses are massed together between a couple of king-parentheses, one of which is placed in the first line of the majestic sentence and the other in the middle of the last line of it -- after which comes the VERB, and you find out for the first time what the man has been talking about; and after the verb -- merely by way of ornament, as far as I can make out -- the writer shovels in "haben sind gewesen gehabt haben geworden sein," or words to that effect, and the monument is finished." |
This comment has been minimized.
This comment has been minimized.
|
@u-c-l:
As another reference point, this is how Graphite names average functions:
|
This comment has been minimized.
This comment has been minimized.
|
moving_avg works for avg as it's a common term that I'd expect users to have heard of. I'm not sure about moving_count, moving_sum or moving_max so I'd tend towards the over_time. Particularly as the moving only happens if you use it across different time periods, a given calculation doesn't have any movement at all (i.e. moving_avg implies returning a matrix). |
This comment has been minimized.
This comment has been minimized.
|
Right, |
This comment has been minimized.
This comment has been minimized.
|
I'd say we go with over_time for them all rather than complicating things. |
This comment has been minimized.
This comment has been minimized.
|
Fine with me for now. We can still easily add an alias later if we want. So basically, to summarize the feature request: build five new functions that each take a matrix argument and within each series, do aggregation over the matrix's time window:
|
This comment has been minimized.
This comment has been minimized.
|
that was a long read down here. I'll keep it short ;) +1 to the feature request |
This comment has been minimized.
This comment has been minimized.
|
Implemented in http://review.prometheus.io/#/c/348/ and waiting for review. |
juliusv
added a commit
that referenced
this issue
Jul 28, 2014
juliusv
closed this
Jul 31, 2014
juliusv
added a commit
that referenced
this issue
Nov 25, 2014
simonpasquier
pushed a commit
to simonpasquier/prometheus
that referenced
this issue
Oct 12, 2017
idavidmcdonald
referenced this issue
Aug 30, 2018
Merged
Add alerts for exceeding 80% disk space or CPU on registers apps #104
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 24, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
bernerdschaefer commentedMar 13, 2014
Given a gauge "instance_memory_usage_bytes" with a "service" label,
I want to find the maximum value of "instance_memory_usage_bytes" by service for a given time window.
That is,
max(instance_memory_usage_bytes) by (service)tells me the maximum value across timeseries at a given point in time. But something likemax(instance_memory_usage_bytes)[7d] by (service)would tell me the maximum value observed across timeseries during the last 7 days.