Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Discuss types of rates #62375

Closed
wylieconlon opened this issue Apr 2, 2020 · 20 comments
Closed

[Lens] Discuss types of rates #62375

wylieconlon opened this issue Apr 2, 2020 · 20 comments
Labels
discuss Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects

Comments

@wylieconlon
Copy link
Contributor

wylieconlon commented Apr 2, 2020

Edit: This discussion is not focused specifically on the "average event rate", it's turned into a more general discussion of rates. Keeping the old discussion for posterity. Discussion about a generic rate function begins at this comment: #62375 (comment)


Average event rate lets users represent very small time intervals in their visualizations without building slow queries, and is common in timeseries use cases. For example, if the user wants to visualize the average sales per second, they can either build a date histogram with an interval of 1 second, or build a date histogram with a larger interval and then use an average event rate aggregation.

The definition of average event rate is: the average number of documents added, divided by the time interval, multiplied into a target interval in units of milliseconds, seconds, hours, etc.

This aggregation is already possible to do by using clever logic in TSVB, but we can simplify this and make it more widely available to handle time series use cases in other tools in Kibana.

For users, this metric will appear in two ways

  • When the index pattern has a timefield, the user can build a metric or gauge visualization using average event rate.
  • When using a date histogram, the user can choose an interval smaller than the histogram interval to calculate the average

One of the important parts about this calculation is to handle timezones and leap seconds.

@wylieconlon wylieconlon added Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:TSVB TSVB (Time Series Visual Builder) Team:AppArch Feature:Lens labels Apr 2, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@kibanamachine kibanamachine added this to Long-term goals in Lens Apr 2, 2020
@kibanamachine kibanamachine added this to To triage in kibana-app-arch Apr 2, 2020
@lukeelmers lukeelmers moved this from To triage to Long Horizon in kibana-app-arch May 5, 2020
@lukeelmers lukeelmers moved this from To triage to Long Horizon in kibana-app-arch May 5, 2020
@wylieconlon wylieconlon moved this from Long-term goals to 7.10 in Lens May 28, 2020
@flash1293 flash1293 moved this from 7.10 to Long-term goals in Lens Jul 21, 2020
@flash1293 flash1293 added the enhancement New value added to drive a business result label Aug 6, 2020
@flash1293 flash1293 moved this from Long-term goals to Next minors in Lens Aug 6, 2020
@wylieconlon
Copy link
Contributor Author

This aggregation may be added to Elasticsearch as a convenience: elastic/elasticsearch#60674

@wylieconlon
Copy link
Contributor Author

@cchaos @AlonaNadler here are two UI concepts that I have come up with for this aggregation, to show how we could support this:

As a rate option on the Count metric:

Kapture 2020-08-13 at 15 01 12

As a separate function:

Kapture 2020-08-13 at 15 12 02

@AlonaNadler
Copy link

I think I prefer to have it as a dedicated function for discoverability. It also doesn't always clear whats the relationship with count .
Sounds like event rate only requires time scaling.

@wylieconlon
Copy link
Contributor Author

@AlonaNadler the relationship with count is that "event rate" is something like "count per second". We definitely want a separate function for positive rate. You're saying you would prefer to have both "Positive rate" and "Event rate" as separate functions?

@wylieconlon wylieconlon changed the title Average event rate aggregation Average event rate aggregation (count per second) Aug 14, 2020
@flash1293 flash1293 moved this from Lens by default to Long-term goals in Lens Aug 17, 2020
@wylieconlon
Copy link
Contributor Author

The new rate aggregation was just merged into Elasticsearch, and supports two options:

  • Event rate (example: Count per second)
  • Rate of a field based on the sum of values (Example: Average quantity sold per day over a month)

I can see both of these being valuable, but in my mind they have separate names. My personal preference is to expose this in two ways:

  1. I would prefer having a rate option for the Count metric
  2. For the "rate of a field", I would add a new function named "Average rate"

We can't begin work on this until it's supported in esaggs, so it's currently blocked.

@AlonaNadler
Copy link

What if we had a general rate function.
If there is already a field in the configurator then Lens does rate of the field. E,g, rate of bytes
If there is no field it does rate of events
If there is a field user can still choose to select rate of events and in that case it removes the field from that dimension in the configuration.

The idea behind it is that most users don't know and wouldn't be able to tell the difference between the rate the way we do in Elastic and they shouldn't need to

@wylieconlon
Copy link
Contributor Author

wylieconlon commented Aug 25, 2020

@AlonaNadler The idea you're proposing is possible from a technical perspective. Based on the use cases I've analyzed for rates, I am not sure that the there is a "general" rate like you are asking about. There are specific types of rates based on the data:

  1. Count per second
  2. Average rate per second of a field
  3. Positive rate per second of a field that is always increasing
  4. Growth rate from previous interval

So it's definitely possible to combine all of these types of rates into a single function, but we will need to make the user choose one of the 4 options.

I think that we would benefit here from real data and examples. I'm planning on writing up example data for each of these 4 options, unless you'd prefer to take this on @AlonaNadler. I also think we are missing clarity from @cchaos on the different options and how we want to present them to the user.

@AlonaNadler
Copy link

Sounds good Wylie, please focus on the first 3. Growth might be considered as a rate but it shouldn't. The rate function addressing mostly our observability and metrics users. Exploring online I see several ways it is being calculated, none of them though directly correspond to the top 3 you have above.
Since most users are not familiar with this semantic, our goal is to simplify that for them, choose the right default. Based on your research let's try to see how we can have all first 3 items coupled together under one rate function

@wylieconlon
Copy link
Contributor Author

Okay, the next step is to get a mockup from @cchaos

@AlonaNadler
Copy link

AlonaNadler commented Sep 2, 2020

Based on my research that includes talking with multiple observability folks and researching the ways various solutions calculate rates. I suggest the following :

There are two types of field users want to calculate rates on:
Gauges (more common for our user base) - are point in time metric, CPU, memory, revenue, etc
Counters - are accumulating metrics. Doing cumulative over time until they reach a certain point and reset

Looking at our beats most are gauges. Looking at other vendors it seems more commonly they calculate rate assuming gauges metrics.

goals :

  • Reduce the number of decisions the users need to make
  • Help users with friendly message to get to what they need
  • Make the most common operations default.

What we expose for users?
2 new functions:

  • Rate
  • % change

Rate

  • By default, Lens assumes a gauge metric and performs based on a per-second interval. For example bytes per second, CPU per second.
  • The calculation is done based on the field being gauge metric - average(field))/ normalized or sum of field diveden to normalized by the interval
  • Rate can be on records field - doing rate of events
  • If users choose advance they can specify their field is a counter. In that case, Lens calculates the rate using max instead of average.
  • In the advance popup, there is also a checkbox for positive only, the checkbox is checked by default so the rate will not have negative values by default

% change:

  • used often in business to show growth or decline using percentage in a normalized way.
  • refers to also as month over month or d/d
  • example: % change of revenue or transactions
  • can show a negative percentage
  • should be exposed outside of rate since it is a common function for business users who not always refers to it as rate in business terms

cc: @cchaos @crowens

@wylieconlon
Copy link
Contributor Author

Based on your comment and offline discussions, I think we mostly agree, with the exception of gauges. I think this might be a confusion about the terminology, and will attempt to clarify this using examples. These examples are based on the work I've been doing to create a comprehensive list of time series functions for us to work backwards from.

  1. Count:

Count per hour, count per second. For example, I can show the number of hourly transactions in the ecommerce sample data, even when I query the data per day:

Count per hour

  1. Value of a summable field:

For fields that would usually be displayed as a Sum, such as quantity, we can convert these into a rate by taking the Sum over the time interval. For example, the ecommerce sample data has products.quantity, and we can look at the average number of products sold per hour, even in a daily chart.

Quantity per hour

  1. Counter:

Counters are monotonically increasing numbers, such as network traffic. The function to convert a counter into a rate does not work for other types of numbers. Counters usually have a separate ID field, and if the user doesn't provide an ID we'll produce incorrect data. Here is a correct dataset, showing the average Megabytes per second:

Screen Shot 2020-09-02 at 4 46 44 PM

4: Gauges:

Represents point-in-time data like CPU or memory. Gauges aren't usually shown as a rate, because they are usually shown as an average. Despite this, some timeseries tools offer this functionality, but I think we should discourage users from attempting this on gauges:

What does CPU per second mean?

It would make sense to apply smoothing functions to gauges, such as moving averages.

@AlonaNadler
Copy link

Talking with observability folks @sorantis @crowens @ruflin @exekias

This is what I suggest:

  • Rate by default assumes the field is a accumulating counter - calculated positive(derivative(max(field)))
  • By default Lens normalizes to a second. Users can change it to minutes, hour, day etc...
  • Users can specify explicitly that their field is not a counter, in which case Lens will calculated it based on rate on gauge field Derivative(average(field))/ normalized by the interval
  • Rate of events is an edge case it can be fulfilled in Lens in two ways (we should choose one):
    • allow users to perform rate on the count of records - this will be calculated based on count being a gauge
    • We can add this as an advance function within count function in Lens
  • Users can select a numeric field drag to the preview or configurator change to rate function and will get a preview based on rate per second (while users can use this in the wrong way and create rates on fields which shouldn't be using rate, Lens will still allow it)
  • % of change is a separate quick function in Lens
  • ideally (but not sure if its possible): when Lens detect a field is not an accumulating counter field, it will calculate rates based on the gauge formula without the need from users side to explicitly configure it.

@cchaos wireframe for rate:
image

@wylieconlon
Copy link
Contributor Author

wylieconlon commented Sep 16, 2020

@AlonaNadler I don't think that you've addressed the points I made in my previous comment. Can you please respond to my points more specifically? Here are the main differences I've identified:

  1. You are saying that we should be supporting "rates on gauges", but like I wrote previously, this does not make sense to me. I could not find any examples where this is the desired behavior. Can you provide a specific example, or is this a terminology issue?

  2. The "rate of summable number", which I listed as 2 above, is missing from your comment. If this was unintentional, then I think your wireframe needs to be updated.

  3. You think the default type of numbers are "counters", but this doesn't match the data I was looking at. There are very few counters tracked in metricbeat. Overall, most numbers in metricbeat are gauges, and I wouldn't want to convert these into rates. Why do you think the default should be counters? Please provide more evidence.

@AlonaNadler
Copy link

You are saying that we should be supporting "rates on gauges", but like I wrote previously, this does not make sense to me. I could not find any examples where this is the desired behavior. Can you provide a specific example, or is this a terminology issue?

checkout my last comment "Rate by default assumes the field is an accumulating counter - calculated positive(derivative(max(field)))"

The "rate of summable number", which I listed as 2 above, is missing from your comment. If this was unintentional, then I think your wireframe needs to be updated.

Yes I don't think we should support it in this form. I suggest we support a simple rate function that doesn't require to add another calculation and is based on the field being an accumulating counter

You think the default type of numbers are "counters", but this doesn't match the data I was looking at. There are very few counters tracked in metricbeat. Overall, most numbers in metricbeat are gauges, and I wouldn't want to convert these into rates. Why do you think the default should be counters? Please provide more evidence.

your 3rd point seems to contradict your 1st. Based on my discussion with the observability team, seem like the counter will continue and these are the fields that their users need rates more often. Any other example I showed that were different variation in the beats dashboards were a calculation mistake. To support other users who might be using gauges (related or unrelated to beat) I suggested the approach in the wireframe that support both gauge and counters assuming counters by default

@exekias
Copy link
Contributor

exekias commented Sep 17, 2020

Thanks for moving this forward folks! some clarifications from what we do in Metricbeat and others:

Our definition of counter is in fact what you are talking about: a monotonically increasing number. Calculating its rate is done with positive(derivative(max(field))).

I love to see that rate is normalized to seconds by default with users being able to change this!

@exekias
Copy link
Contributor

exekias commented Sep 17, 2020

I did some more thinking about this:

"normalizing values to the bucket times" is actually calculating a rate 🤦, so it all depends on the definition we want to have.

If we take:

  • positive(derivative(max(field))) / normalized to interval for counters
  • average(field) / normalized to interval for the rest (gauges).

things add up.

Answering myself on examples:

A gauge could be the document count per interval, either calculated by Kibana or sent by the Agent. Calculating a rate on it would result on document count per second, which matches the definition of rate.

A counter could be network bytes sent since the system started, incrementing on each Agent poll. Calculating a rate on it would result on network bytes sent per second.

I think that, for the sake of this conversation, the value of a summable field matches the definition of a gauge. Users would just need to ask for the rate of the sum of the field. I understand that chaining these will be possible?

@wylieconlon wylieconlon changed the title Average event rate aggregation (count per second) [Lens] Discuss types of rates Sep 17, 2020
@wylieconlon wylieconlon added discuss and removed Team:AppArch Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:TSVB TSVB (Time Series Visual Builder) blocked enhancement New value added to drive a business result labels Sep 17, 2020
@kibanamachine kibanamachine removed this from Long Horizon in kibana-app-arch Sep 17, 2020
@wylieconlon wylieconlon added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Sep 17, 2020
@wylieconlon
Copy link
Contributor Author

This issue has turned into a discussion about rates in general, so I've split out my original questions about just the "event rate" function into a separate issue: #77811

@wylieconlon
Copy link
Contributor Author

wylieconlon commented Sep 17, 2020

@exekias @AlonaNadler I think we are on the same page about counter-type numbers, as well as about "event rates". To summarize:

  • Counter type: We can offer a convenient function to calculate the rate. This is case 3 that I listed above.
  • Event rates by multiplying the Count/Sum/Average into a "per second" rate: This is not a separate function, but it's an option on existing functions. This would handle the cases which I described above as 1 and 2.

The remaining issue is gauge-type numbers, which I listed as case 4 above:

  1. Gauges: Represents point-in-time data like CPU or memory. Gauges aren't usually shown as a rate, because they are usually shown as an average.

I think that I've already shown that CPU and memory don't make sense when converted into a "rate per second", but I've been looking for examples where it might make sense. I finally found one: metricbeat has an elasticsearch module which tracks the total document count for each index. Here's a graph of the Average of elasticsearch.index.total.docs_count:

Screen Shot 2020-09-17 at 3 55 53 PM

As you can see, the value goes up and down over time. Here's what it looks like if I take the derivative of this value and scale it to 1 minute:

Screen Shot 2020-09-17 at 3 56 40 PM

Here's what it looks like with the same calculation, but split by the index that we're calculating from:

Screen Shot 2020-09-17 at 3 58 59 PM

All of these calculation will be possible in Lens by default, even if we don't offer a rate function which is able to do them. So do we need this at all?

I am proposing that the only dedicated "rate" functions in Lens would be:

  • Monotonic rate for counters
  • Percentage change

@flash1293
Copy link
Contributor

The kinds of rates discussed here are supported in Lens today

Lens automation moved this from Long-term goals to Done Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
No open projects
Lens
  
Done
Development

No branches or pull requests

6 participants