Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lens] Add "counter rate" for monotonically increasing numbers #46627

Closed
simianhacker opened this issue Sep 25, 2019 · 19 comments · Fixed by #82948
Closed

[lens] Add "counter rate" for monotonically increasing numbers #46627

simianhacker opened this issue Sep 25, 2019 · 19 comments · Fixed by #82948
Assignees
Labels
enhancement New value added to drive a business result Feature:Lens Project:LensDefault Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@simianhacker
Copy link
Member

simianhacker commented Sep 25, 2019

Goal statement

Lens should support the "positive growth rate" aggregation, used for showing the rate of increase for a monotonic counter such as network traffic, with time scaling in date units such as 1s 60s, etc.

Example

Visualizing network traffic per second, scaled down from 1 hour intervals. It would show 2Mb/s.

Decisions to be made

The most correct way of calculating this value is by implementing a new aggregation in Elasticsearch. Should we wait for this new aggregation, or implement a workaround in Kibana? The workaround will not produce the same numbers in all cases, because Elasticsearch is able to handle more edge cases than we can.

Decision:
For our first version we will implement the logic client side using the same approach as TSVB uses today.

UI

Rate will be a separate operation which can be chosen like sum/min/max/avg/... It allows to pick a field and a time unit.
The operation should be called "Rate". The unit always has to be picked.

Screenshot 2020-10-12 at 09 31 49

As this operation requires a date histogram to make sense, we need to handle the case if there is no date histogram available (both for the case when we have a rate oepration already and if we don't):

Screen Shot 2020-10-07 at 10 30 49 AM

Screen Shot 2020-10-07 at 10 30 57 AM

Implementation

This behavior will be implemented as a separate Lens-private expression function which is calculating the derivative/"positive only" value based on the Elasticsearch max metric of the selected field.

Prior discussion

This is how the Infra UI calculates rates for monotonically increasing numbers like system.network.out.bytes for both the Inventory View and the Metrics Explorer. An added bonus is if we had rate as an option, the Metrics Explorer could link to Lens instead of TSVB:

{
    rate_max: { max: { field: '<field goes here>' } },
    rate_deriv: {
      derivative: {
        buckets_path: 'rate_max',
        gap_policy: 'skip',
        unit: '<user defined, defaults to 1s>',
      },
    },
    rate: {
      bucket_script: {
        buckets_path: { value: 'rate_deriv[normalized_value]' },
        script: {
          source: 'params.value > 0.0 ? params.value : 0.0',
          lang: 'painless',
        },
        gap_policy: 'skip',
      },
    },
  }
@simianhacker simianhacker added enhancement New value added to drive a business result Feature:Lens labels Sep 25, 2019
@wylieconlon wylieconlon added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Sep 30, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app

@wylieconlon
Copy link
Contributor

@simianhacker Can you show an example of what this looks like when visualized?

@wylieconlon
Copy link
Contributor

I was able to set up the Infra UI with my Metricbeat data and use the rate aggregation to get this chart:

Screenshot 2019-10-02 14 28 54

Unlike the normal "Bytes" formatter, this is showing bytes per user defined interval.

@agirbal
Copy link

agirbal commented Feb 24, 2020

Copying over details of #58189

Kibana's basic visualizations, or Lens, should have a way to convert your data count to a rate per unit of time, e.g. (request per second). This is the usual way of thinking of metrics ("each of my instances can do 500 RPS") and it should be made easily available.

As mentioned to @AlonaNadler, many metrics we look at and compare with other systems is typically expressed as a rate, e.g. "requests per second" or "clients per hour". Usually data in ES is of discrete form, you get 1 document per event (e.g. logs). How do you plot your "logs per second"?

There are some answers out there in Discuss but most of them are wrong:

  • use Derivative agg: this only works if your data is a counter that tallies up a value over time, which is not often the case.
  • use moving avg or bucket agg: the parameter is a fixed number of entries or time unit to average over. It just does smoothing, so you need to have a rate in the first place, it does not convert your count to a rate.
  • adjust kibana's time window: yea, you can get lucky and pick a window so that kibana decides to precisely split your count by that unit of time. Not workable :)
  • change the time interval in TSVB: this seems to work but is actually unusable because it makes data granularity extremely small. Looking at more than 30min will actually return an error.
  • the only way I found was to use scale_interval in Timelion.
    Thanks!

@simianhacker
Copy link
Member Author

simianhacker commented Feb 24, 2020

@agirbal Side note... For TSVB, If you set the interval to something like >=10s then you still get all the zooming capabilities but it prevents the buckets from going smaller then 10s. The minimum bucket size should be the same size as your event/sample rate of your data. I always use this feature and it would be the default behavior if there was a programmatic way to determine the event rate (starting in 7.5 we do this in the Metrics UI via the metricset.period field).

There are some other tricks you can employ like using cumlative_sum and derivative together to scale the data. This is a good trick for "log rate", do a cumlative_sum on doc_count then use a derivative to scale it down to 1 second.

The tricky part is abstracting all of this away from the user, using "rate" on a number that is not monotonically increasing will almost always produce something the user doesn't want, except when they know what they are doing. Until we have a concept of "number types" (ie counters, gauges, etc) in Elasticseach/Kibana it's going to be difficult to guide the user in the right direction. In Observability, we plan on using the field metadata to store information like this from the Metricbeat modules.

@agirbal
Copy link

agirbal commented Feb 25, 2020

@simianhacker thanks for the info! I didn't think of that trick "cumulative sum + derivative", that's a good one, is it pretty much how it could get implemented under the hood anyway?

I think the feature of "sample rate" could cleanly be attached to the count function no? A user already has to decide whether to do count (which implies counter) or some field function like avg of field X which would work on gauge or rate field. Since you already have specialized fields for each Y-axis value type, you could just have this for the count one.

From there you can have another general setting smooth or step that would do the moving average and smooth out your graphs, and that can be applied to any function. I think one difficulty today is that we're mixing the 2 concepts which makes it more confusing to the user maybe.

It's a good idea to abstract / automate it all from the user if possible in the future, just I don't think this one is a very complex concept, would love to be able to simply do RPS in TSVB :)

@simianhacker
Copy link
Member Author

I was talking with a colleague last week about how we should just add "rate" to TSVB that essentially does max, derivative, and positive_only since that's the TSVB formula we (Observability) use for Metrics Explorer.

@simianhacker
Copy link
Member Author

Just created PR for adding rate to TSVB: #59843

@wylieconlon wylieconlon changed the title [lens] Add rate to the aggregation options for the Y axis [lens] Add positive growth rate to the aggregation options for the Y axis Apr 6, 2020
@wylieconlon
Copy link
Contributor

@agirbal Your request might not actually be solved by this issue, but I think it is still important to track. I wrote up an issue describing what I would call Average event rate, which is different from the kind of rate that @simianhacker is describing in this request.

@wylieconlon
Copy link
Contributor

Tracking this in Elasticsearch because it might be possible to get a more-correct implementation in Elasticsearch. The main reason is that counter resets in the middle of a bucket should be handled, and the client-side implementation is only able to throw it away.

elastic/elasticsearch#60619

@wylieconlon
Copy link
Contributor

@AlonaNadler @cchaos We have been calling this the "positive rate" in TSVB, but with sentence indicating that this should only be used for monotonically increasing numbers. I would expect us to have exactly the same name and descriptive text.

Screen Shot 2020-08-20 at 12 10 08 PM

@wylieconlon
Copy link
Contributor

There are several decisions that we are not finalized on:

  • Naming and description: Needs input from @AlonaNadler
  • Expected form interface: Needs input from @cchaos
  • Technical question: Should we use the workaround with edge cases, or wait for Elasticsearch to implement the correct aggregation?

More questions will probably be raised once we get these first few.

@wylieconlon
Copy link
Contributor

In terms of naming, my preference is "Positive rate" or "Counter rate", because these names clearly indicate that there is something unusual about this function. I am opposed to calling it "Rate" because it's a clearly confusing name (evidence is that we discussed the name and meaning for weeks). Using the word "growth" is also confusing because it means something different in a business context than what it means here.

Can we settle on the name "Positive rate"?

@wylieconlon wylieconlon changed the title [lens] Add positive growth rate for monotonically increasing numbers [lens] Add positive rate for monotonically increasing numbers Oct 2, 2020
@flash1293 flash1293 removed loe:needs-research This issue requires some research before it can be worked on or estimated needs design labels Oct 12, 2020
@wylieconlon
Copy link
Contributor

I believe we've settled on "Counter rate" after discussion. Updating the title.

@wylieconlon wylieconlon changed the title [lens] Add positive rate for monotonically increasing numbers [lens] Add "counter rate" for monotonically increasing numbers Oct 12, 2020
@wylieconlon
Copy link
Contributor

Based on a suggestion by @exekias in the parallel Elasticsearch issue, I think we should slightly tweak the algorithm that TSVB is using. The main tweak is that when the value decreases, we should use the new value instead of resetting it to zero. This is expressed in pseudocode the following way:

rate = 0
loop_over_values(lambda (current, previous):
  if current >= previous:
    rate = rate + (current - previous)
  else:
    rate = rate + current
)

For the types of counters that we've considered so far this algorithm is going to appear more correct by avoiding sudden drops to zero.

@flash1293
Copy link
Contributor

Reopening as the UI part is still missing

@flash1293 flash1293 reopened this Nov 20, 2020
@flash1293 flash1293 assigned flash1293 and unassigned mbondyra Nov 20, 2020
@flash1293
Copy link
Contributor

Closed by #84384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens Project:LensDefault Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
10 participants