Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch counter metrics to use CUMULATIVE #19

Closed
wants to merge 1 commit into from

Conversation

prog8
Copy link

@prog8 prog8 commented Aug 10, 2021

fixes #18

@google-cla
Copy link

google-cla bot commented Aug 10, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@tam7t
Copy link
Contributor

tam7t commented Aug 11, 2021

I'll need to do some manual testing on this because i'm not sure that converting a metric from gauge to cumulative is backwards compatible for existing metrics

@@ -365,7 +365,7 @@ func (s *Sink) report(ctx context.Context) {
Type: fmt.Sprintf("custom.googleapis.com/%s%s", s.prefix, name),
Labels: labels,
},
MetricKind: metricpb.MetricDescriptor_GAUGE,
MetricKind: metricpb.MetricDescriptor_CUMULATIVE,
Resource: resource,
Points: []*monitoringpb.Point{
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for CUMULATIVE the points also need to have a StartTime:

https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors#metrickind

@prog8
Copy link
Author

prog8 commented Aug 11, 2021

@tam7t let me know what are your findings when you do simple backwards compatibility testing

@rf
Copy link
Contributor

rf commented Aug 31, 2021

So I tested this and I'm getting this error
Failed to write time series data: rpc error: code = InvalidArgument desc = Field timeSeries[0].points[0].interval.start_time had an invalid value of "2021-08-31T14:19:47-07:00": The start time must be before the end time (2021-08-31T14:19:47-07:00) for the non-gauge metric 'custom.googleapis.com/go-metrics/my-metric-id
and yes, this is a backwards incompatible change -- I had to manually delete the old metric since the type changed.

@rf
Copy link
Contributor

rf commented Aug 31, 2021

The correct MetricKind may actually be DELTA.

@prog8
Copy link
Author

prog8 commented Sep 1, 2021

@rf thanks for your effort.
@tam7t would you accept PR afte changing MetricKind to DELTA?

EDIT: I think it's better to avoid forking this repo to fix it. Ideally if it stays under Google's control but I see there is not much info from Google side 😞

@tam7t
Copy link
Contributor

tam7t commented Sep 1, 2021

#19 (comment) is the reason for the error there. I'm still evaluating the effects of changing the metric types at all but i do not suspect that DELTA will make it more difficult as additional time periods would need to be accounted for

@rf
Copy link
Contributor

rf commented Sep 1, 2021

It turns out DELTA metrics are not supported in Stackdriver for custom metrics.

I'm not sure that counters in go-metrics are intended to be treated like a statsd counter (i.e. something that resets after an interval). It's not clear if other sinks treat it this way.

// edit: Was looking at the wrong go-metrics library (lol). The statsd sink does increment a statsd counter when you call IncrCounter. So this library has counters implemented incorrectly. When we do a deep copy of the stats in order to report, it should be clearing the counters. I think once that is implemented, using a GAUGE to report them might actually work fine.

@tam7t
Copy link
Contributor

tam7t commented Sep 1, 2021

@rf I believe that what you are describing is how to implement and report a DELTA metric (clearing each interval and reporting the start/end time of the interval) because you are reporting the change of the counter over a specific time interval. To report a counter as CUMULATIVE you need to report the start time of the accumulation in the report to stackdriver, as the histogram metrics do.

@rf
Copy link
Contributor

rf commented Sep 1, 2021

I did have some trouble getting CUMULATIVE to work -- I was also trying to use a StartTime that was basically the end of the last reporting interval. Not sure if that was right, should I be using the same start time as for histograms?

I was able to get this to work, however, by using GAUGE and just resetting the counters every time we do the deep copy in order to flush. It has the benefit of being somewhat backwards incompatible (because the metrics don't need to be deleted from stackdriver) but will end up reporting very different values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch counter metrics to use cumulative or delta
3 participants