Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datadog output plugin doesn't send the metric type #6822

Closed
orensho opened this issue Dec 25, 2019 · 10 comments · Fixed by #10979
Closed

datadog output plugin doesn't send the metric type #6822

orensho opened this issue Dec 25, 2019 · 10 comments · Fixed by #10979
Labels
cloud Issues or requests around cloud environments feature request Requests for new plugin and for new features to existing plugins

Comments

@orensho
Copy link

orensho commented Dec 25, 2019

datadog output plugin doesn't send the metric type (optional) therefore the metric type is displayed as 'Other' in datadog and COUNT or RATE queries can't be apply
https://docs.datadoghq.com/api/?lang=bash#metrics

Altough telegraf.Metric holds the Type (see https://github.com/influxdata/telegraf/blob/master/metric.go) , the datadog output plugin (https://github.com/influxdata/telegraf/blob/master/plugins/outputs/datadog/datadog.go) Write function discard it and sends only Metric, Tags, Host and Points

metric := &Metric{
Metric: dname,
Tags: metricTags,
Host: host,
}
metric.Points[0] = dogM

@danielnelson danielnelson added the feature request Requests for new plugin and for new features to existing plugins label Dec 26, 2019
@danielnelson danielnelson changed the title datadog output plugin doesn't send the metric type: bug or feature request? datadog output plugin doesn't send the metric type Dec 26, 2019
@danielnelson
Copy link
Contributor

It doesn't look like any type support was considered for this output, I believe the plugin was added before we had any metric type support. I was curious how long Datadog has supported metric types as well, but didn't find any information in the Wayback Machine.

The current state of metric types in Telegraf is that they are mostly supported as passthrough for the prometheus output. Even when we add support for metric types to Datadog output, there will likely be many metrics that are untyped, so it may be worth asking Datadog if they can add support for using untyped metrics in the COUNT and RATE functions. Would it help or hinder if we sent all untyped metrics as Gauges?

We also only currently have the concept a counter type, it might be worth considering if we should have both rate and count types.

@orensho
Copy link
Author

orensho commented Dec 31, 2019

Daniel,

Datadog support replied that there is a workaround to edit and 'apply' as_count() but since it is on a metric of type 'other' this could cause the aggregations to be inaccurate
They are recommending to use another metrics submission system/library

Since the root cause (no offense) is inherent in this output plugin I would like to recommend adding a function that checks the prometheus metric type and set count to count, rate to rate and others to unset

What do you think?

see https://docs.datadoghq.com/api/?lang=bash#post-timeseries-points and https://docs.datadoghq.com/developers/metrics/types/?tab=count

@danielnelson
Copy link
Contributor

Yes this would be good, let's try to hammer out a few of the details.

Our Counter type represents a non-resetting counter type, so there doesn't seem to be a direct analogue with DataDog, as both the COUNT and RATE types appear to be resetting counters. Would it make sense to still report these as COUNT type, or would GAUGE be more appropriate?

Converting Gauge to GAUGE seems obvious and should work well. Do you think Untyped should also be Gauge or would it better as OTHER?

Our Summary and Histogram types don't match up very well with the Datadog types, so they should also probably do the same as the Gauge type.

We probably won't know what conversions are best ahead of time, so if our choices are wrong we can always revisit. One thing we should check though is how Datadog handles type changes, will we run into difficulties migrating to the new format or making any future changes?

@orensho
Copy link
Author

orensho commented Jan 7, 2020

Daniel and team,

It took me several days to have good answers from datadog (we are a paying customer)

  1. the workaround I stated above is valid when applied on metric that was original a counter, there is a way to workaround also the interval, BUT the vendor is not guaranty the accuracy of some of their monitoring feature (such as SLO)
  2. I think the datadog represent counter and count the same way, since it is just a representation in their system and the submitter is handling the logic (increase/decrease)
  3. I took a look at your plugin, first it look a "simple" change as you stated above, but I couldn't find a way to retrieve the agent interval to set it as the metric interval

@danielnelson
Copy link
Contributor

I couldn't find a way to retrieve the agent interval to set it as the metric interval

It isn't possible, not only can each input plugin have a unique interval, many plugins do not use an interval at all and are event driven (such as socket_listener). Additionally, even with inputs which normally work on an interval it is of course possible for them to run longer, missing a scheduled collection or delaying it.

This is why I'm wondering if we should report basically all current types as GAUGE. Converting non-resetting counters to resetting counters will be a bit trickier, especially since we will want to do it in a backwards compatible way, but should be doable. I think we may want to do this initially in a processor/aggregator and introduce a new Rate type, but I'd need some time to consider this further.

@orensho
Copy link
Author

orensho commented Jan 12, 2020

Understood.
Thank you for your time, and I will follow up on this RFE

@ssoroka ssoroka added the cloud Issues or requests around cloud environments label Oct 30, 2020
@markbastiaans
Copy link

markbastiaans commented Jan 15, 2021

We're currently running into this issue. We'd like to use Telegraf to directly submit metrics to Datadog. This works fine for gauges, but counters and rates are not possible. It's of course possible to specify metric type, units and rollup intervals on the Datadog side, but that requires manual work and as discussed in this issue, this can also lead to inaccuracies.

Would you be open to a PR that is similar to #8397?

My idea is to be able to specify which fields to interpret as a rate, counter and monotonic counter. I see a couple of challenges there:

  • The interval is a required field to submit metrics of type rate and count via the Datadog API. Aside from comparing timestamps in measurements, is there a way to get the interval from the producing plugin?
  • There is no monotonic_count type on the Datadog API, so we would have to calculate the delta ourselves in this case based on the previous value for the metric, and account for resetting counters along the way.

@ssoroka
Copy link
Contributor

ssoroka commented Jan 21, 2021

I think we'd be open to something like that. If you can tell directly how to convert the metric type that is best, but adding a setting for it if you can't infer it from the metric type is fine.

@ssoroka
Copy link
Contributor

ssoroka commented Jan 21, 2021

I would say counting the deltas and submitting diffs, while possible, is probably error prone. I think instead you would map count to gauge.

@jrimmer-housecallpro
Copy link
Contributor

We're having hiccups with Rates. I wonder if this issue is the underlying cause of #10944.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud Issues or requests around cloud environments feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants