Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(outputs.stackdriver): Options to use official path and types #13454

Merged
merged 8 commits into from
Jun 26, 2023

Conversation

powersj
Copy link
Contributor

@powersj powersj commented Jun 15, 2023

fixes: #13303

@telegraf-tiger telegraf-tiger bot added fix pr to fix corresponding bug plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins labels Jun 15, 2023
@powersj powersj self-assigned this Jun 15, 2023
@influxdata influxdata deleted a comment from telegraf-tiger bot Jun 20, 2023

var kind string
switch m.Type() {
case telegraf.Gauge, telegraf.Untyped:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is saying that when a metric is untyped to send it in as a gauge. From our experience with tons of customers using GMP, this logic will definitely lead to incorrect behavior. Tons of metrics get untyped at the source for reasons unknown to me, and we have to account for this because people really don't want to change their exporters.

The way we handle this with our other collectors is, when a metric is untyped, to send it twice: once with the suffix "unknown" and once with the suffix "unknown:counter". Then we do some heuristic magic on the query side to choose the gauge (unknown) or the counter (unknown:counter) depending on our best guess based on the query functions you use. If you then type the metric, we union the data.

This is a bit larger of a change but without it this commit will 100% lead to unhappy customers on our (Google's) end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at the PR and giving insight into how this ends up getting used.

The way we handle this with our other collectors is, when a metric is untyped, to send it twice

Because the two points would have different metric types, both can be sent as part of the same timeseries, the only difference being the suffix of the metric type?

This is a bit larger of a change but without it this commit will 100% lead to unhappy customers on our (Google's) end.

fwiw this change will not be the default behavior and is opt-in. We do not want to change things for existing users, who as of now are not even setting these types.

Tons of metrics get untyped at the source for reasons unknown to me

My understanding is any metric from Telegraf not read from a Prometheus or using the Prometheus parser is internally stored as an untyped metric.

Thanks again!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at the PR and giving insight into how this ends up getting used.

The way we handle this with our other collectors is, when a metric is untyped, to send it twice

Because the two points would have different metric types, both can be sent as part of the same timeseries, the only difference being the suffix of the metric type?

Yeah - you can send the two metrics in the same call for sure. They'd basically be identical time series with just the suffix of the metric name changed. In our system they're treated as two separate metrics (as they have different metric names due to the suffix), so there are no collisions or anything if you send both together.

This is a bit larger of a change but without it this commit will 100% lead to unhappy customers on our (Google's) end.

fwiw this change will not be the default behavior and is opt-in. We do not want to change things for existing users, who as of now are not even setting these types.

Ack - makes sense, thanks.

Tons of metrics get untyped at the source for reasons unknown to me

My understanding is any metric from Telegraf not read from a Prometheus or using the Prometheus parser is internally stored as an untyped metric.

Yeah - we're running into this issue with a different customer using influxdb. One of our official collectors isn't doing the write-twice thing by accident (bug), and they're unable to query counter metrics properly as a result. There also doesn't seem to be a way for them to change it at the source without modifying the exporter code which they're loathe to do, so they're stuck until we can make an upstream change to our collector.

Totally understandable from them and a very common situation, which is why handling untyped metrics properly at ingestion time is so important on our end.

Thanks again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I have pushed a change to:

  • Mark untyped metrics with "unknown"
  • If we are using the official metric name format, and we run across a metric with an unknown kind, then we will create another timeseries, identical, but with the "unknown:counter" kind.

I need to add some additional test cases, but wanted to get this in folks hands quickly to try out.

@crflanigan
Copy link
Contributor

I have been running this for about 20 hours and everything seems to be working correctly.
@powersj Have you tested this with histograms?

@powersj
Copy link
Contributor Author

powersj commented Jun 23, 2023

@crflanigan,

I have been running this for about 20 hours and everything seems to be working correctly.

Awesome, thank you for the feedback!

Have you tested this with histograms?

Histograms and summary types are not supported by the plugin and will produce an error. I am sure we could look at changing that in the future with some help.

@crflanigan
Copy link
Contributor

crflanigan commented Jun 23, 2023

@crflanigan,

I have been running this for about 20 hours and everything seems to be working correctly.

Awesome, thank you for the feedback!

Have you tested this with histograms?

Histograms and summary types are not supported by the plugin and will produce an error. I am sure we could look at changing that in the future with some help.

No problem, do we have anything outstanding before we merge? @powersj

@powersj
Copy link
Contributor Author

powersj commented Jun 23, 2023

I need to update the README with a bit more info and get a final review from @srebhan. I'll work on the README now. Thanks again for the feedback!

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jun 23, 2023
@powersj powersj assigned srebhan and unassigned powersj Jun 23, 2023
@powersj powersj marked this pull request as ready for review June 23, 2023 21:06
Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just one small typo...

plugins/outputs/stackdriver/stackdriver.go Outdated Show resolved Hide resolved
Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commited the typo fix... Thanks @powersj!

@srebhan srebhan assigned powersj and unassigned srebhan Jun 26, 2023
@telegraf-tiger
Copy link
Contributor

@powersj powersj merged commit 45f9942 into influxdata:master Jun 26, 2023
24 checks passed
@powersj powersj deleted the fix/13303 branch June 26, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve outputs.stackdriver to better support GMP use case
4 participants