Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inputs.prometheus): Add internal metrics #14424

Merged
merged 47 commits into from Feb 12, 2024

Conversation

tguenneguez
Copy link
Contributor

Summary

Be able to have information about collect data off this plugin.

Checklist

  • [ x] No AI generated code was used in this PR

Related issues

resolves #13103

@tguenneguez tguenneguez changed the title Add internal metrics feat(inputs.prometheus): Add internal metrics Dec 8, 2023
@tguenneguez tguenneguez closed this Dec 8, 2023
@tguenneguez tguenneguez reopened this Dec 11, 2023
plugins/inputs/prometheus/prometheus.go Outdated Show resolved Hide resolved
plugins/inputs/prometheus/prometheus.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I would still only add this metric when enabled in the config.

@tguenneguez
Copy link
Contributor Author

However, I would still only add this metric when enabled in the config.

Hello

544 / 5 000
Résultats de traduction
Résultat de traduction
Hello
It's possible, but what is the point of making the provisioning of these metrics conditional?
Personally, I see 2 advantages in using these conditional options:

  1. A rupture of change
  2. A collection that requires a very significant effort from the plugin.
    In this case, I admit that I do not see the point of proposing conditional activation...

If you still want conditional activation, what configuration variable? => enable_request_metrics
Should it be enabled by default?

THANKS
Thomas

@Hipska
Copy link
Contributor

Hipska commented Feb 5, 2024

No it should not be enabled by default, to keep backwards compatibility. This way users can control whether to have additional metrics.

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tguenneguez! Some small comments...

plugins/inputs/prometheus/prometheus.go Outdated Show resolved Hide resolved
plugins/inputs/prometheus/prometheus_test.go Outdated Show resolved Hide resolved
plugins/inputs/prometheus/prometheus_test.go Outdated Show resolved Hide resolved
plugins/inputs/prometheus/prometheus_test.go Outdated Show resolved Hide resolved
tguenneguez and others added 4 commits February 6, 2024 17:02
Good idea

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Good idea

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Good idea

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Good idea

Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tguenneguez!

@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Feb 6, 2024
@srebhan srebhan assigned powersj and unassigned srebhan Feb 6, 2024
}
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other types of errors? What happens if there are?

}

tags["result"] = resultString
requestFields["result_code"] = resultCodes[resultString]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translating the error string to a numeric value is up to the user and this should not be hard-coded.

The result string can continue to exist as a tag, but the result code field should be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var metrics []telegraf.Metric
requestFields := make(map[string]interface{})
tags := map[string]string{}
u.OriginalURL.User = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tguenneguez thanks for the updates - just another ping on this question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates - just another ping on this question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates - just another ping on this question.

I think you meant to respond not just copy my response? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for what line exactly have you a question ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u.OriginalURL.User = nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was searching on line :

var metrics []telegraf.Metric
You have a keen eye ;-)
New commit

@@ -422,6 +491,7 @@ func (p *Prometheus) gatherURL(u URLAndAddress, acc telegraf.Accumulator) error

var err error
var resp *http.Response
start := time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be right before the client.Do and the time captured immediately after the client.Do. That way we avoid the if statements + error checking getting included in the total time.

return fmt.Errorf("error making HTTP request to %q: %w", u.URL, err)
if setError(err, requestFields, tags) == nil {
// Any error not recognized by `set_error` is considered a "connection_failed"
setResult("connection_failed", requestFields, tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing the return code and response time of the request is acceptable for a new metric, however, adding all these additional error conditions is not something to start adding to all the plugins. These errors are reported via the the internal metrics already and you can you that or the lack of data as alerts.

@powersj
Copy link
Contributor

powersj commented Feb 7, 2024

@tguenneguez,

We agreed to adding the basic support for capturing timing information. Duplicating the entire http_response plugin inside the prometheus plugin is not acceptable.

See this comment and Sven's response: #13103 (comment)

@tguenneguez
Copy link
Contributor Author

tguenneguez commented Feb 7, 2024

I understand your point of view, but when on a telegraf agent we have several prometheus inputs, if we base ourselves on the internal metrics, then it is difficult to remotely identify which source is defective.
This therefore makes it more complicated to deal with the anomaly and especially to solicit the right expert in relation to the defective data source.
As writing in : #13103 (comment)

@tguenneguez after discussing this within the team, we decided to accept a PR adding the proposed metrics to inputs.prometheus. Looking forward to your PR!

The metrics was :
response_time (float, seconds)
content_length (int, response body length)
response_status_code_match (int, 0 = mismatch, 1 = match)
http_response_code (int, response status code)
result_code (int, see below)

@powersj
Copy link
Contributor

powersj commented Feb 7, 2024

The metrics was :

No it was not. We should have been more explicitly, but it was in reference to what hipska had proposed previously: response_time and content_length.

then it is difficult to remotely identify which source is defective.

You have also been insistent that you must collect timing information. Which means you are trying to resolve two things in one PR. Do one at a time.

As-is this is not going to land.

@telegraf-tiger
Copy link
Contributor

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working with us to get this in a great state.

@powersj powersj merged commit de66a2f into influxdata:master Feb 12, 2024
26 checks passed
@github-actions github-actions bot added this to the v1.30.0 milestone Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prometheus feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Inputs.prometheus] Add collection information
4 participants