Increment gather_errors for all errors emitted by inputs #2339

cosmopetrich · 2017-01-29T23:34:22Z

CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
Sign CLA (if not already signed)
~~README.md updated (if adding a new plugin)~~

I'm not sure if the change made here is the correct way to go about this, but I figured a PR would be better way to start a discussion than an issue or groups post.

Most Telegraf input plugins don't currently seem to provide a metric that can be used to determine if the gather operation is running successfully. For example, the prometheus input plugin logs an error if given a target address that returns HTTP 401, but won't return any metrics. That makes it difficult to tell whether a particular Prometheus client is "up".

I'd thought of deploying http_response inputs alongside httpjson/prometheus inputs, but that's a fair bit of extra configuration and doesn't handle endpoints that return 200 but with invalid data, etc. Adding a new metric to the prometheus input was another option but it looks like some other plugins behave similarly and it would be nice to have a more generic solution.

Telegraf 1.2 added the internal plugin, which exposes an internal_agent_gather_errors metric. That seems like a reasonable thing to monitor, however as far as I can tell it's only incremented by the SNMP plugin. This PR aims to increment the metric whenever any input emits an error. This won't catch all errors, as quite a number of plugins handle and log errors internally. I can update those too, but that's probably best in a separate PR.

cosmopetrich · 2017-01-29T23:35:50Z

agent/agent.go

@@ -157,7 +157,7 @@ func gatherWithTimeout(
 		select {
 		case err := <-done:
 			if err != nil {
-				log.Printf("E! ERROR in input [%s]: %s", input.Name(), err)
+				acc.AddError(err)
 			}
 			return
 		case <-ticker.C:


I wasn't sure whether it made sense to invoke AddError on "took longer than collection interval" errors.

sparrc · 2017-02-02T14:56:06Z

this certainly seems like a good idea to me, what do you think @phemmer?

we should probably document that plugins should never call acc.AddError and return the error

sparrc · 2017-02-02T14:56:22Z

@cosmopetrich feel free to update the changelog

phemmer · 2017-02-02T15:40:04Z

I'm for it.
But I think we should adjust line 164 as well.

sparrc · 2017-02-03T11:22:23Z

done

closes #2339

Increment gather_errors for all input errors

44cb249

cosmopetrich commented Jan 29, 2017

View reviewed changes

sparrc added this to the 1.3.0 milestone Jan 30, 2017

sparrc closed this in b1945c0 Feb 3, 2017

maxunt pushed a commit that referenced this pull request Jun 26, 2018

Increment gather_errors for all input errors

aeffacb

closes #2339

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increment gather_errors for all errors emitted by inputs #2339

Increment gather_errors for all errors emitted by inputs #2339

cosmopetrich commented Jan 29, 2017

cosmopetrich Jan 29, 2017

sparrc commented Feb 2, 2017

sparrc commented Feb 2, 2017

phemmer commented Feb 2, 2017

sparrc commented Feb 3, 2017

Increment gather_errors for all errors emitted by inputs #2339

Increment gather_errors for all errors emitted by inputs #2339

Conversation

cosmopetrich commented Jan 29, 2017

cosmopetrich Jan 29, 2017

Choose a reason for hiding this comment

sparrc commented Feb 2, 2017

sparrc commented Feb 2, 2017

phemmer commented Feb 2, 2017

sparrc commented Feb 3, 2017