Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762

gracewehner · 2021-01-27T22:37:05Z

Required for all PRs:

Associated README.md updated.
Has appropriate unit tests.

PR for the issue #8705.

If monitor_kubernetes_pods = true and the new monitor_kubernetes_pods_version = 2 is specified, the pod list is queried locally on the node every 60s instead of using the watch api. This allows for scalability for large clusters so the pods to scrape are distributed among the nodes, which requires telegraf to be run as a daemonset. The same annotations to choose which pods to scrape are used and the optional filtering by namespace and selectors is also the same, with this filtering done afterwards. The feature is backwards compatible and is used if and only if monitor_kubernetes_pods_version = 2 is specified in the config.

telegraf-tiger

🤝 ✅ CLA has been signed. Thank you!

lgtm-com · 2021-01-27T23:15:09Z

This pull request introduces 1 alert when merging 49336fa into d415d9f - view on LGTM.com

new alerts:

1 for Disabled TLS certificate check

lgtm-com · 2021-01-27T23:41:35Z

This pull request introduces 1 alert when merging 79e0e71 into d415d9f - view on LGTM.com

new alerts:

1 for Disabled TLS certificate check

lgtm-com · 2021-01-28T01:25:41Z

This pull request introduces 1 alert when merging cdc439c into d415d9f - view on LGTM.com

new alerts:

1 for Disabled TLS certificate check

gracewehner · 2021-01-28T22:55:19Z

#8705

vishiy · 2021-02-03T19:39:22Z

@ssoroka - can you please look for this to be merged ?

ssoroka

Looks pretty good. I'd like to see more tests around the cAdvisor code if possible, as it looks as if it will panic when run.

ssoroka · 2021-02-10T23:16:55Z

plugins/inputs/prometheus/README.md

@@ -33,6 +33,10 @@ in Prometheus format.
  ## - prometheus.io/path: If the metrics path is not /metrics, define it with this annotation.
  ## - prometheus.io/port: If port is not 9102 use this annotation
  # monitor_kubernetes_pods = true
+  ## Get the list of pods to scrape from either:
+  ##    - version 1 (default): the kubernetes watch api (cluster-wide)
+  ##    - version 2: the local cadvisor api (node-wide); for scalability. Note that the environment variable NODE_IP must be set to the host IP.


How do you feel about accepting node_ip as a string parameter? I find environment variables aren't great for discoverability. I'm not against it also setting the value from env NODE_IP by default in init()

Since this will be solving scalability, there will be a large number of nodes. For this new way, telegraf will need to be run on every node, so having to put a different node ip value for every telegraf config could be impractical.

Also, if the environment variable does not exist, then we would need as many kube api calls as there are during startup which could overload the kube api server at scale. I also have instructions in the readme for how to get the environment variable.

Please let me know what you think.

plugins/inputs/prometheus/README.md

plugins/inputs/prometheus/kubernetes.go

plugins/inputs/prometheus/prometheus.go

gracewehner · 2021-03-04T00:04:31Z

Hi @ssoroka, I apologize for the delay. I have addressed and/or responded to your comments. Could please take another look? Thank you!

plugins/inputs/prometheus/kubernetes.go

ssoroka · 2021-03-05T17:29:49Z

plugins/inputs/prometheus/prometheus.go

+		p.nodeIP = os.Getenv("NODE_IP")
+		if p.nodeIP == "" {
+			return errors.New("The environment variable NODE_IP is not set. Cannot get pod list for monitor_kubernetes_pods using node scrape scope")
+		}


consider defaulting this to os.Getenv("NODE_IP") and allowing it to be set from config. ... actually, I think it's a requirement that all settings need to be configurable from config settings, though env variable defaults are fine

I have added this as a config setting. If the config for this is empty or the specified ip is invalid, it then checks if the env var exists and is valid

ssoroka · 2021-03-08T16:01:12Z

Thanks for all the work here, much appreciated!

rashmichandrashekar · 2021-03-08T18:53:41Z

Thanks for all the work here, much appreciated!

Thanks @ssoroka, @vishiy and @gracewehner!

… (500+ pods) when scraping thru 'monitor_kubernetes_pods' (#8762) (cherry picked from commit d7df2c5)

… (500+ pods) when scraping thru 'monitor_kubernetes_pods' (influxdata#8762) (cherry picked from commit d7df2c5)

gracewehner and others added 18 commits January 13, 2021 13:16

more cleanup and error handling

bdcb7ea

add cadvisor option for scalable telegraf scraping

b65a1a4

comments and cleanup

24cda5d

logging and executable

d08750f

use k8s package for selector parsing and matching

76af87b

fixes

4e20fef

cleanup and error handling for selector parsing

2436923

more cleanup

9bda346

Delete telegraf.zip

4ea31a3

Update README.md

1255e25

Fix whitespace

7d29972

More error handling

673b181

Add tests

4e4141c

Update README.md

797c07f

fix test and dependencies

1cfec11

add licenses

cdc439c

Add error handling for NODE_IP env var

34fb243

Add note about NODE_IP

353e008

telegraf-tiger bot approved these changes Jan 27, 2021

View reviewed changes

gracewehner and others added 3 commits January 27, 2021 14:50

go mod tidy changes

768e899

remove duplicate line

49336fa

make fmt

79e0e71

gracewehner mentioned this pull request Jan 28, 2021

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8705

Closed

ssoroka suggested changes Feb 10, 2021

View reviewed changes

addressed PR comments

a100a6b

don't change all the whitespace in the readme

d662caf

ssoroka reviewed Mar 5, 2021

View reviewed changes

gracewehner added 3 commits March 5, 2021 12:05

add config for node ip and use env var as default

d367504

go fmt

1a80d8e

revert readme whitespace changed from go fmt

f97278c

ssoroka approved these changes Mar 8, 2021

View reviewed changes

ssoroka merged commit d7df2c5 into influxdata:master Mar 8, 2021

ssoroka pushed a commit that referenced this pull request Mar 10, 2021

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters…

a1ee816

… (500+ pods) when scraping thru 'monitor_kubernetes_pods' (#8762) (cherry picked from commit d7df2c5)

sspaink mentioned this pull request Jul 6, 2021

prometheus input plugin: mistaken delete #9408

Closed

imranismail mentioned this pull request Jul 13, 2021

Fix prometheus cadvisor authentication #9497

Merged

2 tasks

arstercz pushed a commit to arstercz/telegraf that referenced this pull request Mar 5, 2023

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters…

706673d

… (500+ pods) when scraping thru 'monitor_kubernetes_pods' (influxdata#8762) (cherry picked from commit d7df2c5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762

gracewehner commented Jan 27, 2021

telegraf-tiger bot left a comment

lgtm-com bot commented Jan 27, 2021

lgtm-com bot commented Jan 27, 2021

lgtm-com bot commented Jan 28, 2021

gracewehner commented Jan 28, 2021

vishiy commented Feb 3, 2021

ssoroka left a comment

ssoroka Feb 10, 2021

gracewehner Mar 3, 2021

gracewehner commented Mar 4, 2021

ssoroka Mar 5, 2021

gracewehner Mar 5, 2021

ssoroka commented Mar 8, 2021

rashmichandrashekar commented Mar 8, 2021

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762

Prometheus [Input] plugin - Optimizing for bigger kubernetes clusters (500+ pods) when scraping thru 'monitor_kubernetes_pods' #8762

Conversation

gracewehner commented Jan 27, 2021

Required for all PRs:

telegraf-tiger bot left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Jan 27, 2021

lgtm-com bot commented Jan 27, 2021

lgtm-com bot commented Jan 28, 2021

gracewehner commented Jan 28, 2021

vishiy commented Feb 3, 2021

ssoroka left a comment

Choose a reason for hiding this comment

ssoroka Feb 10, 2021

Choose a reason for hiding this comment

gracewehner Mar 3, 2021

Choose a reason for hiding this comment

gracewehner commented Mar 4, 2021

ssoroka Mar 5, 2021

Choose a reason for hiding this comment

gracewehner Mar 5, 2021

Choose a reason for hiding this comment

ssoroka commented Mar 8, 2021

rashmichandrashekar commented Mar 8, 2021