Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsphere resource discovery #10

Merged
merged 7 commits into from
Apr 1, 2022
Merged

Conversation

rlankfo
Copy link
Member

@rlankfo rlankfo commented Mar 29, 2022

This PR includes ported code from the telegraf vsphere plugin to discover vsphere resources. There are a few minimal changes to the code, for example, resource filtering is currently excluded. As we begin collecting prometheus metrics for discovered resources, this is expected to change a bit as well for generating the appropriate labels (for resource hierarchy).

Closes #1

LICENSE Outdated Show resolved Hide resolved
LICENSE Outdated Show resolved Hide resolved
vsphere/client.go Show resolved Hide resolved
vsphere/collector.go Outdated Show resolved Hide resolved
vsphere/endpoint.go Outdated Show resolved Hide resolved
vsphere/endpoint.go Show resolved Hide resolved
vsphere/endpoint.go Show resolved Hide resolved
vsphere/finder.go Outdated Show resolved Hide resolved
MetricLookback: 3,
ForceDiscoverOnInit: true,
ObjectDiscoveryInterval: time.Second * 300,
Timeout: time.Second * 60,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about how the Timeout currently works (along with finding it a little hard to reason when there is/isn't a timeout set in the context). As far as I can tell, the timeout currently gets reset per-object/metric as the discovery process progresses or during collection of metrics. This will make the total timeout time fairly unpredictable, and make it difficult to give tight constraints on how fast things should be completing.

IMO instead of having a Timeout per-operation, we should have a timeout for the entire discovery/scrape. This would allow users tighter control over how long things are taking rather than letting it go on unbounded. You can even consider using the X-Prometheus-Scrape-Timeout-Seconds header that Prometheus sets here.

I'm thinking that getting the timeout easy to reason about for users pretty important. We should be careful to cancel scrapes and avoid piling up a number of pending /metrics calls from Prometheus that get ignored because Prometheus already timed out way before the exporter did.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you here, this is something I'll note to come back to once further along and testing in live environments.

@rfratto
Copy link
Member

rfratto commented Mar 29, 2022

(FWIW I understand that most of the comments I had above is relevant to code we copied and not something you wrote directly :)

vsphere/client.go Outdated Show resolved Hide resolved
vsphere/endpoint.go Outdated Show resolved Hide resolved
Copy link

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a lot of places we are swallowing or not returning errors. Feels like we should have comments on why we are doing that.

@rlankfo rlankfo merged commit f1870fa into grafana:main Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vCenter Server Inventory Hierarchy
3 participants