-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsphere resource discovery #10
Conversation
MetricLookback: 3, | ||
ForceDiscoverOnInit: true, | ||
ObjectDiscoveryInterval: time.Second * 300, | ||
Timeout: time.Second * 60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about how the Timeout currently works (along with finding it a little hard to reason when there is/isn't a timeout set in the context). As far as I can tell, the timeout currently gets reset per-object/metric as the discovery process progresses or during collection of metrics. This will make the total timeout time fairly unpredictable, and make it difficult to give tight constraints on how fast things should be completing.
IMO instead of having a Timeout per-operation, we should have a timeout for the entire discovery/scrape. This would allow users tighter control over how long things are taking rather than letting it go on unbounded. You can even consider using the X-Prometheus-Scrape-Timeout-Seconds
header that Prometheus sets here.
I'm thinking that getting the timeout easy to reason about for users pretty important. We should be careful to cancel scrapes and avoid piling up a number of pending /metrics calls from Prometheus that get ignored because Prometheus already timed out way before the exporter did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you here, this is something I'll note to come back to once further along and testing in live environments.
(FWIW I understand that most of the comments I had above is relevant to code we copied and not something you wrote directly :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a lot of places we are swallowing or not returning errors. Feels like we should have comments on why we are doing that.
85fe04e
to
b20f689
Compare
This PR includes ported code from the telegraf vsphere plugin to discover vsphere resources. There are a few minimal changes to the code, for example, resource filtering is currently excluded. As we begin collecting prometheus metrics for discovered resources, this is expected to change a bit as well for generating the appropriate labels (for resource hierarchy).
Closes #1