-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect duplicates across files in textfile collector #274
Conversation
if seenIn, ok := seen[h]; ok { | ||
repr := friendlyString(*metricFamily.Name, names, values) | ||
log.Warnf("Metric %s was read from %s, but has already been collected from file %s, skipping", repr, path, seenIn) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good idea, you're now randomly dropping metrics depending on the iteration order. We've had issues with this type of approach in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not great, I agree. However, failing the scrape completely isn't that good either. Do you see some middle road or better approach?
Also, I just realized I've forgotten to set the error flag when this happens, so now it is not alertable. That needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node exporter fails, as this is an invalid setup. It's better to hard fail than silently return partial data.
@SuperQ FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be failing the entire scrape, though? Skipping the textfile stuff completely seems okay (although I'll note that the metrics aren't randomly dropped, iteration order is deterministic here, so which metrics are dropped is consistent across scrapes given consistent input), but any individual breakage from any collector resulting in up=0
seems a bit harsh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should approach it however you approach any individual collector failing. If you permit that, then you should have metrics indicating which collectors did/didn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that would be my preference. It will require a bit of gymnastics though, since promhttp
bails the entire scrape if we allow the data to get there.
I guess I'll do some sort of buffer to be able to detect failure before passing the metrics through the channel. Would this be something that should be done in node_exporter as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be a question for @SuperQ, I've heard no plans in that direction.
Won't be taking this approach, so closing this. |
Fixes #272.
There are still a couple of cases where the textfile collector can cause the entire scrape to fail, but this should solve for a major contributor of the real-world cases, I hope.