Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect duplicates across files in textfile collector #274

Closed
wants to merge 1 commit into from

Conversation

carlpett
Copy link
Collaborator

Fixes #272.
There are still a couple of cases where the textfile collector can cause the entire scrape to fail, but this should solve for a major contributor of the real-world cases, I hope.

if seenIn, ok := seen[h]; ok {
repr := friendlyString(*metricFamily.Name, names, values)
log.Warnf("Metric %s was read from %s, but has already been collected from file %s, skipping", repr, path, seenIn)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good idea, you're now randomly dropping metrics depending on the iteration order. We've had issues with this type of approach in the past.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not great, I agree. However, failing the scrape completely isn't that good either. Do you see some middle road or better approach?
Also, I just realized I've forgotten to set the error flag when this happens, so now it is not alertable. That needs to be fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node exporter fails, as this is an invalid setup. It's better to hard fail than silently return partial data.

@SuperQ FYI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be failing the entire scrape, though? Skipping the textfile stuff completely seems okay (although I'll note that the metrics aren't randomly dropped, iteration order is deterministic here, so which metrics are dropped is consistent across scrapes given consistent input), but any individual breakage from any collector resulting in up=0 seems a bit harsh?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should approach it however you approach any individual collector failing. If you permit that, then you should have metrics indicating which collectors did/didn't work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that would be my preference. It will require a bit of gymnastics though, since promhttp bails the entire scrape if we allow the data to get there.
I guess I'll do some sort of buffer to be able to detect failure before passing the metrics through the channel. Would this be something that should be done in node_exporter as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be a question for @SuperQ, I've heard no plans in that direction.

@carlpett
Copy link
Collaborator Author

carlpett commented Mar 2, 2020

Won't be taking this approach, so closing this.

@carlpett carlpett closed this Mar 2, 2020
@carlpett carlpett deleted the handle-textfile-duplicates branch March 28, 2020 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

duplicate metric in textfile cause scraping to fail completely
2 participants