Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter invalid lines. #87

Closed
wants to merge 1 commit into from

Conversation

piotr1212
Copy link
Contributor

Checks the number of fields in a line, there should be three fields. Fields are
separated by spaces, count the spaces and forward only lines which have two
spaces in them.

It's been a while since I wrote C... please check carefully.

Checks the number of fields in a line, there should be three fields. Fields are
separated by spaces, count the spaces and forward only lines which have two
spaces in them.
@grobian
Copy link
Owner

grobian commented Jul 27, 2015

so,the problem is you want the relay to discard lines that do not follow the <metric> <value> <timestamp>\n pattern?

@piotr1212
Copy link
Contributor Author

Yes, that is correct. I'm currently at a large organisation with an "open" Graphite for other teams to use. A lot of them send badly formatted metrics, I would like to filter them out on the relay instead of sending them to the downstream carbon-cache's.

@grobian
Copy link
Owner

grobian commented Jul 27, 2015

Hmm, I see the usecase here. We have the same setup ("open" Graphite) but rely on the stores dealing with random crap people seem to produce. I guess performance-wise this doesn't make much difference, visibility wise, I think it needs a new counter metricsInvalid or something to check we're not dropping like mad unexpectedly. I need to think this through a bit.

@piotr1212
Copy link
Contributor Author

I eventuall want to log the invalid lines someway, not sure yet how as I want to avoid disk run full when someone messes up and starts sending millions invalid lines every second (things like that happen..).

The reason why I would prefer this in the relay instead of the stores is that the relay has the source IP of the offender (and could log that), where the stores only see source ip of the relay.

@grobian
Copy link
Owner

grobian commented Jul 27, 2015

Heh, now you actually request the same as issue #78, and the source IP is currently hard to get at the level where we'd log.

@piotr1212
Copy link
Contributor Author

I'm more interested in logging wrongly formatted lines, the other request is for correct lines based on non-wanted metricname

Take for example the following lines:

  1. metric.somepath.aa 42 1438155337
  2. metric.somepath.aa 42 nonsense
  3. unwanted.path.bb 42 1438155337

with #78 you can configure that 1 and 2 will be sent to dest1 and 3 to be logged. My intention is to only drop/log line 2.
These lines could be discarded before the routing logic occurs, thus making it easier to include an IP. I actually hacked some code to see if it would work, and seems to do so. I expect to have some more time next week to work on this.

@grobian
Copy link
Owner

grobian commented Jul 30, 2015

Something like this https://github.com/Skyscanner/carbon-relay-ng/commit/69eafe322140b59d728cdc0be2e04b36f7552fa3 makes me feel like it should be an option whether or not to do validation.

@piotr1212
Copy link
Contributor Author

Sorry for the long delay. I've started a new project at a new customer and my focus got a bit changed by that. I don't expect to have time to finish it soon. :(
Work this far is https://github.com/piotr1212/carbon-c-relay/commits/filter2 which implements:

  • basic filtering
  • logging invalid lines
  • logging ip addresses for upd and tcp socket both ipv4 and ipv6
  • configurable
    I'm not really satisfied by the filtering part, after I made it configurable it made the code more complex and less efficient.

Anyway, I don't mind if you close this one and leave unimplemented.

@piotr1212
Copy link
Contributor Author

Sorry for late response. Thank you for reviewing and commenting my code. The suggested regex would be great! Unfortunately I still don't expect to have time to rework this soon ... :(

@grobian
Copy link
Owner

grobian commented Jan 28, 2016

don't worry, this is on my todo stack for implementing, I hope to be able to reuse your intial patch.

@ravibhooshan
Copy link

Is this Value filtering been implemented in latest release.

@grobian
Copy link
Owner

grobian commented Mar 14, 2016

nope

grobian added a commit that referenced this pull request Nov 11, 2016
This implements a solution for issue #228, issue #142, #issue #121,
pull #227 and pull #87.

Introduce new `validate <regexp> else <drop | log>` construct in match
rules to do checks on the data part of the message.
@grobian
Copy link
Owner

grobian commented Nov 11, 2016

I've introduced a validate clause to match rules

@grobian grobian closed this Nov 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants