Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of the metric lenght #354

Closed
azhiltsov opened this issue Dec 12, 2018 · 8 comments
Closed

Validation of the metric lenght #354

azhiltsov opened this issue Dec 12, 2018 · 8 comments

Comments

@azhiltsov
Copy link

What do you think about introducing a metric length validation as a configurable parameter?
The question comes from this issue: go-graphite/go-carbon#258
In order to ease our lives we would like to filter metrics on relays and currently can do this by the next rule:

match (^|\.)[^.]{250,}(\.|$)
 send to blackhole
 stop
 ;

Might be this can be done in the code internally in more efficient manner?

@grobian
Copy link
Owner

grobian commented Dec 12, 2018

I wonder if it is possible to do so. What you do (I think) is search for consecutive bits between dots (is directory or file) which is what the code would have to do too. Unless you'd be happy with a generic length cap which doesn't look at anything but character count of course.

@azhiltsov
Copy link
Author

You are right, it looks for bits between dots. Depending on what kind of the storage back-end is used the set of limits can be different, so I am not sure that generic metric length will be sufficient.
If you don't see any ways to improve this, then you might consider to document this solution in examples. If so I can send a PR.

@grobian
Copy link
Owner

grobian commented Dec 12, 2018

The only way is to add custom logic at the core that doesn't involve regex machinery, but it means you'd always have to pay for this cost.

I think an example is a good start in any case, we can alway revisit this once it seems it is too expensive to perform like this.

@azhiltsov
Copy link
Author

It turned out that this regex is very costly, It dropped the relay capacity on
older E5-2640 0 @ 2.50GHz to 70K points/s (from ~270K for our set of rules)
and on
newer E5-2620 v3 @ 2.40GHz to 130K points/s (from ~320K for our set of rules)
We are running v2.6 of c-relay

@grobian
Copy link
Owner

grobian commented Dec 12, 2018

which regex impl are you using?

@azhiltsov
Copy link
Author

perf top saying:
libc-2.17.so re_search_internal
I know that in the latest versions you switched to pcre

@grobian
Copy link
Owner

grobian commented Dec 13, 2018

I switched to a configurable impl, but most importantly oniguruma is by orders of magnitude better than the libc impl.

@grobian
Copy link
Owner

grobian commented Jan 4, 2019

What appears to be possible here, is to take the length of the entire input into account. That is, not just the metric name. Capping that value probably is enough for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants