Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove local REs for label and metric names and use fast checks #256

Merged
merged 1 commit into from
Nov 12, 2016

Conversation

beorn7
Copy link
Member

@beorn7 beorn7 commented Nov 11, 2016

All was a mess, we had duplicates of the REs for label name and metric
names here, and we sometimes used them, sometimes we used those from
common/model.

Now we are consistently using the fast checking functions from common/model.

(Tests for leading colons are included there, see
prometheus/common#66 .)

@stuartnelson3 @grobie @kamalmarhubi

All was a mess, we had duplicates of the REs for label name and metric
names here, and we sometimes used them, sometimes we used those from
common/model.

Now we are consistently using the fast checking functions from common/model.

(Tests for leading colons are included there, see
prometheus/common#66 .)
@kamalmarhubi
Copy link

kamalmarhubi commented Nov 11, 2016

Now we are consistently using the fast checking functions from common/model.

Have you benchmarked to support the implicit performance claim? Given the typical use case is incredibly infrequent calls, it might be easier to be sure things are correct if you stick to regexes. It may also be faster.

@grobie
Copy link
Member

grobie commented Nov 11, 2016

👍

@brian-brazil These variables are not exported, where could they be used?

@brian-brazil
Copy link
Contributor

Ah right. I wasn't sure where I'd been importing for.

👍

@grobie
Copy link
Member

grobie commented Nov 11, 2016

@kamalmarhubi Setting up a regular expression engine environment and testing all possible states is a lot slower than doing manual string comparison if easily possible. I'd expect the regex solution to be at least an order of magnitude slower. Therefore I'd rather see a benchmark proving that a regex is indeed faster than asking for the inverse.

@kamalmarhubi
Copy link

@grobie my argument is that correctness of the regex is easy to determine, since the formats are defined as regexes. Given that this is fixing a bug in the validation, I think that should carry weight. For the current use case of validating metric and label names, performance is not a major concern: they're validated at metric creation time, which happens a constant number of times early in the program's life. So I think the that hand-rolled validation is the one that needs to prove its case.

@kamalmarhubi
Copy link

kamalmarhubi commented Nov 11, 2016

Benchmarking on my laptop with name http_requests_total, I'm seeing ~1us for regex, and 100ns for hand rolled. This means that in ~1ms of time at startup, the regex approach can validate hundreds of metrics. For ease of verifying correctness, compare

MetricNameRE = regexp.MustCompile(`^[a-zA-Z_:][a-zA-Z0-9_:]*$`)
func IsValidMetricName(n LabelValue) bool {
    if len(n) == 0 {
        return false
    }
    for i, b := range n {
        if !((b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || b == '_' || b == ':' || (b >= '0' && b <= '9' && i > 0)) {
            return false
        }
    }
    return true
}

If using the function approach, there ought to be a reasonable set of tests to show that it's equivalent to the spec, which is a regex.

(This is the last comment I'll make on this topic.)

@grobie
Copy link
Member

grobie commented Nov 11, 2016

We can't remove the efficient implementation, as it's used heavily in
decoding and scraping inside of Prometheus where the performance matters.
Given that it's already available and tested, it makes sense to me to use
it in the client as well (instead of copy&pasting the regular expression
from prometheus/common).

On Fri, Nov 11, 2016 at 12:36 PM Kamal Marhubi notifications@github.com
wrote:

Benchmarking on my laptop with name http_requests_total, I'm seeing ~1us
for regex, and 100ns for hand rolled. This means that in ~1ms of time at
startup, the regex approach can validate 1000 metrics or so. For ease of
verifying correctness, compare

MetricNameRE = regexp.MustCompile(^[a-zA-Z_:][a-zA-Z0-9_:]*$)

func IsValidMetricName(n LabelValue) bool {
if len(n) == 0 {
return false
}
for i, b := range n {
if !((b >= 'a' && b <= 'z') || (b >= 'A' && b <= 'Z') || b == '_' || b == ':' || (b >= '0' && b <= '9' && i > 0)) {
return false
}
}
return true
}

(This is the last comment I'll make on this topic.)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#256 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAANaJPTGyAT2XtPcM5oSc4_qwLeHjHDks5q9LW9gaJpZM4Kv4kw
.

@beorn7 beorn7 merged commit 198ef62 into master Nov 12, 2016
@beorn7 beorn7 deleted the beorn7/desc branch November 12, 2016 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants