New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: align semantics of metric and log query label extraction #11587
Conversation
Trivy scan found the following vulnerabilities:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please add a changelog entry and add the appropriate labels for backporting.
pkg/logql/log/parser_hints.go
Outdated
found := map[string]interface{}{} | ||
for _, e := range p.extracted { | ||
for _, l := range p.requiredLabels { | ||
if e == l { | ||
found[l] = nil | ||
break | ||
} | ||
} | ||
} | ||
|
||
return len(p.requiredLabels) == len(found) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed because previously, RecordExtracted
was only recording required fields that were extracted. As a result, it was previously acceptable to just test the length of the 2 slices against each other. However, now that we're recording all extracted labels, we have to actually compare extracted to required. I'm using a map to prevent duplicate extractions from causing an incorrect result here.
Ok, I think I've figured out why the benchmark was off. Here's a new benchstat after pushing 147dbab:
The change needed was in @MasslessParticle kudos to your diligence here in making sure we understood why this benchmark was off! |
pkg/logql/log/parser_hints.go
Outdated
return false | ||
} | ||
return len(p.extracted) == len(p.requiredLabels) | ||
|
||
found := map[string]interface{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to say found := make(map[string]struct{}, len(p.requiredLabels))
here. struct{}
is smaller than interface{}
and initializing with the max len we'll need ensures only one alloc
pkg/logql/log/parser_hints.go
Outdated
return len(p.extracted) == len(p.requiredLabels) | ||
|
||
found := map[string]interface{}{} | ||
for _, e := range p.extracted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be worth seeing if it's faster to just compare the list in two loops. It's an n^2 algorithm but it might be faster because it requires no allocs. (This function is called a lot)
It's the same reason extractedLabels
and requiredLabels
are []string
here rather than map[string]
. It's actually faster to iterate a small slice than index all these things from a map!
**What this PR does / why we need it**: A data race introduced in #11587 was caught in the [backport to k184](#11668). This removes the shared state of a single global `noParserHints` in favor of creating an empty `Hint` object for each label builder, since the `Hints` is keeping state of `extracted` and `requiredLabels`.
…11668) Backport 9759c13 from #11587 --- **What this PR does / why we need it**: Align the label parsing logic of metric and log queries to both only extract the first instance of a label when the same label is requested multiple times. **Which issue(s) this PR fixes**: Fixes #11647 --------- Co-authored-by: Trevor Whitney <trevorjwhitney@gmail.com>
…traction (#11667) Backport 9759c13 from #11587 --- **What this PR does / why we need it**: Fix label parsing logic so metric and log queries both only extract the first instance of a label that is requested multiple times. **Which issue(s) this PR fixes**: Fixes #11647 --------- Co-authored-by: Trevor Whitney <trevorjwhitney@gmail.com>
…a#11587) both metric and log queries use the first extracted label when multiple values are requested for the same label Fixes grafana#11647
) **What this PR does / why we need it**: A data race introduced in grafana#11587 was caught in the [backport to k184](grafana#11668). This removes the shared state of a single global `noParserHints` in favor of creating an empty `Hint` object for each label builder, since the `Hints` is keeping state of `extracted` and `requiredLabels`.
What this PR does / why we need it:
As a result of the query optimization work, we accidentally introduced a discrepancy between the semantics of logs and metrics queries. Metric queries can benefit from a short-circuit during label extraction, where we only need the labels needed for grouping and filtering. Log queries need to always extract all labels, as a user may want to inspect the key=value pairs of all detected fields, not just those filtered on.
However, given a query with nested labels of the same name (ie
{"message": {"message": "foo"}}
) this short circuit introduces a problem where the metric query will use the value of the firstmessage
(since it stops parsing themessage
key after finding it once), but the log query will use the value of the secondmessage
(since it will continue to extract all labels, even those it has already seen). This PR changes the semantics so that both types of queries will only use the first value.The result of the change is a slight improvement in the hot path of label extraction, which I interpret as us having to do a few more operations due to the removal of the
len(requiredLabels) == 0
short circuit, but those operations are quick, and thus more are done in the same runtime.Which issue(s) this PR fixes:
Fixes #11647