Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add group_{left,right} to LogQL #4510

Merged
merged 6 commits into from
Oct 29, 2021
Merged

Conversation

lingpeng0314
Copy link
Contributor

@lingpeng0314 lingpeng0314 commented Oct 21, 2021

What this PR does / why we need it:
Add group_{left,right} to LogQL, the group_left/group_right modifier allows one-to-many and many-to-one vector matching.

Which issue(s) this PR fixes:
Fixes TODO: Add group_{left,right} to LogQL

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated

@CLAassistant
Copy link

CLAassistant commented Oct 21, 2021

CLA assistant check
All committers have signed the CLA.

@lingpeng0314 lingpeng0314 force-pushed the group_left_right branch 2 times, most recently from 6437ba4 to 8cfe7fd Compare October 21, 2021 04:35
@lingpeng0314 lingpeng0314 marked this pull request as draft October 21, 2021 04:42
@lingpeng0314 lingpeng0314 marked this pull request as ready for review October 21, 2021 05:54
@dannykopping
Copy link
Contributor

@lingpeng0314 thanks for this submission! Looks quite exciting.
We'll get this reviewed in the next few days.

@@ -163,6 +163,23 @@ This example will return every machine total count within the last minutes ratio
sum by(machine) (count_over_time({app="foo"}[1m])) / on() sum(count_over_time({app="foo"}[1m]))
```

### Many-to-one and one-to-many vector matches
Many-to-one and one-to-many matchings refer to the case where each vector element on the "one"-side can match with multiple elements on the "many"-side. This has to be explicitly requested using the group_left or group_right modifier, where left/right determines which vector has the higher cardinality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Many-to-one and one-to-many matchings refer to the case where each vector element on the "one"-side can match with multiple elements on the "many"-side. This has to be explicitly requested using the group_left or group_right modifier, where left/right determines which vector has the higher cardinality.
Many-to-one and one-to-many matchings occur when each vector element on the "one"-side can match with multiple elements on the "many"-side. You must explicitly request matching by using the group_left or group_right modifier, where left or right determines which vector has the higher cardinality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

<vector expr> <bin-op> on(<labels>) group_left(<labels>) <vector expr>
<vector expr> <bin-op> on(<labels>) group_right(<labels>) <vector expr>
```
The label list provided with the group modifier contains additional labels from the "one"-side to be included in the result metrics. For on a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The label list provided with the group modifier contains additional labels from the "one"-side to be included in the result metrics. For on a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.
The label list provided with the group modifier contains additional labels from the "one"-side that are included in the result metrics. For on a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.

Copy link
Contributor

@chri2547 chri2547 Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify the sentence that starts "For on a label..." What does that mean?

Copy link
Contributor Author

@lingpeng0314 lingpeng0314 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<vector expr> <bin-op> on(<labels>) group_right(<labels>) <vector expr>
What i want to say is a label shouldn;t appear in the on label list and group_x label list at the same time because on label as the join key, the value in the lef/right vector is the same. So the label in group_x is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

<vector expr> <bin-op> on(<labels>) group_right(<labels>) <vector expr>
```
The label list provided with the group modifier contains additional labels from the "one"-side to be included in the result metrics. For on a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.
Grouping modifiers can only be used for comparison and arithmetic. Operations as and, unless and or operations match with all possible entries in the right vector by default.
Copy link
Contributor

@chri2547 chri2547 Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible wording below for the sentence starting with "Operations as..."

"By default, the system matches as, unless, and or operations with all entries in the right vector."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

The label list provided with the group modifier contains additional labels from the "one"-side to be included in the result metrics. For on a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.
Grouping modifiers can only be used for comparison and arithmetic. Operations as and, unless and or operations match with all possible entries in the right vector by default.

This example will return sum results for the same app with the many part labels and the labels specified by group_right in the one side.
Copy link
Contributor

@chri2547 chri2547 Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update wording to:

"The following example returns sum results for the same application with the many part labels and the labels specified by the group_right operation."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@chri2547 chri2547 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided wording suggestions.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review:
This is looking wonderful. Thanks for adding this feature set and fixing some existing bugs for us. I had to play around in Prometheus to compare/validate some of this behavior :) and came to the conclusion that you're right!

Looking forward to reading the rest of the PR and thanks a ton for the contribution!

promql.Vector{
promql.Sample{Point: promql.Point{T: 60 * 1000, V: 0}, Metric: labels.Labels{labels.Label{Name: "app", Value: "foo"}}},
},
promql.Vector{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

promql.Vector{
promql.Sample{Point: promql.Point{T: 60 * 1000, V: 0}, Metric: labels.Labels{}},
},
promql.Vector{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@@ -649,7 +645,24 @@ func TestEngine_LogsInstantQuery(t *testing.T) {
{&logproto.SampleQueryRequest{Start: time.Unix(0, 0), End: time.Unix(60, 0), Selector: `sum by (app) (count_over_time({app="foo"}[1m]))`}},
},
promql.Vector{
promql.Sample{Point: promql.Point{T: 60 * 1000, V: 120}, Metric: labels.Labels{labels.Label{Name: "app", Value: "foo"}, labels.Label{Name: "machine", Value: "fuzz"}}},
promql.Sample{Point: promql.Point{T: 60 * 1000, V: 120}, Metric: labels.Labels{}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

errors.New("multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)"),
},
{
`sum by (app,machine) (count_over_time({app="foo"}[1m])) > bool on () group_left (app) sum by (app) (count_over_time({app="foo"}[1m]))`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can safely remove the (app) here from group_left because it's already ensured via the sum by (app,machine) grouping on the left hand side of this binary operation.

Suggested change
`sum by (app,machine) (count_over_time({app="foo"}[1m])) > bool on () group_left (app) sum by (app) (count_over_time({app="foo"}[1m]))`,
`sum by (app,machine) (count_over_time({app="foo"}[1m])) > bool on () group_left sum by (app) (count_over_time({app="foo"}[1m]))`,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, removed it.

@lingpeng0314
Copy link
Contributor Author

Partial review: This is looking wonderful. Thanks for adding this feature set and fixing some existing bugs for us. I had to play around in Prometheus to compare/validate some of this behavior :) and came to the conclusion that you're right!

Looking forward to reading the rest of the PR and thanks a ton for the contribution!

Thanks for your review and validation!
Actually when i implement the feature, i found the unit tests were broken so i went to prometheus to double confirm the behavior and finally decide to keep consistent with prometheus, which seems more meaningful.

Copy link
Contributor

@chri2547 chri2547 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@owen-d
Copy link
Member

owen-d commented Oct 28, 2021

@lingpeng0314 I've opened a PR into your fork (lingpeng0314#1) which removes the CardManyToMany, which we don't actually support. I think this makes the code a little cleaner. I opted to make the PR mysefl as I found that a bit easier than trying to explain it in PR review - I hope that's ok :) .

Edit: finished up the review - great work! If you're willing to merge the PR I linked, I'll approve this PR and we can merge.

@lingpeng0314
Copy link
Contributor Author

@lingpeng0314 I've opened a PR into your fork (lingpeng0314#1) which removes the CardManyToMany, which we don't actually support. I think this makes the code a little cleaner. I opted to make the PR mysefl as I found that a bit easier than trying to explain it in PR review - I hope that's ok :) .

Edit: finished up the review - great work! If you're willing to merge the PR I linked, I'll approve this PR and we can merge.

Merged the PR, thanks a lot!

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

@owen-d owen-d merged commit 76e2a14 into grafana:main Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants