New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move validation in distributor into separate middleware #3386
Conversation
4ac3590
to
2d6da45
Compare
Meta and please ignore: Test out new subteam of the Docs Squad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit
I just ran the benchmark
|
I know now why the benchmarks show much more allocations in this branch: |
The commit d84ed76 fixes this in the new middleware
|
@@ -1007,155 +1144,57 @@ func (d *Distributor) push(ctx context.Context, pushReq *push.Request) (*mimirpb | |||
return nil, err | |||
} | |||
|
|||
d.updateReceivedMetrics(req, userID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be moved below the return
on :1150
, but then certain metrics which currently are registered with 0
will not be registered at all anymore which will require me to update the tests, so I didn't do that because I'd prefer to not update the tests at all in this PR to make it easy to see that they all still pass without any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks fine, but I think it needs a test showing that invalid series are no longer forwarded. Test should fail on main
branch, but will pass here. WDYT?
I would suggest to double check reuse of timeseries -- I think we don't return yoloSlice
to the pool at all.
pkg/distributor/distributor.go
Outdated
@@ -805,6 +806,9 @@ func (d *Distributor) prePushRelabelMiddleware(next push.Func) push.Func { | |||
} | |||
|
|||
if len(removeTsIndexes) > 0 { | |||
for _, removeTsIndex := range removeTsIndexes { | |||
mimirpb.ReuseTimeseries(req.Timeseries[removeTsIndex].TimeSeries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this doesn't properly return yoloSlice
from PreallocTimeseries
back to the pool. Perhaps we need to introduce ReusePreallocTimeseries
to handle both *TimeSeries
and yoloSlice
?
I am wondering if we could simplify the cleanup by 1) keeping original slice in the push handler, 2) and reusing that original slice. However that wouldn't work, as elements in the slice get rewritten too (eg. in this code, when we drop some of them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this wouldn't return the yoloSlice
to the pool, util.RemoveSliceIndexes
shouldn't replace the slice although it might shorten it by updating the length property.
Now that I think about it... if util.RemoveSliceIndexes
removes index 0
of the slice then even though the underlying data array of the slice doesn't get replaced, the offset 0
gets shifted forward and I suspect that the positions on the underlying array which are before the new offset 0
will never be accessible again. Is this what you were referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this wouldn't return the yoloSlice to the pool,
util.RemoveSliceIndexes
shouldn't replace the slice although it might shorten it by updating the length property.
- for elements that we're going to remove, we call
mimirpb.ReuseTimeseries(req.Timeseries[removeTsIndex].TimeSeries)
here. - for entire slice, we will eventually call
mimirpb.ReuseSlice(req.Timeseries)
, added as "cleanup" function in push.go. We perform this call on updatedreq.Timeseries
, which at this point when cleanup runs only has timeseries that were actually pushed to the ingester, but not timeseries that were removed due to validation or relabeling.
Call to mimirpb.ReuseSlice(req.Timeseries)
in step 2 will call ReuseTimeseries(ts[i].TimeSeries)
on individual elements, and then also return their yoloSlice
to the pool:
mimir/pkg/mimirpb/timeseries.go
Lines 290 to 301 in 70f91b8
func ReuseSlice(ts []PreallocTimeseries) { | |
for i := range ts { | |
ReuseTimeseries(ts[i].TimeSeries) | |
if ts[i].yoloSlice != nil { | |
reuseYoloSlice(ts[i].yoloSlice) | |
ts[i].yoloSlice = nil | |
} | |
} | |
slicePool.Put(ts[:0]) //nolint:staticcheck //see comment on slicePool for more details | |
} |
But in step 1, we only call mimirpb.ReuseTimeseries
, but don't return yoloSlice
to the pool. And since these elements are no longer part of req.Timeseries
when cleanup (step 2) is done, their yoloSlice
will not be returned back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I missed that.
Fixed here: ead8758
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fix looks good!
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
d84ed76
to
193c8f2
Compare
That's a good idea. I added this test: https://github.com/grafana/mimir/pull/3386/files#diff-16b7662ec0e247fa72bf86b63ec8558a8cea51ac2280baf05a25887578132c44R3203-R3331 I force-pushed in order to move the test into the beginning of the PR as the first commit, this allows us to verify that it fails without any further changes by just checking it out directly, when checking out the last commit of the PR it passes:
|
Thanks for adding the test. |
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
After the latest changes I ran the distributor benchmark one more time, comparing the commit which this PR is based on with the last commit of this PR. It looks good:
|
This reverts commit 5356edd. See grafana/mimir-squad#973
* Revert "Distributor push wrapper should only receive unforwarded samples. (#2980)" This reverts commit 3d14b39. See grafana/mimir-squad#973 * Revert "move validation in distributor into separate middleware (#3386)" This reverts commit 5356edd. See grafana/mimir-squad#973 * Pin doc-validator version Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
} | ||
|
||
if ts.yoloSlice != nil { | ||
reuseYoloSlice(ts.yoloSlice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I would set ts.yoloSlice
to nil
and then reuse this function in ReuseSlice()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #3464
* add test to show that validation happens before forwarding Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * move validation in distributor into separate middleware Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * fixes Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * cleanup Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * dont send empty requests to ingesters and fix metrics Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * fix error handling Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * changelog Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * PR feedback Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com> * return timeseries to pool Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * reuse yoloSlice correctly Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Ursula Kallio <ursula.kallio@grafana.com>
This change moves the validation of incoming series before the Distributor's forwarding functionality, so that we don't forward invalid series.