-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race in ingester #2327
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2327 +/- ##
=======================================
Coverage 60.99% 61.00%
=======================================
Files 158 158
Lines 12751 12754 +3
=======================================
+ Hits 7778 7781 +3
- Misses 4390 4391 +1
+ Partials 583 582 -1
|
pkg/ingester/flush.go
Outdated
if err := c.Encode(); err != nil { | ||
return err | ||
} | ||
streamsMtx.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should do a defer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain who is racing for the stream data here ? Querying ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not defer Unlock
. The reason being putting chunks to store here would be locked and other flush threads won't be able to call put
until the lock is released. This would become sequential io
and eventually make all flush threads sequential.
Line 337 in e22f365
if err := i.store.Put(ctx, wireChunks); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you return an error you're deadlocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain who is racing for the stream data here ? Querying ?
Yeah, it is querying
which access the stream chunks here
Line 261 in e22f365
for _, c := range s.chunks { |
but this is Read protected with streamMtx lock here
Line 316 in e22f365
i.streamsMtx.RLock() |
since c.Encode
is not write locked it causes the race.
Also, reading first chunk of the stream and getting bounds on that causes the race.
Line 174 in e22f365
firstTime, _ := stream.chunks[0].chunk.Bounds() |
There are other places which will cause the race. All those are already protected with streamMtx but c.Encode
was not protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
firstTime, _ := stream.chunks[0].chunk.Bounds() ? this one is fixed or already ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you return an error you're deadlocked
oh yeah. You are right. Didn't realize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
firstTime, _ := stream.chunks[0].chunk.Bounds() ? this one is fixed or already ok ?
this was already protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great !
c0740d3
to
4a8ee6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
Fixes data race in ingester
Which issue(s) this PR fixes:
Fixes #2265