-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Fix plasma store segfault #15071
[Core] Fix plasma store segfault #15071
Conversation
@suquark Can you check if this seems like a valid approach to fix the issue? (We can also probably make ReturnFromGet no-op if get_request pointer is nullptr) |
@rkooo567 I think make it noop is more reasonable. A request is deleted if
Under both cases, we should not reply to the client anymore. If we do not make it a noop, then we could reply to the client twice. BTW, I think the most important thing is to understand why there is a concurrency issue. I did not figure it out actually. Maybe the boost timer works in a way I do not expected. |
I see. Good point. Do you mind if I make it no-op still with the shared pointer? If you'd like to keep the raw pointer, lmk. My approach will be like
Thought?
About this; I think this could be related? #15070 |
Yes! I think #15070 is the cause. It surprises me that boost timer cancellation is not an atomic operation (but actually reasonable considering the implementation of the eventloop). We should use shared_ptr. The raw pointer comes from the original codebase that is written with pure C. Set the private attribute is good. |
Didn't know it was written pure C first! Good to know... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Failed window tests are failing in the master too. |
Why are these changes needed?
It looks like when there are some pressure to the object store, this type of segfault rarely happens.
I looked into code, and it seems like we are using the raw pointer for
GetRequest
. It is highly likely theGetRequest
is already GC'ed when ReturnFromGet is called. (Most likely due to this part of code)This try fixing the issue by using the shared pointer in this path.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.