-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(heavier_once_cell): assertion failure can be hit #6652
Conversation
2394 tests run: 2277 passed, 0 failed, 117 skipped (full report)Code coverage (full report)
The comment gets automatically updated with the latest test results
f5b0e72 at 2024-02-07T07:35:08.193Z :recycle: |
Earlier you mentioned that the loser making any forward progress requires that there cannot be a forever loop which does Solving that would mean a custom Semaphore and ll_sem in tokio speak. Especially considering the new RWLock, it's really nice to be able to close the semaphore. Alternatively we could use the Inner::value as a state variable more ( EDIT: tried to implement this in #6657. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the init_take_deinit_scenario test cases are worth it. I would have preferred a failpoint-style approach. Obviously not with failpoint's global state, but, a cfg(test)
field in the heavier_once_cell to configure the failpoint.
Why did we have to switch from acquire_owned()
to acquire()
in this PR? Would help the clarity of this change if we could avoid that.
Regarding the core change, I think it prevents the bug we found from occuring. 95% confident it doesn't introduce new issues.
Commit message should point out that the essence of this change is the Arc::ptr_eq
.
So that the
I don't think the |
This PR reverts - #6589 - #6652 because there's a performance regression that's particularly visible at high layer counts. Most likely it's because the switch to RwLock inflates the ``` inner: heavier_once_cell::OnceCell<ResidentOrWantedEvicted>, ``` size from 48 to 88 bytes, which, by itself is almost a doubling of the cache footprint, and probably the fact that it's now larger than a cache line also doesn't help. See this chat on the Neon discord for more context: https://discord.com/channels/1176467419317940276/1204714372295958548/1205541184634617906 I'm reverting 6652 as well because it might also have perf implications, and we're getting close to the next release. We should re-do its changes after the next release, though. cc @koivunej cc @ivaxer
@problame noticed that the
tokio::sync::AcquireError
branch assertion can be hit like in the first commit. We haven't seen this yet in production, but I'd prefer not to see it there. Theretake_and_deinit
is being used, but this race must be quite timing sensitive.