Fix segfault cuvs bench#2088
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughSummary by CodeRabbit
WalkthroughReplaces unsynchronized lazy mmap initialization in blob_mmap::handle() with std::call_once using a heap-allocated once_flag and an internal retry loop for hugepage fallbacks. Also pre-initializes ground-truth blob-backed data on the main thread before spawning worker threads. ChangesThread-safety fix for blob_mmap
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
dantegd
left a comment
There was a problem hiding this comment.
Nice catch on this one!
One thing I noticed while reading: ground_truth_map's ctor in dataset.hpp is the only place that goes parallel directly over a blob (everywhere else in dataset serializes via mutex_). With this PR it's safe, but if you ever want to make it future proof against new lazy paths in blob.hpp, prewarming ground_truth_set.data() and filter_bitset->data(...) on the main thread before spawning workers would do it. Not blocking, just a thought.
| // Heap-allocate the once_flag so that blob_mmap remains movable (std::once_flag | ||
| // itself is neither copyable nor movable). Multiple threads can race on the | ||
| // first call to handle() via blob<T>::data(); without this serialization each | ||
| // racing thread would mmap the file and the losing emplace would munmap the | ||
| // winner's mapping out from under it -- a flaky SIGSEGV in ann benchmarks. |
There was a problem hiding this comment.
Tiny suggestion — the comment explains the destructor-during-emplace half of the bug really well, but the other half is that the reference handle() already returned to the winning thread also gets invalidated by the second emplace. Maybe worth tweaking the last line to something like:
// racing thread would mmap the file and the losing emplace would both munmap // the winner's mapping AND invalidate the reference the winner had already // returned to its caller -- a flaky SIGSEGV in ann benchmarks.
Totally optional, the existing comment is already good.
There was a problem hiding this comment.
Thanks for the review! I addressed the future proofing and comments update here 39f5d9a
There was a problem hiding this comment.
Yes, it got switched because of the rebase
39f5d9a to
fa616f0
Compare
|
/ok to test fa616f0 |
|
/merge |
972de08
into
rapidsai:release/26.06
Resolves #2087
What does this PR do?
There is a race condition on the std::optional
!handle_.has_value().While thread 1 works on assigning handle with mmap, thread 2 can race through and try to assign it as well. However, during reassignment it causes destructor of thread 1 to be called which calls unmap, leading to a segfault.
To solve this we use
std::onceso that other threads can't enter the critical section.std::onceis non reentrant, so we need to also refactor from using recursion to a while loop.std::onceis not movable, so we allocate it on the heap and use a unique_pointer to keep track of it instead.We need to ensure member variables of blob are movable since blob is used in a std::variant