cross-shard-barrier: Capture shared barrier in complete #11553
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When cross-shard barrier is abort()-ed it spawns a background fiber that will wake-up other shards (if they are sleeping) with exception.
This fiber is implicitly waited by the owning sharded service .stop, because barrier usage is like this:
If abort happens, the invoke_on_all() will only resolve after it queues up the waking lambdas into smp queues, thus the subseqent stop will queue its stopping lambdas after barrier's ones.
However, in debug mode the queue can be shuffled, so the owning service can suddenly be freed from under the barrier's feet causing use after free. Fortunately, this can be easily fixed by capturing the shared pointer on the shared barrier instead of a regular pointer on the shard-local barrier.
fixes: #11303
Signed-off-by: Pavel Emelyanov xemul@scylladb.com