New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[branch 5.2] Iterator invalidation in lsa_partition_reader::reset_state() #14696
Comments
…_state do_refresh_state() keeps iterators to rows_entry in a vector. This vector might be resized during the procedure, triggering memory reclaim and invalidating the iterators, which can cause arbitrarily long loops and/or a segmentation fault during make_heap(). To fix this, do_refresh_state has to always be called from the allocating section. Additionally, it turns out that the first do_refresh_state is useless, because reset_state() doesn't set _change_mark. This causes do_refresh_state to be needlessly repeated during a next_row() or next_range_tombstone() which happens immediately after it. Therefore this patch moves the _change_mark assignment from maybe_refresh_state to do_refresh_state, so that the change mark is properly set even after the first refresh. Fixes scylladb#14696
…_state do_refresh_state() keeps iterators to rows_entry in a vector. This vector might be resized during the procedure, triggering memory reclaim and invalidating the iterators, which can cause arbitrarily long loops and/or a segmentation fault during make_heap(). To fix this, do_refresh_state has to always be called from the allocating section. Additionally, it turns out that the first do_refresh_state is useless, because reset_state() doesn't set _change_mark. This causes do_refresh_state to be needlessly repeated during a next_row() or next_range_tombstone() which happens immediately after it. Therefore this patch moves the _change_mark assignment from maybe_refresh_state to do_refresh_state, so that the change mark is properly set even after the first refresh. Fixes #14696 Closes #14697
@michoecho I understand that #14696 was merged to 5.2 @scylladb/scylla-maint should we close this issue and add the |
It was fixed by #12048, which rewrote the entire implementation of memtable readers (unifying it with |
Fixed in 41aef6d (Gihub didn't auto-close the issue because the fix was targeted at 5.2). |
Should we only backport to 5.1? How about enterprise? @michoecho what are the user-visible symptoms and how hard it is to trigger? |
We should backport it to all supported versions, including enterprise.
Segmentation fault.
Very hard — it took us a few years to discover it. But on the other hand, it was discovered when it happened in production, and it took down two nodes at once. In other words, you have to be very unlucky to trigger it, but there are some workloads which make it likely enough. |
I see no test attached to the fix. I presume this is very hard to reproduce in a unit test. Was this patch tested manually, perhaps on the cluster this bug affected? |
Yes, see #14697 (comment). In general it's impossible to test for invalidation bugs reliably. This is a systemic problem. It would require additional LSA machinery to fix. For example, we could force all accesses to LSA to go through handles (rather than raw pointers/references) which would check (at least in debug mode) that no invalidation happened. (The key is that tests should also check for potential invalidation — any instance of allocation with reclaiming enabled, even if the allocation didn't happened to trigger reclaim).
The only "testing" here was me fooling around trying to write the unit test for this. The patch relied on review rather than testing. You can't test it on the affected cluster, because it's not reproducing the bug reliably. It happened once — on two nodes, but only once — and didn't happen again. |
Ok, let's let the fix see at least one OSS release, to make sure there are no adverse effects, then we can backport. |
…_state do_refresh_state() keeps iterators to rows_entry in a vector. This vector might be resized during the procedure, triggering memory reclaim and invalidating the iterators, which can cause arbitrarily long loops and/or a segmentation fault during make_heap(). To fix this, do_refresh_state has to always be called from the allocating section. Additionally, it turns out that the first do_refresh_state is useless, because reset_state() doesn't set _change_mark. This causes do_refresh_state to be needlessly repeated during a next_row() or next_range_tombstone() which happens immediately after it. Therefore this patch moves the _change_mark assignment from maybe_refresh_state to do_refresh_state, so that the change mark is properly set even after the first refresh. Fixes #14696 Closes #14697 (cherry picked from commit 41aef6d)
Backported to 5.1. |
do_refresh_state() keeps iterators to rows_entry in a vector.
This vector might be resized during the procedure, triggering memory reclaim and invalidating the iterators, which can cause arbitrarily long loops and/or a segmentation fault during make_heap().
The text was updated successfully, but these errors were encountered: