Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[branch 5.2] Iterator invalidation in lsa_partition_reader::reset_state() #14696

Closed
michoecho opened this issue Jul 14, 2023 · 10 comments
Closed
Assignees
Labels
Milestone

Comments

@michoecho
Copy link
Contributor

do_refresh_state() keeps iterators to rows_entry in a vector.
This vector might be resized during the procedure, triggering memory reclaim and invalidating the iterators, which can cause arbitrarily long loops and/or a segmentation fault during make_heap().

@michoecho
Copy link
Contributor Author

michoecho added a commit to michoecho/scylla that referenced this issue Jul 14, 2023
…_state

do_refresh_state() keeps iterators to rows_entry in a vector.
This vector might be resized during the procedure, triggering
memory reclaim and invalidating the iterators, which can cause
arbitrarily long loops and/or a segmentation fault during make_heap().
To fix this, do_refresh_state has to always be called from the allocating
section.

Additionally, it turns out that the first do_refresh_state is useless,
because reset_state() doesn't set _change_mark. This causes do_refresh_state
to be needlessly repeated during a next_row() or next_range_tombstone() which
happens immediately after it. Therefore this patch moves the _change_mark
assignment from maybe_refresh_state to do_refresh_state, so that the change mark
is properly set even after the first refresh.

Fixes scylladb#14696
@michoecho michoecho self-assigned this Jul 14, 2023
@michoecho michoecho added type/bug Requires-Backport-to-5.1 backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed labels Jul 14, 2023
tgrabiec pushed a commit that referenced this issue Jul 17, 2023
…_state

do_refresh_state() keeps iterators to rows_entry in a vector.
This vector might be resized during the procedure, triggering
memory reclaim and invalidating the iterators, which can cause
arbitrarily long loops and/or a segmentation fault during make_heap().
To fix this, do_refresh_state has to always be called from the allocating
section.

Additionally, it turns out that the first do_refresh_state is useless,
because reset_state() doesn't set _change_mark. This causes do_refresh_state
to be needlessly repeated during a next_row() or next_range_tombstone() which
happens immediately after it. Therefore this patch moves the _change_mark
assignment from maybe_refresh_state to do_refresh_state, so that the change mark
is properly set even after the first refresh.

Fixes #14696

Closes #14697
@bhalevy
Copy link
Member

bhalevy commented Jul 23, 2023

@michoecho I understand that #14696 was merged to 5.2
Can you please document what fixed this issue in later release?

@scylladb/scylla-maint should we close this issue and add the Backport Candidate label to facilitate backport to 5.1?

@michoecho
Copy link
Contributor Author

Can you please document what fixed this issue in later release?

It was fixed by #12048, which rewrote the entire implementation of memtable readers (unifying it with partition_snapshot_row_cursor).

@denesb
Copy link
Contributor

denesb commented Jul 24, 2023

Fixed in 41aef6d (Gihub didn't auto-close the issue because the fix was targeted at 5.2).

@denesb denesb closed this as completed Jul 24, 2023
@denesb denesb added Backport candidate and removed backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed labels Jul 24, 2023
@denesb
Copy link
Contributor

denesb commented Jul 24, 2023

@scylladb/scylla-maint should we close this issue and add the Backport Candidate label to facilitate backport to 5.1?

Should we only backport to 5.1? How about enterprise? @michoecho what are the user-visible symptoms and how hard it is to trigger?

@michoecho
Copy link
Contributor Author

michoecho commented Jul 24, 2023

Should we only backport to 5.1? How about enterprise?

We should backport it to all supported versions, including enterprise.

@michoecho what are the user-visible symptoms

Segmentation fault.

how hard it is to trigger?

Very hard — it took us a few years to discover it.

But on the other hand, it was discovered when it happened in production, and it took down two nodes at once.

In other words, you have to be very unlucky to trigger it, but there are some workloads which make it likely enough.

@denesb
Copy link
Contributor

denesb commented Jul 24, 2023

I see no test attached to the fix. I presume this is very hard to reproduce in a unit test. Was this patch tested manually, perhaps on the cluster this bug affected?

@michoecho
Copy link
Contributor Author

I see no test attached to the fix. I presume this is very hard to reproduce in a unit test.

Yes, see #14697 (comment).

In general it's impossible to test for invalidation bugs reliably. This is a systemic problem. It would require additional LSA machinery to fix. For example, we could force all accesses to LSA to go through handles (rather than raw pointers/references) which would check (at least in debug mode) that no invalidation happened. (The key is that tests should also check for potential invalidation — any instance of allocation with reclaiming enabled, even if the allocation didn't happened to trigger reclaim).

Was this patch tested manually, perhaps on the cluster this bug affected?

The only "testing" here was me fooling around trying to write the unit test for this. The patch relied on review rather than testing.

You can't test it on the affected cluster, because it's not reproducing the bug reliably. It happened once — on two nodes, but only once — and didn't happen again.

@denesb
Copy link
Contributor

denesb commented Jul 24, 2023

Ok, let's let the fix see at least one OSS release, to make sure there are no adverse effects, then we can backport.

denesb pushed a commit that referenced this issue Aug 9, 2023
…_state

do_refresh_state() keeps iterators to rows_entry in a vector.
This vector might be resized during the procedure, triggering
memory reclaim and invalidating the iterators, which can cause
arbitrarily long loops and/or a segmentation fault during make_heap().
To fix this, do_refresh_state has to always be called from the allocating
section.

Additionally, it turns out that the first do_refresh_state is useless,
because reset_state() doesn't set _change_mark. This causes do_refresh_state
to be needlessly repeated during a next_row() or next_range_tombstone() which
happens immediately after it. Therefore this patch moves the _change_mark
assignment from maybe_refresh_state to do_refresh_state, so that the change mark
is properly set even after the first refresh.

Fixes #14696

Closes #14697

(cherry picked from commit 41aef6d)
@denesb
Copy link
Contributor

denesb commented Aug 9, 2023

Backported to 5.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants