New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredump after nodetool decomission (fixed via PR - compaction: Fix sstable cleanup after resharding on refresh) #14001
Comments
@avikivity , unrelated (I hope) to the crash itself, I've noticed we have a call to '__memcpy_avx_unaligned_erms' - don't we take care of aligning structs (as much as possible?)
|
@mykaul I don't know what '__memcpy_avx_unaligned_erms' signifies (Unaligned to avx? To quadwords? Doublewords?), but note that this piece of code is specifically in RPC serializer, and serializers are the place where unaligned writes/reads naturally happen.
Not really? Depends on what you mean. |
It's an assert failure. And the decoded stacktraces posted earlier are wrong. (addr2line expects addresses offset by 1 with respect to what coredumpctl (?) is printing. Here's the assert: Here's its real stacktrace:
Here's the assert's location in source code: |
@raphaelsc - please take a look |
@bhalevy / @raphaelsc please help with some first assessment |
happened again during the weekend runs, during
Coredump details
Installation detailsKernel Version: 5.19.0-1025-gcp Cluster size: 6 nodes (n1-highmem-16) Scylla Nodes used in this run:
OS / Image: `` (gce: undefined_region) Test: Logs and commands
Logs:
|
Decoded:
|
longevity-10gb-3h-master-db-node-0b6cacb3-0-1/messages.log:
It looks like:
|
I'm trying to reproduce this with a dtest (still to no avail) |
@Mark-Gurevich / @fruch what's the exact scenario in this test? |
Shouldn't that be visible in Argus? |
On Mon, Jun 5, 2023, 09:56 Benny Halevy ***@***.***> wrote:
I'm trying to reproduce this with a dtest (still to no avail)
Probably a regression. What is probably happening is resharding postponing
its work to cleanup but cleanup cannot work on shared SSTables. Reproducer
can be done with a 2 node cluster, RF 1. Add first node, write a few
sstables, add the second node, wait for streaming. Restart first node with
different smp count.
—
… Reply to this email directly, view it on GitHub
<#14001 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKYA4YMSUTNCX5GUGJFDUDXJXJQDANCNFSM6AAAAAAYL5M2GE>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
But the log message says that resharding was done successfully for the offending uploaded sstable.
We have resharding dtests woth different smp count and they didn't hit this. I tried the following scenario, but it passes reliably:
|
We're probably leaking resharding sstables into the cleanup set somehow. |
Turns out my theory is correct. See that reshard in distributed_loader.cc is updating cleanup state, but not clearing it. I don't even see why we need to insert resharding sstables into the cleanup set. We're forwarding the owned_ranges_ptr into compaciton descriptor anyway, so unowned tokens will be filtered out. It's not like we'll allow regular / cleanup compaction (whatever) to act in parallel on shared sstables. Suggested fix:
|
you can follow up the information in Argus to see which nemesis were running, rest of the information of what this test is doing, is part of it's configuration, in this running one specific c-s write command. in this case the cleanup was done as part of decommissioning a node sequence (i.e. after the node was decommissioned and new node was add) |
The sstable cleanup state is cleared when compaction is done.
The mechanics of tracking the cleanup state aren't specific to the compaction type, on purpose.
|
@fruch how come nodetool refresh at 09:33:25 target node is listed as node 10, but I see it on node 1? |
Actually it's not. That's resharding compaction which goes through compaction_manager::run_custom_job(). Output sstables aren't added to table's main set, nor we have the RAII (through compacting_sstable_registration) for automatically releasing sstables from cleanup set on completion. |
That snippet I mentioned above is incorrect, and I think it should be patched as I suggested. We don't need to mark shared sstables as requiring cleanup. Cleanup will happen if owned ranges is available, and that's propagated through compaction descriptor. We mark sstables for cleanup, as a way to allow different compaction types (usually regular) to operate on sstables needing cleanup, but shared sstables are restricted to resharding compaction. |
you're right that there's no need to add the sstables to |
I am not following the logs closely, but I am convinced that's the root cause. |
Sequence of events:
|
you're welcome :-) |
Awesome, can you please submit the fix and the test that reproduces it? I've also this typo fix: commit 94a0368bb0ae882b70f3eb34465e12df74b9d600
Author: Benny Halevy <bhalevy@scylladb.com>
Date: Mon Jun 5 12:18:46 2023 +0300
compaction_manager: compact_sstables: fix typo in log message about cleanup
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
diff --git a/compaction/compaction_manager.cc b/compaction/compaction_manager.cc
index e847e9b8f1..bc25221867 100644
--- a/compaction/compaction_manager.cc
+++ b/compaction/compaction_manager.cc
@@ -371,7 +371,7 @@ future<sstables::compaction_result> compaction_task_executor::compact_sstables(s
}
}
if (!sstables_requiring_cleanup.empty()) {
- cmlog.info("The following SSTables require cleaned up in this compaction: {}", sstables_requiring_cleanup);
+ cmlog.info("The following SSTables require cleanup in this compaction: {}", sstables_requiring_cleanup);
if (!cs.owned_ranges_ptr) {
on_internal_error_noexcept(cmlog, "SSTables require cleanup but compaction state has null owned ranges");
} |
Also, there's: scylladb/replica/distributed_loader.cc Lines 160 to 161 in d2e0897
We probably should use |
Good point. |
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes scylladb#14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@scylladb/scylla-maint please backport to 5.3 |
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes scylladb#14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb#14147 (cherry picked from commit 156d771) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes #14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14147 (cherry picked from commit 156d771)
5.3 backport queued as fa689c8. |
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes scylladb#14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb#14147 (cherry picked from commit 156d771) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@bhalevy this weekly run, had very similar coredump during cleanup after decomission:
Coredump
Installation detailsKernel Version: 5.15.0-1036-gcp Cluster size: 6 nodes (n1-highmem-16) Scylla Nodes used in this run:
OS / Image: `` (gce: undefined_region) Test: Logs and commands
Logs:
|
@fruch can you please open a new issue with that. |
@bhalevy I am off this morning, but I'd check if we are also not leaking sstables when running off-strategy compaction. If you're busy, I can pick it up once I am back. |
I'll open the issue then. |
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes scylladb#14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb#14147 (cherry picked from commit 156d771) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes scylladb#14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb#14147 (cherry picked from commit 156d771) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: scylladb/scylla-dtest#3206 Fixes scylladb#14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb#14147 (cherry picked from commit 156d771) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
@bhalevy what's the backport status? |
@avikivity |
Download instructions:
download_instructions=gsutil cp gs://upload.scylladb.com/core.scylla.113.5b12a47580bd4dd288d51b116a2ff667.12344.1682673563000000/core.scylla.113.5b12a47580bd4dd288d51b116a2ff667.12344.1682673563000000.gz . gunzip /var/lib/systemd/coredump/core.scylla.113.5b12a47580bd4dd288d51b116a2ff667.12344.1682673563000000.gz
The coredump:
Issue description
Describe your issue in detail and steps it took to produce it.
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Kernel Version: 5.19.0-1022-gcp
Scylla version (or git commit hash):
2023.2.0~dev-20230427.e4ffb1fcf18f
with build-id6233f9b12c6c2204bcbc461bfe8c69d8f65811a5
Cluster size: 6 nodes (n1-highmem-16)
Scylla Nodes used in this run:
OS / Image:
https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/900493631077932048
(gce: us-east1)Test:
longevity-10gb-3h-gce-test
Test id:
6813e59a-bfbb-46c9-92f5-a8f07246108a
Test name:
scylla-enterprise/longevity/longevity-10gb-3h-gce-test
Test config file(s):
Logs and commands
$ hydra investigate show-monitor 6813e59a-bfbb-46c9-92f5-a8f07246108a
$ hydra investigate show-logs 6813e59a-bfbb-46c9-92f5-a8f07246108a
Logs:
Jenkins job URL
Argus
The text was updated successfully, but these errors were encountered: