coredump during creating instance on azure on single node #10494

aleksbykov · 2022-05-05T15:42:20Z

Installation details

Kernel Version: 5.13.0-1022-azure
Scylla version (or git commit hash): 5.1.dev-20220504.b26a3da584cc with build-id ab2a33a30756c1513f4c516cd272291e75acec0e
Cluster size: 6 nodes (Standard_L8s_v2)

Scylla Nodes used in this run:

longevity-10gb-3h-master-db-node-835fbc85-eastus-8 (20.127.8.251 | 10.0.0.14) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-7 (20.120.98.177 | 10.0.0.14) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-6 (20.121.13.143 | 10.0.0.10) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-5 (20.121.13.124 | 10.0.0.9) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-4 (20.119.62.231 | 10.0.0.8) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-3 (20.119.59.43 | 10.0.0.7) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-2 (20.25.96.108 | 10.0.0.6) (shards: 8)
longevity-10gb-3h-master-db-node-835fbc85-eastus-1 (20.232.111.57 | 10.0.0.5) (shards: 8)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/ScyllaDB-5.1.dev-0.20220504.b26a3da584cc-1-build-148 (azure: eastus)

Test: longevity-10gb-3h-azure-test
Test id: 835fbc85-2bdf-46aa-a87d-04348bbbc1f8
Test name: scylla-master/longevity/longevity-10gb-3h-azure-test
Test config file(s):

longevity-10gb-3h.yaml

Issue description

Coredump happened creating node2 even before cluster configuration:

2022-05-04 07:24:32.864 <2022-05-04 07:13:27.000>: (CoreDumpEvent Severity.ERROR) period_type=one-time event_id=1fe38754-d16a-423e-9b40-9b0b73e0cf60 node=Node longevity-10gb-3h-master-db-node-835fbc85-eastus-2 [20.25.96.108 | 10.0.0.6] (seed: False)
corefile_url=https://storage.cloud.google.com/upload.scylladb.com/core.scylla.113.e9ea3c0c29724c5c8ff102ee668da941.5560.1651648407000000000000/core.scylla.113.e9ea3c0c29724c5c8ff102ee668da941.5560.1651648407000000000000.gz
backtrace=           PID: 5560 (scylla)
UID: 113 (scylla)
GID: 121 (scylla)
Signal: 11 (SEGV)
Timestamp: Wed 2022-05-04 07:13:27 UTC (9min ago)
Command Line: /usr/bin/scylla --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-7 --lock-memory=1
Executable: /opt/scylladb/libexec/scylla
Control Group: /scylla.slice/scylla-server.slice/scylla-server.service
Unit: scylla-server.service
Slice: scylla-server.slice
Boot ID: e9ea3c0c29724c5c8ff102ee668da941
Machine ID: 00776e0e9e334a95b9dd4a446cd6fbcd
Hostname: longevity-10gb-3h-master-db-node-eastus-2
Storage: /var/lib/systemd/coredump/core.scylla.113.e9ea3c0c29724c5c8ff102ee668da941.5560.1651648407000000000000
Message: Process 5560 (scylla) of user 113 dumped core.
Stack trace of thread 5567:
#0  0x00007f3e08dbc815 __memmove_avx_unaligned_erms (libc.so.6 + 0x163815)
#1  0x0000000002fe8ef4 _ZN7locator22production_snitch_base9set_my_dcERKN7seastar13basic_sstringIcjLj15ELb1EEE (scylla + 0x2de8ef4)
#2  0x0000000002f3282c _ZNSt17_Function_handlerIFN7seastar6futureIvEERN7locator10snitch_ptrEEZNS0_7shardedIS4_E13invoke_on_allIZNS3_12azure_snitch11load_configEvE3$_0JEEES2_NS0_21smp_submit_to_optionsET_DpT0_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ (scylla + 0x2d3282c)
#3  0x000000000110803b _ZN7seastar17smp_message_queue15async_work_itemIZZNS_7shardedIN7locator10snitch_ptrEE13invoke_on_allENS_21smp_submit_to_optionsESt8functionIFNS_6futureIvEERS4_EEENKUljE_clEjEUlvE_E15run_and_disposeEv (scylla + 0xf0803b)
#4  0x000000000466bd05 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x446bd05)
#5  0x000000000466d0e8 _ZN7seastar7reactor6do_runEv (scylla + 0x446d0e8)
#6  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#7  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#8  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#9  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5568:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5575:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5571:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5570:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae247 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae247)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5565:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#6  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#7  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#8  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5564:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#6  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#7  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#8  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5561:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#6  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#7  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#8  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5572:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5560:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000466c33d _ZN7seastar7reactor3runEv (scylla + 0x446c33d)
#6  0x0000000004613349 _ZN7seastar12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla + 0x4413349)
#7  0x0000000004612822 _ZN7seastar12app_template3runEiPPcOSt8functionIFNS_6futureIiEEvEE (scylla + 0x4412822)
#8  0x000000000105ef70 _ZL11scylla_mainiPPc (scylla + 0xe5ef70)
#9  0x000000000105c79b main (scylla + 0xe5c79b)
#10 0x00007f3e08c80b75 __libc_start_main (libc.so.6 + 0x27b75)
#11 0x000000000105b72e _start (scylla + 0xe5b72e)
Stack trace of thread 5574:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5573:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5562:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#6  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#7  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#8  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5563:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#6  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#7  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#8  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5569:
#0  0x00007f3e0985794c read (libpthread.so.0 + 0x1294c)
#1  0x00000000046ae285 _ZN7seastar11thread_pool4workENS_13basic_sstringIcjLj15ELb1EEE (scylla + 0x44ae285)
#2  0x00000000046ae5c0 _ZNSt17_Function_handlerIFvvEZN7seastar11thread_poolC1EPNS1_7reactorENS1_13basic_sstringIcjLj15ELb1EEEE3$_0E9_M_invokeERKSt9_Any_data (scylla + 0x44ae5c0)
#3  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#4  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#5  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
Stack trace of thread 5566:
#0  0x00007f3e08d53ddd syscall (libc.so.6 + 0xfaddd)
#1  0x00000000046b43f1 _ZN7seastar8internal13io_pgeteventsEmllPNS0_9linux_abi8io_eventEPK8timespecPK10__sigset_tb (scylla + 0x44b43f1)
#2  0x00000000046afff5 _ZN7seastar19reactor_backend_aio12await_eventsEiPK10__sigset_t (scylla + 0x44afff5)
#3  0x00000000046b0764 _ZN7seastar19reactor_backend_aio23wait_and_process_eventsEPK10__sigset_t (scylla + 0x44b0764)
#4  0x000000000466d46d _ZN7seastar7reactor6do_runEv (scylla + 0x446d46d)
#5  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#6  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#7  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#8  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)
download_instructions=gsutil cp gs://upload.scylladb.com/core.scylla.113.e9ea3c0c29724c5c8ff102ee668da941.5560.1651648407000000000000/core.scylla.113.e9ea3c0c29724c5c8ff102ee668da941.5560.1651648407000000000000.gz .
gunzip /var/lib/systemd/coredump/core.scylla.113.e9ea3c0c29724c5c8ff102ee668da941.5560.1651648407000000000000.gz

Restore Monitor Stack command: $ hydra investigate show-monitor 835fbc85-2bdf-46aa-a87d-04348bbbc1f8
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 835fbc85-2bdf-46aa-a87d-04348bbbc1f8

Logs:

db-cluster-835fbc85.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/db-cluster-835fbc85.tar.gz
loader-set-835fbc85.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/loader-set-835fbc85.tar.gz
sct-runner-835fbc85.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/sct-runner-835fbc85.tar.gz
parallel-timelines-report-835fbc85.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/835fbc85-2bdf-46aa-a87d-04348bbbc1f8/20220504_120807/parallel-timelines-report-835fbc85.tar.gz

Jenkins job URL

The text was updated successfully, but these errors were encountered:

asias · 2022-05-17T03:10:33Z

The crash is from azure snitch.

#0  0x00007f3e08dbc815 __memmove_avx_unaligned_erms (libc.so.6 + 0x163815)
#1  0x0000000002fe8ef4 _ZN7locator22production_snitch_base9set_my_dcERKN7seastar13basic_sstringIcjLj15ELb1EEE (scylla + 0x2de8ef4)
#2  0x0000000002f3282c _ZNSt17_Function_handlerIFN7seastar6futureIvEERN7locator10snitch_ptrEEZNS0_7shardedIS4_E13invoke_on_allIZNS3_12azure_snitch11load_configEvE3$_0JEEES2_NS0_21smp_submit_to_optionsET_DpT0_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_ (scylla + 0x2d3282c)
#3  0x000000000110803b _ZN7seastar17smp_message_queue15async_work_itemIZZNS_7shardedIN7locator10snitch_ptrEE13invoke_on_allENS_21smp_submit_to_optionsESt8functionIFNS_6futureIvEERS4_EEENKUljE_clEjEUlvE_E15run_and_disposeEv (scylla + 0xf0803b)
#4  0x000000000466bd05 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x446bd05)
#5  0x000000000466d0e8 _ZN7seastar7reactor6do_runEv (scylla + 0x446d0e8)
#6  0x000000000468bc96 _ZNSt17_Function_handlerIFvvEZN7seastar3smp9configureERKNS1_11smp_optionsERKNS1_15reactor_optionsEE4$_89E9_M_invokeERKSt9_Any_data (scylla + 0x448bc96)
#7  0x000000000463fa8b _ZN7seastar12posix_thread13start_routineEPv (scylla + 0x443fa8b)
#8  0x00007f3e0984e2a5 start_thread (libpthread.so.0 + 0x92a5)
#9  0x00007f3e08d59323 __clone (libc.so.6 + 0x100323)

Probably the regression is introduced by the following commits. @xemul Can you please check?

commit 633746b87d02a214fe86d8ee3507258119ed2496
Author: Pavel Emelyanov <xemul@scylladb.com>
Date:   Thu Apr 7 16:14:22 2022 +0300

    snitch: Make config-based construction of all drivers
    
    Currently snitch drivers register themselves in class-registry with all
    sorts of construction options possible. All those different constuctors
    are in fact "config options".
    
    When later snitch will declare its dependencies (gossiper and system
    keyspace), it will require patching all this registrations, which's very
    inconvenient.
    
    This patch introduces the snitch_config struct and replaces all the
    snitch constructors with the snitch_driver(snitch_config cfg) one.
    
    Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

commit 552a08ecd060aba2697b455b1a699c2efac3d2e8
Author: Pavel Emelyanov <xemul@scylladb.com>
Date:   Thu Apr 7 19:41:20 2022 +0300

    snitch: Introduce container() method
    
    Some snitch drivers want the peering_sharded_service::container()
    functionality, but they can't directly use it, because the driver
    class is in fact the pimplification behind the sharded<snitch_ptr>
    service. To overcome this there's a _my_distributed pointer on the
    driver base class that points back to sharded<snitch_ptr> object.
    
    This patch replaces the direct _my_distributed usage with the
    container() method that does it and also asserts that the pointer
    in question is initialized (some drivers already do it, some don't).
    
    Other than making the code more peering_sharded_service-like, this
    patch allows changing _my_distributed into _backreference that
    points to this shard's snitch_ptr, see next patch.
    
    Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

xemul · 2022-05-17T05:59:34Z

I cannot find logs that show how system behaved right before the crash. All that I see from it is that snitch initialized successfully. @aleksbykov , please point me to correct logs.

aleksbykov · 2022-05-17T06:00:46Z

The core happens even before we start collect any significant log data.

xemul · 2022-05-17T06:01:22Z

But scylla logs itself somewhere on start.

xemul · 2022-05-17T06:09:18Z

Actually, the azure snitch code loos weird from the very beginning (e44fa8d)

future<> azure_snitch::load_config() {
    ...
    _my_rack = azure_zone;
    _my_dc = azure_region;

    co_return co_await _my_distributed->invoke_on_all([this] (snitch_ptr& local_s) {
        if (this_shard_id() != io_cpu_id()) {
            local_s->set_my_dc(_my_dc);
            local_s->set_my_rack(_my_rack);
        }
    });
}

future<> azure_snitch::start() {
    return load_config().then(...);
}

It sets _my_rack and _my_dc on each shard then each shard goes and re-sets the same fields (set_my_... re-sets the corresponding _my_... field) with values from each shard's this. No wonder it crashes.

xemul · 2022-05-17T06:15:18Z

IOW

shard-1                shard-2                shard-3
_my_dc = foo;          _my_dc = foo;          _my_dc = foo;
invoke_on_all                                 invoke_on_all
            \--------> [my_dc = 1._my_dc]     |
1._my_dc = 3._my_dc <-------------------------/
old._my_dc.~sstring()  ...
                       2._my_dc = [my_dc]   <- the my_dc is already dead

slivne · 2022-05-19T11:09:15Z

@xemul - are you sending a patch to fix this ?

If we run a machine and give you access to it would that help ?

xemul · 2022-05-19T11:19:27Z

@slivne , I can prepare a patch, yes. I don't think I need a machine, it's a race + use-after-free that would be hard to trigger intentionally from my pov.

All snitch drivers are supposed to snitch info on some shard and replicate the dc/rack info across others. All, but azure really do so. The azure one gets dc/rack on all shards, which's excessive but not terrible, but when all shards start to replicate their data to all the others, this may lead to use-after-frees. fixes: scylladb#10494 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

slivne · 2022-05-25T11:26:12Z

@xemul ping

xemul · 2022-05-25T11:28:09Z

[PATCH 0/3] Fix snitching on Azure sent 20.05.2022

All snitch drivers are supposed to snitch info on some shard and replicate the dc/rack info across others. All, but azure really do so. The azure one gets dc/rack on all shards, which's excessive but not terrible, but when all shards start to replicate their data to all the others, this may lead to use-after-frees. fixes: #10494 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit c6d0bc8)

avikivity · 2022-07-17T11:22:41Z

Backported to 4.6, 5.0.

avikivity · 2022-07-17T11:23:25Z

Strangely, no "Backport candidate" label to remove.

soyacz mentioned this issue May 6, 2022

Core dump on startup (Azure) #10501

Closed

slivne assigned asias May 15, 2022

slivne added cloud/azure Azure related issues type/bug labels May 15, 2022

slivne added this to the 5.0 milestone May 15, 2022

xemul assigned xemul and unassigned asias May 19, 2022

scylladb-promoter closed this as completed in c6d0bc8 May 30, 2022

xemul mentioned this issue Jul 4, 2022

Scrub fails on Azure #10928

Closed

scylladb-promoter added the Backport candidate label Jul 21, 2022

avikivity removed the Backport candidate label Jul 25, 2022

DoronArazii modified the milestones: 5.0, 5.1 Nov 8, 2022

xemul mentioned this issue Dec 14, 2022

Move cloud snitches logic to scripts #12306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coredump during creating instance on azure on single node #10494

coredump during creating instance on azure on single node #10494

aleksbykov commented May 5, 2022

asias commented May 17, 2022

xemul commented May 17, 2022

aleksbykov commented May 17, 2022

xemul commented May 17, 2022

xemul commented May 17, 2022

xemul commented May 17, 2022

slivne commented May 19, 2022

xemul commented May 19, 2022

slivne commented May 25, 2022

xemul commented May 25, 2022

avikivity commented Jul 17, 2022

avikivity commented Jul 17, 2022

coredump during creating instance on azure on single node #10494

coredump during creating instance on azure on single node #10494

Comments

aleksbykov commented May 5, 2022

Installation details

Issue description

Logs:

asias commented May 17, 2022

xemul commented May 17, 2022

aleksbykov commented May 17, 2022

xemul commented May 17, 2022

xemul commented May 17, 2022

xemul commented May 17, 2022

slivne commented May 19, 2022

xemul commented May 19, 2022

slivne commented May 25, 2022

xemul commented May 25, 2022

avikivity commented Jul 17, 2022

avikivity commented Jul 17, 2022