deadlock in write _thread linkone function during stall #637

ayulas · 2023-08-15T20:55:08Z

deadlock stall

ayulas · 2023-08-16T12:28:03Z

there is a race that wbm stall doesnt trigger the flush so the mem is not free. therefor the stall will never ended and all the writes are stucked

in the current machine thread 58 is the wbm stall:

(gdb) bt
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7f632d86dec8) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f632d86de78, cond=0x7f632d86dea0) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7f632d86dea0, mutex=0x7f632d86de78) at pthread_cond_wait.c:647
#3 0x00005632819bffe1 in rocksdb::port::CondVar::Wait (this=) at port/port_posix.cc:119
#4 0x00005632817dfea1 in rocksdb::DBImpl::WBMStallInterface::Block (this=0x7f632d86de70) at ./db/db_impl/db_impl.h:1249
#5 rocksdb::DBImpl::WriteBufferManagerStallWrites (this=0x7f632d87ec00) at db/db_impl/db_impl_write.cc:1932
#6 0x00005632817e54fa in rocksdb::DBImpl::PreprocessWrite (this=, write_options=..., log_context=0x7f62c85d5660, write_context=) at db/db_impl/db_impl_write.cc:1228
#7 0x00005632817e99de in rocksdb::DBImpl::WriteImpl (this=0x7f632d87ec00, write_options=..., my_batch=, callback=, log_used=, log_ref=, disable_memtable=,
seq_used=, batch_cnt=, pre_release_callback=, post_memtable_callback=) at db/db_impl/db_impl_write.cc:391
#8 0x00005632817ead44 in rocksdb::DBImpl::Write (this=0x7f632d87ec00, write_options=..., my_batch=0x7f62c85d5da0) at db/db_impl/db_impl_write.cc:146
#9 0x00005632817eb07a in rocksdb::DB::Put (this=0x7f632d87ec00, opt=..., column_family=0x7f6322bc29c0, key=..., value=...) at db/db_impl/db_impl_write.cc:2349
#10 0x00005632817eb204 in rocksdb::DBImpl::Put (this=this@entry=0x7f632d87ec00, o=..., column_family=0x7f6322bc29c0, key=..., val=...) at db/db_impl/db_impl_write.cc:28
#11 0x00005632816f1a84 in rocksdb::Benchmark::ReadRandomWriteRandom (this=0x7f632d893a00, thread=0x7f6321502400) at tools/db_bench_tool.cc:7974
#12 0x00005632816e73d6 in rocksdb::Benchmark::ThreadBody (v=0x7f632d927668) at tools/db_bench_tool.cc:4210
#13 0x00005632819127c9 in rocksdb::(anonymous namespace)::StartThreadWrapper (arg=0x7f632d868c70) at env/env_posix.cc:461
#14 0x00007f632e3e2609 in start_thread (arg=) at pthread_create.c:477
#15 0x00007f632de8e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

ongoing writes are stuck on that stall

(gdb) bt
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x7f632d87fb50) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f632d87fb00, cond=0x7f632d87fb28) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x7f632d87fb28, mutex=0x7f632d87fb00) at pthread_cond_wait.c:647
#3 0x00005632819bffe1 in rocksdb::port::CondVar::Wait (this=) at port/port_posix.cc:119
#4 0x00005632818f538c in rocksdb::WriteThread::LinkOne (this=0x7f632d87f9c0, w=0x7f62c7dd47a0, newest_writer=0x7f632d87f9e8) at db/write_thread.cc:248
#5 0x00005632818f589b in rocksdb::WriteThread::JoinBatchGroup (this=0x7f632d87f9c0, w=0x7f62c7dd47a0) at db/write_thread.cc:405
#6 0x00005632817e9520 in rocksdb::DBImpl::WriteImpl (this=0x7f632d87ec00, write_options=..., my_batch=, callback=0x0, log_used=0x0, log_ref=0, disable_memtable=false, seq_used=0x0, batch_cnt=0, pre_release_callback=0x0,
post_memtable_callback=0x0) at db/db_impl/db_impl_write.cc:318
#7 0x00005632817ead44 in rocksdb::DBImpl::Write (this=0x7f632d87ec00, write_options=..., my_batch=0x7f62c7dd4da0) at db/db_impl/db_impl_write.cc:146
#8 0x00005632817eb07a in rocksdb::DB::Put (this=0x7f632d87ec00, opt=..., column_family=0x7f6322bc28c0, key=..., value=...) at db/db_impl/db_impl_write.cc:2349
#9 0x00005632817eb204 in rocksdb::DBImpl::Put (this=this@entry=0x7f632d87ec00, o=..., column_family=0x7f6322bc28c0, key=..., val=...) at db/db_impl/db_impl_write.cc:28
#10 0x00005632816f1a84 in rocksdb::Benchmark::ReadRandomWriteRandom (this=0x7f632d893a00, thread=0x7f632150a000) at tools/db_bench_tool.cc:7974
#11 0x00005632816e73d6 in rocksdb::Benchmark::ThreadBody (v=0x7f632d927690) at tools/db_bench_tool.cc:4210
#12 0x00005632819127c9 in rocksdb::(anonymous namespace)::StartThreadWrapper (arg=0x7f632d868c80) at env/env_posix.cc:461
#13 0x00007f632e3e2609 in start_thread (arg=) at pthread_create.c:477
#14 0x00007f632de8e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

there are 32 cfs but the memtable are not full. having said that because the data is spread on all cfs the wbm reached the threshold to start stall... BUT no flush activity was called!!! so we will be stuck forever! the proactive flush should have work here but it looks it wasnot triggered.....

the wbm value is:

gdb) p $14
$15 = {static kDfltStartDelayPercentThreshold = 70, static kNoDelayedWriteFactor = 0, static kMaxDelayedWriteFactor = 100, static kStopDelayedWriteFactor = 100, static kStartFlushPercentThreshold = 80, static kDfltAllowStall = false,
static kDfltInitiateFlushes = true, static kNoneCodedUsageState = 0, static kStopCodedUsageState = 101, coded_usage_state_ = {<std::__atomic_base> = {static S_alignment = 8, M_i = 101}, static is_always_lock_free = true},
controllers_to_refcount_map = std::unordered_map with 1 element = {[std::shared_ptrrocksdb::WriteController (use count 70, weak count 0) = {get() = 0x7f632d824140}] = 1}, controllers_map_mutex = {std::__mutex_base = {_M_mutex = {
__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, _align = 0}}, },
**buffer_size = {<std::__atomic_base> = {static _S_alignment = 8, M_i = 1073741824}**, static is_always_lock_free = true}, **mutable_limit = {<std::__atomic_base> = {static _S_alignment = 8, M_i = 939524096}**,
static is_always_lock_free = true}, **memory_used = {<std::__atomic_base> = {static _S_alignment = 8, M_i = 1073807360}**, static is_always_lock_free = true}, memory_inactive = {<std::__atomic_base> = {
static _S_alignment = 8, M_i = 0}, static is_always_lock_free = true}, memory_being_freed = {<std::__atomic_base> = {static S_alignment = 8, M_i = 0}, static is_always_lock_free = true},
cache_res_mgr = std::shared_ptrrocksdb::CacheReservationManager (use count 1, weak count 1) = {get() = 0x7f632d8c9890}, cache_res_mgr_mu = {std::__mutex_base = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, _align = 0}}, }, queue = std::_cxx11::list = {[0] = 0x7f632d86de70}, mu = {std::__mutex_base = {
_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, _size = '\000' <repeats 39 times>, align = 0}}, },
allow_stall = true, start_delay_percent = 70, stall_active = {M_base = {static S_alignment = 1, M_i = true}, static is_always_lock_free = true}, static kInvalidInitiatorIdx = 18446744073709551615, initiate_flushes = true,
flush_initiation_options = {static kDfltMaxNumParallelFlushes = 4, max_num_parallel_flushes = 4}, flush_initiators = std::vector of length 1, capacity 1 = {{initiator = 0x7f632d87ec00,
cb = {<std::_Maybe_unary_or_binary_function<bool, unsigned long>> = {<std::unary_function<unsigned long, bool>> = {}, }, std::_Function_base = {static _M_max_size = 16, static _M_max_align = 8,
_M_functor = {_M_unused = {_M_object = 0x7f632d87ec00, _M_const_object = 0x7f632d87ec00, _M_function_pointer = 0x7f632d87ec00,
_M_member_pointer = (void (std::_Undefined_class::)(std::_Undefined_class * const)) 0x7f632d87ec00, this adjustment 140063942374984}, _M_pod_data = "\000\354\207-c\177\000\000H\366\207-c\177\000"}, _M_manager = 0x5632817b6920
<std::_Function_base::_Base_manager<rocksdb::DBImpl::Open(const rocksdb::DBOptions&, const string&, const std::vectorrocksdb::ColumnFamilyDescriptor&, std::vectorrocksdb::ColumnFamilyHandle*, rocksdb::DB**, bool, bool)::<lambda(size_t)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>},
_M_invoker = 0x5632817b69c0 <std::_Function_handler<bool(long unsigned int), rocksdb::DBImpl::Open(const rocksdb::DBOptions&, const string&, const std::vectorrocksdb::ColumnFamilyDescriptor&, std::vectorrocksdb::ColumnFamilyHandle*, rocksdb::DB**, bool, bool)::<lambda(size_t)> >::M_invoke(const std::Any_data &, unsigned long &&)>}}}, next_candidate_initiator_idx = 0, num_running_flushes = {<std::__atomic_base> = {static S_alignment = 8, M_i = 0},
static is_always_lock_free = true}, num_flushes_to_initiate = {<std::atomic_base> = {static S_alignment = 8, M_i = 4}, static is_always_lock_free = true}, flush_initiation_start_size = 214748364,
additional_flush_step_size = 214748364, additional_flush_initiation_size = {<std::atomic_base> = {static S_alignment = 8, M_i = 1073741824}, static is_always_lock_free = true}, min_mutable_flush_size = 67108864,
flushes_mu = std::unique_ptrrocksdb::InstrumentedMutex = {get() = 0x7f632d8ae7c0}, flushes_initiators_mu = std::unique_ptrrocksdb::InstrumentedMutex = {get() = 0x7f632d8ae840},
flushes_wakeup_cv = std::unique_ptrrocksdb::InstrumentedCondVar = {get() = 0x7f632d847aa0}, new_flushes_wakeup = false, flushes_thread = {_M_id = {M_thread = 140063937656576}}, terminate_flushes_thread = false}
as you can see there is no flush in progress and also you can see that used memory cross the limit
min_mutable_flush_size = 67108864

ayulas · 2023-08-16T14:02:15Z

@udi-speedb enjoy:-) machine is up for now with gdb 9.150.1.17

erez-speedb · 2023-08-22T07:56:06Z

@udi-speedb @ayulas Can the instance be destroyed?

ayulas · 2023-08-22T09:40:20Z

I did some work on it so tomorrow is ok ?

erez-speedb · 2023-09-28T05:48:07Z

@ayulas no linked PR yet

ayulas · 2023-09-28T08:01:19Z

There is

Guyme · 2023-10-29T09:51:35Z

@ayulas - status please.
also please verify that #683 is indeed the code for this deadlock (the content seams to be something else - history updates referes #653)

ayulas self-assigned this Aug 15, 2023

ayulas assigned udi-speedb Aug 16, 2023

udi-speedb removed their assignment Sep 5, 2023

Guyme linked a pull request Oct 17, 2023 that will close this issue

637 stall deadlock #683

Closed

Guyme mentioned this issue Oct 17, 2023

637 stall deadlock #683

Closed

Guyme mentioned this issue Oct 30, 2023

Running with multiple groups resulted in hang. #285

Open

Guyme linked a pull request Oct 30, 2023 that will close this issue

Stall deadlock consists small cfs #737

Merged

ayulas closed this as completed in #737 Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deadlock in write _thread linkone function during stall #637

deadlock in write _thread linkone function during stall #637

ayulas commented Aug 15, 2023 •

edited

Loading

ayulas commented Aug 16, 2023 •

edited

Loading

ayulas commented Aug 16, 2023

erez-speedb commented Aug 22, 2023

ayulas commented Aug 22, 2023

erez-speedb commented Sep 28, 2023

ayulas commented Sep 28, 2023 •

edited

Loading

Guyme commented Oct 29, 2023

deadlock in write _thread linkone function during stall #637

deadlock in write _thread linkone function during stall #637

Comments

ayulas commented Aug 15, 2023 • edited Loading

ayulas commented Aug 16, 2023 • edited Loading

ayulas commented Aug 16, 2023

erez-speedb commented Aug 22, 2023

ayulas commented Aug 22, 2023

erez-speedb commented Sep 28, 2023

ayulas commented Sep 28, 2023 • edited Loading

Guyme commented Oct 29, 2023

ayulas commented Aug 15, 2023 •

edited

Loading

ayulas commented Aug 16, 2023 •

edited

Loading

ayulas commented Sep 28, 2023 •

edited

Loading