New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latency degradation during decommission nodes on write workload - ~77% degradation #14537
Comments
This was referenced Aug 21, 2023
bhalevy
added a commit
to bhalevy/scylla
that referenced
this issue
Aug 21, 2023
Although to_repair_rows_list may yield if needed between rows and mutation fragments, the input `repair_rows_on_wire` is freed in one shot and that may cause stalls as seen in qa: ``` | bytes_ostream::free_chain at ././bytes_ostream.hh:163 ++ - addr=0x4103be0: | bytes_ostream::~bytes_ostream at ././bytes_ostream.hh:199 | (inlined by) frozen_mutation_fragment::~frozen_mutation_fragment at ././mutation/frozen_mutation.hh:273 | (inlined by) std::destroy_at<frozen_mutation_fragment> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77 | (inlined by) std::__cxx11::_List_base<frozen_mutation_fragment, std::allocator<frozen_mutation_fragment> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575 | (inlined by) partition_key_and_mutation_fragments::~partition_key_and_mutation_fragments at ././repair/repair.hh:203 | (inlined by) std::destroy_at<partition_key_and_mutation_fragments> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77 | (inlined by) std::__cxx11::_List_base<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575 | (inlined by) to_repair_rows_list at ./repair/row_level.cc:597 ``` This change consumes the rows and frozen mutation fragments incrementally, freeing each after being processed. Fixes scylladb#14537 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Last change in this area in to_repair_rows_list was @avikivity's coroutinization done in e482cb1 |
bhalevy
added a commit
to bhalevy/scylla
that referenced
this issue
Aug 21, 2023
Although to_repair_rows_list may yield if needed between rows and mutation fragments, the input `repair_rows_on_wire` is freed in one shot and that may cause stalls as seen in qa: ``` | bytes_ostream::free_chain at ././bytes_ostream.hh:163 ++ - addr=0x4103be0: | bytes_ostream::~bytes_ostream at ././bytes_ostream.hh:199 | (inlined by) frozen_mutation_fragment::~frozen_mutation_fragment at ././mutation/frozen_mutation.hh:273 | (inlined by) std::destroy_at<frozen_mutation_fragment> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77 | (inlined by) std::__cxx11::_List_base<frozen_mutation_fragment, std::allocator<frozen_mutation_fragment> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575 | (inlined by) partition_key_and_mutation_fragments::~partition_key_and_mutation_fragments at ././repair/repair.hh:203 | (inlined by) std::destroy_at<partition_key_and_mutation_fragments> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77 | (inlined by) std::__cxx11::_List_base<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575 | (inlined by) to_repair_rows_list at ./repair/row_level.cc:597 ``` This change consumes the rows and frozen mutation fragments incrementally, freeing each after being processed. Fixes scylladb#14537 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
avikivity
added a commit
that referenced
this issue
Aug 22, 2023
This sort series deals with two stall sources in row-level repair `to_repair_rows_list`: 1. Freeing the input `repair_rows_on_wire` in one shot on return (as seen in #14537) 2. Freeing the result `row_list` in one shot on error. this hasn't been seen in testing but I have no reason to believe it is not susceptible to stalls exactly like repair_rows_on_wire with the same number of rows and mutations. Fixes #14537 Closes #15102 * github.com:scylladb/scylladb: repair: reindent to_repair_rows_list repair: to_repair_rows_list: clear_gently on error repair: to_repair_rows_list: consume frozen rows gently
raphaelsc
pushed a commit
to raphaelsc/scylla
that referenced
this issue
Aug 29, 2023
Although to_repair_rows_list may yield if needed between rows and mutation fragments, the input `repair_rows_on_wire` is freed in one shot and that may cause stalls as seen in qa: ``` | bytes_ostream::free_chain at ././bytes_ostream.hh:163 ++ - addr=0x4103be0: | bytes_ostream::~bytes_ostream at ././bytes_ostream.hh:199 | (inlined by) frozen_mutation_fragment::~frozen_mutation_fragment at ././mutation/frozen_mutation.hh:273 | (inlined by) std::destroy_at<frozen_mutation_fragment> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77 | (inlined by) std::__cxx11::_List_base<frozen_mutation_fragment, std::allocator<frozen_mutation_fragment> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575 | (inlined by) partition_key_and_mutation_fragments::~partition_key_and_mutation_fragments at ././repair/repair.hh:203 | (inlined by) std::destroy_at<partition_key_and_mutation_fragments> at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_construct.h:88 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/alloc_traits.h:537 | (inlined by) ?? at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/list.tcc:77 | (inlined by) std::__cxx11::_List_base<partition_key_and_mutation_fragments, std::allocator<partition_key_and_mutation_fragments> >::~_List_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_list.h:575 | (inlined by) to_repair_rows_list at ./repair/row_level.cc:597 ``` This change consumes the rows and frozen mutation fragments incrementally, freeing each after being processed. Fixes scylladb#14537 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Performance only, not backporting. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Installation details
Scylla version (or git commit hash):
5.4.0~dev.20230629.f6f974cdeb11 with build-id 7afc85749bdc68e7ee32eead35d51badd480c79f
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI):
ami-0b7891d4fe168d4b1
(eu-west-1
)test_id: 4493c5fb-0803-47a0-a6b1-9e86837a2019
based on previous runs of this operation, we have a degradation of
77% in the latency (this run haddev, and4.74 ms
(this is the AVG of3.32
,4.67
and6.23
, while previous one was2.68 ms
for 5.3.03.92 ms
for 5.4.0~dev)we have 308 stalls during the 3 decommission nodes, where they started and ended as:
decoded:
logs can be found:
db_logs
loader_logs
monitor_logs
sct_logs
The text was updated successfully, but these errors were encountered: