Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cpu usage on some threads after the "nodetool drain" command on linux kernels 3.x / 4.x #13377

Closed
mark-bb opened this issue Mar 30, 2023 · 2 comments
Milestone

Comments

@mark-bb
Copy link

mark-bb commented Mar 30, 2023

Installation details
Scylla version: 5.1.5 Open Source; 4.6.x versions are affected as well.
OS (RHEL/CentOS/Ubuntu/AWS AMI): RHEL/CentOS/Ubuntu - any of them on kernels 3.x / 4.x, but not on 5.x.

We see high cpu usage on some scylladb threads after the "nodetool drain" command.
The problem is easily reproducible.
We believe that it's some unexpected behavior.

Below are the examples of some OS commands to compare.

top [-1] -H -n1 -b -p $(pidof scylla)

Linux kernels 3.x / 4.x
Ubuntu 18.04, Centos 7/8, RHEL 8.1

Threads:  12 total,   1 running,  11 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.8 us,  9.8 sy,  0.0 ni, 73.8 id,  0.0 wa,  0.0 hi,  1.6 si,  0.0 st
KiB Mem :  3861256 total,  3041724 free,   455896 used,   363636 buff/cache
KiB Swap:  4063228 total,  4063228 free,        0 used.  3171324 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  8993 scylla    20   0   16.0t 201992  32912 R 93.3  5.2   1:46.35 scylla    <--
  8994 scylla    20   0   16.0t 201992  32912 S  0.0  5.2   0:01.36 reactor-1
  ...

Linux kernels 5.x
Ubuntu 20.04, Centos 7 (5.x kernel is installed manually)

Threads:   8 total,   0 running,   8 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  1.7 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3990136 total,  3286472 free,   406180 used,   297484 buff/cache
KiB Swap:  4063228 total,  4063228 free,        0 used.  3354412 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
  1263 scylla    20   0   16.0t 239768  65260 S  0.0  6.0   0:02.38 scylla
  1265 scylla    20   0   16.0t 239768  65260 S  0.0  6.0   0:01.70 reactor-1
  ...

On distros with the top -1 option available we see that first 1 or 2 threads are 100% busy.
The situation is slightly different in a multi-node environment:
On the drained node the reactor-1 thread consumes 100% cpu as well (2 theads are 100% busy in this case).
But not on other nodes, where the main scylla process consumes 100% of cpu only.

strace -p $(pidof scylla) -c

Linux kernels 3.x / 4.x
Ubuntu 18.04, Centos 7/8, RHEL 8.1

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.58    4.569387           3   1359826           epoll_pwait
  0.22    0.010306          20       510           write
  0.16    0.007365           6      1115           timerfd_settime
  0.02    0.000973          10        97           timer_settime
  0.01    0.000638           6        95           rt_sigreturn
  0.00    0.000046           5         8           rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00    4.588715               1361651           total

Linux kernels 5.x
Ubuntu 20.04, Centos 7 (5.x kernel is installed manually)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 71.14    0.692684         507      1365           io_pgetevents
  6.17    0.060087          12      4635           timerfd_settime
  5.31    0.051699          27      1859           io_submit
  4.71    0.045870          57       794           write
  3.95    0.038451          20      1901       182 read
  3.46    0.033645          22      1510           membarrier
  2.78    0.027074           9      2886           rt_sigprocmask
  2.48    0.024196          14      1702           timer_settime
------ ----------- ----------- --------- --------- ----------------
100.00    0.973706                 16652       182 total
@mark-bb
Copy link
Author

mark-bb commented Apr 28, 2023

Hello.
Is there any news on this?

@mark-bb mark-bb closed this as completed May 5, 2023
@mykaul mykaul closed this as not planned Won't fix, can't repro, duplicate, stale May 7, 2023
@DoronArazii DoronArazii added this to the 5.3 milestone May 7, 2023
tzach referenced this issue Dec 5, 2023
* seastar 830ce8673...55a821524 (34):
  > Revert "reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc"
  > epoll: Avoid spinning on aborted connections
Fixes #12774
Fixes #7753
Fixes #13337
  > Merge 'Sanitize test-only reactor facilities' from Pavel Emelyanov
  > test/unit: fix fmt version check
  > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc
  > build: add spaces before () and after commands
  > reactor: use zero-initialization to initialize io_uring_params
  > Merge 'build: do not return a non-false condition if the option is off ' from Kefu Chai
  > memory: do not use variable length array
  > build: use tri_state_option() to link against Sanitizers
  > build: do not define SEASTAR_TYPE_ERASE_MORE on all builds
  > Revert "shared_future: make available() immediate after set_value()"
  > test_runner: do not throw when seastar.app fails to start
  > Merge 'Address issue where Seastar faults in toeplitz hash when reassembling fragment' from John Hester
  > defer, closeable: do not use [[nodiscard(str)]]
  > Merge 'build: generate config-specific rules using generator expressions' from Kefu Chai
  > treewide: use *_v and *_t for better readability
  > build: use different names for .pc files for each build mode
  > perftune.py: skip discovering IRQs for iSCSI disks
  > io-tester: explicit use uint64_t for boost::irange(...)
  > gate: correct the typo in doxygen comment
  > shared_future: make available() immediate after set_value()
  > smp: drop unused templates
  > include fmt/ostream.h to make headers self-sufficient
  > Support ccache in ./configure.py
  > rpc_tester: Disable -Wuninitialized when including boost.accumulators
  > file: construct directory_entry with aggregated ctor
  > file: s/ino64_t/ino_t/, s/off64_t/off_t/
  > sstring_test: include fmt/std.h only if fmtlib >= 10.0.0
  > file: do not include coroutine headers if coroutine is disabled
  > fair_queue::unregister_priority_class:fix assertion
  > Merge 'Generalize `net::udp_channel` into `net::datagram_channel`' from Michał Sala
  > Merge 'Add file::list_directory() that co_yields entries' from Pavel Emelyanov
  > http/file_handler: remove unnecessary cast

Closes #16201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants