Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] redis sentinel 100% cpu usage #9956

Open
sedflix opened this issue Dec 17, 2021 · 36 comments
Open

[BUG] redis sentinel 100% cpu usage #9956

sedflix opened this issue Dec 17, 2021 · 36 comments

Comments

@sedflix
Copy link

sedflix commented Dec 17, 2021

Describe the bug
redis sentinel is having 100% cpu usage. we are using redis-ha helm chart version 4.12.15 under default configuration but we have increased resource requests and limits by a significant amount. The redis version used by us is 6.0.7.

To reproduce

I'm not sure if it can be reproduced by anyone else.
Even after redis restart, one or two sentinel reaches 100% CPU utilization.
Other redis metrics are fine and seems okay.

Expected behavior
Sentinel should be taking so much CPU.

Additional information

I ran strace on one of the sentinel and found this:

1 09:20:08 recvfrom(11, 0x7ffd55fab4a0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
1 09:20:08 recvfrom(10, 0x7ffd55fab4a0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
1 09:20:08 epoll_pwait(5, [{events=EPOLLERR, data={u32=11, u64=11}}, {events=EPOLLERR, data={u32=10, u64=10}}], 10128, 8, NULL, 😎 = 2

This seems like a repeating pattern; where redis is trying to read from two sockets and failing continuously and this followed by a epoll_wait. It happens more than 100x times in a second.

socket 11 and 10 seems to point to a connection with the redis process for this sentinel. To be exact it points to the ClusterIP of the redis-cluster-announce pod on which sentinel was running and the port is 6379.

redis

# Server
redis_version:6.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4b6e03cc4dd13d7e
redis_mode:standalone
os:Linux 5.10.68-62.173.amzn2.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:1
run_id:9e5a59931ada7e4b1198b6a5beb98defb5834cc0
tcp_port:6379
uptime_in_seconds:66843
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12347300
executable:/data/redis-server
config_file:/data/conf/redis.conf
io_threads_active:0

# Clients
connected_clients:8
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:2727256
used_memory_human:2.60M
used_memory_rss:8065024
used_memory_rss_human:7.69M
used_memory_peak:2981024
used_memory_peak_human:2.84M
used_memory_peak_perc:91.49%
used_memory_overhead:2137792
used_memory_startup:803016
used_memory_dataset:589464
used_memory_dataset_perc:30.63%
allocator_allocated:2827152
allocator_active:3174400
allocator_resident:6668288
total_system_memory:66245685248
total_system_memory_human:61.70G
used_memory_lua:48128
used_memory_lua_human:47.00K
used_memory_scripts:3072
used_memory_scripts_human:3.00K
number_of_cached_scripts:4
maxmemory:0
maxmemory_human:0B
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.12
allocator_frag_bytes:347248
allocator_rss_ratio:2.10
allocator_rss_bytes:3493888
rss_overhead_ratio:1.21
rss_overhead_bytes:1396736
mem_fragmentation_ratio:2.96
mem_fragmentation_bytes:5339328
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:135888
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:23
rdb_bgsave_in_progress:0
rdb_last_save_time:1639737083
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:524288
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:20058
total_commands_processed:491483
instantaneous_ops_per_sec:5
total_net_input_bytes:33351575
total_net_output_bytes:275568645
instantaneous_input_kbps:0.54
instantaneous_output_kbps:1.59
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:331
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:501438
total_writes_processed:1017489
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Replication
role:slave
master_host:172.20.61.168
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:226855520
slave_priority:100
slave_read_only:1
connected_slaves:0
min_slaves_good_slaves:0
master_replid:6d50a55ecc1d386809fc186708bcc56db6f92765
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:226855520
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:225806945
repl_backlog_histlen:1048576

# CPU
used_cpu_sys:81.727130
used_cpu_user:54.375754
used_cpu_sys_children:0.204695
used_cpu_user_children:0.501349

# Modules

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=2861,expires=0,avg_ttl=0

sentinel

127.0.0.1:26379> info
# Server
redis_version:6.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4b6e03cc4dd13d7e
redis_mode:sentinel
os:Linux 5.10.68-62.173.amzn2.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.3.0
process_id:1
run_id:0919ce29eb1b7f9dce36cea0a2eb73bf1740939c
tcp_port:26379
uptime_in_seconds:66926
uptime_in_days:0
hz:11
configured_hz:10
lru_clock:12347383
executable:/data/redis-sentinel
config_file:/data/conf/sentinel.conf
io_threads_active:0

# Clients
connected_clients:4
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# CPU
used_cpu_sys:30052.725830
used_cpu_user:9940.032902
used_cpu_sys_children:0.000000
used_cpu_user_children:0.005161

# Stats
total_connections_received:8925
total_commands_processed:200173
instantaneous_ops_per_sec:5
total_net_input_bytes:11111722
total_net_output_bytes:1205818
instantaneous_input_kbps:0.35
instantaneous_output_kbps:0.03
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:772
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_reads_processed:204104
total_writes_processed:195182
io_threaded_reads_processed:0
io_threaded_writes_processed:0

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.20.61.168:6379,slaves=2,sentinels=3

Any tops for debugging this?

@betermieux
Copy link

Same problem here.
After a few weeks of usage the slave sentinels spike 100% cpu, the master sentinel is idle.
I am using the redis-ha chart from https://dandydeveloper.github.io/charts
Let me know if I can provide more information to this issue.

@oranagra
Copy link
Member

oranagra commented Nov 5, 2022

can you run pstack or gdb -batch -ex "bt" -p <pid> a few times so we know which code path runs in a loop?
anything you can tell us about the configuration so we can reproduce it? which version are you using?
@moticless PTAL.

@moticless
Copy link
Collaborator

@sedflix ,
According to INFO you are using rather old version (6.0.7).
Can you try newer version?

@oranagra made some fix that might be related to your issue:
Fix busy loop in ae.c when timer event is about to fire (#8764)

@betermieux
Copy link

Due to a reboot the sentinel was restarted and is currently not using 100% cpu any more. I will have to wait until it happens again.
I am using docker image redis:7.0.4-alpine3.16. Since the containers are running on a locked down Fedora CoreOS machine, I will have to debug them with a parallel container running gdb, but this should pose no problem.

@yevhen-harmonizehr
Copy link

Faced same issue today on 3-node cluster. Restart of single pod didn't help. Only scale to 0 and scale back to 3 nodes solved the issue.

@oranagra
Copy link
Member

@yevhen-harmonizehr please specify which version you're using.

@yevhen-harmonizehr
Copy link

@yevhen-harmonizehr please specify which version you're using.

redis:7.0.5-alpine

@oranagra
Copy link
Member

so it can't be the above mentioned fix, which was part of 6.2.
if anyone reproduces this again (regardless of the version), we'd love to get some info.
see the instructions i posted here #9956 (comment)

@kfirfer
Copy link

kfirfer commented Nov 21, 2022

Its reproducing with this helm chart:

https://artifacthub.io/packages/helm/dandydev-charts/redis-ha

Tried multiple versions of redis ( 6.X + 7.X )

Its doesn't reproducing all the times
around 1 of every 3 times its reproduces

Some info:

 kubectl top po --containers=true | grep redis                                                                                                                                                             


redis-ha-haproxy-679fcbf55b-6bgxj        haproxy                   4m           69Mi
redis-ha-haproxy-679fcbf55b-hjrf4        haproxy                   3m           69Mi
redis-ha-haproxy-679fcbf55b-jcgk6        haproxy                   3m           69Mi
redis-ha-server-0                        redis                     5m           2Mi
redis-ha-server-0                        sentinel                  7m           2Mi
redis-ha-server-0                        split-brain-fix           0m           0Mi
redis-ha-server-1                        redis                     5m           2Mi
redis-ha-server-1                        sentinel                  8m           2Mi
redis-ha-server-1                        split-brain-fix           0m           0Mi
redis-ha-server-2                        redis                     2m           3Mi
redis-ha-server-2                        sentinel                  955m         3Mi
redis-ha-server-2                        split-brain-fix           0m           0Mi

Inside the sentinel container:

Mem: 21717868K used, 1839616K free, 441888K shrd, 635924K buff, 9427004K cached
CPU:  25% usr  23% sys   0% nic  50% idle   0% io   0% irq   0% sirq
Load average: 2.40 6.70 9.04 5/4816 741
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 1000     R    33452   0%   1  24% redis-sentinel *:26379 [sentinel]
  456     0 1000     S     1784   0%   1   0% sh
  464   456 1000     R     1692   0%   2   0% top

Its 24% cause the vm machine is 4 cores (so its 100% of 1 core)

lsof sentinel container:

/data $ lsof
1       /usr/local/bin/redis-server     0       /dev/null
1       /usr/local/bin/redis-server     1       pipe:[29328910]
1       /usr/local/bin/redis-server     2       pipe:[29328911]
1       /usr/local/bin/redis-server     3       pipe:[29328991]
1       /usr/local/bin/redis-server     4       pipe:[29328991]
1       /usr/local/bin/redis-server     5       anon_inode:[eventpoll]
1       /usr/local/bin/redis-server     6       socket:[29328992]
1       /usr/local/bin/redis-server     7       socket:[29328993]
1       /usr/local/bin/redis-server     8       socket:[29328998]
1       /usr/local/bin/redis-server     9       socket:[29328999]
1       /usr/local/bin/redis-server     10      socket:[29327138]
1       /usr/local/bin/redis-server     11      socket:[29327139]
1       /usr/local/bin/redis-server     12      socket:[29327140]
1       /usr/local/bin/redis-server     13      socket:[29327141]
1       /usr/local/bin/redis-server     14      socket:[29327142]
1       /usr/local/bin/redis-server     17      socket:[29330538]
1       /usr/local/bin/redis-server     18      socket:[29596831]
1       /usr/local/bin/redis-server     19      socket:[29596847]
1       /usr/local/bin/redis-server     20      socket:[29330576]
1       /usr/local/bin/redis-server     21      socket:[29330578]
456     /bin/busybox    0       /dev/pts/0
456     /bin/busybox    1       /dev/pts/0
456     /bin/busybox    2       /dev/pts/0
456     /bin/busybox    10      /dev/tty

@oranagra
Copy link
Member

i'd like a stack trace to see what it's doing.
you can maybe do that with the instructions here #9956 (comment)
or if you're willing to kill the process, then kill -11 <pid> will cause it to print the trace to the log file.
the first method is preferred since you can take several samples.

@kfirfer
Copy link

kfirfer commented Nov 21, 2022

@oranagra

/tmp # gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
0x00007fc15ceb44a3 in ?? () from /lib/ld-musl-x86_64.so.1
#0  0x00007fc15ceb44a3 in ?? () from /lib/ld-musl-x86_64.so.1
#1  0x00007fc15ceb16fa in ?? () from /lib/ld-musl-x86_64.so.1
#2  0x00007fc15cb5b914 in ?? ()
#3  0x0000000000000008 in ?? ()
#4  0x0000000012b82ba2 in ?? ()
#5  0x0000000000000000 in ?? ()
[Inferior 1 (process 1) detached]

Running as 'root' and he doesnt shows the file and line number
Idk why

@kfirfer
Copy link

kfirfer commented Nov 21, 2022

@oranagra

Tried with bullseye docker image and he gives me more info:

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=100) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=100) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=68) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=68) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

The Stacktraces running on sentinel that uses high cpu for no reason

@kfirfer
Copy link

kfirfer commented Nov 21, 2022

Running multiple times:

# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
gdb -batch -ex "bt" -p 1__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# 
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:362
362     ae.c: No such file or directory.
#0  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:362
#1  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#2  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=59) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=59) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00005569536b9cd1 in getMonotonicUs_posix () at monotonic.c:141
141     monotonic.c: No such file or directory.
#0  0x00005569536b9cd1 in getMonotonicUs_posix () at monotonic.c:141
#1  0x00005569535eab33 in usUntilEarliestTimer (eventLoop=0x7f71fd82b1e0, eventLoop=0x7f71fd82b1e0) at ae.c:271
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:375
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=99) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=99) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=40) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=40) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=49) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=49) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=89) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=89) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
#0  0x00007f71fddaed56 in epoll_wait (epfd=5, events=0x7f71fd8e9a00, maxevents=10128, timeout=69) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00005569535ea866 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f71fd82b1e0) at ae_epoll.c:113
#2  aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:404
#3  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#4  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=10) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=10, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d19500, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d19500) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d19500) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]
# gdb -batch -ex "bt" -p 1
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 10]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
28      ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
#0  __libc_recv (flags=<optimized out>, len=16384, buf=0x7ffe2acb3ce0, fd=14) at ../sysdeps/unix/sysv/linux/recv.c:28
#1  __libc_recv (fd=14, buf=0x7ffe2acb3ce0, len=16384, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recv.c:23
#2  0x00005569536c44a7 in redisNetRead (c=0x556953d1f220, buf=<optimized out>, bufcap=<optimized out>) at net.c:61
#3  0x00005569536c7be9 in redisBufferRead (c=c@entry=0x556953d1f220) at hiredis.c:950
#4  0x00005569536c3959 in redisAsyncRead (ac=0x556953d1f220) at async.c:640
#5  0x00005569535ea90a in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71fd82b1e0, flags=flags@entry=27) at ae.c:436
#6  0x00005569535eacad in aeMain (eventLoop=0x7f71fd82b1e0) at ae.c:496
#7  0x00005569535e65a1 in main (argc=<optimized out>, argv=0x7ffe2acb7ee8) at server.c:7075
[Inferior 1 (process 1) detached]

@oranagra
Copy link
Member

thanks. this is useful.

@michael-grunder can you take a look? any reason why redisNetRead will eat CPU, or do a blocking read (following an epoll indication that it's readable)?

@tillkruss
Copy link

@michael-grunder can you take a look? any reason why redisNetRead will eat CPU, or do a blocking read (following an epoll indication that it's readable)?

No, he’s busy fixing macOS bugs.

@michael-grunder
Copy link
Contributor

michael-grunder commented Nov 22, 2022

any reason why redisNetRead will eat CPU, or do a blocking read (following an epoll indication that it's readable)

From the call stack, it seems like maybe it's getting into a loop where recv immediately fails with EAGAIN, so it just spins eating CPU.

I'll take a closer look tomorrow.

Edit: It would also be interesting to see if there was anything of note in the sentinel logs.

@kfirfer
Copy link

kfirfer commented Dec 1, 2022

@michael-grunder

Logs of sentinel that using 100% CPU:

$ kubectl logs redis-ha-server-2 sentinel                              
1:X 29 Nov 2022 19:38:35.509 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 29 Nov 2022 19:38:35.509 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 29 Nov 2022 19:38:35.509 # Configuration loaded
1:X 29 Nov 2022 19:38:35.510 * monotonic clock: POSIX clock_gettime
1:X 29 Nov 2022 19:38:35.511 * Running mode=sentinel, port=26379.
1:X 29 Nov 2022 19:38:35.511 # Sentinel ID is ce2a8c84e9e96e3cf079fafb2ad20bcde4a7b533
1:X 29 Nov 2022 19:38:35.511 # +monitor master mymaster 10.48.8.79 6379 quorum 2
1:X 29 Nov 2022 19:38:35.513 * +slave slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:35.521 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.522 * +slave slave 10.48.12.117:6379 10.48.12.117 6379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:35.527 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.854 * +sentinel sentinel a0ddf4367a72f8e768e2546287bf0b303985c19f 10.48.6.161 26379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:35.860 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.866 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:38:35.866 # +new-epoch 10
1:X 29 Nov 2022 19:38:36.637 * +sentinel sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:38:36.645 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:39:52.343 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:39:52.343 # +new-epoch 11
1:X 29 Nov 2022 19:39:53.281 # +config-update-from sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.8.79 6379
1:X 29 Nov 2022 19:39:53.281 # +switch-master mymaster 10.48.8.79 6379 10.48.12.117 6379
1:X 29 Nov 2022 19:39:53.337 * +slave slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:39:53.337 * +slave slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:39:53.343 * Sentinel new configuration saved on disk
1:X 29 Nov 2022 19:40:07.245 # +sdown sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:40:14.554 # +sdown slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:41:08.813 * +reboot slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:41:08.911 # -sdown slave 10.48.8.79:6379 10.48.8.79 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:41:11.977 # -sdown sentinel bc16ddc9a19ce8f3ff5ec8f31a2d6b94436af8a3 10.48.8.79 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:42:19.449 # +sdown slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:42:24.029 # +sdown sentinel a0ddf4367a72f8e768e2546287bf0b303985c19f 10.48.6.161 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:43:24.637 # -sdown sentinel a0ddf4367a72f8e768e2546287bf0b303985c19f 10.48.6.161 26379 @ mymaster 10.48.12.117 6379
1:X 29 Nov 2022 19:43:25.051 * +reboot slave 10.48.6.161:6379 10.48.6.161 6379 @ mymaster 10.48.12.117 6379

@yevhen-harmonizehr
Copy link

yevhen-harmonizehr commented Dec 7, 2022

can see same lines for gdb

#9  0x000056375716c6fc in beforeSleep (eventLoop=<optimized out>) at server.c:1648
#10 0x00005637571680c1 in aeApiPoll (tvp=<optimized out>, eventLoop=0x7f689822b1e0) at ae.c:113
#11 aeProcessEvents (eventLoop=eventLoop@entry=0x7f689822b1e0, flags=flags@entry=27) at ae.c:404
#12 0x0000563757168c5d in aeProcessEvents (flags=27, eventLoop=0x7f689822b1e0) at ae.c:496
#13 aeMain (eventLoop=0x7f689822b1e0) at ae.c:496
#14 0x000056375716450a in main (argc=<optimized out>, argv=0x7ffdab146bf8) at server.c:7075

and for sentinel logs

k logs -f argo-cd-redis-ha-server-1 -c sentinel
1:X 06 Dec 2022 20:14:13.642 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 06 Dec 2022 20:14:13.642 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 06 Dec 2022 20:14:13.642 # Configuration loaded
1:X 06 Dec 2022 20:14:13.643 * monotonic clock: POSIX clock_gettime
1:X 06 Dec 2022 20:14:13.643 * Running mode=sentinel, port=26379.
1:X 06 Dec 2022 20:14:13.644 # Sentinel ID is c524e76c625cc68c94f25c99e20efce31df6f4c2
1:X 06 Dec 2022 20:14:13.644 # +monitor master argocd 172.20.173.199 6379 quorum 2
1:X 06 Dec 2022 20:14:13.646 * +slave slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:13.651 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:14:13.651 * +slave slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:13.656 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:14:13.939 * +sentinel sentinel 2f679f20284b54ed430b97819302f9dc0d48cb09 172.20.112.236 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:13.945 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:14:14.099 * +sentinel sentinel 4794d0beec6abdd90a3821f7d9bf66804f290941 172.20.173.199 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:14:14.104 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:18:07.337 # +sdown sentinel 2f679f20284b54ed430b97819302f9dc0d48cb09 172.20.112.236 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:18:07.798 # +sdown slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:19:04.549 * +reboot slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:19:04.593 # -sdown slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:19:05.762 # -sdown sentinel 2f679f20284b54ed430b97819302f9dc0d48cb09 172.20.112.236 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:26.004 # +sdown sentinel 4794d0beec6abdd90a3821f7d9bf66804f290941 172.20.173.199 26379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.601 # +sdown master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.659 # +odown master argocd 172.20.173.199 6379 #quorum 2/2
1:X 06 Dec 2022 20:24:28.659 # +new-epoch 1
1:X 06 Dec 2022 20:24:28.659 # +try-failover master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.763 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:24:28.764 # +vote-for-leader c524e76c625cc68c94f25c99e20efce31df6f4c2 1
1:X 06 Dec 2022 20:24:28.776 # 2f679f20284b54ed430b97819302f9dc0d48cb09 voted for c524e76c625cc68c94f25c99e20efce31df6f4c2 1
1:X 06 Dec 2022 20:24:28.822 # +elected-leader master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.822 # +failover-state-select-slave master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.884 # +selected-slave slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.884 * +failover-state-send-slaveof-noone slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:28.955 * +failover-state-wait-promotion slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:29.940 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:24:29.940 # +promoted-slave slave 172.20.150.209:6379 172.20.150.209 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:29.940 # +failover-state-reconf-slaves master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:29.993 * +slave-reconf-sent slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.420 * +slave-reconf-inprog slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.420 * +slave-reconf-done slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.505 # +failover-end master argocd 172.20.173.199 6379
1:X 06 Dec 2022 20:24:30.505 # +switch-master argocd 172.20.173.199 6379 172.20.150.209 6379
1:X 06 Dec 2022 20:24:30.505 * +slave slave 172.20.112.236:6379 172.20.112.236 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:24:30.505 * +slave slave 172.20.173.199:6379 172.20.173.199 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:24:30.511 * Sentinel new configuration saved on disk
1:X 06 Dec 2022 20:24:40.539 # +sdown slave 172.20.173.199:6379 172.20.173.199 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:25:45.791 # -sdown slave 172.20.173.199:6379 172.20.173.199 6379 @ argocd 172.20.150.209 6379
1:X 06 Dec 2022 20:25:49.641 # -sdown sentinel 4794d0beec6abdd90a3821f7d9bf66804f290941 172.20.173.199 26379 @ argocd 172.20.150.209 6379

@kfirfer
Copy link

kfirfer commented Dec 18, 2022

There some workaround ?

Edit:
Its happens only on docker ? maybe incompatibility with the OS?

@silvpol
Copy link

silvpol commented Mar 13, 2023

This also happens on GKE using ContainerD so not Docker specific. We try to manage this by setting up livenessCheck on Sentinel CPU usage so that it gets restarted automatically if it uses too much CPU.

@Zebradil
Copy link

We observe this issue on GKE as well.

@davidln777
Copy link

@oranagra - any update on this issue?
We keep getting it, and it's being resolved after the pod restart.

@oranagra
Copy link
Member

oranagra commented Jun 6, 2023

@michael-grunder can you please take a closer look?

@Sven1410
Copy link

Sven1410 commented Nov 16, 2023

is there any workaround or hint how to prevent/solve this ?
We have the same problem after upgrading redis-ha from 7.0.0 to 7.0.13 (inside of argocd)
(I think the problem was there before - but now it occurs in nearly every restart)

what additionally confuses me:

kubectl top pods shows that the (sentinel) container runs on its limit (CPU 1005m)
but when I execute top inside the container the result always is CPU usage: 12% (?)

thanks in advance!

edit:
looks like the 12% CPU (busybox top command) inside the container reflects the capacity of the node
--> the limit is 1000m which is 1/8 cpu's of the node = 12%
for a lifeness check the "load" would be a better indicator (also available via command "uptime")

@Sven1410
Copy link

maybe a redis-expert can see something here - this is the "info" command output of the "bad" sentinel container with 100% cpu consumption ("sentinel pending-scripts" returns nothing, failover works fine):

redis-cli -p 26379
info

# Server
redis_version:7.0.11
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:183dd15edc8d4d6a
redis_mode:sentinel
os:Linux 5.15.135-gardenlinux-cloud-amd64 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:12.2.1
process_id:1
process_supervised:no
run_id:8986c630a7fd774f791223f327bea3154f794675
tcp_port:26379
server_time_usec:1701163193604078
uptime_in_seconds:66249
uptime_in_days:0
hz:13
configured_hz:10
lru_clock:6664377
executable:/data/redis-sentinel
config_file:/data/conf/sentinel.conf
io_threads_active:0

# Clients
connected_clients:6
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Stats
total_connections_received:340692
total_commands_processed:863856
instantaneous_ops_per_sec:8
total_net_input_bytes:27922297
total_net_output_bytes:23261548
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.29
instantaneous_output_kbps:0.22
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:574
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:198318
dump_payload_sanitizations:0
total_reads_processed:1345332
total_writes_processed:1057540
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:132427
reply_buffer_expands:6

# CPU
used_cpu_sys:40470.884147
used_cpu_user:25442.201681
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:40470.870627
used_cpu_user_main_thread:25442.177889

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=argocd,status=ok,address=100.xx.xxx.38:6379,slaves=2,sentinels=3

sentinel pending-scripts
<--empty line returned-->
sentinel CKQUORUM argocd
OK 3 usable Sentinels. Quorum and failover authorization can be reached

@Sven1410
Copy link

Sven1410 commented Dec 1, 2023

I found a workaround (for docker/k8s):

I played a bit with the sentinel CLI and found out that a "sentinel reset *" always solves the problem and the CPU consumption is dropped down to normal.
https://lzone.de/cheat-sheet/Redis%20Sentinel
https://redis.io/docs/management/sentinel/#sentinel-commands

and so far the problem occurs only after a rolling update or container restarts.
So I added a delayed reset to all the sentinel containers via helm chart values:

argo-cd:
  redis-ha:
    sentinel:
      lifecycle: 
        postStart:
          exec:
            command: ["/bin/sh", "-c", "sleep 30; redis-cli -p 26379 sentinel reset argocd "]

@jortkoopmans
Copy link

Thank you for the proposed workaround @Sven1410. Did the issue return for you in the meantime over the last few days (as it is intermittent)?
What I've done based on your post:

  • Run the manual sentinel reset command. This indeed worked for me to bring back the utilization to normal.
  • Implement the lifecycle hook, but while I used 60s sleep in the command, I had at least one occasion where the hook did not prevent the utilization spike. It is possible that timing matters (30s vs 60s, in relation to the entrypoint or other containers in the pod), alternatively it could be reducing the likelihood of running into this condition (which probably is a race condition?) but not preventing it. More/longer testing should give clearer results, specifically on this workaround. I'll report back if I obtain robust conclusions.

A root cause analysis is (still) preferred though 😊 .

@Sven1410
Copy link

Sven1410 commented Dec 5, 2023

so far the problem did not came back.
I would also think a longer delay would be better, but if the delay is too long our alert already fires.
Another option is to do this reset periodically (once a day maybe?)
my versions: (redis-ha: 7.0.13-alpine, haproxy: 2.8.3-alpine)

Thanks for testing + sharing the results!

@jortkoopmans
Copy link

Unfortunately, the workaround does not (consistently) work on my deployments. I have 2 sets of versions and configurations, I still observe the cpu drain, at least on the 6.2.5 version.

  • redis 7.0.7-alpine, ha-proxy 2.6.4
  • redis 6.2.5-alpine, no ha-proxy

Note that I do not observe the sentinel pending-scripts line, described here

@Sven1410
Copy link

"sentinel pending-scripts" was already the next command I typed in - it returns nothing, only an empty line...

@jortkoopmans
Copy link

"sentinel pending-scripts" was already the next command I typed in - it returns nothing, only an empty line...

Ah OK my bad, should have checked the sentinel cli directives. I'll verify this the next time I observe the issue.
In normal condition I get this response:

(empty array)

@silvpol
Copy link

silvpol commented Dec 14, 2023

We have experienced this in prod today and sentinel pending-scripts did return an empty array.

@jortkoopmans
Copy link

Following up after observing this for a while (on a number of independent redis-ha deployments), findings:

  • The workaround to reset the sentinel after pod start (on k8s) does not work in my case. I don't have long term statistics on the bug likelihood on specific configuration combinations, but in all cases it does not alleviate the issue substantially enough to be able to run this independently in production (i.e. other mitigation required).
  • sentinel pending-scripts still returns an empty array (same to normal conditions, identical to finding by @silvpol ).

Now I'll stop posting behavior patterns for this issue, in the hope of a root cause analysis. 🙏

@Sven1410
Copy link

After debugging with gdb I've got exactly the same stack traces like others posted here.
I think it must be this epoll_wait() call. But the timeout (-1) looks correct.
If you google "epoll_wait" "100% cpu" you will find dozens of nearly identical problems.
Maybe we have a similar problem like described here: joyent/libuv#1099
The fact that we restart the redis process(es) on the same (linux) node and use the same sockets - especially the "announce" sockets could also be relevant.

Is there an option to fall back (for linux docker images) to "ae_kqueue.c"? (#define HAVE_EPOLL 0)?

@posh-raj-kuni
Copy link

We are seeing the same issue in our environment. Our environment is a GKE cluster running on k8s version 1.26.

The perplexing thing is, this CPU issue is only happening in one namespace in the cluster. If we try deploying redis-sentinel in another namespace, we are not seeing the issue!

When I read it may be a networking issue, I thought maybe the network policies in the namespace where the CPU ramp up was observed were the issue. So I added a couple policies that allowed all egress/ingress traffic. But that didn't resolve the issue either.

So still stumped as to what's causing this issue and why it's only happening in one namespace!

@posh-raj-kuni
Copy link

posh-raj-kuni commented Jan 30, 2024

We figured out what the issue was on our end.

When I monitored the processes on the redis and sentinel containers, I noticed that the processes that get created for the health checks (liveness, readiness probes) were taking up a lot of CPU (around 40% - 50%). This didn't seem right. In the namespaces where the overall CPU usage was low, these health check processes took up a negligible amount of CPU.

One key difference between the two namespaces was that the pod in the namespace where we saw high CPU usage had a very large number of environment variables set by k8s service links.

When we disabled the service links, we no longer observed the high cpu usage! It looks like the large number of environment variables causes the liveness/readiness probes to take up a lot of cpu.

But then again, I'm not sure if the high number of environment variables is the root cause of the CPU usage. Because we have other redis bitnami deployments running in standalone mode where the same health checks are run. And we don't see the health checks using up the CPU like in the redis-sentinel pods.

Perhaps a combination of having a large number of environment variables and multiple containers in a pod is causing the health check processes to take up a lot of CPU?

I also found an article discussing how exec probes can result in large cpu usage but not sure if it is related to the health check problems in redis-sentinel.

And another github thread discussing the same high CPU issue for exec probes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests