Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyDB crashes when running as slave using FLASH #701

Open
marcelogaio-groovinads opened this issue Jul 22, 2023 · 1 comment
Open

KeyDB crashes when running as slave using FLASH #701

marcelogaio-groovinads opened this issue Jul 22, 2023 · 1 comment

Comments

@marcelogaio-groovinads
Copy link

Crash report

Paste the complete crash log between the quotes below. Please include a few lines from the log preceding the crash report to provide some context.

[log_before_crash.txt](https://github.com/Snapchat/KeyDB/files/12137023/log_before_crash.txt)

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
251825:228978:C 22 Jul 2023 15:18:09.357 # KeyDB 6.3.3 crashed by signal: 6, si_code: -6
251825:228978:C 22 Jul 2023 15:18:09.357 # Killed by PID: 251825, UID: 115
251825:228978:C 22 Jul 2023 15:18:09.357 # Crashed running the instruction at: 0x7f169c306a7c

------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f169c306a7c]

Backtrace:
/lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f169c2b2520]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f169c306a7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f169c2b2476]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f169c2987f3]
/lib/x86_64-linux-gnu/libc.so.6(+0x2871b) [0x7f169c29871b]
/lib/x86_64-linux-gnu/libc.so.6(+0x39e96) [0x7f169c2a9e96]
keydb-rdb-bgsave *:6389(RocksDBStorageProvider::enumerate(std::function<bool (char const*, unsigned long, void const*, unsigned long)>) const+0x204) [0x55bb4f959614]
keydb-rdb-bgsave *:6389(redisDbPersistentDataSnapshot::iterate_threadsafe_core(std::function<bool (char const*, robj_roptr)>&, bool, bool, bool) const+0x1bd) [0x55bb4f95a81d]
keydb-rdb-bgsave *:6389(rdbSaveRio(_rio*, redisDbPersistentDataSnapshot const**, int*, int, rdbSaveInfo*)+0x2b0) [0x55bb4f88fb90]
keydb-rdb-bgsave *:6389(rdbSaveFile(char*, redisDbPersistentDataSnapshot const**, rdbSaveInfo*)+0x115) [0x55bb4f8900c5]
keydb-rdb-bgsave *:6389(rdbSave(redisDbPersistentDataSnapshot const**, rdbSaveInfo*)+0x72) [0x55bb4f890772]
keydb-rdb-bgsave *:6389(rdbSaveBackgroundFork(rdbSaveInfo*)+0x122) [0x55bb4f891052]
keydb-rdb-bgsave *:6389(rdbSaveBackground(rdbSaveInfo*)+0xd8) [0x55bb4f891708]
keydb-rdb-bgsave *:6389(serverCron(aeEventLoop*, long long, void*)+0xd52) [0x55bb4f8326d2]
keydb-rdb-bgsave *:6389(aeProcessEvents+0x22c) [0x55bb4f82947c]
keydb-rdb-bgsave *:6389(aeMain+0x3e) [0x55bb4f829bde]
keydb-rdb-bgsave *:6389(workerThreadMain(void*)+0x1b0) [0x55bb4f841770]
/lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f169c304b43]
/lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f169c396a00]

------ REGISTERS ------
251825:228978:C 22 Jul 2023 15:18:09.359 #
RAX:0000000000000000 RBX:00007f168abff640
RCX:00007f169c306a7c RDX:0000000000000006
RDI:000000000003d7b1 RSI:000000000003d7b1
RBP:000000000003d7b1 RSP:00007f168abfb180
R8 :00007f168abfb250 R9 :00007f0a40000000
R10:0000000000000008 R11:0000000000000246
R12:0000000000000006 R13:0000000000000016
R14:000055bb4fc50807 R15:0000000006066f97
RIP:00007f169c306a7c EFL:0000000000000246
CSGSFS:002b000000000033
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb18f) -> 6fa3a1d4f5e65f00
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb18e) -> ffffffffffffffff
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb18d) -> 6fa3a1d4f5e65f00
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb18c) -> 00007f1687e0c090
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb18b) -> 0000000000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb18a) -> 00007f168abf0000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb189) -> 0000000106066f97
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb188) -> 0000008000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb187) -> 0000000000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb186) -> 0000000000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb185) -> 0000000000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb184) -> 0000000000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb183) -> 0000000000000000
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb182) -> 00007f076f713dac
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb181) -> 00007f076f713c80
251825:228978:C 22 Jul 2023 15:18:09.359 # (00007f168abfb180) -> 00007f076f713dac

------ INFO OUTPUT ------
# Server
redis_version:6.3.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:e067ff01e34480a9
redis_mode:standalone
os:Linux 5.15.0-76-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:11.2.0
process_id:251825
process_supervised:no
run_id:5dd168cf464d19c26ae452e1a0dfccc9a86a7375
tcp_port:6389
server_time_usec:1690049448863607
uptime_in_seconds:8020
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12327848
executable:/usr/bin/keydb-server
config_file:/etc/keydb/keydb.conf

# Clients
connected_clients:1
cluster_connections:0
maxclients:50000
client_recent_max_input_buffer:137288
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:0
thread_0_clients:1
thread_1_clients:0
thread_2_clients:0
thread_3_clients:0

# Memory
used_memory:32886766000
used_memory_human:30.63G
used_memory_rss:36613906432
used_memory_rss_human:34.10G
used_memory_peak:32907707880
used_memory_peak_human:30.65G
used_memory_peak_perc:99.94%
used_memory_overhead:9578959276
used_memory_startup:16098000
used_memory_dataset:23307806724
used_memory_dataset_perc:70.91%
allocator_allocated:32889824936
allocator_active:34910052352
allocator_resident:35443187712
total_system_memory:134805303296
total_system_memory_human:125.55G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:61275137861
maxmemory_human:57.07G
maxmemory_policy:allkeys-lru
allocator_frag_ratio:1.06
allocator_frag_bytes:2020227416
allocator_rss_ratio:1.02
allocator_rss_bytes:533135360
rss_overhead_ratio:1.03
rss_overhead_bytes:1170718720
mem_fragmentation_ratio:1.11
mem_fragmentation_bytes:3727163280
mem_not_counted_for_evict:0
mem_replication_backlog:1048576
mem_clients_slaves:0
mem_clients_normal:146724
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:flash
flash_memory:44462146058

# Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:19894494
rdb_bgsave_in_progress:0
rdb_last_save_time:1690045112
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:393
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:20000768
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:0
total_commands_processed:19898807
instantaneous_ops_per_sec:4821
total_net_input_bytes:79015931833
total_net_output_bytes:65949
instantaneous_input_kbps:6046.88
instantaneous_output_kbps:0.05
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1172457
total_forks:9
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:4063863
total_writes_processed:1473
instantaneous_lock_contention:1
avg_lock_contention:0.656250
storage_provider_read_hits:2110710
storage_provider_read_misses:4126368

# Replication
role:slave
master_global_link_status:up
connected_masters:1
master_host:lan_dmm01
master_port:6389
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:20024792990
slave_repl_offset:20024732295
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:5daa97bc81d3358e75495604895c6d5faf8bc77d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:20024732295
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:20023683720
repl_backlog_histlen:1048576

# CPU
used_cpu_sys:28.563275
used_cpu_user:404.256802
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:4
long_lock_waits:46424
used_cpu_sys_main_thread:28.563275
used_cpu_user_main_thread:404.256802

# Modules

# Commandstats
cmdstat_ping:calls=263,usec=100,usec_per_call=0.38,rejected_calls=0,failed_calls=0
cmdstat_set:calls=15946676,usec=116154118,usec_per_call=7.28,rejected_calls=0,failed_calls=0
cmdstat_del:calls=3951866,usec=242492661,usec_per_call=61.36,rejected_calls=0,failed_calls=0
cmdstat_select:calls=2,usec=3,usec_per_call=1.50,rejected_calls=0,failed_calls=0

# Errorstats

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=101085079,expires=217344705,avg_ttl=518848137,cached_keys=2766794

# KeyDB
mvcc_depth:0

------ CLIENT LIST OUTPUT ------
id=9 addr=192.168.0.80:6389 laddr=192.168.0.81:42900 fd=111 name= age=1521 idle=0 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=65530 argv-mem=60668 obl=0 oll=0 omem=0 tot-mem=146724 events=r cmd=set user=(superuser) redir=-1

------ MODULES INFO OUTPUT ------

------ DUMPING CODE AROUND EIP ------
Symbol: pthread_kill (base: 0x7f169c306950)
Module: /lib/x86_64-linux-gnu/libc.so.6 (base 0x7f169c270000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x7f169c306950 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
251825:228978:C 22 Jul 2023 15:18:09.359 # dump of function (hexdump of 428 bytes):
f30f1efa41568d56e0415541bd16000000415455534881ec9000000064488b042528000000488984248800000031c083fa01767e4889fb4189f464483b3c25100000000f84c70000004989e641ba0800000031ffb80e0000004c89f2488d35adc013000f0531c0488dab74090000ba01000000f00fb155000f85ca00000080bb730900000074594531ed31d287937409000083fa010f8fbd00000041ba0800000031d24c89f6bf02000000b80e0000000f05488b84248800000064482b0425280000000f859c0000004881c4900000004489e85b5d415c415d415ec30f1f4000448babd0020000e8745605004489e289c74489eeb8ea0000000f053d00f0ffff76854189c541f7ddeb80660f1f440000b8ba0000000f0589c5e8425605004489e289ee89c7b8ea0000000f054189c541f7dd3d00f0ffffb800000000440f46e8e96dffffff0f1f004889efe810a8ffffe929ffffff0f1f004889efe8d0a8ffffe936ffffffe866fc0900660f1f440000f30f1efac3662e0f1f840000000000904157415641554154554889fd534883ec2864488b042528000000488944241831c0648b0425d0020000894424

=== KEYDB BUG REPORT END. Make sure to include from START to END. ===

Aditional information

  1. OS distribution and version
    KeyDB 6.3.3 (00000000/0) 64 bit
    Running FLASH over HW RAID 10 (new Intel SSDs)
    UBUNTU 22.04
    CPU Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz (2 cpus - 24 cores/48 threads)
    128GB Ram

  2. Steps to reproduce (if any)
    This is a slave node.
    We started it and it replicated from the Master (around 60gb in ssd storage)
    Replication completed OK.
    Some minutes later, the slave crashes.
    Also, dmesg shows "segfault":
    [74716.219476] keydb-server[226968]: segfault at 7f6e61bfb910 ip 00007f6e746c5c5d sp 00007f6e623f6510 error 4 in libc.so.6[7f6e7465b000+195000]
    [74716.219488] Code: 08 5b 5d c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 55 48 81 ec a0 00 00 00 64 48 8b 04 25 28 00 00 00 48 89 84 24 98 00 00 00 <8b> 87 d0 02 00 00 85 c0 75 29 48 8b 84 24 98 00 00 00 64 48 2b 04

@msotheeswaran-sc
Copy link
Collaborator

hi @marcelogaio-groovinads the crash report you shared is from the bgsave child process, it looks like it was killed because the parent process crashed? can you provide logs from the actual keydb-server process, as well as your config file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants