Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] 7.0.10 memcpy sigabrt due to buffer overflow in replication.c line 367 #11965

Closed
darix opened this issue Mar 24, 2023 · 41 comments · Fixed by #11982
Closed

[CRASH] 7.0.10 memcpy sigabrt due to buffer overflow in replication.c line 367 #11965

darix opened this issue Mar 24, 2023 · 41 comments · Fixed by #11982

Comments

@darix
Copy link

darix commented Mar 24, 2023

This was code changed between 7.0.8 and 7.0.10

git diff -w 7.0.8..7.0.10 -- src/replication.c
diff --git a/src/replication.c b/src/replication.c
index 2c7755512..e86aa4b59 100644
--- a/src/replication.c
+++ b/src/replication.c
@@ -324,9 +324,8 @@ void feedReplicationBuffer(char *s, size_t len) {
     static long long repl_block_id = 0;
 
     if (server.repl_backlog == NULL) return;
-    server.master_repl_offset += len;
-    server.repl_backlog->histlen += len;
 
+    while(len > 0) {
         size_t start_pos = 0; /* The position of referenced block to start sending. */
         listNode *start_node = NULL; /* Replica/backlog starts referenced node. */
         int add_new_block = 0; /* Create new block if current block is total used. */
@@ -345,20 +344,27 @@ void feedReplicationBuffer(char *s, size_t len) {
             tail->used += copy;
             s += copy;
             len -= copy;
+            server.master_repl_offset += copy;
+            server.repl_backlog->histlen += copy;
         }
         if (len) {
             /* Create a new node, make sure it is allocated to at
              * least PROTO_REPLY_CHUNK_BYTES */
             size_t usable_size;
-        size_t size = (len < PROTO_REPLY_CHUNK_BYTES) ? PROTO_REPLY_CHUNK_BYTES : len;
+            /* Avoid creating nodes smaller than PROTO_REPLY_CHUNK_BYTES, so that we can append more data into them,
+             * and also avoid creating nodes bigger than repl_backlog_size / 16, so that we won't have huge nodes that can't
+             * trim when we only still need to hold a small portion from them. */
+            size_t limit = max((size_t)server.repl_backlog_size / 16, (size_t)PROTO_REPLY_CHUNK_BYTES);
+            size_t size = min(max(len, (size_t)PROTO_REPLY_CHUNK_BYTES), limit);
             tail = zmalloc_usable(size + sizeof(replBufBlock), &usable_size);
             /* Take over the allocation's internal fragmentation */
             tail->size = usable_size - sizeof(replBufBlock);
-        tail->used = len;
+            size_t copy = (tail->size >= len) ? len : tail->size;
+            tail->used = copy;
             tail->refcount = 0;
-        tail->repl_offset = server.master_repl_offset - tail->used + 1;
+            tail->repl_offset = server.master_repl_offset + 1;
             tail->id = repl_block_id++;
-        memcpy(tail->buf, s, len);
+            memcpy(tail->buf, s, copy);
             listAddNodeTail(server.repl_buffer_blocks, tail);
             /* We also count the list node memory into replication buffer memory. */
             server.repl_buffer_mem += (usable_size + sizeof(listNode));
@@ -367,6 +373,10 @@ void feedReplicationBuffer(char *s, size_t len) {
                 start_node = listLast(server.repl_buffer_blocks);
                 start_pos = 0;
             }
+            s += copy;
+            len -= copy;
+            server.master_repl_offset += copy;
+            server.repl_backlog->histlen += copy;
         }
 
         /* For output buffer of replicas. */
@@ -408,6 +418,7 @@ void feedReplicationBuffer(char *s, size_t len) {
          * in freeMemoryGetNotCountedMemory for details. */
         incrementalTrimReplicationBacklog(REPL_BACKLOG_TRIM_BLOCKS_PER_CALL);
     }
+}
 
 /* Propagate write commands to replication stream.
  *

It seems the changed buffer size calculation leads to

(gdb) bt
#0  0x00007fe91f64367b in __GI_kill () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005568bf6c67d3 in bugReportEnd (killViaSignal=<optimized out>, sig=6) at /usr/src/debug/redis-7.0.10/src/debug.c:2114
#2  <signal handler called>
#3  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#4  0x00007fe91f694e03 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#5  0x00007fe91f643356 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#6  0x00007fe91f62b897 in __GI_abort () at abort.c:79
#7  0x00007fe91f62c611 in __libc_message (fmt=fmt@entry=0x7fe91f7b02e9 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:150
#8  0x00007fe91f72ae9b in __GI___fortify_fail (msg=msg@entry=0x7fe91f7b028f "buffer overflow detected") at fortify_fail.c:24
#9  0x00007fe91f729316 in __GI___chk_fail () at chk_fail.c:28
#10 0x00005568bf68aa79 in memcpy (__len=72100, __src=0x7fe9199f21fd, __dest=0x7fe919a28ca8) at /usr/include/bits/string_fortified.h:29
#11 feedReplicationBuffer (s=<optimized out>, s@entry=0x7fe9199ef389 "[snip]"..., len=<optimized out>)
    at /usr/src/debug/redis-7.0.10/src/replication.c:367
#12 0x00005568bf68ae32 in feedReplicationBuffer (len=<optimized out>, s=0x7fe9199ef389 "[snip]"...)
    at /usr/src/debug/redis-7.0.10/src/replication.c:326
#13 feedReplicationBufferWithObject (o=<optimized out>) at /usr/src/debug/redis-7.0.10/src/replication.c:242
#14 0x00005568bf68b08c in replicationFeedSlaves (slaves=<optimized out>, dictid=<optimized out>, argv=<optimized out>, argc=<optimized out>) at /usr/src/debug/redis-7.0.10/src/replication.c:501
#15 0x00005568bf65714e in propagatePendingCommands () at /usr/src/debug/redis-7.0.10/src/server.c:3268
#16 0x00005568bf6578e7 in afterCommand (c=<optimized out>) at /usr/src/debug/redis-7.0.10/src/server.c:3597
#17 afterCommand (c=0x7fe91ae2ad00) at /usr/src/debug/redis-7.0.10/src/server.c:3590
#18 call (c=c@entry=0x7fe91ae2ad00, flags=<optimized out>, flags@entry=15) at /usr/src/debug/redis-7.0.10/src/server.c:3539
#19 0x00005568bf6584ad in processCommand (c=0x7fe91ae2ad00) at /usr/src/debug/redis-7.0.10/src/server.c:4018
#20 0x00005568bf671eeb in processCommandAndResetClient (c=0x7fe91ae2ad00) at /usr/src/debug/redis-7.0.10/src/networking.c:2472
#21 processInputBuffer (c=c@entry=0x7fe91ae2ad00) at /usr/src/debug/redis-7.0.10/src/networking.c:2576
#22 0x00005568bf6723a8 in readQueryFromClient (conn=<optimized out>) at /usr/src/debug/redis-7.0.10/src/networking.c:2712
#23 0x00005568bf743d53 in callHandler (handler=<optimized out>, conn=0x7fe9198160c0) at /usr/src/debug/redis-7.0.10/src/connhelpers.h:79
#24 connSocketEventHandler (el=<optimized out>, fd=<optimized out>, clientData=0x7fe9198160c0, mask=<optimized out>) at /usr/src/debug/redis-7.0.10/src/connection.c:310
#25 0x00005568bf64e148 in aeProcessEvents (eventLoop=eventLoop@entry=0x7fe91f02b0f0, flags=flags@entry=27) at /usr/src/debug/redis-7.0.10/src/ae.c:436
#26 0x00005568bf64e52d in aeProcessEvents (flags=27, eventLoop=0x7fe91f02b0f0) at /usr/src/debug/redis-7.0.10/src/ae.c:362
#27 aeMain (eventLoop=0x7fe91f02b0f0) at /usr/src/debug/redis-7.0.10/src/ae.c:496
#28 0x00005568bf646662 in main (argc=2, argv=<optimized out>) at /usr/src/debug/redis-7.0.10/src/server.c:7167

Additional information

  1. openSUSE Tumbleweed
@darix
Copy link
Author

darix commented Mar 24, 2023

FWIW: the machine doesnt even do replication. it is a standalone redis server.

@sundb
Copy link
Collaborator

sundb commented Mar 24, 2023

@darix Thx, could you provide the full crash logs?

@darix
Copy link
Author

darix commented Mar 24, 2023

which part are you missing? not sure that the parts from the redis log are helpful here. sigabrt because memcpy would write beyond the allocated buffer, is pretty obvious.

@sundb
Copy link
Collaborator

sundb commented Mar 24, 2023

Like:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
...
------ FAST MEMORY TEST ------
...

@darix
Copy link
Author

darix commented Mar 24, 2023

------ FAST MEMORY TEST ------
20863:M 24 Mar 2023 03:36:14.932 # Bio thread for job type #0 terminated
20863:M 24 Mar 2023 03:36:14.932 # Bio thread for job type #1 terminated
20863:M 24 Mar 2023 03:36:14.932 # Bio thread for job type #2 terminated
*** Preparing to test memory region 5568bf8f3000 (2306048 bytes)
*** Preparing to test memory region 5568c0c93000 (270336 bytes)
*** Preparing to test memory region 7fe918efc000 (63963136 bytes)
*** Preparing to test memory region 7fe91cbfd000 (8388608 bytes)
*** Preparing to test memory region 7fe91d3fe000 (8388608 bytes)
*** Preparing to test memory region 7fe91dbff000 (8388608 bytes)
*** Preparing to test memory region 7fe91e400000 (8388608 bytes)
*** Preparing to test memory region 7fe91ec00000 (8388608 bytes)
*** Preparing to test memory region 7fe91f604000 (4096 bytes)
*** Preparing to test memory region 7fe91f7f2000 (57344 bytes)
*** Preparing to test memory region 7fe91fc25000 (12288 bytes)
*** Preparing to test memory region 7fe91fc34000 (24576 bytes)
*** Preparing to test memory region 7fe91fc78000 (8192 bytes)
*** Preparing to test memory region 7fe91fd72000 (8192 bytes)
*** Preparing to test memory region 7fe91feeb000 (4096 bytes)
*** Preparing to test memory region 7fe91ffd8000 (12288 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

@sundb
Copy link
Collaborator

sundb commented Mar 24, 2023

@darix Is there any logs after REDIS BUG REPORT START: Cut & paste starting from here

@darix
Copy link
Author

darix commented Mar 24, 2023

yes. but as i said ... not sure any more relevant info is in there.

=== REDIS BUG REPORT START: Cut & paste starting from here ===
20863:M 24 Mar 2023 03:36:14.929 # Redis 7.0.10 crashed by signal: 6, si_code: -6
20863:M 24 Mar 2023 03:36:14.929 # Crashed running the instruction at: 0x7fe91f694d7c

------ STACK TRACE ------
EIP:
/lib64/libc.so.6(+0x8fd7c)[0x7fe91f694d7c]

Backtrace:
/lib64/libc.so.6(+0x3e420)[0x7fe91f643420]
/lib64/libc.so.6(+0x8fd7c)[0x7fe91f694d7c]
/lib64/libc.so.6(gsignal+0x18)[0x7fe91f643356]
/lib64/libc.so.6(abort+0xd9)[0x7fe91f62b897]
/lib64/libc.so.6(+0x27611)[0x7fe91f62c611]
/lib64/libc.so.6(+0x125e9b)[0x7fe91f72ae9b]
/lib64/libc.so.6(+0x124316)[0x7fe91f729316]
/usr/sbin/redis-server 127.0.0.1:6379(+0xbaa79)[0x5568bf68aa79]
/usr/sbin/redis-server 127.0.0.1:6379(feedReplicationBufferWithObject+0xa2)[0x5568bf68ae32]
/usr/sbin/redis-server 127.0.0.1:6379(replicationFeedSlaves+0x19c)[0x5568bf68b08c]
/usr/sbin/redis-server 127.0.0.1:6379(propagatePendingCommands+0xfe)[0x5568bf65714e]
/usr/sbin/redis-server 127.0.0.1:6379(call+0x6a7)[0x5568bf6578e7]
/usr/sbin/redis-server 127.0.0.1:6379(processCommand+0x81d)[0x5568bf6584ad]
/usr/sbin/redis-server 127.0.0.1:6379(processInputBuffer+0xeb)[0x5568bf671eeb]
/usr/sbin/redis-server 127.0.0.1:6379(readQueryFromClient+0x2d8)[0x5568bf6723a8]
/usr/sbin/redis-server 127.0.0.1:6379(+0x173d53)[0x5568bf743d53]
/usr/sbin/redis-server 127.0.0.1:6379(+0x7e148)[0x5568bf64e148]
/usr/sbin/redis-server 127.0.0.1:6379(aeMain+0x1d)[0x5568bf64e52d]
/usr/sbin/redis-server 127.0.0.1:6379(main+0x352)[0x5568bf646662]
/lib64/libc.so.6(+0x27bb0)[0x7fe91f62cbb0]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7fe91f62cc79]
/usr/sbin/redis-server 127.0.0.1:6379(_start+0x25)[0x5568bf646d35]

------ REGISTERS ------
20863:M 24 Mar 2023 03:36:14.931 # 
RAX:0000000000000000 RBX:000000000000517f
RCX:00007fe91f694d7c RDX:0000000000000006
RDI:000000000000517f RSI:000000000000517f
RBP:00007fe91fc36040 RSP:00007ffe7b3ea9d0
R8 :00000000ffffffff R9 :0000000000000000
R10:0000000000000008 R11:0000000000000246
R12:00007fe91ffd8000 R13:0000000000000006
R14:0000000000001000 R15:00007fe91f7b02ef
RIP:00007fe91f694d7c EFL:0000000000000246
CSGSFS:002b000000000033
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9df) -> 00005568bf7da0b9
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9de) -> 00007ffe7b3eab67
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9dd) -> 0000000000000030
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9dc) -> 0000000000000000
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9db) -> 0000000000000040
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9da) -> 0000000000000020
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d9) -> 00007fe91f62b897
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d8) -> 00007fe91f7f2430
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d7) -> 00007fe91f643356
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d6) -> 0000000000001000
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d5) -> 00007ffe7b3eab10
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d4) -> 00007fe91ffd8000
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d3) -> 00007fe91fc36040
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d2) -> 0000000000000006
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d1) -> 59c14cc4e9038d00
20863:M 24 Mar 2023 03:36:14.931 # (00007ffe7b3ea9d0) -> 00007fe91f2032e8

------ INFO OUTPUT ------
# Server
redis_version:7.0.10
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:ce21150194f99eee
redis_mode:standalone
os:Linux 6.2.6-1-kvmsmall x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:13.0.1
process_id:20863
process_supervised:systemd
run_id:df21e7e25b0f43cc9775520c2e446f315d348137
tcp_port:6379
server_time_usec:1679628974929758
uptime_in_seconds:0
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:1907374
executable:/usr/sbin/redis-server
config_file:/etc/redis/[snip].conf
io_threads_active:0

# Clients
connected_clients:9
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:64452528
used_memory_human:61.47M
used_memory_rss:75104256
used_memory_rss_human:71.62M
used_memory_peak:64452528
used_memory_peak_human:61.47M
used_memory_peak_perc:100.13%
used_memory_overhead:2935188
used_memory_startup:862328
used_memory_dataset:61517340
used_memory_dataset_perc:96.74%
allocator_allocated:64412696
allocator_active:64897024
allocator_resident:69849088
total_system_memory:25194160128
total_system_memory_human:23.46G
used_memory_lua:33792
used_memory_vm_eval:33792
used_memory_lua_human:33.00K
used_memory_scripts_eval:232
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:66560
used_memory_vm_total_human:65.00K
used_memory_functions:184
used_memory_scripts:416
used_memory_scripts_human:416B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.01
allocator_frag_bytes:484328
allocator_rss_ratio:1.08
allocator_rss_bytes:4952064
rss_overhead_ratio:1.08
rss_overhead_bytes:5255168
mem_fragmentation_ratio:1.17
mem_fragmentation_bytes:10964992
mem_not_counted_for_evict:0
mem_replication_backlog:20508
mem_total_replication_buffers:20504
mem_clients_slaves:0
mem_clients_normal:191496
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:25
rdb_bgsave_in_progress:0
rdb_last_save_time:1679628974
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:13
rdb_last_load_keys_loaded:30886
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:9
total_commands_processed:64
instantaneous_ops_per_sec:34
total_net_input_bytes:95420
total_net_output_bytes:246502
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:5.76
instantaneous_output_kbps:150.31
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:39
keyspace_misses:48
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:3
dump_payload_sanitizations:0
total_reads_processed:62
total_writes_processed:59
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:6
reply_buffer_expands:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:8fd1dc32b44953bf7a850a0e697b29ee40d52ed4
master_replid2:edb552d6b5a80e2cb3fb92c1d963ddda31111ce5
master_repl_offset:34288358
second_repl_offset:34267919
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:34267919
repl_backlog_histlen:20440

# CPU
used_cpu_sys:0.088843
used_cpu_user:0.114109
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.095072
used_cpu_user_main_thread:0.106956

# Modules

# Commandstats
cmdstat_eval:calls=1,usec=61,usec_per_call=61.00,rejected_calls=0,failed_calls=0
cmdstat_multi:calls=3,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=0
cmdstat_get:calls=22,usec=54,usec_per_call=2.45,rejected_calls=1,failed_calls=0
cmdstat_setex:calls=1,usec=6,usec_per_call=6.00,rejected_calls=0,failed_calls=0
cmdstat_ltrim:calls=3,usec=6,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_mget:calls=5,usec=237,usec_per_call=47.40,rejected_calls=1,failed_calls=0
cmdstat_exec:calls=3,usec=57,usec_per_call=19.00,rejected_calls=0,failed_calls=0
cmdstat_exists:calls=3,usec=8,usec_per_call=2.67,rejected_calls=0,failed_calls=0
cmdstat_hset:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_llen:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_expire:calls=4,usec=12,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_incr:calls=3,usec=17,usec_per_call=5.67,rejected_calls=0,failed_calls=0
cmdstat_setnx:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_rpush:calls=1,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=0
cmdstat_evalsha:calls=3,usec=68,usec_per_call=22.67,rejected_calls=0,failed_calls=1
cmdstat_lrange:calls=2,usec=9,usec_per_call=4.50,rejected_calls=0,failed_calls=0
cmdstat_hget:calls=2,usec=9,usec_per_call=4.50,rejected_calls=0,failed_calls=0
cmdstat_lrem:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_hexists:calls=1,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_LOADING:count=2
errorstat_NOSCRIPT:count=1

# Latencystats
latency_percentiles_usec_eval:p50=61.183,p99=61.183,p99.9=61.183
latency_percentiles_usec_multi:p50=0.001,p99=0.001,p99.9=0.001
latency_percentiles_usec_get:p50=2.007,p99=14.015,p99.9=14.015
latency_percentiles_usec_setex:p50=6.015,p99=6.015,p99.9=6.015
latency_percentiles_usec_ltrim:p50=2.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_mget:p50=45.055,p99=81.407,p99.9=81.407
latency_percentiles_usec_exec:p50=25.087,p99=27.007,p99.9=27.007
latency_percentiles_usec_exists:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_hset:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_llen:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_expire:p50=3.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_incr:p50=6.015,p99=6.015,p99.9=6.015
latency_percentiles_usec_setnx:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_lpush:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_rpush:p50=0.001,p99=0.001,p99.9=0.001
latency_percentiles_usec_evalsha:p50=30.079,p99=31.103,p99.9=31.103
latency_percentiles_usec_lrange:p50=3.007,p99=6.015,p99.9=6.015
latency_percentiles_usec_hget:p50=4.015,p99=5.023,p99.9=5.023
latency_percentiles_usec_lrem:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_hexists:p50=0.001,p99=0.001,p99.9=0.001

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=30890,expires=9651,avg_ttl=33727081572

------ CLIENT LIST OUTPUT ------
id=8 addr=127.0.0.1:56672 laddr=127.0.0.1:6379 fd=12 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=mget user=default redir=-1 resp=2
id=9 addr=127.0.0.1:56678 laddr=127.0.0.1:6379 fd=13 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=get user=default redir=-1 resp=2
id=3 addr=127.0.0.1:56630 laddr=127.0.0.1:6379 fd=8 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=12 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=get user=default redir=-1 resp=2
id=4 addr=127.0.0.1:56636 laddr=127.0.0.1:6379 fd=9 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37632 events=r cmd=get user=default redir=-1 resp=2
id=10 addr=127.0.0.1:56690 laddr=127.0.0.1:6379 fd=14 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=5 obl=0 oll=0 omem=0 tot-mem=37632 events=r cmd=get user=default redir=-1 resp=2
id=5 addr=127.0.0.1:56646 laddr=127.0.0.1:6379 fd=7 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=4096 rbp=3851 obl=0 oll=0 omem=0 tot-mem=25344 events=r cmd=exec user=default redir=-1 resp=2
id=6 addr=127.0.0.1:56662 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=98294 argv-mem=84105 multi-mem=0 rbs=1024 rbp=1024 obl=5 oll=0 omem=0 tot-mem=184241 events=r cmd=setex user=default redir=-1 resp=2
id=7 addr=127.0.0.1:56664 laddr=127.0.0.1:6379 fd=11 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=12 obl=0 oll=0 omem=0 tot-mem=22272 events=r cmd=get user=default redir=-1 resp=2
id=11 addr=127.0.0.1:56700 laddr=127.0.0.1:6379 fd=15 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37632 events=r cmd=mget user=default redir=-1 resp=2

------ CURRENT CLIENT INFO ------
id=6 addr=127.0.0.1:56662 laddr=127.0.0.1:6379 fd=10 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=98294 argv-mem=84105 multi-mem=0 rbs=1024 rbp=1024 obl=5 oll=0 omem=0 tot-mem=184241 events=r cmd=setex user=default redir=-1 resp=2
argv[0]: '"SET"'
argv[1]: '"default:ANON_CACHE_text/html_[snip]/|m=false|c=true|o=false|d=false|b=false|t=|ca=|l=_body"'
argv[2]: '"[snip]"'
argv[3]: '"PXAT"'
argv[4]: '"1679629034929"'
20863:M 24 Mar 2023 03:36:14.931 # key 'default:ANON_CACHE_text/html_[snip]/|m=false|c=true|o=false|d=false|b=false|t=|ca=|l=_body' found in DB containing the following object:
20863:M 24 Mar 2023 03:36:14.931 # Object type: 0
20863:M 24 Mar 2023 03:36:14.931 # Object encoding: 0
20863:M 24 Mar 2023 03:36:14.931 # Object refcount: 3

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
lazyfree-lazy-user-del no
io-threads-do-reads no
list-compress-depth 0
lazyfree-lazy-server-del no
activedefrag no
lazyfree-lazy-expire no
repl-diskless-load disabled
lazyfree-lazy-user-flush no
slave-read-only yes
replica-read-only yes
sanitize-dump-payload no
lazyfree-lazy-eviction no
proto-max-bulk-len 512mb
client-query-buffer-limit 1gb
io-threads 1
repl-diskless-sync no

------ FAST MEMORY TEST ------
20863:M 24 Mar 2023 03:36:14.932 # Bio thread for job type #0 terminated
20863:M 24 Mar 2023 03:36:14.932 # Bio thread for job type #1 terminated
20863:M 24 Mar 2023 03:36:14.932 # Bio thread for job type #2 terminated
*** Preparing to test memory region 5568bf8f3000 (2306048 bytes)
*** Preparing to test memory region 5568c0c93000 (270336 bytes)
*** Preparing to test memory region 7fe918efc000 (63963136 bytes)
*** Preparing to test memory region 7fe91cbfd000 (8388608 bytes)
*** Preparing to test memory region 7fe91d3fe000 (8388608 bytes)
*** Preparing to test memory region 7fe91dbff000 (8388608 bytes)
*** Preparing to test memory region 7fe91e400000 (8388608 bytes)
*** Preparing to test memory region 7fe91ec00000 (8388608 bytes)
*** Preparing to test memory region 7fe91f604000 (4096 bytes)
*** Preparing to test memory region 7fe91f7f2000 (57344 bytes)
*** Preparing to test memory region 7fe91fc25000 (12288 bytes)
*** Preparing to test memory region 7fe91fc34000 (24576 bytes)
*** Preparing to test memory region 7fe91fc78000 (8192 bytes)
*** Preparing to test memory region 7fe91fd72000 (8192 bytes)
*** Preparing to test memory region 7fe91feeb000 (4096 bytes)
*** Preparing to test memory region 7fe91ffd8000 (12288 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /lib64/libc.so.6 (base 0x7fe91f605000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

@darix
Copy link
Author

darix commented Mar 24, 2023

now running with:

cat get-old-size-calculations.patch 
diff --git a/src/replication.c b/src/replication.c
index e86aa4b59..2e6f49af3 100644
--- a/src/replication.c
+++ b/src/replication.c
@@ -354,8 +354,11 @@ void feedReplicationBuffer(char *s, size_t len) {
             /* Avoid creating nodes smaller than PROTO_REPLY_CHUNK_BYTES, so that we can append more data into them,
              * and also avoid creating nodes bigger than repl_backlog_size / 16, so that we won't have huge nodes that can't
              * trim when we only still need to hold a small portion from them. */
+            /*
             size_t limit = max((size_t)server.repl_backlog_size / 16, (size_t)PROTO_REPLY_CHUNK_BYTES);
             size_t size = min(max(len, (size_t)PROTO_REPLY_CHUNK_BYTES), limit);
+            */
+            size_t size = (len < PROTO_REPLY_CHUNK_BYTES) ? PROTO_REPLY_CHUNK_BYTES : len;
             tail = zmalloc_usable(size + sizeof(replBufBlock), &usable_size);
             /* Take over the allocation's internal fragmentation */
             tail->size = usable_size - sizeof(replBufBlock);

@hwware
Copy link
Collaborator

hwware commented Mar 24, 2023

@darix Hi you said this crash happens in a standlone mode redis, could you please under which situation it crash? such as enable RDB, AOF or no persist, or provide us the config parameter?
BTW, can I understand this crash happen in the openSUSE platform?

Thanks

@darix
Copy link
Author

darix commented Mar 24, 2023

isnt the config part of the crash report output on top? and yes it is on openSUSE Tumbleweed. but the package is not yet in the main distro only in the devel project so far. https://build.opensuse.org/package/show/server:database/redis

and the server started crashing during the night. once I switched to the version with the patch above. it is stable again.

@oranagra
Copy link
Member

i was puzzled as to why would memcpy throw a SIGABRT, (and not a SIGSEGV).
from looking at the gdb stack trace and sniffing in the code, it seems like some voluntary overflow protection implemented by the toolchain (__builtin___memcpy_chk calls __glibc_objsize0 (__dest_)), but i don't think that in our case it can do the right thing (since we over allocate replBufBlock on purpose, and use an allocator that's not the default one).

my guess is that for some reason, the change we did in the code caused the compiler to make some wrong assumption, and changing it back avoid that false assumption, when in fact we have no bug.

more research is needed.
@tezc @yossigo maybe you have something to add.

@darix
Copy link
Author

darix commented Mar 26, 2023

  1. This is fortify_source in glibc. most linux distributions will have turned this on as it will make the impact of security bugs less severe.
  2. as you see above when i changed the size calculation the crash goes away.

@oranagra
Copy link
Member

you make it sound so simple and clear, but i don't think that's the case.

  1. we haven't seen this problem elsewhere yet, and not in our CI, so i think the it may not be as simple as that (i.e. always fail with fortify_source)
  2. reviewing the code, i don't see any miscalculation of the buffer or copy size, and i suspect your change just has some side effect of hiding the problem or confusing the compiler enough to avoid that check). i don't think it solves a bug.

i'll also add that:

  1. if i understand correctly, __glibc_objsize0 returns the size when it is known in compile time (size of a struct), so it looks inappropriate for our case (where we allocate a big buffer, and even query the size of the allocation to make sure of the internal fragmentation).
  2. in your specific case it looks related a keys being expired during restart from RDB (arguably rare case), but i suspect that if you simply run the redis test suite you'll see it reproduces quite quickly in many tests.

@darix
Copy link
Author

darix commented Mar 27, 2023

sadly i do not have the original coredump anymore because redis crashed 557 times before i stopped the auto restart. so i can not tell you what the original crash cause was.

@oranagra
Copy link
Member

@darix can you run the redis test suite on a vanilla 7.0.10 on your system?

@tezc
Copy link
Collaborator

tezc commented Mar 27, 2023

#include <stdio.h>
#include <malloc.h>
#include <string.h>

struct a {
    char c;
    char buf[];
};

int main() {
    const int BUFSIZE = 5;

    struct a *a = malloc(sizeof(*a) + BUFSIZE);
    size_t n = malloc_usable_size(a);
    printf("usable size: %zu \n", n);

    // Dummy condition to prevent compiler throw warning on compile time
    // n will always be n >= BUFSIZE
    size_t copy = n >= BUFSIZE ? BUFSIZE + 1 : BUFSIZE;

    memcpy(a->buf, "ozanozan", copy);
    printf("%.*s \n", (int) copy, (char*) a->buf);

    return 0;
}
gcc -D_FORTIFY_SOURCE=1 -O3 main.c 
./a.out 

usable size: 24 
*** buffer overflow detected ***: terminated

@oranagra I think your analysis correct. Looks like _FORTIFY_SOURCE=1 works with allocated size, not with "usable size".
Maybe we don't have a test to trigger "memcpy of larger than allocated size" here so we didn't see this issue before?

@sundb
Copy link
Collaborator

sundb commented Mar 27, 2023

By disassembling the binary of https://build.opensuse.org/package/show/server:database/redis, we can see that it does use __memcpy_chk.
image

@sundb
Copy link
Collaborator

sundb commented Mar 27, 2023

Reproduction steps:

  1. Install Redis 7.0.10 in openSUSE Tumbleweed (https://software.opensuse.org//download.html?project=server%3Adatabase&package=redis)

  2. Start Redis

# start redis
SETEX k 3 v
# stop redis and wait for 3 seconds (let redis create backlog due to expiration)
# start redis
  1. Feed backlog with various sizes of bigkey
for((i=65698;i<=1000000;i++));
do   
dd if=/dev/zero of=test bs=$i count=1
./src/redis-cli -x set k < test
if [ $? -ne 0 ]; then
    echo "failed"
    exit
fi
done

@oranagra
Copy link
Member

@sundb can you run the redis test suite on that platform and report if it passes or fails?
i did try to run your reproduction on my machine and it passes, so i assume the test suite will fail on openSUSE Tumbleweed.

i can easily reproduce @tezc example on my Jammy Ubuntu (even without explicitly specifying _FORTIFY_SOURCE), and indeed it seems to use the allocated size (not the usable, and also not the sizeof(a).
so i wonder come come, despite our extensive use of this approach (writing into the extra allocated space, even before 7.0.10), we never saw any problems with this check...

my guess is that a newer compiler got "smarter" and can remember an allocation size for some cases, whereas if we store it for later and then re-use the allocation, the compiler doesn't do that validation.

@cryptomilk
Copy link

openSUSE Tumbleweed moved to gcc 13 lately.

@darix
Copy link
Author

darix commented Mar 27, 2023

i just remembered that a lot of debug information is in the log file. so it seems the sigabrt was triggered during a restart and then trying to read some files again.

709:M 23 Mar 2023 11:05:50.030 * Background saving terminated with success
709:signal-handler (1679569569) Received SIGTERM scheduling shutdown...
709:M 23 Mar 2023 11:06:09.724 # User requested shutdown...
709:M 23 Mar 2023 11:06:09.724 * Saving the final RDB snapshot before exiting.
709:M 23 Mar 2023 11:06:10.530 * DB saved on disk
709:M 23 Mar 2023 11:06:10.530 * Removing the pid file.
709:M 23 Mar 2023 11:06:10.531 # Redis is now ready to exit, bye bye...
24352:C 23 Mar 2023 11:06:10.646 * Supervised by systemd. Please make sure you set appropriate values for TimeoutStartSec and TimeoutStopSec in your service unit.
24352:C 23 Mar 2023 11:06:10.646 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
24352:C 23 Mar 2023 11:06:10.646 # Redis version=7.0.10, bits=64, commit=00000000, modified=0, pid=24352, just started
24352:C 23 Mar 2023 11:06:10.646 # Configuration loaded
24352:M 23 Mar 2023 11:06:10.646 * monotonic clock: POSIX clock_gettime
24352:M 23 Mar 2023 11:06:10.647 # Server initialized
24352:M 23 Mar 2023 11:06:10.647 * Loading RDB produced by version 7.0.10
24352:M 23 Mar 2023 11:06:10.648 * RDB age 1 seconds
24352:M 23 Mar 2023 11:06:10.648 * RDB memory usage when created 66.47 Mb
24352:M 23 Mar 2023 11:06:10.882 * Done loading RDB, keys loaded: 34931, keys expired: 39.
24352:M 23 Mar 2023 11:06:10.882 * DB loaded from disk: 0.235 seconds
24352:M 23 Mar 2023 11:06:10.882 * Ready to accept connections


=== REDIS BUG REPORT START: Cut & paste starting from here ===
24352:M 23 Mar 2023 11:06:58.981 # Redis 7.0.10 crashed by signal: 6, si_code: -6
24352:M 23 Mar 2023 11:06:58.981 # Crashed running the instruction at: 0x7fa7abe94d7c

------ STACK TRACE ------
EIP:
/lib64/libc.so.6(+0x8fd7c)[0x7fa7abe94d7c]

Backtrace:
/lib64/libc.so.6(+0x3e420)[0x7fa7abe43420]
/lib64/libc.so.6(+0x8fd7c)[0x7fa7abe94d7c]
/lib64/libc.so.6(gsignal+0x18)[0x7fa7abe43356]
/lib64/libc.so.6(abort+0xd9)[0x7fa7abe2b897]
/lib64/libc.so.6(+0x27611)[0x7fa7abe2c611]
/lib64/libc.so.6(+0x125e9b)[0x7fa7abf2ae9b]
/lib64/libc.so.6(+0x124316)[0x7fa7abf29316]
/usr/sbin/redis-server 127.0.0.1:6379(+0xbaa79)[0x563ca5071a79]
/usr/sbin/redis-server 127.0.0.1:6379(feedReplicationBufferWithObject+0xa2)[0x563ca5071e32]
/usr/sbin/redis-server 127.0.0.1:6379(replicationFeedSlaves+0x19c)[0x563ca507208c]
/usr/sbin/redis-server 127.0.0.1:6379(propagatePendingCommands+0xfe)[0x563ca503e14e]
/usr/sbin/redis-server 127.0.0.1:6379(call+0x6a7)[0x563ca503e8e7]
/usr/sbin/redis-server 127.0.0.1:6379(processCommand+0x81d)[0x563ca503f4ad]
/usr/sbin/redis-server 127.0.0.1:6379(processInputBuffer+0xeb)[0x563ca5058eeb]
/usr/sbin/redis-server 127.0.0.1:6379(readQueryFromClient+0x2d8)[0x563ca50593a8]
/usr/sbin/redis-server 127.0.0.1:6379(+0x173d53)[0x563ca512ad53]
/usr/sbin/redis-server 127.0.0.1:6379(+0x7e148)[0x563ca5035148]
/usr/sbin/redis-server 127.0.0.1:6379(aeMain+0x1d)[0x563ca503552d]
/usr/sbin/redis-server 127.0.0.1:6379(main+0x352)[0x563ca502d662]
/lib64/libc.so.6(+0x27bb0)[0x7fa7abe2cbb0]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7fa7abe2cc79]
/usr/sbin/redis-server 127.0.0.1:6379(_start+0x25)[0x563ca502dd35]

@tezc
Copy link
Collaborator

tezc commented Mar 27, 2023

#include <stdio.h>
#include <malloc.h>
#include <string.h>

struct a {
    char c;
    char buf[];
};

void* test()
{
    const int BUFSIZE = 5;
    struct a *a = malloc(sizeof(*a) + BUFSIZE);
    size_t n = malloc_usable_size(a);
    printf("usable size: %zu \n", n);

    return a;
}

int main() {
    struct a *a = test();

    memcpy(a->buf, "ozanozan", 6);
    printf("%.*s \n", (int) 6, (char*) a->buf);

    return 0;
}

I think compiler won't ever store allocation size on heap for these checks. If allocation & usage happens in the same stack (as it happens here in replication.c), it can perform this analysis without using heap. Maybe this is the new thing in the latest gcc.
So, for example, the above example won't abort on my local. (Adding -flto to compile flags will make it abort, I guess because -flto inlines the function and then fortify stuff can inject its checks)

@marxin
Copy link

marxin commented Mar 27, 2023

Hey guys, here is a GCC developer. Yes, it's a known issue that _FORTIFY_SOURCE does not work with malloc_usable_size.

@marxin
Copy link

marxin commented Mar 27, 2023

Here is an intensive discussion with the fortification developer and systemd project where they also use it:
systemd/systemd#22801.

@darix
Copy link
Author

darix commented Mar 27, 2023

For the sake of easier testing:

'CFLAGS=-O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g'

@sundb
Copy link
Collaborator

sundb commented Mar 27, 2023

@oranagra test suite also failed.

[exception]: Executing test client: I/O error reading reply.
I/O error reading reply
    while executing
"[srv $level "client"] {*}$args"
    (procedure "r" line 7)
    invoked from within
"r set key [string repeat A 100000] "
    ("uplevel" body line 3)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 51)
    invoked from within
"test {trim on SET with big value} {
        # set a big value to trigger increasing the query buf
        r set key [string repeat A 100000] 
        ..."
    ("uplevel" body line 480)
    invoked from within
"uplevel 2 $code"
    (procedure "run_external_server_test" line 41)
    invoked from within
"run_external_server_test $code $overrides"
    (procedure "start_server" line 55)
    invoked from within
"start_server {tags {"string"}} {
    test {SET and GET an item} {
        r set x foobar
        r get x
    } {foobar}

    test {SET and GET an empt..."
    (file "tests/unit/type/string.tcl" line 1)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "
=== REDIS BUG REPORT START: Cut & paste starting from here ===
14921:M 27 Mar 2023 19:54:59.302 # Redis 7.0.10 crashed by signal: 6, si_code: -6
14921:M 27 Mar 2023 19:54:59.302 # Crashed running the instruction at: 0x7f9b79e94d7c

------ STACK TRACE ------
EIP:
/lib64/libc.so.6(+0x8fd7c)[0x7f9b79e94d7c]

Backtrace:
/lib64/libc.so.6(+0x3e420)[0x7f9b79e43420]
/lib64/libc.so.6(+0x8fd7c)[0x7f9b79e94d7c]
/lib64/libc.so.6(gsignal+0x18)[0x7f9b79e43356]
/lib64/libc.so.6(abort+0xd9)[0x7f9b79e2b897]
/lib64/libc.so.6(+0x27611)[0x7f9b79e2c611]
/lib64/libc.so.6(+0x125e9b)[0x7f9b79f2ae9b]
/lib64/libc.so.6(+0x124316)[0x7f9b79f29316]
./src/redis-server 127.0.0.1:6379(+0xbaa79)[0x55e585223a79]
./src/redis-server 127.0.0.1:6379(feedReplicationBufferWithObject+0xa2)[0x55e585223e32]
./src/redis-server 127.0.0.1:6379(replicationFeedSlaves+0x19c)[0x55e58522408c]
./src/redis-server 127.0.0.1:6379(propagatePendingCommands+0xfe)[0x55e5851f014e]
./src/redis-server 127.0.0.1:6379(call+0x6a7)[0x55e5851f08e7]
./src/redis-server 127.0.0.1:6379(processCommand+0x81d)[0x55e5851f14ad]
./src/redis-server 127.0.0.1:6379(processInputBuffer+0xeb)[0x55e58520aeeb]
./src/redis-server 127.0.0.1:6379(readQueryFromClient+0x2d8)[0x55e58520b3a8]
./src/redis-server 127.0.0.1:6379(+0x173d53)[0x55e5852dcd53]
./src/redis-server 127.0.0.1:6379(+0x7e148)[0x55e5851e7148]
./src/redis-server 127.0.0.1:6379(aeMain+0x1d)[0x55e5851e752d]
./src/redis-server 127.0.0.1:6379(main+0x352)[0x55e5851df662]
/lib64/libc.so.6(+0x27bb0)[0x7f9b79e2cbb0]
/lib64/libc.so.6(__libc_start_main+0x8b)[0x7f9b79e2cc79]
./src/redis-server 127.0.0.1:6379(_start+0x25)[0x55e5851dfd35]

------ REGISTERS ------
14921:M 27 Mar 2023 19:54:59.303 # 
RAX:0000000000000000 RBX:0000000000003a49
RCX:00007f9b79e94d7c RDX:0000000000000006
RDI:0000000000003a49 RSI:0000000000003a49
RBP:00007f9b7a41a040 RSP:00007fff51b31430
R8 :00000000ffffffff R9 :0000000000000000
R10:0000000000000008 R11:0000000000000246
R12:00007f9b7a7a5000 R13:0000000000000006
R14:0000000000001000 R15:00007f9b79fb02ef
RIP:00007f9b79e94d7c EFL:0000000000000246
CSGSFS:002b000000000033
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b3143f) -> 000055e5853730b9
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b3143e) -> 00007fff51b315c7
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b3143d) -> 0000000000000030
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b3143c) -> 0000000000000000
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b3143b) -> 0000000000000040
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b3143a) -> 0000000000000020
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31439) -> 00007f9b79e2b897
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31438) -> 00007f9b79ff2430
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31437) -> 00007f9b79e43356
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31436) -> 0000000000001000
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31435) -> 00007fff51b31570
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31434) -> 00007f9b7a7a5000
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31433) -> 00007f9b7a41a040
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31432) -> 0000000000000006
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31431) -> 0183a3ec6e879b00
14921:M 27 Mar 2023 19:54:59.303 # (00007fff51b31430) -> 00007f9b79a032e8

------ INFO OUTPUT ------
# Server
redis_version:7.0.10
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:ce21150194f99eee
redis_mode:standalone
os:Linux 6.2.6-1-default x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:13.0.1
process_id:14921
process_supervised:no
run_id:de6f9de26153a322a4978d312f23bc8e21dfd184
tcp_port:6379
server_time_usec:1679918099302699
uptime_in_seconds:35
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:2196499
executable:/home/sundb/redis/./src/redis-server
config_file:/home/sundb/redis/redis.conf
io_threads_active:0

# Clients
connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:3409408
used_memory_human:3.25M
used_memory_rss:17129472
used_memory_rss_human:16.34M
used_memory_peak:34617064
used_memory_peak_human:33.01M
used_memory_peak_perc:9.85%
used_memory_overhead:1459588
used_memory_startup:862328
used_memory_dataset:1949820
used_memory_dataset_perc:76.55%
allocator_allocated:3136952
allocator_active:4173824
allocator_resident:8216576
total_system_memory:8316571648
total_system_memory_human:7.75G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.33
allocator_frag_bytes:1036872
allocator_rss_ratio:1.97
allocator_rss_bytes:4042752
rss_overhead_ratio:2.08
rss_overhead_bytes:8912896
mem_fragmentation_ratio:5.54
mem_fragmentation_bytes:14038000
mem_not_counted_for_evict:0
mem_replication_backlog:574116
mem_total_replication_buffers:574112
mem_clients_slaves:0
mem_clients_normal:22272
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:22069
rdb_bgsave_in_progress:0
rdb_last_save_time:1679918098
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:2
rdb_last_cow_size:761856
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:199
total_commands_processed:102586
instantaneous_ops_per_sec:3867
total_net_input_bytes:33979463
total_net_output_bytes:255961332
total_net_repl_input_bytes:0
total_net_repl_output_bytes:637794
instantaneous_input_kbps:429.31
instantaneous_output_kbps:757.18
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:383.53
rejected_connections:0
sync_full:2
sync_partial_ok:0
sync_partial_err:0
expired_keys:2
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:19096
keyspace_misses:27
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:322
total_forks:2
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:34
dump_payload_sanitizations:0
total_reads_processed:105695
total_writes_processed:101880
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:5
reply_buffer_expands:4

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:f1ce9033f8b3adab3f701cf02d643bef8b689bab
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:572320
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:572320

# CPU
used_cpu_sys:3.439076
used_cpu_user:0.097839
used_cpu_sys_children:0.014915
used_cpu_user_children:0.008735
used_cpu_sys_main_thread:3.437728
used_cpu_user_main_thread:0.097557

# Modules

# Commandstats
cmdstat_pttl:calls=5,usec=4,usec_per_call=0.80,rejected_calls=0,failed_calls=0
cmdstat_set:calls=15453,usec=20013,usec_per_call=1.30,rejected_calls=0,failed_calls=0
cmdstat_blpop:calls=2,usec=14,usec_per_call=7.00,rejected_calls=0,failed_calls=0
cmdstat_getdel:calls=3,usec=8,usec_per_call=2.67,rejected_calls=0,failed_calls=0
cmdstat_mset:calls=4,usec=710,usec_per_call=177.50,rejected_calls=0,failed_calls=1
cmdstat_get:calls=13061,usec=33468,usec_per_call=2.56,rejected_calls=0,failed_calls=0
cmdstat_rename:calls=8,usec=16,usec_per_call=2.00,rejected_calls=0,failed_calls=2
cmdstat_zscan:calls=99,usec=1738,usec_per_call=17.56,rejected_calls=0,failed_calls=0
cmdstat_move:calls=5,usec=7,usec_per_call=1.40,rejected_calls=0,failed_calls=1
cmdstat_getex:calls=9,usec=33,usec_per_call=3.67,rejected_calls=1,failed_calls=1
cmdstat_xreadgroup:calls=5,usec=43,usec_per_call=8.60,rejected_calls=0,failed_calls=0
cmdstat_srem:calls=93,usec=4215,usec_per_call=45.32,rejected_calls=0,failed_calls=0
cmdstat_xinfo|stream:calls=3,usec=45,usec_per_call=15.00,rejected_calls=0,failed_calls=0
cmdstat_dump:calls=10,usec=32,usec_per_call=3.20,rejected_calls=0,failed_calls=0
cmdstat_object|freq:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_object|encoding:calls=27,usec=35,usec_per_call=1.30,rejected_calls=0,failed_calls=0
cmdstat_object|idletime:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_object|refcount:calls=18,usec=11,usec_per_call=0.61,rejected_calls=0,failed_calls=0
cmdstat_hmset:calls=3,usec=1120,usec_per_call=373.33,rejected_calls=0,failed_calls=0
cmdstat_select:calls=49,usec=34,usec_per_call=0.69,rejected_calls=0,failed_calls=0
cmdstat_hello:calls=9,usec=17,usec_per_call=1.89,rejected_calls=0,failed_calls=0
cmdstat_strlen:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_rpush:calls=2,usec=7,usec_per_call=3.50,rejected_calls=0,failed_calls=0
cmdstat_hscan:calls=98,usec=774,usec_per_call=7.90,rejected_calls=0,failed_calls=0
cmdstat_getrange:calls=1013,usec=843,usec_per_call=0.83,rejected_calls=0,failed_calls=0
cmdstat_restore:calls=11,usec=30,usec_per_call=2.73,rejected_calls=0,failed_calls=2
cmdstat_info:calls=13,usec=1151,usec_per_call=88.54,rejected_calls=0,failed_calls=0
cmdstat_multi:calls=1,usec=0,usec_per_call=0.00,rejected_calls=0,failed_calls=0
cmdstat_sync:calls=2,usec=947,usec_per_call=473.50,rejected_calls=0,failed_calls=0
cmdstat_getbit:calls=15,usec=22,usec_per_call=1.47,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=7,usec=6,usec_per_call=0.86,rejected_calls=0,failed_calls=1
cmdstat_hset:calls=65,usec=93,usec_per_call=1.43,rejected_calls=0,failed_calls=0
cmdstat_del:calls=182,usec=1280,usec_per_call=7.03,rejected_calls=0,failed_calls=0
cmdstat_sscan:calls=1672,usec=7346,usec_per_call=4.39,rejected_calls=0,failed_calls=0
cmdstat_randomkey:calls=102,usec=76,usec_per_call=0.75,rejected_calls=0,failed_calls=0
cmdstat_setbit:calls=2012,usec=1762,usec_per_call=0.88,rejected_calls=0,failed_calls=7
cmdstat_xadd:calls=1006,usec=2304,usec_per_call=2.29,rejected_calls=0,failed_calls=0
cmdstat_ttl:calls=21,usec=26,usec_per_call=1.24,rejected_calls=0,failed_calls=0
cmdstat_incrbyfloat:calls=2,usec=18,usec_per_call=9.00,rejected_calls=0,failed_calls=0
cmdstat_config|set:calls=16,usec=43,usec_per_call=2.69,rejected_calls=0,failed_calls=0
cmdstat_config|get:calls=4,usec=17,usec_per_call=4.25,rejected_calls=0,failed_calls=0
cmdstat_xdel:calls=1,usec=3,usec_per_call=3.00,rejected_calls=0,failed_calls=0
cmdstat_exec:calls=1,usec=7,usec_per_call=7.00,rejected_calls=0,failed_calls=0
cmdstat_function|flush:calls=8,usec=16,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_flushdb:calls=19,usec=4778,usec_per_call=251.47,rejected_calls=0,failed_calls=0
cmdstat_dbsize:calls=9,usec=4,usec_per_call=0.44,rejected_calls=0,failed_calls=0
cmdstat_scan:calls=680,usec=2431,usec_per_call=3.58,rejected_calls=0,failed_calls=0
cmdstat_getset:calls=2,usec=8,usec_per_call=4.00,rejected_calls=0,failed_calls=0
cmdstat_setrange:calls=15,usec=38,usec_per_call=2.53,rejected_calls=0,failed_calls=4
cmdstat_substr:calls=3,usec=4,usec_per_call=1.33,rejected_calls=0,failed_calls=0
cmdstat_setnx:calls=4,usec=34,usec_per_call=8.50,rejected_calls=0,failed_calls=0
cmdstat_setex:calls=10000,usec=23563,usec_per_call=2.36,rejected_calls=0,failed_calls=0
cmdstat_copy:calls=17,usec=98,usec_per_call=5.76,rejected_calls=0,failed_calls=1
cmdstat_xgroup|setid:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_xgroup|create:calls=2,usec=5,usec_per_call=2.50,rejected_calls=0,failed_calls=0
cmdstat_flushall:calls=8,usec=14000,usec_per_call=1750.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=4,usec=24,usec_per_call=6.00,rejected_calls=0,failed_calls=0
cmdstat_keys:calls=8,usec=19,usec_per_call=2.38,rejected_calls=0,failed_calls=0
cmdstat_renamenx:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_sadd:calls=55815,usec=61813,usec_per_call=1.11,rejected_calls=0,failed_calls=0
cmdstat_expire:calls=4,usec=11,usec_per_call=2.75,rejected_calls=0,failed_calls=0
cmdstat_srandmember:calls=3,usec=5,usec_per_call=1.67,rejected_calls=0,failed_calls=0
cmdstat_exists:calls=12,usec=7,usec_per_call=0.58,rejected_calls=0,failed_calls=0
cmdstat_debug:calls=207,usec=15679,usec_per_call=75.74,rejected_calls=0,failed_calls=1
cmdstat_mget:calls=6,usec=29,usec_per_call=4.83,rejected_calls=0,failed_calls=0
cmdstat_msetnx:calls=5,usec=19,usec_per_call=3.80,rejected_calls=0,failed_calls=1
cmdstat_zadd:calls=634,usec=4864,usec_per_call=7.67,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_BUSYKEY:count=1
errorstat_ERR:count=31
errorstat_WRONGTYPE:count=2

# Latencystats
latency_percentiles_usec_pttl:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_set:p50=1.003,p99=6.015,p99.9=16.063
latency_percentiles_usec_blpop:p50=2.007,p99=7.007,p99.9=7.007
latency_percentiles_usec_getdel:p50=3.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_mset:p50=8.031,p99=692.223,p99.9=692.223
latency_percentiles_usec_get:p50=1.003,p99=34.047,p99.9=69.119
latency_percentiles_usec_rename:p50=2.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_zscan:p50=15.039,p99=66.047,p99.9=71.167
latency_percentiles_usec_move:p50=1.003,p99=3.007,p99.9=3.007
latency_percentiles_usec_getex:p50=2.007,p99=17.023,p99.9=17.023
latency_percentiles_usec_xreadgroup:p50=8.031,p99=12.031,p99.9=12.031
latency_percentiles_usec_srem:p50=55.039,p99=101.375,p99.9=103.423
latency_percentiles_usec_xinfo|stream:p50=15.039,p99=17.023,p99.9=17.023
latency_percentiles_usec_dump:p50=3.007,p99=5.023,p99.9=5.023
latency_percentiles_usec_object|freq:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_object|encoding:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_object|idletime:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_object|refcount:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_hmset:p50=14.015,p99=1097.727,p99.9=1097.727
latency_percentiles_usec_select:p50=1.003,p99=3.007,p99.9=3.007
latency_percentiles_usec_hello:p50=2.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_strlen:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_rpush:p50=3.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_hscan:p50=7.007,p99=20.095,p99.9=22.015
latency_percentiles_usec_getrange:p50=1.003,p99=2.007,p99.9=3.007
latency_percentiles_usec_restore:p50=3.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_info:p50=46.079,p99=335.871,p99.9=335.871
latency_percentiles_usec_multi:p50=0.001,p99=0.001,p99.9=0.001
latency_percentiles_usec_sync:p50=417.791,p99=532.479,p99.9=532.479
latency_percentiles_usec_getbit:p50=1.003,p99=10.047,p99.9=10.047
latency_percentiles_usec_ping:p50=1.003,p99=3.007,p99.9=3.007
latency_percentiles_usec_hset:p50=1.003,p99=3.007,p99.9=4.015
latency_percentiles_usec_del:p50=7.007,p99=30.079,p99.9=48.127
latency_percentiles_usec_sscan:p50=4.015,p99=11.007,p99.9=19.071
latency_percentiles_usec_randomkey:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_setbit:p50=1.003,p99=2.007,p99.9=10.047
latency_percentiles_usec_xadd:p50=2.007,p99=5.023,p99.9=17.023
latency_percentiles_usec_ttl:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_incrbyfloat:p50=6.015,p99=12.031,p99.9=12.031
latency_percentiles_usec_config|set:p50=3.007,p99=5.023,p99.9=5.023
latency_percentiles_usec_config|get:p50=3.007,p99=9.023,p99.9=9.023
latency_percentiles_usec_xdel:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_exec:p50=7.007,p99=7.007,p99.9=7.007
latency_percentiles_usec_function|flush:p50=2.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_flushdb:p50=43.007,p99=2424.831,p99.9=2424.831
latency_percentiles_usec_dbsize:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_scan:p50=3.007,p99=9.023,p99.9=16.063
latency_percentiles_usec_getset:p50=2.007,p99=6.015,p99.9=6.015
latency_percentiles_usec_setrange:p50=2.007,p99=10.047,p99.9=10.047
latency_percentiles_usec_substr:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_setnx:p50=1.003,p99=29.055,p99.9=29.055
latency_percentiles_usec_setex:p50=2.007,p99=10.047,p99.9=18.047
latency_percentiles_usec_copy:p50=3.007,p99=17.023,p99.9=17.023
latency_percentiles_usec_xgroup|setid:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_xgroup|create:p50=2.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_flushall:p50=1196.031,p99=3571.711,p99.9=3571.711
latency_percentiles_usec_lpush:p50=5.023,p99=10.047,p99.9=10.047
latency_percentiles_usec_keys:p50=2.007,p99=4.015,p99.9=4.015
latency_percentiles_usec_renamenx:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_sadd:p50=1.003,p99=3.007,p99.9=11.007
latency_percentiles_usec_expire:p50=3.007,p99=3.007,p99.9=3.007
latency_percentiles_usec_srandmember:p50=2.007,p99=2.007,p99.9=2.007
latency_percentiles_usec_exists:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_debug:p50=22.015,p99=3506.175,p99.9=3555.327
latency_percentiles_usec_mget:p50=2.007,p99=20.095,p99.9=20.095
latency_percentiles_usec_msetnx:p50=2.007,p99=10.047,p99.9=10.047
latency_percentiles_usec_zadd:p50=4.015,p99=33.023,p99.9=78.335

# Cluster
cluster_enabled:0

# Keyspace
db9:keys=14,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=143 addr=127.0.0.1:43419 laddr=127.0.0.1:6379 fd=8 name= age=8 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=100006 multi-mem=0 rbs=1024 rbp=1024 obl=5 oll=0 omem=0 tot-mem=216510 events=r cmd=set user=default redir=-1 resp=2

------ CURRENT CLIENT INFO ------
id=143 addr=127.0.0.1:43419 laddr=127.0.0.1:6379 fd=8 name= age=8 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=100006 multi-mem=0 rbs=1024 rbp=1024 obl=5 oll=0 omem=0 tot-mem=216510 events=r cmd=set user=default redir=-1 resp=2
argv[0]: '"set"'
argv[1]: '"key"'
argv[2]: '"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"'
14921:M 27 Mar 2023 19:54:59.304 # key 'key' found in DB containing the following object:
14921:M 27 Mar 2023 19:54:59.304 # Object type: 0
14921:M 27 Mar 2023 19:54:59.304 # Object encoding: 0
14921:M 27 Mar 2023 19:54:59.304 # Object refcount: 3

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
slave-read-only yes
replica-read-only yes
io-threads-do-reads no
lazyfree-lazy-expire no
proto-max-bulk-len 512mb
lazyfree-lazy-user-del no
repl-diskless-load disabled
repl-diskless-sync yes
client-query-buffer-limit 1gb
sanitize-dump-payload no
activedefrag no
list-compress-depth 0
io-threads 1
lazyfree-lazy-eviction no
lazyfree-lazy-server-del no
lazyfree-lazy-user-flush no

------ FAST MEMORY TEST ------
14921:M 27 Mar 2023 19:54:59.305 # Bio thread for job type #0 terminated
14921:M 27 Mar 2023 19:54:59.305 # Bio thread for job type #1 terminated
14921:M 27 Mar 2023 19:54:59.305 # Bio thread for job type #2 terminated
*** Preparing to test memory region 55e58548c000 (2306048 bytes)
*** Preparing to test memory region 55e5857e2000 (270336 bytes)
*** Preparing to test memory region 7f9b74d7c000 (40370176 bytes)
*** Preparing to test memory region 7f9b773fd000 (8388608 bytes)
*** Preparing to test memory region 7f9b77bfe000 (8388608 bytes)
*** Preparing to test memory region 7f9b783ff000 (8388608 bytes)
*** Preparing to test memory region 7f9b78c00000 (8388608 bytes)
*** Preparing to test memory region 7f9b79400000 (8388608 bytes)
*** Preparing to test memory region 7f9b79e04000 (4096 bytes)
*** Preparing to test memory region 7f9b79ff2000 (57344 bytes)
*** Preparing to test memory region 7f9b7a411000 (12288 bytes)
*** Preparing to test memory region 7f9b7a418000 (24576 bytes)
*** Preparing to test memory region 7f9b7a442000 (8192 bytes)
*** Preparing to test memory region 7f9b7a533000 (8192 bytes)
*** Preparing to test memory region 7f9b7a6ac000 (4096 bytes)
*** Preparing to test memory region 7f9b7a7a5000 (12288 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

@marxin
Copy link

marxin commented Mar 27, 2023

CCing glibc maintainer of the functionality: @siddhesh. So basically the function malloc_usable_size is a diagnostics function of glibc and should not be used in production:

https://man7.org/linux/man-pages/man3/malloc_usable_size.3.html:

NOTES
...

       The main use of this function is for debugging and introspection.

@siddhesh
Copy link

siddhesh commented Mar 27, 2023

Thanks @marxin!

Right, the problem is with the usable size function returning a different value than what is allocated, thus breaking the compiler's assumption that the sizes ought to be the same. We (i.e. the glibc community) have been considering deprecating malloc_usable_size because it's being used (and other allocators growing compatible interfaces) for non-diagnostic uses despite it being clearly mentioned as being a diagnostic-only interface. Accessing the extra memory continues to violate the C standard, so the pattern continues to be unsafe.

The reason why redis is running into this now and didn't before is in fact because the compiler now has more fortification coverage with _FORTIFY_SOURCE=3 and can see this inconsistencies better.

The ideal fix for this should be to have the size returned from the allocator be consistent with the requested size, but if that's not possible (as it seems to me from a quick peek at the allocator; it seems to do a lot of things, including wrapping around external allocators), then the fix may be the same as systemd, i.e. to wrap the zmalloc_usable_size and zmalloc_size calls with a call to a dummy realloc that hints to the compiler that the block is larger.

Something like this:

zmalloc.h:
...
extern void *extend_to_usable (void *, size_t) __attribute__((alloc_size, 2)) __attribute__((noinline)) __attribute__((__returns_nonnull__));

#define zmalloc_size(_p) \
({ \
  size_t ret = real_zmalloc_size (_p); \
  _p = extend_to_usable (p, ret); \
  ret; \
})
zmalloc.c:
...
void *extend_to_usable (void *p, size_t sz __attribute__((unused)))
{
  return p;
}

Alternatively, you could do this during allocation, by extending the zmalloc call to zmalloc() + zmalloc_size() + extend_to_usable(), likewise for zcalloc, zrealloc, etc. But I reckon the usable_size hack is easier...

@oranagra
Copy link
Member

Thanks to all the GCC and glibc developers for stepping in to clear this up.
I think Redis will continue using malloc_usable_size to remain memory efficient, despite violating the standard.

i.e. to wrap the zmalloc_usable_size and zmalloc_size calls with a call to a dummy realloc that hints to the compiler that the block is larger

@siddhesh by dummy realloc you mean actually calling realloc to request the allocation to be resized to the size we already know it has, or did you mean an empty function like in your example above which just returns the pointer blindly?

please correct me if i'm wrong, won't that (the second approach) completely disable the fortification checks (the compiler will be completely unaware of the size of the allocation and won't do any checks)? maybe instead we can disable part of the fortification checks?

@sundb can you try that suggestion to make sure it works and also check if it has any impact on performance?

@sundb
Copy link
Collaborator

sundb commented Mar 28, 2023

@oranagra I will try it.

@cryptomilk
Copy link

talloc provides functions to get the size of talloc chunk even including children.

@siddhesh
Copy link

Thanks to all the GCC and glibc developers for stepping in to clear this up.
I think Redis will continue using malloc_usable_size to remain memory efficient, despite violating the standard.

So to be clear, with the workaround I suggested, the code will actually become compliant. However, there's a good chance that glibc will deprecate malloc_usable_size in future and maybe provide a different interface to query the usable size in a manner that is safe.

Also, the risk is not just limited to such usage being standards-unsafe; there's also no guarantee that the size is consistent. That is, the underlying allocator is free to grow that chunk at any time, making the return value unstable and actual usage unsafe. None of the allocator implementations do that at the moment, but there's nothing stopping them from doing that if they find it to be somehow more performant. That's really what makes it a very bad interface to actually use the extra size.

There are standards discussions to provide safer interfaces to do something like this, which may be a better fit for the redis (and systemd) use case and the workaround I suggested basically tries to get the usage closer to something like that.

@siddhesh by dummy realloc you mean actually calling realloc to request the allocation to be resized to the size we already know it has, or did you mean an empty function like in your example above which just returns the pointer blindly?

I did mean an empty function like the example I wrote above; I wrote a workaround like that for systemd; see systemd/systemd@7929e18 and systemd/systemd@4f79f54

please correct me if i'm wrong, won't that (the second approach) completely disable the fortification checks (the compiler will be completely unaware of the size of the allocation and won't do any checks)? maybe instead we can disable part of the fortification checks?

On the contrary, the compiler will be able to see the new size and do the right thing for your code base. For the compiler, the code will look like this:

ptr = zmalloc (sz); // Compiler sees SZ as the size of PTR.
usable_sz = zmalloc_usable_size (ptr);
ptr = extend_to_usable (ptr, usable_sz); // Compiler sees that the size of PTR is now USABLE_SZ
memcpy (ptr, src, usable_sz);  // Compiler will generate __memcpy_chk (ptr, src, usable_sz, usable_sz) 
                               // and then optimize it to memcpy (ptr, src, usable_sz) since it knows
                               // at compile time that the access is safe.

@sundb can you try that suggestion to make sure it works and also check if it has any impact on performance?

If needed, I'll be happy to help with review to verify that it matches what I suggested (I obviously won't be very useful with nuances of the code base), please feel free to tag me into the PR.

@oranagra
Copy link
Member

oranagra commented Mar 28, 2023

@siddhesh thank you.
i'm curious to understand why the compiler think the size of ptr was changed to usable_sz, does it consider the extend_to_usable a realloc function? is it just because it's interface it to take a size_t and return a void*?

also, am i correct to understand that extra call will be optimized out and this change have zero impact on our performance?
and that it relies on the fact that the fortification decisions are taken before optimization (for a moment i thought that maybe because we're now using -flto the fortification can maybe reach better conclusions than before).

[Edit] I now see _noinline_ in the second commit you referred to, so i suppose my above paragraph is wrong, would love to get confirmation / clarification.

[Edit] i now see the _alloc_(2) attribute, so that answers my first question. sorry for missing them in your initial code snippet.

@siddhesh
Copy link

[Edit] I now see _noinline_ in the second commit you referred to, so i suppose my above paragraph is wrong, would love to get confirmation / clarification.

yeah we can't let the function get optimized away yet because gcc (atleast, maybe even clang but I haven't checked) loses attributes whenever a function is inlined, so the _alloc_(2) is missed and the compiler no longer sees it as an allocator function and simply optimizes it away. It's being tracked here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503

So the impact is a function call overhead for now, hopefully we'll claw it back soon in gcc with a fix to that bug.

[Edit] i now see the _alloc_(2) attribute, so that answers my first question. sorry for missing them in your initial code snippet.

Correct too :)

@sundb
Copy link
Collaborator

sundb commented Mar 28, 2023

@siddhesh I tried to make some changes(sundb@19ed6a4) in redis, but ran into a strange problem.

Calling extend_to_usable() after zmalloc_usable() is fine.

tail = zmalloc_usable(size + sizeof(replBufBlock), &usable_size).
tail = extend_to_usable(tail, usable_size); /* gcc can see it */

But when I move extend_to_usable() in to ztrymalloc_usable(), gcc doesn't see the new size.

zmalloc_usable
    -> ztrymalloc_usable
        -> size = zmalloc_size(ptr).
           ptr = extend_to_usable(ptr, size).

I am not sure what is the right behavior.

@siddhesh
Copy link

Calling extend_to_usable inside ztrymalloc_usable is not going to work because the function itself is declared with alloc_size(1), indicating that the size of the returned block is argument 1 of the call.

Basically, the extend_to_usable call has to follow all calls to *_usable functions with the returned usable size.

@sundb
Copy link
Collaborator

sundb commented Mar 28, 2023

@siddhesh Thanks for your help.

@darix
Copy link
Author

darix commented Apr 11, 2023

do we have an ETA for the release?

@oranagra
Copy link
Member

i suppose that since you indicated a recent change is what caused this (old) issue to get exposed, then we should release soon.
still, i'd like to let the change mature for a while in unstable, considering it's not very small.
i think i'll aim for next week.

@oranagra
Copy link
Member

I noticed that on 7.0.x we didn't have any alloc_size attributes in zmalloc.h, and we're also not using -lto, so i wonder how come it crashed in the initial report of this issue.
as far as i can tell the Makefile is unmodified.
@siddhesh is gcc 13 able to somehow track these attributes even from other object files?

@darix
Copy link
Author

darix commented Apr 13, 2023

distros started to enable -lto via CFLAGS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants