Skip to content

Conversation

sundb
Copy link
Collaborator

@sundb sundb commented Aug 29, 2025

This PR fixes two crashes due to the defragmentation of the Lua script, which were by #13108

  1. During long-running Lua script execution, active defragmentation may be triggered, causing the luaScript structure to be reallocated to a new memory location, then we access l->node(may be reallocatedd) after script execution to update the Lua LRU list.
    In this PR, we don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends.
    Note that defrag is now only permitted during loading.
    This PR also reverts the changes made by Prevent active defrag from triggering during replicaof db flush #14274.

crash report:

Logged crash report (pid 344907):
=== REDIS BUG REPORT START: Cut & paste starting from here ===
344907:M 29 Aug 2025 15:00:47.873 # Redis 255.255.255 crashed by signal: 11, si_code: 1
344907:M 29 Aug 2025 15:00:47.873 # Accessing address: 0x18
344907:M 29 Aug 2025 15:00:47.873 # Crashed running the instruction at: 0x564402cd2900

------ STACK TRACE ------
EIP:
src/redis-server 127.0.0.1:21121(listUnlinkNode+0x10)[0x564402cd2900]

344911 bio_lazy_free
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7f1496498d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7f149649b7ed]
src/redis-server 127.0.0.1:21121(bioProcessBackgroundJobs+0x2b7)[0x564402ddd027]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7f149649caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7f1496529c3c]

344910 bio_aof
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7f1496498d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7f149649b7ed]
src/redis-server 127.0.0.1:21121(bioProcessBackgroundJobs+0x2b7)[0x564402ddd027]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7f149649caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7f1496529c3c]

344909 bio_close_file
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7f1496498d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7f149649b7ed]
src/redis-server 127.0.0.1:21121(bioProcessBackgroundJobs+0x2b7)[0x564402ddd027]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7f149649caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7f1496529c3c]

344907 redis-server *
/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7f1496445330]
src/redis-server 127.0.0.1:21121(listUnlinkNode+0x10)[0x564402cd2900]
src/redis-server 127.0.0.1:21121(evalGenericCommand+0x1f9)[0x564402dd3339]
src/redis-server 127.0.0.1:21121(call+0x171)[0x564402cf3a91]
src/redis-server 127.0.0.1:21121(processCommand+0xae8)[0x564402cf5188]
src/redis-server 127.0.0.1:21121(processInputBuffer+0xd9)[0x564402d1ba79]
src/redis-server 127.0.0.1:21121(readQueryFromClient+0x358)[0x564402d214c8]
src/redis-server 127.0.0.1:21121(+0x2267c4)[0x564402e697c4]
src/redis-server 127.0.0.1:21121(aeMain+0xf9)[0x564402cd69e9]
src/redis-server 127.0.0.1:21121(main+0x4a7)[0x564402cd08d7]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f149642a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f149642a28b]
src/redis-server 127.0.0.1:21121(_start+0x25)[0x564402cd2155]

4/4 expected stacktraces.

------ STACK TRACE DONE ------

------ REGISTERS ------
344907:M 29 Aug 2025 15:00:47.875 # 
RAX:0000000000000010 RBX:00007f149605fe18
RCX:0000564402dd32a4 RDX:00007f149601b668
RDI:00007f1496025cf0 RSI:00007f149605fe18
RBP:00007ffc6193f2a0 RSP:00007ffc6193f1e8
R8 :000000000098fc48 R9 :0000000000000008
R10:00007ffc6193f13c R11:0000000000000246
R12:00007f1496025cf0 R13:0000000000000000
R14:00007ffc6193f200 R15:00007f14960f6e00
RIP:0000564402cd2900 EFL:0000000000010202
CSGSFS:002b000000000033
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f7) -> 00007ffc6193f2f8
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f6) -> 577b1cfe39003765
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f5) -> 3438663064336437
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f4) -> 6337333539643939
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f3) -> 3266623161366334
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f2) -> 6137396264643165
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f1) -> 6635613439365f66
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1f0) -> fffffffeffffffff
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1ef) -> 00000004fcb39a6d
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1ee) -> 0000000300000190
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1ed) -> 00007f14960f6e00
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1ec) -> 00007f14960f6000
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1eb) -> 00007ffc6193f232
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1ea) -> 00007ffc6193f232
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1e9) -> 00007ffc6193f2cc
344907:M 29 Aug 2025 15:00:47.875 # (00007ffc6193f1e8) -> 0000564402dd3339

------ INFO OUTPUT ------
# Server
redis_version:255.255.255
redis_git_sha1:35aacdf8
redis_git_dirty:1
redis_build_id:87ae0a9ca86744b8
redis_mode:standalone
os:Linux 6.8.0-78-generic x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:13.3.0
process_id:344907
process_supervised:no
run_id:9fd151d4898cd35b559282a26b3a224921f97621
tcp_port:21121
server_time_usec:1756450847873832
uptime_in_seconds:0
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:11620383
executable:/home/sundb/data/redis_fork_3/src/redis-server
config_file:/home/sundb/data/redis_fork_3/./tests/tmp/redis.conf.344262.22
io_threads_active:0
listener0:name=tcp,bind=127.0.0.1,port=21121
listener1:name=unix,bind=/home/sundb/data/redis_fork_3/tests/tmp/server.344262.21/socket

# Clients
connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
pubsub_clients:0
watching_clients:0
clients_in_timeout_table:0
total_watched_keys:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

# Memory
used_memory:838056
used_memory_human:818.41K
used_memory_rss:9961472
used_memory_rss_human:9.50M
used_memory_peak:838056
used_memory_peak_human:818.41K
used_memory_peak_time:1756450847
used_memory_peak_perc:100.36%
used_memory_overhead:720096
used_memory_startup:652144
used_memory_dataset:117960
used_memory_dataset_perc:63.45%
allocator_allocated:1160896
allocator_active:1503232
allocator_resident:9125888
allocator_muzzy:0
total_system_memory:66516320256
total_system_memory_human:61.95G
used_memory_lua:33792
used_memory_vm_eval:33792
used_memory_lua_human:33.00K
used_memory_scripts_eval:176
number_of_cached_scripts:1
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:33792
used_memory_vm_total:67584
used_memory_vm_total_human:66.00K
used_memory_functions:192
used_memory_scripts:368
used_memory_scripts_human:368B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.39
allocator_frag_bytes:266304
allocator_rss_ratio:6.07
allocator_rss_bytes:7622656
rss_overhead_ratio:1.09
rss_overhead_bytes:835584
mem_fragmentation_ratio:15.13
mem_fragmentation_bytes:9302960
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_replica_full_sync_buffer:0
mem_clients_slaves:0
mem_clients_normal:0
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
mem_overhead_db_hashtable_rehashing:0
active_defrag_running:99
lazyfree_pending_objects:0
lazyfreed_objects:0

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1756450847
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Threads
io_thread_0:clients=2,reads=8,writes=6

# Stats
total_connections_received:3
total_commands_processed:5
instantaneous_ops_per_sec:0
total_net_input_bytes:192
total_net_output_bytes:116
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_subkeys:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
evicted_scripts:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:3
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:250
current_active_defrag_time:91
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:2
dump_payload_sanitizations:0
total_reads_processed:8
total_writes_processed:6
io_threaded_reads_processed:0
io_threaded_writes_processed:0
io_threaded_total_prefetch_batches:0
io_threaded_total_prefetch_entries:0
client_query_buffer_limit_disconnections:0
client_output_buffer_limit_disconnections:0
reply_buffer_shrinks:0
reply_buffer_expands:0
eventloop_cycles:13196
eventloop_duration_sum:113416
eventloop_duration_cmd_sum:4
instantaneous_eventloop_cycles_per_sec:7341
instantaneous_eventloop_duration_usec:0
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:ed3c5bebfe1f4c1169ad27399fc5ddbc460e378d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.013026
used_cpu_user:0.306627
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.013015
used_cpu_user_main_thread:0.306360

# Modules
module:name=vectorset,ver=1,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors|handle-repl-async-load]

# Commandstats
cmdstat_ping:calls=1,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
cmdstat_config|set:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_script|kill:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_select:calls=2,usec=3,usec_per_call=1.50,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_BUSY:count=1
errorstat_ERR:count=1

# Latencystats
latency_percentiles_usec_ping:p50=0.001,p99=0.001,p99.9=0.001
latency_percentiles_usec_config|set:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_script|kill:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_select:p50=1.003,p99=2.007,p99.9=2.007

# Cluster
cluster_enabled:0

# Keyspace

# Keysizes

------ CLIENT LIST OUTPUT ------
id=5 addr=127.0.0.1:41925 laddr=127.0.0.1:21121 fd=12 name= age=0 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=script|kill user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 tot-net-in=117 tot-net-out=104 tot-cmds=3
id=6 addr=127.0.0.1:40183 laddr=127.0.0.1:21121 fd=13 name= age=0 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=45 qbuf-free=20429 argv-mem=22 multi-mem=0 rbs=16384 rbp=16384 obl=117 oll=0 omem=0 tot-mem=37806 events=r cmd=eval user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 tot-net-in=68 tot-net-out=5 tot-cmds=1

------ CURRENT CLIENT INFO ------
id=6 addr=127.0.0.1:40183 laddr=127.0.0.1:21121 fd=13 name= age=0 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=45 qbuf-free=20429 argv-mem=22 multi-mem=0 rbs=16384 rbp=16384 obl=117 oll=0 omem=0 tot-mem=37806 events=r cmd=eval user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 tot-net-in=68 tot-net-out=5 tot-cmds=1
argc: '3'
argv[0]: '"eval"'
argv[1]: '"while true do end"'
argv[2]: '"0"'

------ EXECUTING CLIENT INFO ------
id=6 addr=127.0.0.1:40183 laddr=127.0.0.1:21121 fd=13 name= age=0 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=45 qbuf-free=20429 argv-mem=22 multi-mem=0 rbs=16384 rbp=16384 obl=117 oll=0 omem=0 tot-mem=37806 events=r cmd=eval user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 tot-net-in=68 tot-net-out=5 tot-cmds=1
argc: '3'
argv[0]: '"eval"'
argv[1]: '"while true do end"'
argv[2]: '"0"'

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
client-query-buffer-limit 1gb
slave-read-only yes
proto-max-bulk-len 512mb
repl-diskless-sync yes
io-threads 1
activedefrag yes
lazyfree-lazy-user-flush no
lazyfree-lazy-server-del no
sanitize-dump-payload no
lazyfree-lazy-user-del no
repl-diskless-load disabled
lazyfree-lazy-expire no
replica-read-only yes
list-compress-depth 0
lazyfree-lazy-eviction no

------ FAST MEMORY TEST ------
344907:M 29 Aug 2025 15:00:47.876 # Bio worker thread #0 terminated
344907:M 29 Aug 2025 15:00:47.876 # Bio worker thread #1 terminated
344907:M 29 Aug 2025 15:00:47.876 # Bio worker thread #2 terminated
*** Preparing to test memory region 56440305a000 (2322432 bytes)
*** Preparing to test memory region 56443a6d1000 (135168 bytes)
*** Preparing to test memory region 7f148c000000 (135168 bytes)
*** Preparing to test memory region 7f14923fc000 (8388608 bytes)
*** Preparing to test memory region 7f1492bfd000 (8388608 bytes)
*** Preparing to test memory region 7f14933fe000 (8388608 bytes)
*** Preparing to test memory region 7f1493bff000 (8388608 bytes)
*** Preparing to test memory region 7f1494400000 (8388608 bytes)
*** Preparing to test memory region 7f1494c00000 (10485760 bytes)
*** Preparing to test memory region 7f1495c00000 (8388608 bytes)
*** Preparing to test memory region 7f1496605000 (53248 bytes)
*** Preparing to test memory region 7f1496a7a000 (16384 bytes)
*** Preparing to test memory region 7f1496bf7000 (28672 bytes)
*** Preparing to test memory region 7f1496d2c000 (8192 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: listUnlinkNode (base: 0x564402cd28f0)
Module: src/redis-server 127.0.0.1:21121 (base 0x564402c43000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x564402cd28f0 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
344907:M 29 Aug 2025 15:00:48.043 # dump of function (hexdump of 144 bytes):
f30f1efa488b06488b56084885c07420488950084885d2741f660fefc04889020f110648836f2801c30f1f80000000004889174885d275e1660fefc0488947080f110648836f2801c30f1f8000000000f30f1efa554889e54883ec1064488b042528000000488945f831c04885ff742e488d75f0e8e77f2100488b4df064488b042548f5ffff4883f8ff742448c1e006

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/redis/redis/issues

  If a Redis module was involved, please open in the module's repo instead.

  Suspect RAM error? Use redis-server --test-memory to verify it.

  Some other issues could be detected by redis-server --check-system
  1. Forgot to update the Lua LUR list node's value.
    Since lua_scripts_lru_list node stores a pointer to the lua_script's key, we also need to update node->value when the key is reallocated.
    In this PR, after performing defragmentation on a Lua script, if the script is in the LRU list, its reference in the LRU list will be unconditionally updated.

crash report:

Logged crash report (pid 347529):
=== REDIS BUG REPORT START: Cut & paste starting from here ===
347529:M 29 Aug 2025 15:02:29.879 # === ASSERTION FAILED CLIENT CONTEXT ===
347529:M 29 Aug 2025 15:02:29.879 # client->flags = 536870912
347529:M 29 Aug 2025 15:02:29.879 # client->conn = fd=12
347529:M 29 Aug 2025 15:02:29.879 # client->argc = 3
347529:M 29 Aug 2025 15:02:29.879 # client->argv[0] = "eval" (refcount: 1)
347529:M 29 Aug 2025 15:02:29.879 # client->argv[1] = "return 502" (refcount: 1)
347529:M 29 Aug 2025 15:02:29.879 # client->argv[2] = "0" (refcount: 1)
347529:M 29 Aug 2025 15:02:29.879 # === RECURSIVE ASSERTION FAILED ===
347529:M 29 Aug 2025 15:02:29.879 # ==> eval.c:509 'de' is not true

------ STACK TRACE ------

347533 bio_lazy_free
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7f0e4ce98d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7f0e4ce9b7ed]
src/redis-server 127.0.0.1:21628(bioProcessBackgroundJobs+0x2b7)[0x55e12faa9fd7]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7f0e4ce9caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7f0e4cf29c3c]

347529 redis-server *
src/redis-server 127.0.0.1:21628(luaDeleteFunction+0x14d)[0x55e12fa9f6ed]
src/redis-server 127.0.0.1:21628(luaCreateFunction+0x6ef)[0x55e12fa9fe3f]
src/redis-server 127.0.0.1:21628(evalGenericCommand+0x2c8)[0x55e12faa03c8]
src/redis-server 127.0.0.1:21628(call+0x171)[0x55e12f9c0ad1]
src/redis-server 127.0.0.1:21628(processCommand+0xae8)[0x55e12f9c21c8]
src/redis-server 127.0.0.1:21628(processInputBuffer+0xd9)[0x55e12f9e8ab9]
src/redis-server 127.0.0.1:21628(readQueryFromClient+0x358)[0x55e12f9ee508]
src/redis-server 127.0.0.1:21628(+0x2267d4)[0x55e12fb367d4]
src/redis-server 127.0.0.1:21628(aeMain+0xf9)[0x55e12f9a39d9]
src/redis-server 127.0.0.1:21628(main+0x4a7)[0x55e12f99d8c7]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f0e4ce2a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f0e4ce2a28b]
src/redis-server 127.0.0.1:21628(_start+0x25)[0x55e12f99f145]

347531 bio_close_file
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7f0e4ce98d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7f0e4ce9b7ed]
src/redis-server 127.0.0.1:21628(bioProcessBackgroundJobs+0x2b7)[0x55e12faa9fd7]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7f0e4ce9caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7f0e4cf29c3c]

347532 bio_aof
/lib/x86_64-linux-gnu/libc.so.6(+0x98d71)[0x7f0e4ce98d71]
/lib/x86_64-linux-gnu/libc.so.6(pthread_cond_wait+0x20d)[0x7f0e4ce9b7ed]
src/redis-server 127.0.0.1:21628(bioProcessBackgroundJobs+0x2b7)[0x55e12faa9fd7]
/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7f0e4ce9caa4]
/lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7f0e4cf29c3c]

4/4 expected stacktraces.

------ STACK TRACE DONE ------

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/redis/redis/issues

  If a Redis module was involved, please open in the module's repo instead.

  Suspect RAM error? Use redis-server --test-memory to verify it.

  Some other issues could be detected by redis-server --check-system

Copy link

snyk-io bot commented Aug 29, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@sundb sundb requested review from Copilot and oranagra August 29, 2025 07:04
@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 29, 2025
Copilot

This comment was marked as outdated.

@kaplanben
Copy link

Logo
Checkmarx One – Scan Summary & Details647f6360-7349-4f99-ae54-fbdf660d4e52

New Issues (1)

Checkmarx found the following issues in this Pull Request

Severity Issue Source File / Package Checkmarx Insight
CRITICAL Buffer_Overflow_Wrong_Buffer_Size /src/sha1.c: 65
detailsThe buffer buffer created in /src/sha1.c at line 65 is written to a buffer in /src/sha1.c at line 65 by block, but an error in calculating the al...
ID: N9gGLsUP8UQvFZEl1N39fgD7jYQ%3D
Attack Vector

@oranagra
Copy link
Member

oranagra commented Aug 29, 2025

Probably needs a backported as far as 7.4.
But maybe in these old versions it would be simpler to relocate the lru list reinsertion rather than change defrag.c?

@sundb sundb changed the title Fix crash during lua script defragmentation Fix crash during lua script defrag Sep 1, 2025
@sundb sundb requested a review from oranagra September 1, 2025 02:48
@oranagra
Copy link
Member

oranagra commented Sep 1, 2025

so now this PR just contains 2 bug fixes:

  1. don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends.
  2. don't forget to update the list node pointer when it moves.

both issues could have caused crashes, and this small fix should be backported.
please update the top comment to reflect these details.

and IIRC there are other improvements (defragging the list nodes and more), that can be done in another PR..

@sundb
Copy link
Collaborator Author

sundb commented Sep 1, 2025

so now this PR just contains 2 bug fixes:

  1. don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends.
  2. don't forget to update the list node pointer when it moves.

both issues could have caused crashes, and this small fix should be backported. please update the top comment to reflect these details.

Update the top comment.

@sundb sundb requested a review from Copilot September 2, 2025 02:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes two critical crashes that occur during Lua script defragmentation. The primary issue is that active defragmentation can relocate luaScript structures to new memory locations while scripts are executing, causing segmentation faults when accessing the original memory addresses.

  • Restricts defragmentation to only occur during AOF loading to prevent interference with executing scripts
  • Adds callback to update LRU list node pointers when Lua script keys are reallocated during defragmentation
  • Reverts previous temporary defragmentation disable logic during database flush operations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/server.c Limits defragmentation to AOF loading context only
src/replication.c Removes temporary defrag disable logic during replication
src/defrag.c Adds callback to update LRU node pointers for Lua scripts

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sundb sundb merged commit 8ad5421 into redis:unstable Sep 2, 2025
19 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Redis 8.4 Sep 2, 2025
@sundb
Copy link
Collaborator Author

sundb commented Sep 3, 2025

@oranagra i found that this PR is not enough, aof loading may loading a long run lua script.😯

@oranagra
Copy link
Member

oranagra commented Sep 3, 2025

not with newly generated AOF files, in which scripts propagate effects rather than scripts.
but you're right, either change the check to verify you're not in script, or, i think it's better, move that LRU list move to the beginning of the script rather than it's end.

@sundb sundb deleted the fix-lua-defrag branch September 4, 2025 02:56
sundb added a commit that referenced this pull request Sep 7, 2025
This PR fixes two defrag issues.

1. Fix a use-after-free issue caused by updating dictionary keys after a
pubsub channel is reallocated.
This issue was introduced by #13058

1. Fix potential use-after-free for lua during AOF loading with defrag
This issue was introduced by #13058
   This fix follows #14319
This PR updates the LuaScript LRU list before script execution to
prevent accessing a potentially invalidated pointer after long-running
scripts.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 28, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 28, 2025
This PR fixes two defrag issues.

1. Fix a use-after-free issue caused by updating dictionary keys after a
pubsub channel is reallocated.
This issue was introduced by redis#13058

1. Fix potential use-after-free for lua during AOF loading with defrag
This issue was introduced by redis#13058
   This fix follows redis#14319
This PR updates the LuaScript LRU list before script execution to
prevent accessing a potentially invalidated pointer after long-running
scripts.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Sep 29, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
sundb added a commit to YaacovHazan/redis that referenced this pull request Sep 30, 2025
This PR fixes two defrag issues.

1. Fix a use-after-free issue caused by updating dictionary keys after a
pubsub channel is reallocated.
This issue was introduced by redis#13058

1. Fix potential use-after-free for lua during AOF loading with defrag
This issue was introduced by redis#13058
   This fix follows redis#14319
This PR updates the LuaScript LRU list before script execution to
prevent accessing a potentially invalidated pointer after long-running
scripts.
sundb added a commit to YaacovHazan/redis that referenced this pull request Sep 30, 2025
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by redis#13108

1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
   Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
redis#14274.

2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
sundb added a commit to YaacovHazan/redis that referenced this pull request Sep 30, 2025
This PR fixes two defrag issues.

1. Fix a use-after-free issue caused by updating dictionary keys after a
pubsub channel is reallocated.
This issue was introduced by redis#13058

2. Fix potential use-after-free for lua during AOF loading with defrag
This issue was introduced by redis#13058
   This fix follows redis#14319
This PR updates the LuaScript LRU list before script execution to
prevent accessing a potentially invalidated pointer after long-running
scripts.
@sundb sundb mentioned this pull request Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants