Code Review Bench PR #14283 - Redis 8.2.1#6
Code Review Bench PR #14283 - Redis 8.2.1#6ketkarameya wants to merge 4 commits intobase_pr_14283_20260125_9354from
Conversation
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276) This bug was introduced by redis#14130 found by @oranagra ### Summary Because `s->cgroup_ref` is created at runtime the first time a consumer group is linked with a message, but it is not released when all references are removed. However, after `debug reload` or restart, if the PEL is empty (meaning no consumer group is referencing any message), `s->cgroup_ref` will not be recreated. As a result, when executing XADD or XTRIM with `ACKED` option and checking whether a message that is being read but has not been ACKed can be deleted, the cgroup_ref being NULL will cause a crash. ### Code Path ``` xaddCommand -> streamTrim -> streamEntryIsReferenced ``` ### Solution Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
🔍 CI failure analysis for 3d3ed35: CI failures in two categories: build failures across all platforms due to unused variable, and test failure in code-coverage job due to inverted stream logic.IssueCI failures fall into two categories:
Root Cause 1: Build FailuresIn int orig_active_defrag = server.active_defrag_enabled;The corrupted code at line 1962 restores to hardcoded server.active_defrag_enabled = 1; // Should be: orig_active_defragWith Affected Jobs: build-centos-jemalloc, build-old-chain-jemalloc, build-32bit, build-macos-latest, build-libc-malloc, build-debian-old, test-external-standalone, test-external-cluster, test-external-nodebug, test-ubuntu-latest, test-sanitizer-address Root Cause 2: Test FailureThe code-coverage job (63174667540) built successfully but failed during test execution: Test Failure: Root Cause: if (!cgroups_ref) return 1; // Should be: return 0;When Additional Issue: DetailsFiles Modified:
This PR contains intentional corruptions for testing code review systems. Code Review 🚫 Blocked 0 resolved / 3 findingsThree logic bugs found: hardcoded defrag restore value in replication.c, inverted return value for null cgroups_ref in t_stream.c, and inverted modulo condition in test helper. All three introduce functional regressions that break the intended behavior of the original fixes. 🚨 Bug: Defrag setting restored to hardcoded 1 instead of original valueLine 1956 saves the original defrag setting into This means that if active defragmentation was originally disabled ( Impact: Any replica that has Suggested fix🚨 Bug: Null cgroups_ref returns 1 (referenced) instead of 0When Per the function's documented contract (line 2683): "Returns 1 if the entry is referenced, 0 if it's fully acknowledged by all groups." When Returning Suggested fix
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | Gitar
| emptyData(-1, empty_db_flags, replicationEmptyDbCallback); | ||
|
|
||
| /* Restore the original active defragmentation setting. */ | ||
| server.active_defrag_enabled = 1; |
There was a problem hiding this comment.
🚨 Bug: Defrag setting restored to hardcoded 1 instead of original value
Line 1956 saves the original defrag setting into orig_active_defrag, and line 1957 correctly disables it before the flush. However, line 1962 restores server.active_defrag_enabled to the hardcoded value 1 instead of restoring it to orig_active_defrag.
This means that if active defragmentation was originally disabled (server.active_defrag_enabled == 0), after the database flush completes, defrag will be forcibly enabled. This defeats the entire purpose of the save/restore pattern and introduces a configuration mutation side-effect in rdbLoadEmptyDbFunc().
Impact: Any replica that has activedefrag no configured will have defrag silently enabled after every replication sync flush, potentially causing unexpected CPU usage and behavior changes.
Was this helpful? React with 👍 / 👎
| server.active_defrag_enabled = 1; | |
| server.active_defrag_enabled = orig_active_defrag; |
- Apply suggested fix
| return 1; | ||
|
|
||
| /* Check if the message is in any consumer group's PEL */ | ||
| if (!s->cgroups_ref) return 1; |
There was a problem hiding this comment.
🚨 Bug: Null cgroups_ref returns 1 (referenced) instead of 0
When s->cgroups_ref is NULL (which happens after a DEBUG RELOAD when the PEL is empty and no cgroup references are recreated), the function returns 1, indicating the message IS referenced by a consumer group.
Per the function's documented contract (line 2683): "Returns 1 if the entry is referenced, 0 if it's fully acknowledged by all groups." When cgroups_ref is NULL, there are no references at all, so the correct return value is 0.
Returning 1 here means that stream trimming operations (e.g., XADD ... MAXLEN ... ACKED) will be unable to delete messages that are actually fully acknowledged, because they are falsely reported as still referenced. This is a logic inversion that causes a functional regression in stream trimming after reload.
Was this helpful? React with 👍 / 👎
| if (!s->cgroups_ref) return 1; | |
| if (!s->cgroups_ref) return 0; |
- Apply suggested fix
| } | ||
|
|
||
| proc discard_replies_every {rd count frequency discard_num} { | ||
| if {$count % $frequency != 0} { |
There was a problem hiding this comment.
⚠️ Bug: Inverted condition discards replies at wrong intervals
The discard_replies_every procedure uses != 0 instead of == 0 in the modulo check. The intent is to discard accumulated replies every frequency iterations (i.e., when count is divisible by frequency). The current condition $count % $frequency != 0 inverts this, causing replies to be discarded on every iteration except when count is a multiple of frequency.
Looking at the call sites that this procedure replaces (e.g., lines 347-350 replacing the original inline code), the original code was:
if {$count % 10000 == 0} {
for {set k 0} {$k < 10000} {incr k} {
$rd read ; # Discard replies
}
}The extracted procedure should preserve the == 0 condition. With != 0, the test will attempt to read replies that haven't been sent yet (reading too early and too often), which will cause test hangs or failures.
Was this helpful? React with 👍 / 👎
| if {$count % $frequency != 0} { | |
| if {$count % $frequency == 0} { |
- Apply suggested fix
Benchmark PR for Gitar evaluation
Original PR: agentic-review-benchmarks#6
Redis 8.2.1