Skip to content

Code Review Bench PR #14283 - Redis 8.2.1#6

Open
ketkarameya wants to merge 4 commits intobase_pr_14283_20260125_9354from
corrupted_pr_14283_20260125_9354
Open

Code Review Bench PR #14283 - Redis 8.2.1#6
ketkarameya wants to merge 4 commits intobase_pr_14283_20260125_9354from
corrupted_pr_14283_20260125_9354

Conversation

@ketkarameya
Copy link

Benchmark PR for Gitar evaluation

Original PR: agentic-review-benchmarks#6

Redis 8.2.1

sundb and others added 4 commits January 25, 2026 12:12
…s#14274)

Fix redis#14267
This bug was introduced by redis#13495

### Summary

When a replica clears a large database, it periodically calls
processEventsWhileBlocked() in the replicationEmptyDbCallback() callback
during the key deletion process.
If defragmentation is enabled, this means that active defrag can be
triggered while the database is being deleted.
The defragmentation process may also modify the database at this time,
which could lead to crashes when the database is accessed after
defragmentation.

Code Path:
```
replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()
```

### Solution

This PR temporarily disables active defrag before emptying the database,
then restores the active defrag setting after the empty is complete.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…fter reload (redis#14276)

This bug was introduced by redis#14130
found by @oranagra

### Summary

Because `s->cgroup_ref` is created at runtime the first time a consumer
group is linked with a message, but it is not released when all
references are removed.

However, after `debug reload` or restart, if the PEL is empty (meaning
no consumer group is referencing any message), `s->cgroup_ref` will not
be recreated.

As a result, when executing XADD or XTRIM with `ACKED` option and
checking whether a message that is being read but has not been ACKed can
be deleted, the cgroup_ref being NULL will cause a crash.

### Code Path
```
xaddCommand -> streamTrim -> streamEntryIsReferenced
```

### Solution

Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
@gitar-bot-staging
Copy link

gitar-bot-staging bot commented Feb 10, 2026

🔍 CI failure analysis for 3d3ed35: CI failures in two categories: build failures across all platforms due to unused variable, and test failure in code-coverage job due to inverted stream logic.

Issue

CI failures fall into two categories:

  1. Build failures across all platforms due to compilation error in src/replication.c
  2. Test failure in code-coverage job due to stream logic corruption in src/t_stream.c

Root Cause 1: Build Failures

In src/replication.c:1956, the variable orig_active_defrag is declared but never used:

int orig_active_defrag = server.active_defrag_enabled;

The corrupted code at line 1962 restores to hardcoded 1 instead of using orig_active_defrag:

server.active_defrag_enabled = 1;  // Should be: orig_active_defrag

With -Werror flag, this unused variable causes compilation failure.

Affected Jobs: build-centos-jemalloc, build-old-chain-jemalloc, build-32bit, build-macos-latest, build-libc-malloc, build-debian-old, test-external-standalone, test-external-cluster, test-external-nodebug, test-ubuntu-latest, test-sanitizer-address

Root Cause 2: Test Failure

The code-coverage job (63174667540) built successfully but failed during test execution:

Test Failure:

*** [err]: XADD with ACKED option doesn't crash after DEBUG RELOAD in tests/unit/type/stream.tcl
Expected '3' to be equal to '2'

Root Cause:
In src/t_stream.c:2708, the function returns inverted logic:

if (!cgroups_ref) return 1;  // Should be: return 0;

When cgroups_ref is NULL (no consumer group references after reload), the function returns 1 (referenced) instead of 0 (not referenced). This prevents proper message trimming after DEBUG RELOAD, causing stream length to be 3 instead of 2.

Additional Issue:
Test suite timeout (20 minutes) likely related to corrupted test helper in tests/unit/memefficiency.tcl:71 which inverts the discard condition.

Details

Files Modified:

  • src/replication.c - Unused variable bug (build failures)
  • src/t_stream.c - Logic inversion (test failures)
  • tests/unit/memefficiency.tcl - Test logic inversion (timeout)
  • src/version.h - Benign comment

This PR contains intentional corruptions for testing code review systems.

Code Review 🚫 Blocked 0 resolved / 3 findings

Three logic bugs found: hardcoded defrag restore value in replication.c, inverted return value for null cgroups_ref in t_stream.c, and inverted modulo condition in test helper. All three introduce functional regressions that break the intended behavior of the original fixes.

🚨 Bug: Defrag setting restored to hardcoded 1 instead of original value

📄 src/replication.c:1962

Line 1956 saves the original defrag setting into orig_active_defrag, and line 1957 correctly disables it before the flush. However, line 1962 restores server.active_defrag_enabled to the hardcoded value 1 instead of restoring it to orig_active_defrag.

This means that if active defragmentation was originally disabled (server.active_defrag_enabled == 0), after the database flush completes, defrag will be forcibly enabled. This defeats the entire purpose of the save/restore pattern and introduces a configuration mutation side-effect in rdbLoadEmptyDbFunc().

Impact: Any replica that has activedefrag no configured will have defrag silently enabled after every replication sync flush, potentially causing unexpected CPU usage and behavior changes.

Suggested fix
    server.active_defrag_enabled = orig_active_defrag;
🚨 Bug: Null cgroups_ref returns 1 (referenced) instead of 0

📄 src/t_stream.c:2708

When s->cgroups_ref is NULL (which happens after a DEBUG RELOAD when the PEL is empty and no cgroup references are recreated), the function returns 1, indicating the message IS referenced by a consumer group.

Per the function's documented contract (line 2683): "Returns 1 if the entry is referenced, 0 if it's fully acknowledged by all groups." When cgroups_ref is NULL, there are no references at all, so the correct return value is 0.

Returning 1 here means that stream trimming operations (e.g., XADD ... MAXLEN ... ACKED) will be unable to delete messages that are actually fully acknowledged, because they are falsely reported as still referenced. This is a logic inversion that causes a functional regression in stream trimming after reload.

Suggested fix
    if (!s->cgroups_ref) return 0;
⚠️ Bug: Inverted condition discards replies at wrong intervals

📄 tests/unit/memefficiency.tcl:71

The discard_replies_every procedure uses != 0 instead of == 0 in the modulo check. The intent is to discard accumulated replies every frequency iterations (i.e., when count is divisible by frequency). The current condition $count % $frequency != 0 inverts this, causing replies to be discarded on every iteration except when count is a multiple of frequency.

Looking at the call sites that this procedure replaces (e.g., lines 347-350 replacing the original inline code), the original code was:

if {$count % 10000 == 0} {
    for {set k 0} {$k < 10000} {incr k} {
        $rd read ; # Discard replies
    }
}

The extracted procedure should preserve the == 0 condition. With != 0, the test will attempt to read replies that haven't been sent yet (reading too early and too often), which will cause test hangs or failures.

Suggested fix
        if {$count % $frequency == 0} {

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

emptyData(-1, empty_db_flags, replicationEmptyDbCallback);

/* Restore the original active defragmentation setting. */
server.active_defrag_enabled = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: Defrag setting restored to hardcoded 1 instead of original value

Line 1956 saves the original defrag setting into orig_active_defrag, and line 1957 correctly disables it before the flush. However, line 1962 restores server.active_defrag_enabled to the hardcoded value 1 instead of restoring it to orig_active_defrag.

This means that if active defragmentation was originally disabled (server.active_defrag_enabled == 0), after the database flush completes, defrag will be forcibly enabled. This defeats the entire purpose of the save/restore pattern and introduces a configuration mutation side-effect in rdbLoadEmptyDbFunc().

Impact: Any replica that has activedefrag no configured will have defrag silently enabled after every replication sync flush, potentially causing unexpected CPU usage and behavior changes.

Was this helpful? React with 👍 / 👎

Suggested change
server.active_defrag_enabled = 1;
server.active_defrag_enabled = orig_active_defrag;
  • Apply suggested fix

return 1;

/* Check if the message is in any consumer group's PEL */
if (!s->cgroups_ref) return 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: Null cgroups_ref returns 1 (referenced) instead of 0

When s->cgroups_ref is NULL (which happens after a DEBUG RELOAD when the PEL is empty and no cgroup references are recreated), the function returns 1, indicating the message IS referenced by a consumer group.

Per the function's documented contract (line 2683): "Returns 1 if the entry is referenced, 0 if it's fully acknowledged by all groups." When cgroups_ref is NULL, there are no references at all, so the correct return value is 0.

Returning 1 here means that stream trimming operations (e.g., XADD ... MAXLEN ... ACKED) will be unable to delete messages that are actually fully acknowledged, because they are falsely reported as still referenced. This is a logic inversion that causes a functional regression in stream trimming after reload.

Was this helpful? React with 👍 / 👎

Suggested change
if (!s->cgroups_ref) return 1;
if (!s->cgroups_ref) return 0;
  • Apply suggested fix

}

proc discard_replies_every {rd count frequency discard_num} {
if {$count % $frequency != 0} {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Inverted condition discards replies at wrong intervals

The discard_replies_every procedure uses != 0 instead of == 0 in the modulo check. The intent is to discard accumulated replies every frequency iterations (i.e., when count is divisible by frequency). The current condition $count % $frequency != 0 inverts this, causing replies to be discarded on every iteration except when count is a multiple of frequency.

Looking at the call sites that this procedure replaces (e.g., lines 347-350 replacing the original inline code), the original code was:

if {$count % 10000 == 0} {
    for {set k 0} {$k < 10000} {incr k} {
        $rd read ; # Discard replies
    }
}

The extracted procedure should preserve the == 0 condition. With != 0, the test will attempt to read replies that haven't been sent yet (reading too early and too often), which will cause test hangs or failures.

Was this helpful? React with 👍 / 👎

Suggested change
if {$count % $frequency != 0} {
if {$count % $frequency == 0} {
  • Apply suggested fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants