Skip to content

Reject corrupt stream RDB with shared NACK across consumers#15081

Merged
sggeorgiev merged 7 commits intoredis:unstablefrom
sggeorgiev:fix-rdb-shared-nack
Apr 23, 2026
Merged

Reject corrupt stream RDB with shared NACK across consumers#15081
sggeorgiev merged 7 commits intoredis:unstablefrom
sggeorgiev:fix-rdb-shared-nack

Conversation

@sggeorgiev
Copy link
Copy Markdown
Collaborator

@sggeorgiev sggeorgiev commented Apr 20, 2026

Summary

Detects and rejects corrupt stream RDB payloads where the same NACK (pending entry) is referenced by more than one consumer, which violates a stream data-structure.

Changes

  • rdbLoadObject (stream consumer PEL loading): Added a guard that checks nack->consumer != NULL before assigning the consumer pointer. When a second consumer's PEL references a NACK that was already claimed by a prior consumer, the loader now reports a corrupt RDB error and aborts instead of silently overwriting the pointer. Without this check, two consumers share the same streamNACK, and freeing the first consumer's PEL leaves the second with a dangling pointer.
  • corrupt-dump.tcl: Added a regression test that crafts a stream with two consumers (consumerA, consumerB) whose PELs both reference the same entry (1-0). The RESTORE command is expected to fail with "Bad data format", and the server must remain responsive (PING succeeds).

Benefits

  • Fail-fast on corrupt data: The invariant violation is caught at load time with a clear diagnostic message rather than manifesting as a crash later during normal operation.
  • Regression coverage: The crafted payload in the test ensures this class of corruption is permanently guarded against.

Note

Medium Risk
Adds stricter validation during stream RDB/RESTORE loading; mistakes could cause previously-loadable (but corrupt) dumps to be rejected, and it touches stream consumer-group persistence logic that can affect load paths.

Overview
Prevents loading corrupt stream consumer-group state where multiple consumers reference the same global PEL streamNACK. During rdbLoadObject stream consumer PEL loading, it now fails fast if a resolved streamNACK already has a consumer set, instead of overwriting the pointer.

Adds an integration regression test that restores a crafted stream payload with a NACK shared by two consumers and asserts RESTORE fails with Bad data format while the server remains responsive.

Reviewed by Cursor Bugbot for commit 3ce48f9. Bugbot is set up for automated code reviews on this repo. Configure here.

@sggeorgiev sggeorgiev requested review from skaslev and sundb April 20, 2026 10:33
@sundb sundb added this to Redis 8.8 Apr 22, 2026
Comment thread tests/integration/corrupt-dump.tcl Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit e9bd224. Configure here.

Comment thread tests/integration/corrupt-dump.tcl
@sundb
Copy link
Copy Markdown
Collaborator

sundb commented Apr 22, 2026

Pls check if we need to backport it to a version older than 8.6.
And it is necessary to confirm whether this corrupted test is applicable to version 8.6, because the rdb version of 8.8 has been increased, and possibly the version of this corrupted data is the latest one.

Comment thread src/rdb.c Outdated
Co-authored-by: debing.sun <debing.sun@redis.com>
@sggeorgiev sggeorgiev merged commit 47c5136 into redis:unstable Apr 23, 2026
18 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in Redis 8.8 Apr 23, 2026
YaacovHazan pushed a commit that referenced this pull request Apr 28, 2026
**Summary**

Detects and rejects corrupt stream RDB payloads where the same NACK
(pending entry) is referenced by more than one consumer, which violates
a stream data-structure.

**Changes**

- **`rdbLoadObject` (stream consumer PEL loading)**: Added a guard that
checks `nack->consumer != NULL` before assigning the consumer pointer.
When a second consumer's PEL references a NACK that was already claimed
by a prior consumer, the loader now reports a corrupt RDB error and
aborts instead of silently overwriting the pointer. Without this check,
two consumers share the same `streamNACK`, and freeing the first
consumer's PEL leaves the second with a dangling pointer.
- **`corrupt-dump.tcl`**: Added a regression test that crafts a stream
with two consumers (`consumerA`, `consumerB`) whose PELs both reference
the same entry (`1-0`). The `RESTORE` command is expected to fail with
`"Bad data format"`, and the server must remain responsive (`PING`
succeeds).

**Benefits**

- **Fail-fast on corrupt data**: The invariant violation is caught at
load time with a clear diagnostic message rather than manifesting as a
crash later during normal operation.
- **Regression coverage**: The crafted payload in the test ensures this
class of corruption is permanently guarded against.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants