Skip to content

Fix decrRefCount on NULL robj on corrupt KEY_META payload#15034

Merged
sundb merged 8 commits intoredis:unstablefrom
sundb:fix-decrrefcount-null
Apr 16, 2026
Merged

Fix decrRefCount on NULL robj on corrupt KEY_META payload#15034
sundb merged 8 commits intoredis:unstablefrom
sundb:fix-decrrefcount-null

Conversation

@sundb
Copy link
Copy Markdown
Collaborator

@sundb sundb commented Apr 13, 2026

Summary

This PR fixes two issues when processing corrupt data in rdbLoadCheckModuleValue():

  1. When handling RDB_MODULE_OPCODE_STRING opcode, rdbGenericLoadStringObject() can return NULL on a corrupt payload. The code called decrRefCount(o) unconditionally without a NULL check, resulting in a NULL pointer dereference crash.

  2. The while loop condition was != RDB_MODULE_OPCODE_EOF, which means a truncated payload (causing rdbLoadLen to return RDB_LENERR) would never exit the loop, since RDB_LENERR != RDB_MODULE_OPCODE_EOF is always true, potentially causing an infinite hang.

Note that the new test in corrupt-dump covers both of these issues.
To ensure that this test could be reproduced on version 8.6, manually modified the payload.

Failed CI: https://github.com/redis/redis/actions/runs/24219825926/job/70708582753

Server crashed in RESTORE with payload: "\xF3\x02\x01\x0E\x00\x54\x23\x3F\xC9\x82\x32\x05\x8D"
valgrind or asan found an issue for payload: "\xF3\x02\x01\x0E\x00\x54\x23\x3F\xC9\x82\x32\x05\x8D"
violating commands:
[err]: Sanitizer error: object.c:604:21: runtime error: member access within null pointer of type 'struct robj'

Note

Low Risk
Changes are limited to RDB-check/corruption paths and add stricter validation/early exits, with minimal impact on normal RDB load/save behavior.

Overview
Improves corruption handling in rdbLoadCheckModuleValue() by detecting RDB_LENERR, rejecting unknown module opcodes, and avoiding decrRefCount() on a NULL string object read; it now supports an option to return NULL on corruption instead of always returning a dummy object.

Updates RDB/key-metadata and redis-check-rdb callers to pass the new null_on_error flag (returning NULL for KEY_META skipping paths, keeping dummy objects for regular check-mode parsing), and adds an integration test covering the corrupt KEY_META RESTORE regression.

Reviewed by Cursor Bugbot for commit 10c03a4. Bugbot is set up for automated code reviews on this repo. Configure here.

@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Apr 13, 2026
@sundb sundb requested a review from moticless April 13, 2026 06:35
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 13, 2026

🤖 Augment PR Summary

Summary: Hardens RDB check-mode module value parsing against corrupt/truncated payloads by avoiding decrRefCount(NULL) and ensuring parsing terminates on RDB_LENERR instead of hanging.
Tests: Adds an integration test reproducing a fuzzer-found corrupt KEY_META DUMP payload and verifying Redis returns “Bad data format” without crashing.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 0518ea6. Configure here.

Comment thread src/rdb.c Outdated
@moticless
Copy link
Copy Markdown
Collaborator

@sundb, I think that once rdbLoadCheckModuleValue() fail while decoding a module payload, continuing to scan for RDB_MODULE_OPCODE_EOF is best-effort at best. We may already be desynchronized, so “keep going and return dummy” is not really a robust recovery strategy. If we do that, we should also update the callers so they stop parsing immediately on NULL:

@sundb
Copy link
Copy Markdown
Collaborator Author

sundb commented Apr 16, 2026

@sundb, I think that once rdbLoadCheckModuleValue() fail while decoding a module payload, continuing to scan for RDB_MODULE_OPCODE_EOF is best-effort at best. We may already be desynchronized, so “keep going and return dummy” is not really a robust recovery strategy. If we do that, we should also update the callers so they stop parsing immediately on NULL:

@moticless Since this PR needs to be backported, I’d prefer not to make too many changes to the behavior that’s already working correctly in this PR, to avoid introducing a new bug to old versions.

Comment thread src/rdb.c
@sundb sundb merged commit ca6e471 into redis:unstable Apr 16, 2026
23 of 25 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in Redis 8.8 Apr 16, 2026
YaacovHazan pushed a commit that referenced this pull request Apr 28, 2026
## Summary

This PR fixes two issues when processing corrupt data in
rdbLoadCheckModuleValue():

1. When handling `RDB_MODULE_OPCODE_STRING` opcode,
rdbGenericLoadStringObject() can return NULL on a corrupt payload. The
code called decrRefCount(o) unconditionally without a NULL check,
resulting in a NULL pointer dereference crash.

2. The while loop condition was `!= RDB_MODULE_OPCODE_EOF`, which means
a truncated payload (causing rdbLoadLen to return RDB_LENERR) would
never exit the loop, since `RDB_LENERR != RDB_MODULE_OPCODE_EOF` is
always true, potentially causing an infinite hang.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants