Skip to content

in_storage_backlog: fix use-after-free when segregating chunk#10892

Merged
edsiper merged 1 commit intomasterfrom
storage-backlog-fix
Sep 17, 2025
Merged

in_storage_backlog: fix use-after-free when segregating chunk#10892
edsiper merged 1 commit intomasterfrom
storage-backlog-fix

Conversation

@edsiper
Copy link
Copy Markdown
Member

@edsiper edsiper commented Sep 17, 2025

Fixes #10868


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features
    • None.
  • Bug Fixes
    • Improved reliability when handling corrupted or empty backlog chunks.
    • Prevented inconsistencies by ensuring backlog entries are removed before closing/downgrading chunks.
    • Reduced risk of crashes and resource leaks during failure scenarios in storage backlog processing.
  • Refactor
    • Streamlined cleanup and queuing logic within storage backlog processing for greater stability.

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 17, 2025

Walkthrough

The change updates cb_queue_chunks in plugins/in_storage_backlog/sb.c to consistently use a local pointer (underlying_chunk) for chunk operations and reorders cleanup steps to remove a chunk from segregated backlogs before closing/downgrading it. This applies to corrupted, empty, and mapping/metadata failure paths, as well as the final queueing step.

Changes

Cohort / File(s) Summary
Segregated backlog chunk lifecycle handling
plugins/in_storage_backlog/sb.c
Introduces local underlying_chunk (struct cio_chunk *). Replaces direct uses of chunk_instance->chunk. Reorders cleanup: backlog removal precedes close/downgrade. Applies to corrupted, empty, mapping/type read failure, and final queueing paths within cb_queue_chunks. No API/signature changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Q as cb_queue_chunks
  participant SB as Segregated Backlog
  participant CIO as cio_chunk (underlying_chunk)

  rect rgba(230,240,255,0.5)
  note over Q: Handling chunk (corrupted/empty/map-fail or final queue)
  Q->>Q: derive underlying_chunk = chunk_instance->chunk
  Q->>SB: remove(underlying_chunk)
  alt corrupted/empty/map-read failure
    Q->>CIO: close(underlying_chunk)
  else final queueing/downgrade
    Q->>CIO: down(underlying_chunk)
  end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nudge the backlog, tidy and neat,
Pluck chunks first, then fold them to sleep.
No ghostly pointers nibble the heap—
A hop, a fix, a safer sweep.
Carrots queued, crashes beat. 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "in_storage_backlog: fix use-after-free when segregating chunk" concisely and accurately describes the main change: addressing a use-after-free in the in_storage_backlog chunk segregation path (cb_queue_chunks). It is specific to the subsystem and clear for reviewers scanning history.
Linked Issues Check ✅ Passed The changes introduce an underlying_chunk local and reorder operations to remove a chunk from the segregated backlog before closing/downgrading it, which directly addresses the use-after-free reported in issue #10868 and the Valgrind traces pointing at cb_queue_chunks -> sb_remove_chunk_from_segregated_backlog -> cio_chunk_close. The fix is scoped to cb_queue_chunks and preserves public APIs, so it meets the linked issue's coding objective of eliminating the use-after-free.
Out of Scope Changes Check ✅ Passed The diff is limited to plugins/in_storage_backlog/sb.c (cb_queue_chunks) where only an internal local variable and the order of backlog removal versus chunk close/downgrade were changed; there are no other file edits, signature changes, or new features outside the linked issue's scope. Therefore no out-of-scope changes were detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch storage-backlog-fix

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
plugins/in_storage_backlog/sb.c (2)

564-572: Fixes UAF order; confirm deletion policy for corrupted chunks.
Order is now correct (unlink from backlogs before close). But closing with FLB_FALSE leaves a corrupted, now-unreferenced chunk on disk/stream (not re-segregated), potentially leaking disk space. Consider honoring storage_del_bad_chunks (and error type) here too.

Apply:

-                        sb_remove_chunk_from_segregated_backlogs(underlying_chunk, ctx);
-                        cio_chunk_close(underlying_chunk, FLB_FALSE);
+                        sb_remove_chunk_from_segregated_backlogs(underlying_chunk, ctx);
+                        {
+                            int chunk_error = cio_error_get(underlying_chunk);
+                            int del = (ctx->ins->config->storage_del_bad_chunks == FLB_TRUE) &&
+                                      (chunk_error == CIO_ERR_BAD_FILE_SIZE ||
+                                       chunk_error == CIO_ERR_BAD_LAYOUT);
+                            cio_chunk_close(underlying_chunk, del ? FLB_TRUE : FLB_FALSE);
+                        }

625-628: Mapping-failure path fixed; minor consistency nit.
Sequence is safe. For consistency with other paths, set underlying_chunk before the down/close and use it for cio_chunk_down too.

-                    cio_chunk_down(chunk_instance->chunk);
+                    underlying_chunk = chunk_instance->chunk;
+                    cio_chunk_down(underlying_chunk);
 ...
-                    underlying_chunk = chunk_instance->chunk;
-                    sb_remove_chunk_from_segregated_backlogs(underlying_chunk, ctx);
-                    cio_chunk_close(underlying_chunk, FLB_TRUE);
+                    sb_remove_chunk_from_segregated_backlogs(underlying_chunk, ctx);
+                    cio_chunk_close(underlying_chunk, FLB_TRUE);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 074e39b and 30a477f.

📒 Files selected for processing (1)
  • plugins/in_storage_backlog/sb.c (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (3)
plugins/in_storage_backlog/sb.c (3)

521-521: Good defensive pointer: prevents UAF.
Using a local underlying_chunk decouples the CIO chunk lifetime from the sb_out_chunk wrapper. LGTM.


593-596: Correct: unlink-before-close removes UAF on metadata errors.
Safe sequencing and use of underlying_chunk look good.


606-609: Correct: unlink-before-close for empty chunks.
This eliminates the UAF reported in #10868 for the empty-path as well. LGTM.

@edsiper edsiper merged commit feeeb9f into master Sep 17, 2025
65 checks passed
@edsiper edsiper deleted the storage-backlog-fix branch September 17, 2025 23:33
sb_remove_chunk_from_segregated_backlogs(chunk_instance->chunk, ctx);
/* This function will indirecly release chunk_instance so it has to be
* called last.
underlying_chunk = chunk_instance->chunk;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indent is wrong/misleading, isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use-after-free in segregated backlog crashing 4.0.8

2 participants