Skip to content

file_server: fix use-after-free in FileStreamer async-file callbacks after request cancel#45153

Merged
ravenblackx merged 4 commits into
envoyproxy:mainfrom
omkhar:fix/file_server-async-uaf
May 21, 2026
Merged

file_server: fix use-after-free in FileStreamer async-file callbacks after request cancel#45153
ravenblackx merged 4 commits into
envoyproxy:mainfrom
omkhar:fix/file_server-async-uaf

Conversation

@omkhar
Copy link
Copy Markdown
Contributor

@omkhar omkhar commented May 19, 2026

Commit Message:
file_server: fix UAF on async-file callback after request cancel

FileStreamer::start() captured [this] into the async-file callbacks
returned by AsyncFileManager::stat() and AsyncFileManager::read().
The async-file manager guarantees the callback fires (even if cancel
was called), so when the FileStreamer is destroyed before the
callback runs the bare this capture becomes dangling and the
callback dereferences freed memory.

This is reachable on any deployment using the file_server filter when
downstream clients can cancel or reset requests while an async-file
operation is in flight.

Specifically, the race is if the callback is placed in a dispatcher queue
by AsyncFileManager before cancelation, then deletion before
the callback reaches the head of the queue.

Solution is to hold a CancelCallback sentinel in FileStreamer and
wrap the async-file callbacks in cancelWrapper.

Additional Description:

Reported privately via GHSA-c4w8-wqj5-4xph; envoy security team
(@wbpcode) asked for this to be fixed in the open given the
file_server filter's WIP status.

Fixes #45152.

Risk Level: Low.
Defensive lifetime guard around two existing async-file callback sites.
No behavior change for the in-flight-completes-before-cancel path. The
cancel-then-callback path previously dereferenced freed memory and now
no-ops.

Testing: bazel test --config=clang --config=clang-asan //test/extensions/filters/http/file_server:file_server_test passes
locally on the patched tree at upstream main. Happy to add a focused
regression test that drives the cancel-during-stat path if maintainers
would like.

Docs Changes: None required.

Release Notes: Added to changelogs/current.yaml under bug_fixes
(area: file_server).

Platform Specific Features: None.

FileStreamer::start() captured [this] into the async-file callbacks
returned by AsyncFileManager::stat() and AsyncFileManager::read().
The async-file manager guarantees the callback fires (even if cancel
was called), so when the FileStreamer is destroyed before the
callback runs the bare 'this' capture becomes dangling and the
callback dereferences freed memory.

This is reachable on any deployment using the file_server filter
when downstream clients can cancel or RST requests while an
async-file operation is in flight.

Hold a std::shared_ptr<bool> alive sentinel in FileStreamer and
capture [this, alive = std::weak_ptr<bool>(alive_)] in the
async-file callbacks. If alive.lock() returns nullptr the callback
short-circuits to a no-op.

Reported privately via GHSA-c4w8-wqj5-4xph; envoy security team
asked for this to be fixed in the open given the file_server
filter's WIP status.

Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
@omkhar omkhar had a problem deploying to external-contributors May 19, 2026 16:34 — with GitHub Actions Error
Copy link
Copy Markdown
Contributor

@ravenblackx ravenblackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an integration test case that reproduces the crash, before the fix?

(cancel_callback_ should already be called in the abort/destruction path, which should already be making AsyncFileManager not call the callback, so there's a broader problem if this change addresses a real problem. We shouldn't have two sentinels doing the same job.)

Edit: Ah, I see the issue, the callback can be dispatched before it is canceled, here, but then canceled before the dispatched event reaches head of queue, so it also needs the ability to abort if it's canceled while the callback is in the dispatcher's queue.

And it makes sense here to have two sentinels, because one is to abort the file operation if it isn't complete, and the other is to abort the callback if it got queued first.

So since it's racy it's probably quite difficult to repro reliably in an integration test.

Consider using cancelWrapped rather than another custom sentinel, since it was created for exactly this kind of purpose. (Annoying though it is to have to call two cancel-callback functions!)

Comment thread changelogs/current.yaml Outdated
bug_fixes:
# *Changes expected to improve the state of the world and are unlikely to have negative effects*
- area: file_server
change: |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to use new changelog layout - see #45095

/wait

@ravenblackx ravenblackx self-assigned this May 19, 2026
Replace the custom std::shared_ptr<bool> alive_ sentinel introduced in
the first commit with envoy's existing cancelWrapped helper from
source/common/common/cancel_wrapper.h.

ravenblackx pointed out two things:
  1. The bare `this` capture problem the v1 commit solved is real:
     AsyncFileManager can dispatch the completion callback onto the
     owner dispatcher (see source/extensions/common/async_files/
     async_file_manager_thread_pool.cc) and then observe cancellation
     after the dispatch — so the file-side cancel_callback_ alone can't
     stop the queued callback from firing.
  2. Two sentinels are warranted (one to abort the file op if it has
     not completed, one to abort the dispatched callback if it has
     already been queued), but the codebase already provides
     cancelWrapped for exactly that. A custom sentinel here just
     duplicates work without the thread-safety asserts cancel_wrapper
     provides.

This commit:
  - Removes the alive_ shared_ptr and the [this, alive=...] capture
    pattern from every async-file callback.
  - Adds a single Envoy::CancelWrapper::CancelFunction
    cancel_dispatcher_callbacks_ member that is invoked in abort() and
    in the destructor.
  - Wraps each async-file completion lambda in
    Envoy::CancelWrapper::cancelWrapped(lambda, &cancel_dispatcher_callbacks_).
  - Adds //source/common/common:cancel_wrapper_lib to the file_server
    BUILD deps.

The existing AsyncFiles cancel_callback_ continues to short-circuit
the in-flight file operation. The new cancel_dispatcher_callbacks_
short-circuits any completion the dispatcher has already enqueued.

Verified on Linux x86_64: bazel test --config=clang
//test/extensions/filters/http/file_server:file_server_test passes
on a clean envoyproxy/envoy@main with this patch applied.

Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
@omkhar omkhar temporarily deployed to external-contributors May 20, 2026 16:36 — with GitHub Actions Inactive
@omkhar
Copy link
Copy Markdown
Contributor Author

omkhar commented May 20, 2026

Thanks @ravenblackx — that read of the dispatcher-queue race is right, and using cancelWrapped is cleaner than rolling my own.

Pushed eccd1db:

  • Drops std::shared_ptr<bool> alive_ and the [this, alive=...] capture pattern.
  • Adds an Envoy::CancelWrapper::CancelFunction cancel_dispatcher_callbacks_ member; each async-file completion lambda is now wrapped via Envoy::CancelWrapper::cancelWrapped(lambda, &cancel_dispatcher_callbacks_).
  • abort() and ~FileStreamer() invoke cancel_dispatcher_callbacks_() after cancel_callback_(). The existing cancel_callback_ continues to abort the in-flight file op; cancel_dispatcher_callbacks_ short-circuits any completion the dispatcher has already enqueued (the path you cited at async_file_manager_thread_pool.cc:112-120).
  • Adds //source/common/common:cancel_wrapper_lib to the file_server BUILD deps.

Verified on Linux x86_64: bazel test --config=clang //test/extensions/filters/http/file_server:file_server_test passes on a clean envoyproxy/envoy@main with this patch applied.

On the integration test: I agree the race is "quite difficult to repro reliably" — wiring the dispatch-then-cancel sequence deterministically from an integration test would require driving AsyncFileManager state from the test harness. Happy to draft one if you want it before merge, but per your earlier note I think the cancel_wrapper change makes the safety property local enough that a unit test would be sufficient if anything.

Copy link
Copy Markdown
Contributor

@ravenblackx ravenblackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and I'm okay with not trying to force a regression test around it.

Still needs the changelog updated to the new layout per phlax's comment.

Per envoyproxy#45095 / envoyproxy#45093, changelog entries are now individual RST files
under changelogs/current/<section>/. Move the file_server bug_fix
entry out of the legacy current.yaml and rewrite the prose to reflect
the v2 cancelWrapped framing.

Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
@omkhar omkhar temporarily deployed to external-contributors May 21, 2026 18:51 — with GitHub Actions Inactive
@omkhar
Copy link
Copy Markdown
Contributor Author

omkhar commented May 21, 2026

Addressed @phlax / @ravenblackx — migrated the bug_fix entry to changelogs/current/bug_fixes/file_server__fixed-use-after-free-in-filestreamer-callbacks.rst per #45095 / #45093, and merged upstream/main to clear the conflict. No code changes since eccd1db; the cancelWrapped refactor stands. Ready for re-review when CI clears.

@ravenblackx
Copy link
Copy Markdown
Contributor

/wait

Format pre-check is angry because it wants more hideous indents. You could either run clang-format over the touched files (fastest) or run ./ci/do_ci.sh format fix in the branch, or copy-paste and apply the diff from the CI report.

Fix indentation in startDir() / startFile() callbacks that clang-format
disagreed with after the v2 cancelWrapped refactor (the additional
lambda-wrap level shifted the inner block, but the inner body kept its
old indentation). Pure whitespace; no semantic change.

Signed-off-by: Omkhar Arasaratnam <omkhar@linkedin.com>
@omkhar omkhar deployed to external-contributors May 21, 2026 19:22 — with GitHub Actions Active
@ravenblackx ravenblackx added the backport/review Request to backport to stable releases label May 21, 2026
@ravenblackx
Copy link
Copy Markdown
Contributor

Thanks. I've modified the description to match the cancelWrapped implementation rather than alive_, and am queuing it to merge assuming CI will pass this time.

@ravenblackx ravenblackx enabled auto-merge (squash) May 21, 2026 19:30
@ravenblackx ravenblackx merged commit 9629465 into envoyproxy:main May 21, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/review Request to backport to stable releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

file_server: use-after-free in FileStreamer when async-file callback fires after request cancel

3 participants