killAllExpiredTransactions: use a shared_ptr to manage coro context#212
killAllExpiredTransactions: use a shared_ptr to manage coro context#212xiexiaoy merged 1 commit intoeloqdata:mainfrom
Conversation
WalkthroughReplaces raw stack backing and canary-based overflow checks with boost::context::protected_fixedsize_stack; introduces shared CoroCtx in kill_sessions_local for allocator-enabled continuations and hooks; updates callcc/continuation captures, logging, and lifecycle checks while preserving yield/resume/error-handling semantics. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant Kill as kill_sessions_local
participant Coro as CoroCtx (salloc + source)
participant Cont as Continuation
Caller->>Kill: start kill sessions
Note right of Kill: create shared CoroCtx with protected_fixedsize_stack
Kill->>Coro: init yield/resume hooks
Kill->>Cont: callcc(std::allocator_arg, Coro.salloc, ...)
Note over Cont,Coro: continuation stores Coro.source and uses Coro.yieldFunc to yield
Cont-->>Kill: yield (via Coro.yieldFunc)
Kill->>Cont: resume (via Coro.resumeFunc / longResumeFunc)
Cont-->>Kill: complete or throw DBException
Kill-->>Caller: finalize
sequenceDiagram
autonumber
participant Net as Network
participant SSM as ServiceStateMachine
participant Alloc as _salloc (protected_fixedsize_stack)
participant Src as _source (continuation)
Net->>SSM: new connection/event
Note right of SSM: callcc(std::allocator_arg, Alloc, ...) with weak_from_this() guard
SSM->>Src: enter processing loop
Src-->>SSM: yield (no explicit stack-overflow checks)
SSM->>Src: resume on I/O/state change
alt normal completion
Src-->>SSM: done
else error
Src-->>SSM: error surfaced
end
SSM-->>Net: transition/close
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
src/mongo/db/kill_sessions_local.cpp (3)
104-108: Consider removing or adjusting log level for coroutine resume loggingThe log message "abortArbitraryTransactionIfExpired call resume." might be too verbose for production at the default log level. Consider using a higher verbosity level (e.g., LOG(3) or LOG(4)) or removing it if it was added for debugging purposes.
std::function<void()> resumeTask = [&source, &client] { - log() << "abortArbitraryTransactionIfExpired call resume."; + LOG(3) << "abortArbitraryTransactionIfExpired call resume."; Client::setCurrent(std::move(client)); source = source.resume(); };
122-126: Consider removing or adjusting log level for coroutine yield loggingSimilar to the resume logging, this yield log message appears to be debug information that might be too verbose for production.
yieldFunc = [&sink, &client]() { - log() << "abortArbitraryTransactionIfExpired call yield."; + LOG(3) << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); };
117-158: Verify protected stack size is sufficientThe protected_fixedsize_stack is allocated with 3200KB (3.125MB). Ensure this size is adequate for the worst-case stack usage during transaction abort operations, considering any potential deep call stacks or large local variables.
The stack size allocation of 3200KB appears reasonable for typical transaction abort operations. However, you should monitor stack usage in production to ensure this is sufficient, especially under heavy load or with complex transaction scenarios. Consider adding metrics or diagnostics to track actual stack usage if issues arise.
src/mongo/transport/service_state_machine.cpp (1)
554-557: Consider adjusting log level for coroutine yield loggingThe yield log message at LOG(3) level seems appropriate for debugging but might still be verbose in some scenarios.
Consider using LOG(4) or LOG(5) for even less verbose output in production:
ssm->_coroYield = [ssm = ssm.get(), &sink]() { - MONGO_LOG(3) << "call yield"; + MONGO_LOG(4) << "call yield"; ssm->_dbClient = Client::releaseCurrent(); sink = sink.resume(); };src/mongo/transport/service_state_machine.h (1)
267-267: Extract coroutine stack size into a shared constant
Hardcoding3200 * 1024in service_state_machine.h (line 267) and kill_sessions_local.cpp (line 83) leads to duplication. Defineconstexpr size_t kCoroutineStackSize = 3200 * 1024;in a shared header and replace both instances.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/kill_sessions_local.cpp(4 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
src/mongo/db/kill_sessions_local.cpp (1)
src/mongo/db/client.cpp (4)
setCurrent(169-172)setCurrent(169-169)releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (3)
src/mongo/transport/service_state_machine.cpp (1)
548-551: Good: Consistent use of protected stack allocatorThe migration to boost::context::protected_fixedsize_stack with the allocator-augmented callcc provides automatic stack overflow protection, eliminating the need for manual overflow checks that were removed throughout the file.
src/mongo/transport/service_state_machine.h (1)
267-269: Good: Clean migration to protected stack allocationThe replacement of manual stack management with
boost::context::protected_fixedsize_stackis a significant improvement. This provides automatic stack overflow protection through guard pages, eliminating the need for manual canary checking and making the coroutine implementation more robust and maintainable.src/mongo/db/kill_sessions_local.cpp (1)
83-84: Drop thread-safety concern—scanSessions is single-threaded
scanSessions acquires _mutex and invokes the callback serially, so salloc and source captured by reference are never accessed concurrently.
c027a9b to
ad6e003
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/mongo/transport/service_state_machine.h (1)
267-270: Make stack allocator const and centralize stack size.
- _salloc can be const; it’s not mutated after construction.
- kCoroStackSize duplicates the literal used elsewhere (kill_sessions_local.cpp). Prefer one shared definition to avoid drift.
Apply:
- static constexpr size_t kCoroStackSize = 3200 * 1024; + static constexpr size_t kCoroStackSize = 3200 * 1024; // TODO: centralize this constant boost::context::continuation _source; - boost::context::protected_fixedsize_stack _salloc{kCoroStackSize}; + const boost::context::protected_fixedsize_stack _salloc{kCoroStackSize};Optionally, expose kCoroStackSize in a common header (e.g., mongo/util/coro_stack.h) and include it here and in kill_sessions_local.cpp.
src/mongo/db/kill_sessions_local.cpp (1)
102-104: Avoid hardcoding stack size; reuse a single constant.3200*1024 is duplicated from ServiceStateMachine. Prefer a shared constant to prevent divergence.
Apply:
- boost::context::protected_fixedsize_stack salloc(3200 * 1024); + // Consider centralizing this constant (e.g., mongo/util/coro_stack.h) + boost::context::protected_fixedsize_stack salloc(3200 * 1024);If practical, include a shared header and use the same kCoroStackSize.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/kill_sessions_local.cpp(3 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)src/mongo/util/exit.cpp(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/mongo/util/exit.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/db/kill_sessions_local.cpp (2)
src/mongo/transport/service_executor_coroutine.cpp (2)
resumeTask(49-54)resumeTask(49-49)src/mongo/db/client.cpp (4)
setCurrent(169-172)setCurrent(169-169)releaseCurrent(164-167)releaseCurrent(164-164)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (5)
src/mongo/transport/service_state_machine.h (1)
34-34: Allocator include looks correct.Using boost::context::protected_fixedsize_stack is appropriate for guarded stacks.
src/mongo/transport/service_state_machine.cpp (1)
549-561: Allocator-aware callcc usage LGTM.Passing std::allocator_arg with _salloc and capturing sink by reference is standard in this pattern. No issues spotted here.
src/mongo/db/kill_sessions_local.cpp (3)
47-48: Allocator header include is appropriate.Required for protected_fixedsize_stack; looks good.
161-164: Continuation drain loop LGTM.Draining while (source) ensures completion. Good addition.
102-110: Confirm intent: source/salloc still live inside scanSessions callback.The PR description mentions moving source and stack out of the scanSessions callback. They’re still defined within the callback (before scheduling task). If the goal was to move them fully outside scanSessions, please clarify or adjust.
187b978 to
59af554
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/mongo/transport/service_state_machine.cpp (1)
548-561: Capture ThreadGuard by move, not reference.Capturing
&guardand then moving it is brittle. Capture by move to avoid dangling-reference risks in future edits/refactors.- _source = boost::context::callcc( - std::allocator_arg, - _salloc, - [ssm = shared_from_this(), - &guard](boost::context::continuation&& sink) { + _source = boost::context::callcc( + std::allocator_arg, + _salloc, + [ssm = shared_from_this(), + guard = std::move(guard)](boost::context::continuation&& sink) { ssm->_coroYield = [ssm = ssm.get(), &sink]() { MONGO_LOG(3) << "call yield"; ssm->_dbClient = Client::releaseCurrent(); sink = sink.resume(); }; ssm->_processMessage(std::move(guard)); return std::move(sink); });src/mongo/transport/service_state_machine.h (2)
34-34: Remove unused/over-specific include.
continuation.hppsuffices;continuation_fcontext.hppis an internal/impl header and unnecessary here.-#include <boost/context/continuation_fcontext.hpp>
267-270: Avoid duplicating stack-size literals across modules.This constant (3200*1024) also appears in kill_sessions_local.cpp. Consider centralizing (e.g., a shared constexpr in a common header) to keep the stack size consistent.
src/mongo/db/kill_sessions_local.cpp (3)
46-48: Prefer public continuation header.Use
<boost/context/continuation.hpp>instead of the fcontext-specific header.-#include <boost/context/continuation_fcontext.hpp> -#include <boost/context/protected_fixedsize_stack.hpp> +#include <boost/context/continuation.hpp> +#include <boost/context/protected_fixedsize_stack.hpp>
102-104: Allocator lifetime: capture by value to reduce lifetime coupling.
taskcurrently capturessallocby reference. Capturing by value avoids accidental lifetime issues if the waiting behavior changes later.- boost::context::protected_fixedsize_stack salloc(3200 * 1024); + boost::context::protected_fixedsize_stack salloc(3200 * 1024); boost::context::continuation source;And update the capture below (see next comment’s diff).
162-166: Task attribution: use a more specific executor task name if available.
kSSMProcessMessagemay skew metrics. If the executor exposes a task name for session cleanup/kill or maintenance, prefer that.Can you check if
transport::ServiceExecutorTaskNamehas a suitable constant (e.g., for session maintenance/cleanup) to replacekSSMProcessMessage?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/kill_sessions_local.cpp(3 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)src/mongo/util/exit.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
src/mongo/db/kill_sessions_local.cpp (1)
src/mongo/db/client.cpp (4)
setCurrent(169-172)setCurrent(169-169)releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (2)
src/mongo/util/exit.cpp (1)
120-123: LGTM: logging split is fine.No behavior change; the message formatting remains clear.
src/mongo/db/kill_sessions_local.cpp (1)
115-160: Critical: OperationContext crosses threads; create it on the executor thread.
opCtxoriginates on the caller’s thread and is used inside the scheduled task on another thread. Construct a freshOperationContextafterClient::setCurrentand use that instead. Also capturesallocby value.- auto task = - [&salloc, &finished, &mux, &cv, &source, &yieldFunc, opCtx, session, &client] { + auto task = + [salloc, &finished, &mux, &cv, &source, &yieldFunc, session, &client] { Client::setCurrent(std::move(client)); source = boost::context::callcc( std::allocator_arg, salloc, - [&yieldFunc, opCtx, session, &client](boost::context::continuation&& sink) { + [&yieldFunc, session, &client](boost::context::continuation&& sink) { yieldFunc = [&sink, &client]() { log() << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); }; - try { - session->abortArbitraryTransactionIfExpired(opCtx); + try { + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get()); } catch (const DBException& ex) { const Status& status = ex.toStatus(); std::string errmsg = str::stream() << "May have failed to abort expired " "transaction with session id (lsid) '" << session->getSessionId() << "'." << " Caused by: " << status;
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.h (1)
267-270: Optional: make stack size portable/configurable
- Consider using boost::context::stack_traits::default_size() (or max_size()) to derive a portable default, or gate kCoroStackSize behind a server parameter for tuning under heavy connection counts.
- If coroutine mode can be disabled, lazily construct the allocator only when enabled to reduce per-connection footprint.
No behavior change required now.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/kill_sessions_local.cpp(4 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)src/mongo/util/exit.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/mongo/db/kill_sessions_local.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (3)
src/mongo/util/exit.cpp (1)
120-123: LGTM: log message reformat onlyNo functional change; the split insertions are fine.
src/mongo/transport/service_state_machine.h (1)
34-35: Include boost::context::protected_fixedsize_stack: OKHeader addition is appropriate for the allocator-backed coroutine stack.
src/mongo/transport/service_state_machine.cpp (1)
548-562: Allocator-augmented callcc: verify continuation lifetime
- _coroYield captures sink by reference—ensure it’s only ever invoked while the coroutine frame (and its sink) is active (i.e. never after the coroutine returns or before callcc sets up sink).
- _source.resume() is already gated on CoroStatus::OnGoing—keep tests covering double-resume and migration paths.
- Optional: capture sink by value in a small holder on the coroutine stack and expose a bound yield() method.
59af554 to
87b4ffc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/mongo/transport/service_state_machine.h (1)
34-35: Remove unused fcontext includecontinuation.hpp is sufficient; continuation_fcontext.hpp isn’t needed here.
-#include <boost/context/continuation_fcontext.hpp> #include <boost/context/protected_fixedsize_stack.hpp>src/mongo/db/kill_sessions_local.cpp (1)
161-169: Race/UAF guard: ensure task waits or lifetime is anchoredWith the above changes, source/salloc lifetime is anchored; if you choose not to move them out, alternatively keep the task alive until finished to avoid UAF (less preferred).
Alternative minimal change inside task after callcc:
- // (task returns immediately here) + // Keep locals alive until completion if you don't hoist them. + { + std::unique_lock lk(mux); + cv.wait(lk, [&finished]() { return finished; }); + }Prefer the earlier hoist + single post-callcc resume.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/kill_sessions_local.cpp(2 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)src/mongo/util/exit.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
src/mongo/db/kill_sessions_local.cpp (2)
src/mongo/transport/service_executor_coroutine.cpp (2)
resumeTask(49-54)resumeTask(49-49)src/mongo/db/client.cpp (2)
setCurrent(169-172)setCurrent(169-169)
🔇 Additional comments (4)
src/mongo/util/exit.cpp (1)
120-123: LGTM: logging split is benignSplitting the stream insertion has no behavioral impact; formatting remains intact.
src/mongo/transport/service_state_machine.cpp (1)
548-552: Allocator-arg callcc adoption looks correctUsing protected_fixedsize_stack via allocator_arg is the right move and matches the header change.
src/mongo/db/kill_sessions_local.cpp (1)
120-131: Bug: OperationContext crosses threadsopCtx originates on the caller’s thread but is used inside the scheduled task. Create a fresh OperationContext on the executor thread.
Apply within the callcc body:
- try { - session->abortArbitraryTransactionIfExpired(opCtx); + try { + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get());And remove opCtx from all lambda capture lists.
src/mongo/transport/service_state_machine.h (1)
267-270: Fix destruction order: destroy continuation before allocatorCurrently _salloc is declared after _source, so it will be destroyed first, risking UB if the continuation references allocator-managed resources. Declare the allocator before the continuation so the continuation is destroyed first.
- static constexpr size_t kCoroStackSize = 3200 * 1024; - boost::context::continuation _source; - boost::context::protected_fixedsize_stack _salloc{kCoroStackSize}; + static constexpr size_t kCoroStackSize = 3200 * 1024; + boost::context::protected_fixedsize_stack _salloc{kCoroStackSize}; + boost::context::continuation _source;
87b4ffc to
32c4fd8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.cpp (1)
548-561: Allocator-aware callcc is correct; capture guard by move, not referenceAvoid capturing ThreadGuard by reference across coroutine boundaries. Capture by move to eliminate lifetime pitfalls.
Apply:
- _source = boost::context::callcc( - std::allocator_arg, - _salloc, - [ssm = shared_from_this(), - &guard](boost::context::continuation&& sink) { + _source = boost::context::callcc( + std::allocator_arg, + _salloc, + [ssm = shared_from_this(), + guard = std::move(guard)](boost::context::continuation&& sink) mutable { ssm->_coroYield = [ssm = ssm.get(), &sink]() { MONGO_LOG(3) << "call yield"; ssm->_dbClient = Client::releaseCurrent(); sink = sink.resume(); }; - ssm->_processMessage(std::move(guard)); + ssm->_processMessage(std::move(guard)); return std::move(sink); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/kill_sessions_local.cpp(2 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)src/mongo/util/exit.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/mongo/util/exit.cpp
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/db/kill_sessions_local.cpp (2)
src/mongo/transport/service_executor_coroutine.cpp (2)
resumeTask(49-54)resumeTask(49-49)src/mongo/db/client.cpp (4)
setCurrent(169-172)setCurrent(169-169)releaseCurrent(164-167)releaseCurrent(164-164)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (5)
src/mongo/transport/service_state_machine.h (2)
34-34: Switch to protected_fixedsize_stack — LGTMAppropriate header for allocator-backed coroutine stacks.
267-269: Fix for destruction order — LGTM (continuation destroyed before allocator)Declaring _salloc before _source ensures the continuation is destroyed first. This resolves the earlier concern.
src/mongo/db/kill_sessions_local.cpp (3)
47-48: Header include — LGTMUsing boost::context::protected_fixedsize_stack is appropriate here.
102-104: Good: allocator and continuation lifetime moved out of the scheduled taskThis prevents UAF of continuation/stack across resume calls.
115-160: Critical: opCtx crosses threads; and missing resume after callcc leads to hang
- opCtx is created on the caller thread but used inside the scheduled task; create it on the executor thread after Client::setCurrent.
- After callcc returns (on initial yield), you must resume once to progress the coroutine, per PR objective; otherwise finished may never be set and the wait deadlocks.
Apply:
- auto task = - [&salloc, &finished, &mux, &cv, source, &yieldFunc, opCtx, session, &client] { + auto task = + [&salloc, &finished, &mux, &cv, source, &yieldFunc, session, &client] { Client::setCurrent(std::move(client)); *source = boost::context::callcc( std::allocator_arg, salloc, - [&finished, &mux, &cv, &yieldFunc, opCtx, session, &client]( + [&finished, &mux, &cv, &yieldFunc, session, &client]( boost::context::continuation&& sink) { yieldFunc = [&sink, &client]() { log() << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); }; - std::unique_lock lk(mux); try { - session->abortArbitraryTransactionIfExpired(opCtx); + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get()); } catch (const DBException& ex) { const Status& status = ex.toStatus(); std::string errmsg = str::stream() << "May have failed to abort expired " "transaction with session id (lsid) '" << session->getSessionId() << "'." << " Caused by: " << status; // LockTimeout errors are expected if we are unable to acquire // an IS lock to clean up transaction cursors. The transaction // abort (and lock resource release) should have succeeded // despite failing to clean up cursors. The cursors will // eventually be cleaned up by the cursor manager. We'll log // such errors at a higher log level for diagnostic purposes in // case something gets stuck. if (ErrorCodes::isShutdownError(status.code()) || status == ErrorCodes::LockTimeout) { LOG(1) << errmsg; } else { warning() << errmsg; } } + std::unique_lock lk(mux); finished = true; client = Client::releaseCurrent(); cv.notify_one(); return std::move(sink); }); + // Ensure coroutine progresses past the initial yield. + if (!finished) { + Client::setCurrent(std::move(client)); + *source = source->resume(); + } };Optional follow-up (recommended): narrow the mutex scope to only the finished/notify section as shown above to avoid holding mux during the abort.
32c4fd8 to
fffb8e8
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/mongo/db/kill_sessions_local.cpp (1)
102-104: Stack allocator and continuation lifetime now correctly owned by the outer scopeThis fixes prior UAF risks with local stack/continuation inside the scheduled task.
Consider introducing a small constant for the stack size to avoid magic numbers:
- boost::context::protected_fixedsize_stack salloc(3200 * 1024); + constexpr size_t kCoroStackSize = 3200 * 1024; + boost::context::protected_fixedsize_stack salloc(kCoroStackSize);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/mongo/db/kill_sessions_local.cpp(2 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)src/mongo/util/exit.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/mongo/transport/service_state_machine.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
src/mongo/db/kill_sessions_local.cpp (1)
src/mongo/db/client.cpp (4)
setCurrent(169-172)setCurrent(169-169)releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (9)
src/mongo/util/exit.cpp (1)
120-123: LGTM: purely cosmetic stream splitNo behavioral change; safe and clearer.
src/mongo/transport/service_state_machine.h (2)
34-34: LGTM: adopt protected_fixedsize_stack includeMatches allocator-backed coroutine approach across the PR.
267-269: Good: safe destruction order (_source before allocator)Declaring _salloc before _source ensures the continuation is destroyed first. This addresses the prior concern.
src/mongo/db/kill_sessions_local.cpp (6)
47-48: LGTM: switch to protected_fixedsize_stack includeConsistent with the allocator-backed stack usage.
123-128: LGTM: yield functor swaps Client and yieldsMatches the coroutine protocol.
129-139: Guarded completion path looks good (aside from opCtx bug above)Lock/notify on finished is correct. With the opCtx fix, this block is fine.
155-159: LGTM: completion + wakeupSets finished, releases Client, and notifies waiter; correct finalization.
115-121: Bug: OperationContext crosses threads; create it on the executor threadopCtx originates on the caller’s thread. Using it inside the scheduled task is unsafe. Create a fresh OperationContext after Client::setCurrent on the destination thread and remove opCtx from captures.
Apply:
- auto task = - [&salloc, &finished, &mux, &cv, source, &yieldFunc, opCtx, session, &client] { + auto task = + [&salloc, &finished, &mux, &cv, source, &yieldFunc, session, &client] { Client::setCurrent(std::move(client)); - *source = boost::context::callcc( + *source = boost::context::callcc( std::allocator_arg, salloc, - [&finished, &mux, &cv, source, &yieldFunc, opCtx, session, &client]( + [&finished, &mux, &cv, source, &yieldFunc, session, &client]( boost::context::continuation&& sink) {And below in the body:
- try { - session->abortArbitraryTransactionIfExpired(opCtx); + try { + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get());
105-109: Avoid raw‐pointer capture inresumeTask(UAF risk). Change to capture theshared_ptritself:- std::function<void()> resumeTask = [source = source.get(), &client] { + std::function<void()> resumeTask = [source, &client] { log() << "abortArbitraryTransactionIfExpired call resume."; Client::setCurrent(std::move(client)); - *source = source->resume(); + (*source) = source->resume(); };No other resume functors capture raw pointers to the continuation.
fffb8e8 to
1bae001
Compare
1bae001 to
4c34046
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mongo/db/kill_sessions_local.cpp (1)
122-166: Critical: OperationContext crosses threads; create it on the executor threadopCtx originates on the caller’s thread and is used inside the scheduled task. Create a new OperationContext after Client::setCurrent on the executor thread and pass that to abortArbitraryTransactionIfExpired. Also remove opCtx from lambda captures.
- auto task = [&finished, &mux, &cv, coroCtx, opCtx, session, &client] { + auto task = [&finished, &mux, &cv, coroCtx, session, &client] { Client::setCurrent(std::move(client)); coroCtx->source = boost::context::callcc( std::allocator_arg, coroCtx->salloc, - [&finished, &mux, &cv, coroCtx, opCtx, session, &client]( + [&finished, &mux, &cv, coroCtx, session, &client]( boost::context::continuation&& sink) { coroCtx->yieldFunc = [&sink, &client]() { log() << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); }; std::unique_lock lk(mux); try { - session->abortArbitraryTransactionIfExpired(opCtx); + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get()); } catch (const DBException& ex) { const Status& status = ex.toStatus(); std::string errmsg = str::stream() << "May have failed to abort expired " "transaction with session id (lsid) '" << session->getSessionId() << "'." << " Caused by: " << status;
🧹 Nitpick comments (1)
src/mongo/db/kill_sessions_local.cpp (1)
76-83: Avoid magic number for stack size in CoroCtxDefine a constant within CoroCtx (or centralize it) instead of repeating 3200*1024.
struct CoroCtx { - boost::context::protected_fixedsize_stack salloc{3200 * 1024}; + static constexpr size_t kCoroStackSize = 3200 * 1024; + boost::context::protected_fixedsize_stack salloc{kCoroStackSize}; boost::context::continuation source; std::function<void()> yieldFunc; std::function<void()> resumeFunc; std::function<void()> longResumeFunc; };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/kill_sessions_local.cpp(3 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/mongo/transport/service_state_machine.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
src/mongo/db/kill_sessions_local.cpp (3)
src/mongo/db/service_context.cpp (4)
client(231-239)client(231-231)getGlobalServiceContext(68-71)getGlobalServiceContext(68-68)src/mongo/db/client.cpp (4)
releaseCurrent(164-167)releaseCurrent(164-164)setCurrent(169-172)setCurrent(169-169)src/mongo/transport/service_executor_coroutine.cpp (2)
resumeTask(49-54)resumeTask(49-49)
🔇 Additional comments (6)
src/mongo/transport/service_state_machine.h (2)
34-34: LGTM: correct Boost stack headerIncluding protected_fixedsize_stack aligns with the new allocator-based coro usage.
267-269: Member order is correct (continuation destroyed before allocator)_salloc is declared before _source, so _source is destroyed first. This fixes the prior destruction-order hazard.
src/mongo/db/kill_sessions_local.cpp (4)
47-48: LGTM: add protected_fixedsize_stack includeMatches allocator-based coroutine changes.
97-97: LGTM: shared ownership for coroutine contextUsing shared_ptr for CoroCtx addresses dangling-lifetime risks between scheduler/resumer.
103-106: LGTM: wiring CoroutineFunctors to CoroCtxHooking functors to CoroCtx members is correct.
117-121: LGTM: resume functors bound via executorAssuming the above lifetime fix, this is consistent with the coroutine model.
Please confirm coroutineResumeFunctor/coroutineLongResumeFunctor don’t outlive the enclosing scanSessions callback. If they can, the lifetime fix above is required to avoid UAF.
4c34046 to
bff5f1d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/kill_sessions_local.cpp(3 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/db/kill_sessions_local.cpp (3)
src/mongo/db/service_context.cpp (4)
client(231-239)client(231-231)getGlobalServiceContext(68-71)getGlobalServiceContext(68-68)src/mongo/db/client.cpp (4)
releaseCurrent(164-167)releaseCurrent(164-164)setCurrent(169-172)setCurrent(169-169)src/mongo/transport/service_executor_coroutine.cpp (2)
resumeTask(49-54)resumeTask(49-49)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (4)
src/mongo/db/kill_sessions_local.cpp (2)
123-139: Don’t reuse the caller’s OperationContext on the executor threadThe scheduled task runs on a different thread/group, yet it still captures and uses
opCtxfrom the caller. OperationContext instances are thread-affine; reusing one across threads violates invariants and can trip debug assertions. Create a fresh OperationContext after callingClient::setCurrenton the destination thread and use that inabortArbitraryTransactionIfExpired.- auto task = [&finished, &mux, &cv, coroCtx, opCtx, session, &client] { + auto task = [&finished, &mux, &cv, coroCtx, session, &client] { Client::setCurrent(std::move(client)); coroCtx->source = boost::context::callcc( std::allocator_arg, coroCtx->salloc, - [&finished, &mux, &cv, coroCtx, opCtx, session, &client]( + [&finished, &mux, &cv, coroCtx, session, &client]( boost::context::continuation&& sink) { coroCtx->yieldFunc = [&sink, &client]() { log() << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); }; std::unique_lock lk(mux); try { - session->abortArbitraryTransactionIfExpired(opCtx); + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get()); } catch (const DBException& ex) {
112-116: Keep the coroutine context alive during async resumes
resumeTaskcapturescoroCtx->sourceby reference without retainingcoroCtx. Once the scanSessions callback unwinds, the shared_ptr can drop to zero beforeresumeTaskruns, leaving the reference dangling and making the next resume a UAF. Capture the shared_ptr itself so it stays alive for the resume path.- std::function<void()> resumeTask = [&source = coroCtx->source, &client] { + std::function<void()> resumeTask = [coroCtx, &client] { log() << "abortArbitraryTransactionIfExpired call resume."; Client::setCurrent(std::move(client)); - source = source.resume(); + coroCtx->source = coroCtx->source.resume(); };src/mongo/transport/service_state_machine.h (2)
35-35: Include for the guarded stack allocator looks correctAdding
<boost/context/protected_fixedsize_stack.hpp>is required for the new allocator.
267-269: Allocator declared ahead of the continuation keeps destruction order safeDeclaring
_sallocbefore_sourceensures the continuation is torn down before the allocator, which matches Boost’s expectations.
bff5f1d to
610dc41
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.h (1)
267-269: De-duplicate stack size constant across modules (optional).Coro stack size (3200*1024) is duplicated here and in CoroCtx. Consider a shared constexpr (e.g., a common header) to prevent drift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/kill_sessions_local.cpp(3 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
src/mongo/db/kill_sessions_local.cpp (2)
src/mongo/db/service_context.cpp (4)
client(231-239)client(231-231)getGlobalServiceContext(68-71)getGlobalServiceContext(68-68)src/mongo/db/client.cpp (4)
releaseCurrent(164-167)releaseCurrent(164-164)setCurrent(169-172)setCurrent(169-169)
🔇 Additional comments (8)
src/mongo/db/kill_sessions_local.cpp (5)
47-48: LGTM: switch to protected_fixedsize_stack.Using boost::context::protected_fixedsize_stack is a solid replacement for the custom stack.
76-83: CoroCtx layout is sound.Allocator precedes continuation, ensuring continuation is destroyed before the allocator.
123-167: Critical: OperationContext used across threads; create it on the executor thread.opCtx originates on the caller’s thread but is used inside the scheduled task on a different thread. Construct a fresh OperationContext after Client::setCurrent() on the executor thread and remove opCtx from captures.
Apply:
- auto task = [&finished, &mux, &cv, coroCtx, opCtx, session, &client] { + auto task = [&finished, &mux, &cv, coroCtx, session, &client] { Client::setCurrent(std::move(client)); - coroCtx->source = boost::context::callcc( + coroCtx->source = boost::context::callcc( std::allocator_arg, coroCtx->salloc, - [&finished, &mux, &cv, coroCtx, opCtx, session, &client]( + [&finished, &mux, &cv, coroCtx, session, &client]( boost::context::continuation&& sink) { coroCtx->yieldFunc = [&sink, &client]() { log() << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); }; std::unique_lock lk(mux); try { - session->abortArbitraryTransactionIfExpired(opCtx); + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get()); } catch (const DBException& ex) { const Status& status = ex.toStatus(); std::string errmsg = str::stream() << "May have failed to abort expired " "transaction with session id (lsid) '" << session->getSessionId() << "'." << " Caused by: " << status;
103-108: LGTM: wiring functors via CoroCtx.Passing addresses of the CoroCtx std::functions into CoroutineFunctors is consistent with how the client expects them.
112-121: Avoid dangling reference: capture CoroCtx in resumeTask.resumeTask captures coroCtx->source by reference. If resume runs after the scheduling scope ends, it risks UAF. Capture coroCtx (shared_ptr) by value and update via the pointer.
Apply:
- std::function<void()> resumeTask = [&source = coroCtx->source, &client] { + std::function<void()> resumeTask = [coroCtx, &client] { log() << "abortArbitraryTransactionIfExpired call resume."; Client::setCurrent(std::move(client)); - source = source.resume(); + coroCtx->source = coroCtx->source.resume(); };src/mongo/transport/service_state_machine.h (2)
34-34: LGTM: include protected_fixedsize_stack.Header dependency updated appropriately.
267-269: LGTM: allocator before continuation.Member order ensures continuation is destroyed before the allocator. Matches best practice and prior guidance.
src/mongo/transport/service_state_machine.cpp (1)
548-566: Break potential UAF in _coroYield by using weak_ptr.Great fix switching to weak_ptr for callcc capture and using _salloc. However, _coroYield captures a raw SSM pointer (ssm.get()), which can dangle if invoked during teardown. Capture the weak_ptr and lock on use.
Apply:
- [wssm, &guard](boost::context::continuation&& sink) { + [wssm, &guard](boost::context::continuation&& sink) { auto ssm = wssm.lock(); if (!ssm) { return std::move(sink); } - ssm->_coroYield = [ssm = ssm.get(), &sink]() { - MONGO_LOG(3) << "call yield"; - ssm->_dbClient = Client::releaseCurrent(); - sink = sink.resume(); - }; + ssm->_coroYield = [wssm, &sink]() mutable { + if (auto locked = wssm.lock()) { + MONGO_LOG(3) << "call yield"; + locked->_dbClient = Client::releaseCurrent(); + } + sink = sink.resume(); + }; ssm->_processMessage(std::move(guard)); return std::move(sink); });
610dc41 to
3d2a701
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
src/mongo/db/kill_sessions_local.cpp (2)
46-48: Prefer portable Boost.Context include.Use <boost/context/continuation.hpp> instead of the fcontext-specific header for portability and clearer API surface. Keep protected_fixedsize_stack include.
-#include <boost/context/continuation_fcontext.hpp> +#include <boost/context/continuation.hpp> #include <boost/context/protected_fixedsize_stack.hpp>
76-83: Optional: move UniqueClient into CoroCtx to avoid cross-thread ref to local.Storing ServiceContext::UniqueClient in CoroCtx (and using it in yield/resume) avoids capturing a stack-local reference (&client) across async boundaries.
Example shape:
- Add ServiceContext::UniqueClient client; to CoroCtx.
- Move-assign client into coroCtx->client after releaseCurrent().
- Use coroCtx->client within yield/resume lambdas.
src/mongo/transport/service_state_machine.h (2)
34-34: Use stack_traits to validate configured stack sizeGood switch to protected_fixedsize_stack. Add stack_traits and static_asserts to catch misconfigurations at compile-time.
#include <boost/context/continuation.hpp> #include <boost/context/continuation_fcontext.hpp> +#include <boost/context/stack_traits.hpp> #include <boost/context/protected_fixedsize_stack.hpp>And near the kCoroStackSize definition:
static constexpr size_t kCoroStackSize = 3200 * 1024; + static_assert(kCoroStackSize >= boost::context::stack_traits::minimum_size(), + "kCoroStackSize below boost minimum"); + static_assert(kCoroStackSize <= boost::context::stack_traits::maximum_size(), + "kCoroStackSize above boost maximum");
32-35: Consider removing unused continuation_fcontext includeIf nothing in this header needs continuation_fcontext.hpp, drop it to reduce compile surface area.
-#include <boost/context/continuation_fcontext.hpp>src/mongo/transport/service_state_machine.cpp (1)
733-742: Proactively release coroutine resources during cleanupTo free the stack pages and drop any pending closures immediately on EndSession, clear the continuation and coroutine functors here.
void ServiceStateMachine::_cleanupSession(ThreadGuard guard) { MONGO_LOG(1) << "ServiceStateMachine::_cleanupSession"; _state.store(State::Ended); _inMessage.reset(); // By ignoring the return value of Client::releaseCurrent() we destroy the session. // _dbClient is now nullptr and _dbClientPtr is invalid and should never be accessed. Client::releaseCurrent(); + // Break any remaining continuation/functor references and release the coroutine stack + _source = {}; + _coroYield = {}; + _coroResume = {}; + _coroLongResume = {}; + _coroMigrateThreadGroup = {}; + _resumeTask = {}; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/kill_sessions_local.cpp(3 hunks)src/mongo/transport/service_state_machine.cpp(1 hunks)src/mongo/transport/service_state_machine.h(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/mongo/db/kill_sessions_local.cpp (2)
src/mongo/db/service_context.cpp (4)
client(231-239)client(231-231)getGlobalServiceContext(68-71)getGlobalServiceContext(68-68)src/mongo/db/client.cpp (4)
releaseCurrent(164-167)releaseCurrent(164-164)setCurrent(169-172)setCurrent(169-169)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (2)
releaseCurrent(164-167)releaseCurrent(164-164)
🔇 Additional comments (7)
src/mongo/db/kill_sessions_local.cpp (5)
76-83: Good: centralize coroutine state and stack via CoroCtx.Shared state + protected stack is a solid improvement.
98-98: Good: shared_ptr for CoroCtx lifecycle.This prevents dangling functor pointers.
103-108: Good: bind functors to stable addresses in CoroCtx.Avoids pointers to temporaries.
123-139: Critical: OperationContext crosses threads; create it on the executor thread.opCtx originates on the caller thread; using it inside the scheduled task is unsafe. Construct a fresh OperationContext after Client::setCurrent on the executor thread and remove opCtx from captures.
- auto task = [&finished, &mux, &cv, coroCtx, opCtx, session, &client] { + auto task = [&finished, &mux, &cv, coroCtx, session, &client] { Client::setCurrent(std::move(client)); coroCtx->source = boost::context::callcc( std::allocator_arg, coroCtx->salloc, - [&finished, &mux, &cv, coroCtx, opCtx, session, &client]( + [&finished, &mux, &cv, coroCtx, session, &client]( boost::context::continuation&& sink) { coroCtx->yieldFunc = [&sink, &client]() { log() << "abortArbitraryTransactionIfExpired call yield."; client = Client::releaseCurrent(); sink = sink.resume(); }; std::unique_lock lk(mux); try { - session->abortArbitraryTransactionIfExpired(opCtx); + auto localOpCtx = Client::getCurrent()->makeOperationContext(); + session->abortArbitraryTransactionIfExpired(localOpCtx.get()); } catch (const DBException& ex) {Also applies to: 128-139
112-121: Avoid UAF and retain cycles: capture weak_ptr to CoroCtx in resumeTask.Capturing a reference to coroCtx->source risks UAF if resume is invoked after scope ends; capturing a shared_ptr can create a retain cycle via coroCtx->resumeFunc. Use weak_ptr.
- std::function<void()> resumeTask = [&source = coroCtx->source, &client] { - log() << "abortArbitraryTransactionIfExpired call resume."; - Client::setCurrent(std::move(client)); - source = source.resume(); - }; + auto weakCoroCtx = std::weak_ptr<CoroCtx>(coroCtx); + std::function<void()> resumeTask = [weakCoroCtx, &client] { + if (auto ctx = weakCoroCtx.lock()) { + log() << "abortArbitraryTransactionIfExpired call resume."; + Client::setCurrent(std::move(client)); + ctx->source = ctx->source.resume(); + } + };src/mongo/transport/service_state_machine.h (1)
267-269: Allocator/continuation destruction order is now correct_salloc declared before _source ensures continuation is destroyed before the allocator. This fixes the previous lifetime risk.
src/mongo/transport/service_state_machine.cpp (1)
548-553: Confirm toolchain supports C++17
weak_from_this() requires C++17 or newer. Verify CMAKE_CXX_STANDARD (or equivalent) is set to ≥17 in your build configuration.
…loqdata#212) 1. killAllExpiredTransactions: use a shared_ptr to manage coro context. 2. Use boost::context::protected_fixedsize_stack to replace the self-implemented stack.
Summary by CodeRabbit