Prevent deadlock on DiagnosticsCollector#3348
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR fixes potential deadlocks in Sequence Diagram(s)sequenceDiagram
participant Caller
participant DiagnosticsCollector
participant BackgroundReader
participant Channel
participant Outputs
Caller->>DiagnosticsCollector: StopAsync()
DiagnosticsCollector->>DiagnosticsCollector: check _readerStarted
alt _readerStarted == false
DiagnosticsCollector->>DiagnosticsCollector: Drain() synchronously
DiagnosticsCollector-->>Caller: return
else _readerStarted == true
DiagnosticsCollector->>BackgroundReader: await _started
BackgroundReader->>Channel: complete reader / finish loop
BackgroundReader->>DiagnosticsCollector: drain queued items
DiagnosticsCollector->>Outputs: write drained diagnostics
DiagnosticsCollector-->>Caller: return
end
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs`:
- Line 33: IsStarted currently returns true just because _started is non-null,
which becomes true when Task.Run returns a canceled Task (StartAsync with a
canceled token) even though the reader never executed; change IsStarted to check
that the background task is actually running by returning _started is not null
&& !_started.IsCompleted (i.e. use _started?.IsCompleted to guard), and ensure
StopAsync/Drain() logic already uses IsStarted so it will call Drain() only when
the background task truly started; update the IsStarted property (and any
dependent callers if necessary) to use this completed-state check rather than
null-only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b039c96b-e356-4b4d-94c5-e354310c80df
📒 Files selected for processing (4)
src/Elastic.Documentation/Diagnostics/DiagnosticsChannel.cssrc/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cssrc/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cstests/Elastic.Changelog.Tests/Changelogs/DiagnosticsCollectorDisposeTests.cs
| ConcurrentDictionary<string, bool> InUseSubstitutionKeys { get; } | ||
|
|
||
| /// True once StartAsync has been called and a background reader is draining the channel. | ||
| bool IsStarted => true; |
There was a problem hiding this comment.
Can you explain why IsStarted is initialized with true?
I would assume a boolean like this would be initialized with false and only becomes true when something starts and changes it.
There was a problem hiding this comment.
++ its fine for the interface to not have a default implementation.
Mpdreamz
left a comment
There was a problem hiding this comment.
Would it be simpler to gate the write?
public virtual void Write(Diagnostic diagnostic)
{
IncrementSeverityCount(diagnostic);
if (_started)
Channel.Write(diagnostic);
}
There was a problem hiding this comment.
Just some raw thoughts, wdyt @cotti
ultimately, not calling .StartAsync() is not necessary an exception, a collector could be instantiated and disposed and that should be fine too. Can we extend the tests to assert that as well?
| // instead of deadlocking on Channel.Reader.Completion. | ||
| if (_started is null) | ||
| { | ||
| Drain(); |
There was a problem hiding this comment.
I think this potentially problematic in a different way, since nothing is calling waittoread.
Maybe we need a new flag after:
_ = await Channel.WaitToWrite(cancellationToken);
or after while (Channel.Reader.TryRead(out var item)) ?
and base the decision to await Channel.Reader.Completion? on that?
We also need to wait for .NET 11.0 see e.g: dotnet/runtime#123555 to be completely safe.
- Gate Channel.Write on a _readerStarted flag set inside the reader delegate, so writes are dropped when no reader will drain them (per Mpdreamz). - _readerStarted, not _started != null, drives IsStarted; Task.Run with an already-canceled token leaves _started non-null without ever running the delegate (per CodeRabbit). - StopAsync no longer needs a sync-drain branch — gating keeps the channel empty when no reader started. Keep a defensive Drain after awaiting the reader Task to mop up if it exited early via cancellation. - Drop the default IsStarted => true on IDiagnosticsCollector and require implementers to declare it (per reakaleek + Mpdreamz). - Add a regression test for instantiate-and-dispose with no emissions (per Mpdreamz). Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs (1)
101-104:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
_readerStartedstartup race can drop diagnostics and skip shutdown waiting.Between
StartAsyncreturning and the delegate setting_readerStartedon Line 53,StopAsynccan early-return andWritecan drop items even though startup is already in-flight.Suggested minimal fix
public virtual async Task StopAsync(Cancel cancellationToken) { Channel.TryComplete(); - // No reader was ever scheduled, so Write() gated everything out and the - // channel is empty — nothing to await, nothing to drain. - if (!_readerStarted) + if (_started is null) return; try { await _started!; } catch (OperationCanceledException) { // Reader was canceled before its final Drain(); mop up below. } // Defensive: if the reader exited early via cancellation, items may // still be queued. Drain them synchronously so they're not lost. Drain(); } public virtual void Write(Diagnostic diagnostic) { IncrementSeverityCount(diagnostic); - if (_readerStarted) + if (_readerStarted || _started is { IsCompleted: false }) Channel.Write(diagnostic); }Also applies to: 127-128
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs` around lines 101 - 104, Startup race: _readerStarted may be observed false by StopAsync and Write while StartAsync is still in-flight, causing dropped diagnostics and skipped shutdown waits. Fix by ensuring _readerStarted is set before StartAsync returns and before scheduling the reader delegate (or use an atomic/interlocked/volatile write or a TaskCompletionSource that StartAsync awaits and StopAsync/Write await/check) — update the startup path in StartAsync (the code that schedules the reader delegate and currently sets _readerStarted) to set the flag (or complete the TCS) deterministically before returning, and change the StopAsync and Write checks (the same checks referenced at 127-128) to observe that same atomic/awaitable signal so they never early-return while startup is in-flight.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs`:
- Around line 101-104: Startup race: _readerStarted may be observed false by
StopAsync and Write while StartAsync is still in-flight, causing dropped
diagnostics and skipped shutdown waits. Fix by ensuring _readerStarted is set
before StartAsync returns and before scheduling the reader delegate (or use an
atomic/interlocked/volatile write or a TaskCompletionSource that StartAsync
awaits and StopAsync/Write await/check) — update the startup path in StartAsync
(the code that schedules the reader delegate and currently sets _readerStarted)
to set the flag (or complete the TCS) deterministically before returning, and
change the StopAsync and Write checks (the same checks referenced at 127-128) to
observe that same atomic/awaitable signal so they never early-return while
startup is in-flight.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5a79be47-e79d-4ae2-a326-2807def13cf7
📒 Files selected for processing (3)
src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cssrc/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cstests/Elastic.Changelog.Tests/Changelogs/DiagnosticsCollectorDisposeTests.cs
Per review: only flip the flag once the reader has cleared its initial setup, so it strictly means "the delegate executed past the first await". A canceled token or a Task.Run short-circuit leaves the flag false. Co-authored-by: Cursor <cursoragent@cursor.com>
…el_stale Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cs
The Write gate (`if (_readerStarted) Channel.Write(...)`) had a scheduling race: between StartAsync returning and the Task.Run delegate hitting `await Channel.WaitToWrite`, any emitted diagnostic was silently dropped. This broke the F# authoring suite, which synchronously schedules the reader and emits diagnostics before the reader had a chance to flip the flag. - Write unconditionally — UnboundedChannel never blocks, and items in an undrained channel are GC'd with the collector. - StopAsync now gates on `_started is null` (was StartAsync ever called?) rather than `_readerStarted` (is the reader executing?). Once StartAsync was called with a non-canceled token, awaiting `_started` is always meaningful: it completes when the reader exits after draining the channel. - IsStarted still backed by `_readerStarted` for accurate introspection (execution proof, per earlier review feedback). - Drop the leftover //TODO in DiagnosticsChannel.Write. Co-authored-by: Cursor <cursoragent@cursor.com>
This pull request improves the reliability and safety of the
DiagnosticsCollectorby ensuring diagnostics are properly drained even ifStartAsyncwas never called, preventing deadlocks during disposal or stopping. It also introduces new tests to verify this behavior and adds a guard toWaitForDrainto avoid misuse. Additionally, it makes minor code quality improvements and adds a new property for tracking start state.Diagnostics channel and collector robustness:
DiagnosticsCollector.StopAsyncto synchronously drain diagnostics ifStartAsyncwas never called, ensuring diagnostics are not lost and preventing deadlocks on disposal or stop.IsStartedproperty to bothDiagnosticsCollectorandIDiagnosticsCollectorto track whether a background reader is active. [1] [2]WaitForDrainto throw an exception if called before the collector is started, preventing silent failures or deadlocks.Testing and regression coverage:
DiagnosticsCollectorDisposeTeststo verify that disposing or stopping a collector without starting it does not hang and that diagnostics are still processed. Also tests thatWaitForDrainthrows if misused.Code quality and documentation:
DiagnosticsChannelandDiagnosticsCollectorto clarify channel behavior and method responsibilities.using System.Diagnostics;and correcting method visibility. [1] [2] [3]