Prevent deadlock on DiagnosticsCollector by cotti · Pull Request #3348 · elastic/docs-builder

cotti · 2026-05-19T05:31:38Z

This pull request improves the reliability and safety of the DiagnosticsCollector by ensuring diagnostics are properly drained even if StartAsync was never called, preventing deadlocks during disposal or stopping. It also introduces new tests to verify this behavior and adds a guard to WaitForDrain to avoid misuse. Additionally, it makes minor code quality improvements and adds a new property for tracking start state.

Diagnostics channel and collector robustness:

Modified DiagnosticsCollector.StopAsync to synchronously drain diagnostics if StartAsync was never called, ensuring diagnostics are not lost and preventing deadlocks on disposal or stop.
Added the IsStarted property to both DiagnosticsCollector and IDiagnosticsCollector to track whether a background reader is active. [1] [2]
Updated WaitForDrain to throw an exception if called before the collector is started, preventing silent failures or deadlocks.

Testing and regression coverage:

Added DiagnosticsCollectorDisposeTests to verify that disposing or stopping a collector without starting it does not hang and that diagnostics are still processed. Also tests that WaitForDrain throws if misused.

Code quality and documentation:

Improved comments and documentation in DiagnosticsChannel and DiagnosticsCollector to clarify channel behavior and method responsibilities.
Minor code cleanup, such as removing unused using System.Diagnostics; and correcting method visibility. [1] [2] [3]

coderabbitai · 2026-05-19T05:38:11Z

Warning

Rate limit exceeded

@cotti has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 25 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6934657c-eaaf-4eba-ad0b-6e4851377732

📥 Commits

Reviewing files that changed from the base of the PR and between 25979cd and 22f8976.

📒 Files selected for processing (4)

src/Elastic.Documentation/Diagnostics/DiagnosticsChannel.cs
src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs
src/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cs
tests/Elastic.Changelog.Tests/Changelogs/DiagnosticsCollectorDisposeTests.cs

📝 Walkthrough

Walkthrough

This PR fixes potential deadlocks in DiagnosticsCollector when StopAsync or DisposeAsync are called without StartAsync having been invoked. It introduces an IsStarted property to track initialization and modifies StopAsync to synchronously drain queued diagnostics before awaiting the background task when the reader was never started. The interface adds a guard in WaitForDrain to prevent misuse by throwing InvalidOperationException if called before initialization. Channel semantics are documented, and regression tests verify the fix prevents hangs in edge cases.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant DiagnosticsCollector
  participant BackgroundReader
  participant Channel
  participant Outputs

  Caller->>DiagnosticsCollector: StopAsync()
  DiagnosticsCollector->>DiagnosticsCollector: check _readerStarted
  alt _readerStarted == false
    DiagnosticsCollector->>DiagnosticsCollector: Drain() synchronously
    DiagnosticsCollector-->>Caller: return
  else _readerStarted == true
    DiagnosticsCollector->>BackgroundReader: await _started
    BackgroundReader->>Channel: complete reader / finish loop
    BackgroundReader->>DiagnosticsCollector: drain queued items
    DiagnosticsCollector->>Outputs: write drained diagnostics
    DiagnosticsCollector-->>Caller: return
  end

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Prevent deadlock on DiagnosticsCollector' directly and clearly summarizes the main objective of the changeset—addressing deadlock issues in the DiagnosticsCollector.
Description check	✅ Passed	The description comprehensively explains the changes: preventing deadlocks by draining diagnostics even when StartAsync is not called, adding IsStarted tracking, updating WaitForDrain, and adding regression tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch fix/diagnostics_channel_stale

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs`:
- Line 33: IsStarted currently returns true just because _started is non-null,
which becomes true when Task.Run returns a canceled Task (StartAsync with a
canceled token) even though the reader never executed; change IsStarted to check
that the background task is actually running by returning _started is not null
&& !_started.IsCompleted (i.e. use _started?.IsCompleted to guard), and ensure
StopAsync/Drain() logic already uses IsStarted so it will call Drain() only when
the background task truly started; update the IsStarted property (and any
dependent callers if necessary) to use this completed-state check rather than
null-only.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b039c96b-e356-4b4d-94c5-e354310c80df

📥 Commits

Reviewing files that changed from the base of the PR and between d2ee28b and 026c15d.

📒 Files selected for processing (4)

src/Elastic.Documentation/Diagnostics/DiagnosticsChannel.cs
src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs
src/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cs
tests/Elastic.Changelog.Tests/Changelogs/DiagnosticsCollectorDisposeTests.cs

reakaleek · 2026-05-19T06:15:19Z

 	ConcurrentDictionary<string, bool> InUseSubstitutionKeys { get; }

+	/// True once StartAsync has been called and a background reader is draining the channel.
+	bool IsStarted => true;


Can you explain why IsStarted is initialized with true?

I would assume a boolean like this would be initialized with false and only becomes true when something starts and changes it.

++ its fine for the interface to not have a default implementation.

Mpdreamz

Would it be simpler to gate the write?

public virtual void Write(Diagnostic diagnostic)
{
	IncrementSeverityCount(diagnostic);
    if (_started)
		Channel.Write(diagnostic);
}

Mpdreamz

Just some raw thoughts, wdyt @cotti

ultimately, not calling .StartAsync() is not necessary an exception, a collector could be instantiated and disposed and that should be fine too. Can we extend the tests to assert that as well?

Mpdreamz · 2026-05-19T19:27:02Z

+		// instead of deadlocking on Channel.Reader.Completion.
+		if (_started is null)
+		{
+			Drain();


I think this potentially problematic in a different way, since nothing is calling waittoread.

Maybe we need a new flag after:

_ = await Channel.WaitToWrite(cancellationToken);

or after while (Channel.Reader.TryRead(out var item)) ?

and base the decision to await Channel.Reader.Completion? on that?

We also need to wait for .NET 11.0 see e.g: dotnet/runtime#123555 to be completely safe.

- Gate Channel.Write on a _readerStarted flag set inside the reader delegate, so writes are dropped when no reader will drain them (per Mpdreamz). - _readerStarted, not _started != null, drives IsStarted; Task.Run with an already-canceled token leaves _started non-null without ever running the delegate (per CodeRabbit). - StopAsync no longer needs a sync-drain branch — gating keeps the channel empty when no reader started. Keep a defensive Drain after awaiting the reader Task to mop up if it exited early via cancellation. - Drop the default IsStarted => true on IDiagnosticsCollector and require implementers to declare it (per reakaleek + Mpdreamz). - Add a regression test for instantiate-and-dispose with no emissions (per Mpdreamz). Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

♻️ Duplicate comments (1)

src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs (1)

101-104: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_readerStarted startup race can drop diagnostics and skip shutdown waiting.

Between StartAsync returning and the delegate setting _readerStarted on Line 53, StopAsync can early-return and Write can drop items even though startup is already in-flight.

Suggested minimal fix

 public virtual async Task StopAsync(Cancel cancellationToken)
 {
 	Channel.TryComplete();
-	// No reader was ever scheduled, so Write() gated everything out and the
-	// channel is empty — nothing to await, nothing to drain.
-	if (!_readerStarted)
+	if (_started is null)
 		return;
 
 	try
 	{
 		await _started!;
 	}
 	catch (OperationCanceledException)
 	{
 		// Reader was canceled before its final Drain(); mop up below.
 	}
 	// Defensive: if the reader exited early via cancellation, items may
 	// still be queued. Drain them synchronously so they're not lost.
 	Drain();
 }

 public virtual void Write(Diagnostic diagnostic)
 {
 	IncrementSeverityCount(diagnostic);
-	if (_readerStarted)
+	if (_readerStarted || _started is { IsCompleted: false })
 		Channel.Write(diagnostic);
 }

Also applies to: 127-128

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs` around lines
101 - 104, Startup race: _readerStarted may be observed false by StopAsync and
Write while StartAsync is still in-flight, causing dropped diagnostics and
skipped shutdown waits. Fix by ensuring _readerStarted is set before StartAsync
returns and before scheduling the reader delegate (or use an
atomic/interlocked/volatile write or a TaskCompletionSource that StartAsync
awaits and StopAsync/Write await/check) — update the startup path in StartAsync
(the code that schedules the reader delegate and currently sets _readerStarted)
to set the flag (or complete the TCS) deterministically before returning, and
change the StopAsync and Write checks (the same checks referenced at 127-128) to
observe that same atomic/awaitable signal so they never early-return while
startup is in-flight.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs`:
- Around line 101-104: Startup race: _readerStarted may be observed false by
StopAsync and Write while StartAsync is still in-flight, causing dropped
diagnostics and skipped shutdown waits. Fix by ensuring _readerStarted is set
before StartAsync returns and before scheduling the reader delegate (or use an
atomic/interlocked/volatile write or a TaskCompletionSource that StartAsync
awaits and StopAsync/Write await/check) — update the startup path in StartAsync
(the code that schedules the reader delegate and currently sets _readerStarted)
to set the flag (or complete the TCS) deterministically before returning, and
change the StopAsync and Write checks (the same checks referenced at 127-128) to
observe that same atomic/awaitable signal so they never early-return while
startup is in-flight.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5a79be47-e79d-4ae2-a326-2807def13cf7

📥 Commits

Reviewing files that changed from the base of the PR and between 026c15d and 25979cd.

📒 Files selected for processing (3)

src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs
src/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cs
tests/Elastic.Changelog.Tests/Changelogs/DiagnosticsCollectorDisposeTests.cs

Per review: only flip the flag once the reader has cleared its initial setup, so it strictly means "the delegate executed past the first await". A canceled token or a Task.Run short-circuit leaves the flag false. Co-authored-by: Cursor <cursoragent@cursor.com>

…el_stale Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cs

The Write gate (`if (_readerStarted) Channel.Write(...)`) had a scheduling race: between StartAsync returning and the Task.Run delegate hitting `await Channel.WaitToWrite`, any emitted diagnostic was silently dropped. This broke the F# authoring suite, which synchronously schedules the reader and emits diagnostics before the reader had a chance to flip the flag. - Write unconditionally — UnboundedChannel never blocks, and items in an undrained channel are GC'd with the collector. - StopAsync now gates on `_started is null` (was StartAsync ever called?) rather than `_readerStarted` (is the reader executing?). Once StartAsync was called with a non-canceled token, awaiting `_started` is always meaningful: it completes when the reader exits after draining the channel. - IsStarted still backed by `_readerStarted` for accurate introspection (execution proof, per earlier review feedback). - Drop the leftover //TODO in DiagnosticsChannel.Write. Co-authored-by: Cursor <cursoragent@cursor.com>

Prevent deadlock on DiagnosticsCollector

026c15d

cotti self-assigned this May 19, 2026

cotti requested a review from a team as a code owner May 19, 2026 05:31

cotti added the fix label May 19, 2026

cotti requested a review from Mpdreamz May 19, 2026 05:31

coderabbitai Bot requested changes May 19, 2026

View reviewed changes

Comment thread src/Elastic.Documentation/Diagnostics/DiagnosticsCollector.cs Outdated

reakaleek reviewed May 19, 2026

View reviewed changes

Mpdreamz requested changes May 19, 2026

View reviewed changes

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

coderabbitai Bot approved these changes May 20, 2026

View reviewed changes

cotti and others added 2 commits May 20, 2026 10:44

Merge remote-tracking branch 'origin/main' into fix/diagnostics_chann…

b983a5d

…el_stale Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # src/Elastic.Documentation/Diagnostics/IDiagnosticsCollector.cs

cotti temporarily deployed to integration-tests May 20, 2026 13:49 — with GitHub Actions Inactive

cotti requested a review from Mpdreamz May 20, 2026 13:51

Mpdreamz approved these changes May 20, 2026

View reviewed changes

cotti temporarily deployed to integration-tests May 20, 2026 14:08 — with GitHub Actions Inactive

cotti enabled auto-merge (squash) May 20, 2026 14:10

cotti merged commit 0464e10 into main May 20, 2026
24 checks passed

cotti deleted the fix/diagnostics_channel_stale branch May 20, 2026 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent deadlock on DiagnosticsCollector#3348

Prevent deadlock on DiagnosticsCollector#3348
cotti merged 5 commits into
mainfrom
fix/diagnostics_channel_stale

cotti commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

reakaleek May 19, 2026

Uh oh!

Mpdreamz May 19, 2026

Uh oh!

Mpdreamz left a comment

Uh oh!

Mpdreamz left a comment •

edited

Loading

Uh oh!

Mpdreamz May 19, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cotti commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reakaleek May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Mpdreamz May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Mpdreamz left a comment

Choose a reason for hiding this comment

Uh oh!

Mpdreamz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Mpdreamz May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Mpdreamz left a comment •

edited

Loading