Fix server hang after "Running scheduled diagnostics refresh" by Copilot · Pull Request #3628 · microsoft/typescript-go

Copilot · 2026-04-27T21:29:25Z

Two unbounded waits on the critical path fired by Running scheduled diagnostics refresh can deadlock the server when the client is slow to respond, leaving the LSP completely unresponsive until manually restarted.

`RefreshDiagnostics` — fire and forget

workspace/diagnostic/refresh always returns null; waiting for it is unnecessary
Changed to send the request without registering a response channel or blocking
Late client responses are silently dropped by the read loop (no matching pending entry)
Short-circuits on context cancellation before sending

`updateWatch` — per-request 1s timeout on client requests with rollback

WatchFiles/UnwatchFiles calls hold session.watchesMu while synchronously waiting on client/registerCapability responses
Any client delay blocks all other updateWatch callers behind the mutex, and if requestQueue fills up, the read loop can't deliver responses — creating a self-locking deadlock
Each WatchFiles/UnwatchFiles call gets its own 1s context.WithTimeout so earlier calls don't consume the deadline for later ones
On any WatchFiles failure, ALL newly-acquired watchers in that batch are rolled back (not just the failed one) to keep refcounts clean for retry
Failed watcher IDs are tracked as "pending" in the registry so updateWatches retries them on the next snapshot update, even when the watcher identity hasn't changed

`WatchedFiles.Watchers()` — deduplicated at the source

Watchers() now deduplicates patternsInsideWorkspace and directoriesOutsideWorkspace via slices.Compact(slices.Sorted(...)) before creating FileSystemWatcher objects
This prevents refcount inflation from duplicate patterns, which would otherwise break the rollback path (duplicate keys inflate refcounts but only the first Acquire is tracked for rollback, leaving stale counts that block future retries)

`watchRegistry` — encapsulated watch bookkeeping

Extracted a watchRegistry type into internal/project/watch.go that encapsulates the ref-counted map, mutex, and provides Acquire and Release methods
Added MarkPending, ClearPending, and IsPending methods to track watcher IDs whose registration failed
All methods lock mu internally, fully encapsulating the locking discipline so callers never need to manage the mutex directly
updateWatch now delegates all bookkeeping to the registry instead of manipulating the map directly, making the function easier to follow
updateWatches checks IsPending for all watcher types (config, program files, typings, auto-imports) and retries as fresh adds when the watcher identity is unchanged but the previous registration failed

Tests

Added TestUpdateWatchTimeoutAndRollback in internal/project/watchtimeout_test.go using synctest and projecttestutil to verify timeout, rollback, and retry behavior with real session operations
Uses DidOpenFile to open a file (triggering watches that time out with a mocked slow client), then DidChangeFile with a single character edit (which does not change any watcher identity), followed by GetLanguageService to flush the pending change and trigger updateWatches
Verifies that the pending tracking causes updateWatches to retry the failed registrations and that the retry succeeds, even though the watcher identities are unchanged
Asserts that retry watcher IDs match the initial attempt IDs and that at least one retried watcher succeeds

- Make RefreshDiagnostics fire-and-forget: send the request but don't wait for a response (client always returns null). Any response from the client is silently ignored by the read loop. - Add 1s timeout to updateWatch for WatchFiles/UnwatchFiles client requests to prevent holding watchesMu indefinitely if the client is slow or unresponsive. Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/9d82d127-ea83-455b-bd36-297c9fe864b7 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/9d82d127-ea83-455b-bd36-297c9fe864b7 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot

Pull request overview

Addresses an LSP server hang triggered by “Running scheduled diagnostics refresh” by removing an unnecessary client-response wait and bounding how long watch registration holds a critical mutex.

Changes:

Make RefreshDiagnostics send workspace/diagnostic/refresh as fire-and-forget (no response channel / no wait).
Add a 1s timeout for WatchFiles/UnwatchFiles client requests made while holding session.watchesMu.
Introduce sendClientRequestFireAndForget helper for non-essential client responses.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
internal/project/session.go	Adds a watch request timeout context for client watcher register/unregister calls to avoid indefinite mutex holds.
internal/lsp/server.go	Changes diagnostics refresh to fire-and-forget and adds a helper to send client requests without awaiting responses.

andrewbranch · 2026-04-28T15:55:58Z

@copilot fix review comments

…lock, rollback + retry on WatchFiles failure Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/b1cea66d-c309-4c26-97cc-f815ab98e5fe Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot · 2026-04-28T16:14:15Z

@copilot fix review comments

Addressed all review comments in e2074f9e:

RefreshDiagnostics now checks ctx.Err() before sending
Timeout context is created after acquiring watchesMu
WatchFiles failures roll back session.watches bookkeeping and schedule a single retry after 1s

…apshot updates re-attempt naturally Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/02f6f16f-047c-455f-981d-b83603fc1df8 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/b572b8f8-1888-4b53-a86f-59468b89d099 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot · 2026-04-28T17:55:05Z

@copilot fix review comments

All review comments have been addressed across e2074f9e, ccd56581, and 60619b9d.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

internal/project/session.go:1203

pending watcher IDs are only cleared on successful registration in the newWatcher path. If a watcher times out (MarkPending) and is later removed (updateWatch called with newWatcher == nil), its pending entry is never cleared, so watchRegistry.pending can grow over the lifetime of a session. Consider calling ClearPending for oldWatcher.Watchers().WatcherID when removing/unwatching, or otherwise pruning pending entries when the corresponding watcher is no longer present.

	if oldWatcher != nil {
		w := oldWatcher.Watchers()
		watchers := append(w.WorkspaceWatchers, w.OutsideWorkspaceWatchers...)
		if len(watchers) > 0 {
			var removedIDs []WatcherID
			for _, watcher := range watchers {
				if id, removed := session.watches.Release(watcher); removed {
					removedIDs = append(removedIDs, id)
				}
			}
			for _, id := range removedIDs {
				if err := session.client.UnwatchFiles(watchCtx, id); err != nil {
					errors = append(errors, err)
				} else if logger != nil && newWatcher == nil {
					logger.Log(fmt.Sprintf("Removed watch: %s", id))
				}
			}
		}
	}

Copilot · 2026-04-29T15:46:07Z

+	// Retry config watchers whose IDs didn't change but whose previous registration failed.
+	for path, newEntry := range newSnapshot.ConfigFileRegistry.configs {
+		if oldEntry, ok := oldSnapshot.ConfigFileRegistry.configs[path]; ok {
+			if oldEntry.rootFilesWatch.ID() == newEntry.rootFilesWatch.ID() && s.watches.IsPending(newEntry.rootFilesWatch.ID()) {
+				errors = append(errors, updateWatch(ctx, s, s.logger, nil, newEntry.rootFilesWatch)...)
+			}


The pending retry path calls updateWatch(ctx, s, ..., nil, newWatcher). If the original updateWatch call partially succeeded (some WatchFiles calls returned nil before a later one timed out), the registry will still contain refcounts for the successful watchers. Retrying with oldWatcher==nil will Acquire again and permanently increment those refcounts (no matching Release), which can prevent future UnwatchFiles from ever firing and leak client registrations. To keep refcounts consistent, consider making the failure path roll back the entire parent watcher’s bookkeeping when any error occurs (not just the failed watcher), or implement a retry helper that registers only missing keys without incrementing refcounts for already-tracked entries.

Copilot · 2026-04-29T15:46:08Z

+// watchRegistry tracks the current watch globs and how many individual
+// WatchedFiles reference each glob. It guards concurrent access with a mutex
+// and provides ref-count helpers so callers don't manipulate the map directly.
+// It also tracks which parent watcher IDs have pending (failed) registrations
+// so that updateWatches can retry them even when the watcher identity hasn't changed.
+type watchRegistry struct {
+	mu      sync.Mutex
+	entries map[fileSystemWatcherKey]*fileSystemWatcherValue
+	pending map[WatcherID]struct{}
+}
+
+func newWatchRegistry() *watchRegistry {
+	return &watchRegistry{
+		entries: make(map[fileSystemWatcherKey]*fileSystemWatcherValue),
+		pending: make(map[WatcherID]struct{}),
+	}
+}
+
+// Acquire increments the ref count for a watcher. If this is the first
+// reference (count goes from 0 to 1), it returns true so the caller knows
+// to register the watcher with the client.
+func (r *watchRegistry) Acquire(watcher *lsproto.FileSystemWatcher, id WatcherID) (isNew bool) {
+	key := toFileSystemWatcherKey(watcher)
+	value := r.entries[key]
+	if value == nil {
+		value = &fileSystemWatcherValue{id: id}
+		r.entries[key] = value
+	}
+	value.count++
+	return value.count == 1
+}
+


watchRegistry is documented as guarding concurrent access with its mutex, but Acquire/Release/MarkPending/ClearPending do not lock r.mu (only IsPending does). Since session.go now locks session.watches.mu directly, the locking discipline is split across call sites, which makes it easy to accidentally call these methods without holding the mutex (data race) and undermines the stated encapsulation. Consider either (1) moving the locking inside watchRegistry methods (and removing external r.mu access), or (2) updating the type/method docs and API to make the required locking explicit (e.g., AcquireLocked/ReleaseLocked or a WithLock helper) so future callers can’t misuse it.

Copilot · 2026-04-29T15:46:08Z

+	session.watches.mu.Lock()
+	defer session.watches.mu.Unlock()
+	// Use a timeout for client requests to prevent holding the mutex indefinitely
+	// if the client is slow or unresponsive.
+	watchCtx, watchCancel := context.WithTimeout(ctx, watchRequestTimeout)
+	defer watchCancel()
 	if newWatcher != nil {
 		w := newWatcher.Watchers()
 		watchers := append(w.WorkspaceWatchers, w.OutsideWorkspaceWatchers...)
 		if len(watchers) > 0 {
 			var newWatchers collections.OrderedMap[WatcherID, *lsproto.FileSystemWatcher]
 			for i, watcher := range watchers {
-				key := toFileSystemWatcherKey(watcher)
-				value := session.watches[key]
 				globId := WatcherID(fmt.Sprintf("%s.%d", w.WatcherID, i))
-				if value == nil {
-					value = &fileSystemWatcherValue{id: globId}
-					session.watches[key] = value
-				}
-				value.count++
-				if value.count == 1 {
+				if session.watches.Acquire(watcher, globId) {
 					newWatchers.Set(globId, watcher)
 				}
 			}
 			for id, watcher := range newWatchers.Entries() {
-				if err := session.client.WatchFiles(ctx, id, []*lsproto.FileSystemWatcher{watcher}); err != nil {
+				if err := session.client.WatchFiles(watchCtx, id, []*lsproto.FileSystemWatcher{watcher}); err != nil {
+					session.watches.Release(watcher)
 					errors = append(errors, err)
 				} else if logger != nil {


updateWatch creates a single watchCtx with a 1s deadline and reuses it for all WatchFiles/UnwatchFiles calls in this update. If multiple registrations/unregistrations are needed, later calls may inherit very little time remaining (e.g., first call takes 900ms → second gets ~100ms), causing avoidable timeouts even when the client is responsive. Consider creating a fresh per-request timeout context inside the WatchFiles/UnwatchFiles loops (or otherwise resetting the deadline per call) while still ensuring the mutex is not held indefinitely.

Copilot · 2026-04-29T15:46:08Z

 			for i, watcher := range watchers {
-				key := toFileSystemWatcherKey(watcher)
-				value := session.watches[key]
 				globId := WatcherID(fmt.Sprintf("%s.%d", w.WatcherID, i))
-				if value == nil {
-					value = &fileSystemWatcherValue{id: globId}
-					session.watches[key] = value
-				}
-				value.count++
-				if value.count == 1 {
+				if session.watches.Acquire(watcher, globId) {
 					newWatchers.Set(globId, watcher)
 				}
 			}
 			for id, watcher := range newWatchers.Entries() {
-				if err := session.client.WatchFiles(ctx, id, []*lsproto.FileSystemWatcher{watcher}); err != nil {
+				if err := session.client.WatchFiles(watchCtx, id, []*lsproto.FileSystemWatcher{watcher}); err != nil {
+					session.watches.Release(watcher)
 					errors = append(errors, err)
 				} else if logger != nil {


On WatchFiles error, updateWatch rolls back bookkeeping via session.watches.Release(watcher) exactly once for that watcher. If the input watcher list contains duplicate patterns (same fileSystemWatcherKey), Acquire will increment the refcount multiple times but only one client registration is attempted; a failed registration would then leave the refcount >0, preventing future retries from ever treating that watch as “new”. This is reachable because some glob lists are only sorted (not deduped) before becoming watchers. Consider deduping watcher keys before calling Acquire, or tracking how many increments were applied per key and fully rolling them back on registration failure.

andrewbranch · 2026-04-29T16:04:37Z

@copilot address review comments

…up watchers, rollback all on failure, stronger test assertions Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/81267cbb-20f5-41d0-b92b-9f1b30604e5d Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/81267cbb-20f5-41d0-b92b-9f1b30604e5d Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot · 2026-04-29T16:24:01Z

@copilot address review comments

Addressed all five review comments in 22abe61:

Partial success refcount leak: On any WatchFiles error, ALL newly-acquired watchers are now rolled back (not just the failed one), keeping refcounts clean for retry.
Consistent locking: All watchRegistry methods now consistently require the caller to hold mu. Removed the internal lock from IsPending; callers in updateWatches now lock/unlock around each check.
Per-request timeout: Each WatchFiles/UnwatchFiles call gets its own context.WithTimeout so earlier calls don't consume the deadline for later ones.
Duplicate patterns: Watchers are deduped by key before calling Acquire to prevent refcount inflation.
Stronger test assertions: Test now verifies retry watcher IDs are a subset of the first attempt IDs and that at least one retried watcher succeeded.

andrewbranch · 2026-05-05T20:48:49Z

@copilot I don't really understand the "dedupe watchers by key to prevent refcount inflation from duplicate patterns" issue. If a watcher produced duplicate patterns, previously, the fileSystemWatcherValue corresponding to that pattern would get reffed once for each instance. During deregistration, the ref count would decrement once for each instance. The registrations were already deduped, so why does it matter what the actual ref counts were as long as they were balanced? It looks to me like you've added another unnecessary layer of deduping, but please explain if I'm misunderstanding something.

Copilot · 2026-05-05T20:55:34Z

@copilot I don't really understand the "dedupe watchers by key to prevent refcount inflation from duplicate patterns" issue. If a watcher produced duplicate patterns, previously, the fileSystemWatcher...

You're right that in the normal lifecycle the refcounts are balanced — register and deregister both iterate the same list (including duplicates), so the increments and decrements match.

The issue is specifically with the rollback path. On failure, lines 1184-1186 iterate newWatchers.Entries() to Release, but newWatchers only contains entries where Acquire returned isNew=true (count went 0→1). Without dedup, if key K appears 3 times, Acquire increments count to 3 but only the first is in newWatchers. Rollback releases once (3→2), leaving count at 2. On retry, Acquire sees count >0 and returns isNew=false, so the watcher never gets re-registered.

An alternative would be to track all acquired watchers (including duplicates) and Release each one on rollback. If you prefer that approach I can switch to it — the dedup just seemed simpler and avoids the extra Acquire/Release cycles for duplicates.

andrewbranch · 2026-05-05T21:31:02Z

@copilot understood. Instead, let's move the deduplication into WatchedFiles itself so the session code is simplified.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/dd1001aa-1bbc-4e9f-a0a3-edc2a7e29627 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot · 2026-05-05T21:48:35Z

@copilot understood. Instead, let's move the deduplication into WatchedFiles itself so the session code is simplified.

Moved in ce17015d. Watchers() now deduplicates via slices.Compact(slices.Sorted(...)) on both patternsInsideWorkspace and directoriesOutsideWorkspace before creating FileSystemWatcher objects. The dedup logic in updateWatch is removed.

andrewbranch · 2026-05-05T22:00:01Z

@copilot move mutex locking into watchRegistry methods.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/6ef1f6c5-5c45-4461-bf8c-933c384252fe Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot · 2026-05-05T22:15:36Z

@copilot move mutex locking into watchRegistry methods.

Done in a2d063ec. All watchRegistry methods (Acquire, Release, MarkPending, ClearPending, IsPending) now lock mu internally. Removed all external s.watches.mu.Lock/Unlock calls from session.go.

Initial plan

360eba7

Copilot AI assigned Copilot and andrewbranch Apr 27, 2026

Copilot started work on behalf of andrewbranch April 27, 2026 21:29 View session

Copilot AI linked an issue Apr 27, 2026 that may be closed by this pull request

Server stops responding after Running scheduled diagnostics refresh #3615

Closed

Copilot AI and others added 2 commits April 27, 2026 21:41

address review: extract watchRequestTimeout constant and improve docs

3091afe

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/9d82d127-ea83-455b-bd36-297c9fe864b7 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix server stops responding after running scheduled diagnostics refresh~~ Fix server hang after "Running scheduled diagnostics refresh" Apr 27, 2026

Copilot AI requested a review from andrewbranch April 27, 2026 21:45

Copilot finished work on behalf of andrewbranch April 27, 2026 21:45

andrewbranch marked this pull request as ready for review April 28, 2026 00:44

Copilot AI review requested due to automatic review settings April 28, 2026 00:44

Copilot started reviewing on behalf of andrewbranch April 28, 2026 00:45 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread internal/project/session.go

Comment thread internal/lsp/server.go

Comment thread internal/lsp/server.go

Comment thread internal/project/session.go Outdated

Copilot started work on behalf of andrewbranch April 28, 2026 15:56 View session

Copilot finished work on behalf of andrewbranch April 28, 2026 16:16

Copilot started work on behalf of andrewbranch April 28, 2026 17:20 View session

Copilot finished work on behalf of andrewbranch April 28, 2026 17:33

andrewbranch reviewed Apr 28, 2026

View reviewed changes

Comment thread internal/project/session.go Outdated

Copilot started work on behalf of andrewbranch April 28, 2026 17:40 View session

refactor: extract watchRegistry type to encapsulate watch bookkeeping

60619b9

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/b572b8f8-1888-4b53-a86f-59468b89d099 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot finished work on behalf of andrewbranch April 28, 2026 17:56

Copilot AI requested a review from andrewbranch April 28, 2026 17:56

andrewbranch reviewed Apr 28, 2026

View reviewed changes

Comment thread internal/project/watch.go Outdated

Copilot started work on behalf of andrewbranch April 28, 2026 18:20 View session

andrewbranch requested a review from Copilot April 29, 2026 15:37

Copilot started reviewing on behalf of andrewbranch April 29, 2026 15:38 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Copilot started work on behalf of andrewbranch April 29, 2026 16:04 View session

Copilot AI and others added 2 commits April 29, 2026 16:22

Preallocate seen map with capacity len(watchers)

72f2544

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/81267cbb-20f5-41d0-b92b-9f1b30604e5d Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot finished work on behalf of andrewbranch April 29, 2026 16:25

Copilot started work on behalf of andrewbranch May 5, 2026 20:48 View session

Copilot finished work on behalf of andrewbranch May 5, 2026 20:56

Copilot started work on behalf of andrewbranch May 5, 2026 21:31 View session

Move watcher deduplication into WatchedFiles.Watchers()

ce17015

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/dd1001aa-1bbc-4e9f-a0a3-edc2a7e29627 Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot finished work on behalf of andrewbranch May 5, 2026 21:49

Copilot started work on behalf of andrewbranch May 5, 2026 22:00 View session

Move mutex locking into watchRegistry methods

a2d063e

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/6ef1f6c5-5c45-4461-bf8c-933c384252fe Co-authored-by: andrewbranch <3277153+andrewbranch@users.noreply.github.com>

Copilot finished work on behalf of andrewbranch May 5, 2026 22:16

andrewbranch approved these changes May 6, 2026

View reviewed changes

jakebailey approved these changes May 6, 2026

View reviewed changes

andrewbranch enabled auto-merge May 6, 2026 15:42

andrewbranch added this pull request to the merge queue May 6, 2026

Merged via the queue into main with commit 095aa2f May 6, 2026
21 checks passed

andrewbranch deleted the copilot/fix-server-unresponsiveness branch May 6, 2026 16:13

Conversation

Copilot AI commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RefreshDiagnostics — fire and forget

updateWatch — per-request 1s timeout on client requests with rollback

WatchedFiles.Watchers() — deduplicated at the source

watchRegistry — encapsulated watch bookkeeping

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewbranch commented Apr 28, 2026

Uh oh!

Copilot AI commented Apr 28, 2026

Uh oh!

Uh oh!

Copilot AI commented Apr 28, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrewbranch commented Apr 29, 2026

Uh oh!

Copilot AI commented Apr 29, 2026

Uh oh!

andrewbranch commented May 5, 2026

Uh oh!

Copilot AI commented May 5, 2026

Uh oh!

andrewbranch commented May 5, 2026

Uh oh!

Copilot AI commented May 5, 2026

Uh oh!

andrewbranch commented May 5, 2026

Uh oh!

Copilot AI commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 27, 2026 •

edited

Loading

`RefreshDiagnostics` — fire and forget

`updateWatch` — per-request 1s timeout on client requests with rollback

`WatchedFiles.Watchers()` — deduplicated at the source

`watchRegistry` — encapsulated watch bookkeeping