Skip to content

Await pending session DB writes on agent host shutdown#311432

Merged
roblourens merged 3 commits intomainfrom
roblou/agents/ci-test-flake-investigation
Apr 20, 2026
Merged

Await pending session DB writes on agent host shutdown#311432
roblourens merged 3 commits intomainfrom
roblou/agents/ci-test-flake-investigation

Conversation

@roblourens
Copy link
Copy Markdown
Member

@roblourens roblourens commented Apr 20, 2026

Fixes a flake in the Protocol WebSocket - Session Config persistence across restarts > persisted config values are restored on subscribe after server restart integration test, where the assertion saw the previous branch: 'main' value instead of the updated branch: 'release'.

Root cause

The agent host server's SIGTERM/SIGINT handler called process.exit(0) synchronously, abandoning any fire-and-forget SQLite writes that were still in flight — configValues, customTitle, isRead/isDone, and per-session diffs. Under CI load the most recent SessionConfigChanged write could lose the race against shutdown, leaving the prior value persisted.

Fix

Track in-flight writes inside SessionDatabase via a _pendingWrites set populated by every public mutating method. The outermost wrap is required so the await this._ensureDb() window is also covered — tracking only the leaf dbRun/dbExec would miss the gap between the method being called and the query actually being queued.

SessionDataService aggregates whenIdle() across all live per-session DBs (tracked via the existing ref-counted collection). The server's shutdown handler now awaits this with a 3s raceTimeout before disposing, so a stuck write cannot hang shutdown indefinitely.

Removes the await timeout(500) hack the test previously needed to mask the race.

Verification

  • 632/632 platform/agentHost tests pass locally.
  • 10/10 stability runs of the previously-flaky test pass with the fix.
  • 5/5 runs fail when the await raceTimeout(...) line is commented out, confirming the test now deterministically catches the race rather than masking it with a sleep.
  • npm run compile-check-ts-native clean.

(Written by Copilot)

skipped in https://github.com/microsoft/vscode/pull/311374/changes

The agent host server's SIGTERM/SIGINT handler called process.exit(0)
synchronously, abandoning any fire-and-forget SQLite writes that were
in flight (configValues, customTitle, isRead/isDone, diffs). Under CI
load this caused the 'Session Config persistence across restarts'
integration test to  the most recent SessionConfigChanged writeflake
could lose the race against shutdown, leaving the previous value
persisted instead.

Track in-flight writes inside SessionDatabase via a _pendingWrites set
populated by every public mutating method (the outermost wrap is
required so the await this._ensureDb() window is also covered).
SessionDataService aggregates whenIdle() across all live per-session
DBs. The server's shutdown handler now awaits this with a 3s
raceTimeout before disposing.

Removes the await timeout(500) hack the test previously needed.

(Written by Copilot)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 20, 2026 18:18
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

Screenshot Changes

Base: 3aeae51b Current: 063b0e92

Changed (1)

chat/aiCustomizations/aiCustomizationManagementEditor/McpBrowseMode/Light
Before After
before after

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a shutdown race in the Agent Host where fire-and-forget SQLite writes could be dropped on SIGTERM/SIGINT, causing flaky restart/persistence integration tests.

Changes:

  • Track in-flight write promises in SessionDatabase and expose whenIdle() to await pending writes.
  • Add ISessionDataService.whenIdle() to aggregate whenIdle() across all live per-session databases.
  • Await whenIdle() (with a 3s raceTimeout) during agent host shutdown; remove the test’s timeout(500) workaround.
Show a summary per file
File Description
src/vs/platform/agentHost/node/sessionDatabase.ts Adds pending-write tracking and whenIdle(); wraps mutating DB methods with _track().
src/vs/platform/agentHost/node/sessionDataService.ts Tracks live DB instances and implements whenIdle() across open databases.
src/vs/platform/agentHost/node/agentHostServerMain.ts Makes shutdown async and awaits DB idleness with a bounded timeout before exiting.
src/vs/platform/agentHost/common/sessionDataService.ts Extends ISessionDatabase/ISessionDataService interfaces with whenIdle().
src/vs/platform/agentHost/test/node/protocol/sessionConfig.integrationTest.ts Removes sleep-based flake workaround; relies on graceful shutdown flush.
src/vs/platform/agentHost/test/node/copilotAgent.test.ts Updates test ISessionDataService stub to include whenIdle().
src/vs/platform/agentHost/test/node/agentService.test.ts Updates session data service mocks to include whenIdle().
src/vs/platform/agentHost/test/common/sessionTestHelpers.ts Updates ISessionDatabase/ISessionDataService test helpers with whenIdle().

Copilot's findings

  • Files reviewed: 8/8 changed files
  • Comments generated: 2

Comment thread src/vs/platform/agentHost/node/agentHostServerMain.ts
Comment thread src/vs/platform/agentHost/node/sessionDataService.ts Outdated
- Close the WebSocket server before awaiting whenIdle() so no further
  actions can be dispatched during the flush window.
- Simplify SessionDataService.whenIdle(): per-DB whenIdle() already
  drains writes against existing DBs, so the outer loop only needs to
  re-pass when a NEW DB was opened during the await. Comment now
  matches the code.

(Written by Copilot)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roblourens roblourens marked this pull request as ready for review April 20, 2026 18:31
@roblourens roblourens enabled auto-merge (squash) April 20, 2026 18:31
Yoyokrazy
Yoyokrazy previously approved these changes Apr 20, 2026
@roblourens roblourens marked this pull request as draft April 20, 2026 18:39
auto-merge was automatically disabled April 20, 2026 18:39

Pull request was converted to draft

On Windows, child.kill() (SIGTERM) terminates the process unconditionally
without invoking the SIGTERM  so the in-flight setMetadata writehandler
never reaches SQLite and the second phase sees no persisted config at all.
Closing the child's stdin fires process.stdin.on('end', shutdown) on every
platform, exercising the same graceful flush path.

(Written by Copilot)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roblourens roblourens marked this pull request as ready for review April 20, 2026 18:59
@roblourens roblourens enabled auto-merge (squash) April 20, 2026 18:59
@roblourens roblourens merged commit a557dbc into main Apr 20, 2026
26 checks passed
@roblourens roblourens deleted the roblou/agents/ci-test-flake-investigation branch April 20, 2026 19:51
@vs-code-engineering vs-code-engineering Bot added this to the 1.118.0 milestone Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants