Cancel outstanding threads prior to closing sqlite connection by crtschin · Pull Request #4886 · haskell/haskell-language-server

crtschin · 2026-04-05T20:13:13Z

From the investigation in #4884 (comment).

I was repeatedly running the ghcide-tests to debug flakiness, and next to the EOF closed handle reads, I also encountered segfaults. Investigating the latter further via the coredumps showed some sqlite functions, suggesting use-after-free of the sqlite handle.

The problem here is that the connections are spawned and given to numerous async threads, either to handle the request, or via the shake session. These async threads outlive the workerthread lifetimes, leading to racey use-after-frees. Solve this by both keeping track of live threads and cancelling those, and additionally issuing a shake session shutdown prior to leaving. The segfaults disappear with these changes.

Edit: Removed the cancellation of the threads that were handling the request, and only included the shake shutdown prior to leaving scope. This was enough to avoid segfaults in my case (running a small portion of the testsuite 100x repeatedly), and stops the tests on CI from failing.

fendor

Amazing detective work! CI was green at the first attempt, if this fixes the segmentation faults we have been observing on windows, this would be a huge win!

I think the idea is correct, but the exact location for shutdown feels not quite correct.
Perhaps we need to cancel the worker threads in runWithWorkerThreads? Then a comment could state the invariant to avoid use-after-free.

soulomoon · 2026-04-08T14:33:45Z

Great detective work @crtschin !

Some Threads(ShakeRestart and Sessionloader) in runWithWorkerThreads(3 of them in total now) are the controlling threads over the shake session's threads. And only the DB thread would be used in shake session's threads.

If we cancel the shake session before leaving the scope of the body of runWithWorkerThreads, shakeRestart might still wake up shake session's threads.

Perhaps a better idea would be, shutdown the shake session right before shutting down the DB threads but after the shutting down of ShakeRestart thread and Sessionloader thread inside the runWithWorkerThreads as @fendor suggested.

PS. I was wondering why the problem only happens for windows.
and if we could be more informed to the use-after-free problem(Such as replacing the handle with something like error "xxx is already free").

I was wrong, Sessionloader would be called inside shake session's threads(Rule GhcSession). and Sessionloader would call ShakeRestart hence it creates cycling depedencies. 0 0 It is tricky. Perhaps, do so in three folds, first stop Sessionloader's worker and then ShakeRestart's woker, then shutdown the shake session, finally close the queues of Sessionloader and ShakeRestart.

crtschin · 2026-04-08T19:32:38Z

I was wrong, Sessionloader would be called inside shake session's threads(Rule GhcSession). and Sessionloader would call ShakeRestart hence it creates cycling depedencies.

That's fine no? Following your and @fendor's suggestion, I'm thinking shutting down the shake session at the following spot:

diff --git a/ghcide/src/Development/IDE/LSP/LanguageServer.hs b/ghcide/src/Development/IDE/LSP/LanguageServer.hs
index 72720e302..40d34d2ca 100644
--- a/ghcide/src/Development/IDE/LSP/LanguageServer.hs
+++ b/ghcide/src/Development/IDE/LSP/LanguageServer.hs
@@ -354,6 +354,7 @@ handleInit lifecycleCtx env (TRequestMessage _ _ m params) = otTracedHandler "In
 runWithWorkerThreads :: Recorder (WithPriority Session.Log) -> FilePath -> (WithHieDb -> ThreadQueue -> IO ()) -> IO ()
 runWithWorkerThreads recorder dbLoc f = evalContT $ do
             (WithHieDbShield hiedb, threadQueue) <- runWithDb recorder dbLoc
+            -- Shutdown session here, note that shutdown happens bottom->up
             sessionRestartTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "RestartTQueue"
             sessionLoaderTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "SessionLoaderTQueue"
             liftIO $ f hiedb (ThreadQueue threadQueue sessionRestartTQueue sessionLoaderTQueue)

So this'd be after the restart and loader threads have closed down. If these threads are closed down, then they shouldn't be accessing the connection anymore. Restarting also wouldn't occur because the thread that does it is now finished.

crtschin · 2026-04-08T19:39:02Z

PS. I was wondering why the problem only happens for windows.

Forgot to mention, I'm running into the segfaults locally on my linux machine when running the following command repeatedly: cabal run ghcide-tests -- -p '/constructor hover (#2904)/'.

I assume this occurs more often with the above, because I'm running with 6 capabilities and those tests specifically are very quick leaving a big potential gap between everything being loaded in and the threads being torn down.

soulomoon

That's fine no? Following your and @fendor's suggestion, I'm thinking shutting down the shake session at the following spot:

yes, put it there look good to me.

I said so

fendor

LGTM, thank you for fixing this incredible strain on our CI!
@soulomoon could you merge this PR once you are happy with it?

soulomoon

Good job, it looks good to me too. @crtschin would you like to squash and merge.

crtschin · 2026-04-12T11:11:13Z

Squashed my changes, but I can't merge. I don't have write access.

The shake session holds references to the sqlite connection. When the stop signal is given and the scope is exitted, the sqlite connections are closed, which outstanding threads may still be using. This leads to use-after-frees. Ensure the session is shutdown prior to leaving the worker scope.

crtschin · 2026-04-12T16:05:47Z

Rebased, rerunning CI before enabling automerge again

crtschin force-pushed the crtschin/fix-sqlite-leak branch 2 times, most recently from 9b542d3 to 054b254 Compare April 7, 2026 21:41

crtschin marked this pull request as ready for review April 7, 2026 23:20

crtschin requested a review from wz1000 as a code owner April 7, 2026 23:20

fendor requested a review from soulomoon April 8, 2026 07:15

fendor reviewed Apr 8, 2026

View reviewed changes

Comment thread ghcide/src/Development/IDE/LSP/LanguageServer.hs Outdated

fendor previously requested changes Apr 8, 2026

View reviewed changes

Comment thread ghcide/src/Development/IDE/LSP/LanguageServer.hs Outdated

Comment thread ghcide/src/Development/IDE/LSP/LanguageServer.hs Outdated

soulomoon reviewed Apr 9, 2026

View reviewed changes

Comment thread ghcide/src/Development/IDE/LSP/LanguageServer.hs

soulomoon reviewed Apr 9, 2026

View reviewed changes

Comment thread ghcide/src/Development/IDE/LSP/LanguageServer.hs Outdated

crtschin mentioned this pull request Apr 9, 2026

Ignore SessionException during test cleanup from listener thread haskell/lsp#637

Closed

fendor approved these changes Apr 10, 2026

View reviewed changes

soulomoon approved these changes Apr 11, 2026

View reviewed changes

crtschin force-pushed the crtschin/fix-sqlite-leak branch from b81cfa7 to da12523 Compare April 12, 2026 11:10

crtschin enabled auto-merge (squash) April 12, 2026 15:59

crtschin disabled auto-merge April 12, 2026 16:04

crtschin force-pushed the crtschin/fix-sqlite-leak branch from da12523 to 1057d68 Compare April 12, 2026 16:05

crtschin merged commit 0db92cd into haskell:master Apr 12, 2026
71 of 76 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cancel outstanding threads prior to closing sqlite connection#4886

Cancel outstanding threads prior to closing sqlite connection#4886
crtschin merged 1 commit intohaskell:masterfrom
crtschin:crtschin/fix-sqlite-leak

crtschin commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

fendor left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

soulomoon commented Apr 8, 2026 •

edited

Loading

Uh oh!

crtschin commented Apr 8, 2026 •

edited

Loading

Uh oh!

crtschin commented Apr 8, 2026

Uh oh!

soulomoon left a comment

Uh oh!

Uh oh!

Uh oh!

fendor left a comment

Uh oh!

soulomoon left a comment •

edited

Loading

Uh oh!

crtschin commented Apr 12, 2026

Uh oh!

crtschin commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

crtschin commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fendor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

soulomoon commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented Apr 8, 2026

Uh oh!

soulomoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fendor left a comment

Choose a reason for hiding this comment

Uh oh!

soulomoon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crtschin commented Apr 12, 2026

Uh oh!

crtschin commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

crtschin commented Apr 5, 2026 •

edited

Loading

fendor left a comment •

edited

Loading

soulomoon commented Apr 8, 2026 •

edited

Loading

crtschin commented Apr 8, 2026 •

edited

Loading

soulomoon left a comment •

edited

Loading