You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing issues and did not find a duplicate.
I included enough detail to reproduce or investigate the problem.
Area
apps/server
Steps to reproduce
Install the opencode CLI (/usr/bin/opencode, v1.4.7 on this box). Do not configure an external serverUrl, so T3 Code will spawn its own local OpenCode server for the probe.
Leave the default setting providers.opencode.enabled: true.
Cold-start T3 Code and measure the gap between Running all migrations... / Migrations ran successfully and Listening on http://... in ~/.t3/userdata/logs/server-child.log.
Expected behavior
Backend HTTP readiness (Listening on ...) does not block on any one provider's probe. A slow or hung provider should degrade that provider's status display, not delay every other layer that transitively depends on ProviderRegistry.
Actual behavior
On this install, cold starts routinely block 45-75s waiting for the OpenCode probe. The gap before Listening on ... is dominated by checkOpenCodeProviderStatus, and it is unbounded because runOpenCodeSdk has no timeout.
Measured from one run of apps/server/src/telemetry trace output (server.trace.ndjson) over 56 checkOpenCodeProviderStatus spans during normal use:
metric
duration
min
3.6s
p50
47s
p90
55s
max
73s
Independently verified by toggling the setting. With providers.opencode.enabled: false, cold start drops by 45-70s on the same install:
Scenario
DB size
opencode.enabled
Migrations -> Listening
Small DB
30 MB
true
~50s+ (varies)
Small DB
30 MB
false
1.58s
Large DB
584 MB
false
3.51s
The DB-size axis is a separate, much smaller (~2s) issue tracked in #2245. The remaining tens of seconds are entirely the OpenCode probe.
Root cause
Three mechanisms compose to make one provider's slow probe a global cold-start blocker:
makeManagedServerProvider (apps/server/src/provider/makeManagedServerProvider.ts:140) forks the initial snapshot with Effect.forkScoped, but that fiber is serialized against every subsequent .getSnapshot call via refreshSemaphore.withPermits(1) (makeManagedServerProvider.ts:121). So getSnapshot is effectively synchronous with the initial probe:
ProviderRegistryLive calls .getSnapshot on every provider during layer acquisition (apps/server/src/provider/Layers/ProviderRegistry.ts:266), so the Layer body waits on the forked OpenCode probe before it can return:
runOpenCodeSdk (apps/server/src/provider/opencodeRuntime.ts:78) wraps the OpenCode SDK calls (client.provider.list(), client.app.agents()) in Effect.tryPromise with no timeout. loadOpenCodeInventory runs them concurrently with { concurrency: "unbounded" }, but either one hanging is enough to hold the probe:
startOpenCodeServerProcess does have a timeout (opencodeRuntime.ts:406, Deferred.await(readyDeferred).pipe(Effect.timeoutOption(timeoutMs))), but that only bounds server startup; once the local server reports ready, the SDK calls against it are unbounded.
Because ProviderRegistryLive is a transitive dependency of the HTTP router layer, the Listening on ... log cannot fire until the probe returns.
Impact
Major degradation or frequent failure. Cold-start delay is the visible symptom, but the real problem is that any single provider being slow (for any reason: slow disk I/O spawning the process, OpenCode internal slowness, a wedged local server) holds up the whole backend. On this install the user-visible gap before the app becomes usable was 50-75s per launch.
Version or commit
Observed on a local build of main at f6978db6 (the package.json version string still reads 0.0.20, but that commit is post-v0.0.20 and corresponds to the v0.0.21-nightly.20260420.77 release).
This is a regression in the post-v0.0.20 nightly series:
v0.0.20 stable (tag v0.0.20, commit b2cca674, 2026-04-16) did not have OpenCode provider support at all, so it was not affected.
feat: add opencode provider support (feat: add opencode provider support #1758, commit ce94feee, 2026-04-17) introduced OpenCodeProviderLive using makeManagedServerProvider and runOpenCodeSdk without any timeout around the SDK calls. This commit first shipped in nightly v0.0.21-nightly.20260420.75 (commit 66c326b8).
Refactor OpenCode lifecycle and structured output handling (Refactor OpenCode lifecycle and structured output handling #2218, commit 306ec4bb, 2026-04-19) reshaped the lifecycle but did not add a timeout to the probe; shipped in nightly v0.0.21-nightly.20260420.77.
Users who stayed on v0.0.20 stable will not see this; users who updated to any nightly from v0.0.21-nightly.20260420.75 onward and have opencode installed locally will.
Environment
Linux (niri, Wayland), Electron 40.6.0, AppImage. opencode v1.4.7 at /usr/bin/opencode. No external serverUrl configured (local server path). Default timeoutMs for server startup.
Workaround
Disable the OpenCode probe via ~/.t3/userdata/settings.json:
This hits the early-return at apps/server/src/provider/Layers/OpenCodeProvider.ts:308 and skips the probe entirely. Verified end-to-end: cold start drops from 50-75s to ~1.6s on the same DB. The OpenCode provider then shows as disabled in the UI.
Possible directions
A few ideas, though will likely leave the decisions to the maintainers:
Bound checkOpenCodeProviderStatus with a timeout (e.g. a few seconds total, covering version + inventory). On timeout, fall through to the existing fallback(...) path the function already uses for errors, and surface the timeout in the probe status. This is the smallest fix and localizes the bound to the provider.
Add a timeout inside runOpenCodeSdk itself (or a timeoutOption-based wrapper around each SDK call) so client.provider.list() and client.app.agents() cannot hang indefinitely. Similar bound, different seam.
Make ProviderRegistryLive layer acquisition non-blocking on the first probe: have loadProviders consume the already-persisted initialSnapshot (the "pending" stub from makePendingOpenCodeProvider already exists for this) and let streamChanges publish the real snapshot when the forked probe finishes. That decouples every provider's slowness from Listening.
Drop the refreshSemaphore serialization between the forked initial probe and .getSnapshot, or have .getSnapshot return the current cached snapshot without waiting for an in-flight refresh. Today a caller asking "what's the current snapshot?" during startup is forced to wait for a refresh it didn't request.
Related
[Bug]: ~2s cold start from decoding archived+deleted threads (linear) #2245 — linear-in-history DB decode at cold start (getSnapshot full-table decode). Separate, much smaller (~2s) contributor that was initially conflated with this one. Fixing that issue alone would leave the 45-75s probe blocker in place; fixing this issue alone would leave the ~2s DB cost.
Before submitting
Area
apps/server
Steps to reproduce
opencodeCLI (/usr/bin/opencode, v1.4.7 on this box). Do not configure an externalserverUrl, so T3 Code will spawn its own local OpenCode server for the probe.providers.opencode.enabled: true.Running all migrations.../Migrations ran successfullyandListening on http://...in~/.t3/userdata/logs/server-child.log.Expected behavior
Backend HTTP readiness (
Listening on ...) does not block on any one provider's probe. A slow or hung provider should degrade that provider's status display, not delay every other layer that transitively depends onProviderRegistry.Actual behavior
On this install, cold starts routinely block 45-75s waiting for the OpenCode probe. The gap before
Listening on ...is dominated bycheckOpenCodeProviderStatus, and it is unbounded becauserunOpenCodeSdkhas no timeout.Measured from one run of
apps/server/src/telemetrytrace output (server.trace.ndjson) over 56checkOpenCodeProviderStatusspans during normal use:Independently verified by toggling the setting. With
providers.opencode.enabled: false, cold start drops by 45-70s on the same install:The DB-size axis is a separate, much smaller (~2s) issue tracked in #2245. The remaining tens of seconds are entirely the OpenCode probe.
Root cause
Three mechanisms compose to make one provider's slow probe a global cold-start blocker:
makeManagedServerProvider(apps/server/src/provider/makeManagedServerProvider.ts:140) forks the initial snapshot withEffect.forkScoped, but that fiber is serialized against every subsequent.getSnapshotcall viarefreshSemaphore.withPermits(1)(makeManagedServerProvider.ts:121). SogetSnapshotis effectively synchronous with the initial probe:ProviderRegistryLivecalls.getSnapshoton every provider during layer acquisition (apps/server/src/provider/Layers/ProviderRegistry.ts:266), so the Layer body waits on the forked OpenCode probe before it can return:runOpenCodeSdk(apps/server/src/provider/opencodeRuntime.ts:78) wraps the OpenCode SDK calls (client.provider.list(),client.app.agents()) inEffect.tryPromisewith no timeout.loadOpenCodeInventoryruns them concurrently with{ concurrency: "unbounded" }, but either one hanging is enough to hold the probe:startOpenCodeServerProcessdoes have a timeout (opencodeRuntime.ts:406,Deferred.await(readyDeferred).pipe(Effect.timeoutOption(timeoutMs))), but that only bounds server startup; once the local server reports ready, the SDK calls against it are unbounded.Because
ProviderRegistryLiveis a transitive dependency of the HTTP router layer, theListening on ...log cannot fire until the probe returns.Impact
Major degradation or frequent failure. Cold-start delay is the visible symptom, but the real problem is that any single provider being slow (for any reason: slow disk I/O spawning the process, OpenCode internal slowness, a wedged local server) holds up the whole backend. On this install the user-visible gap before the app becomes usable was 50-75s per launch.
Version or commit
Observed on a local build of
mainatf6978db6(thepackage.jsonversion string still reads0.0.20, but that commit is post-v0.0.20 and corresponds to thev0.0.21-nightly.20260420.77release).This is a regression in the post-v0.0.20 nightly series:
v0.0.20, commitb2cca674, 2026-04-16) did not have OpenCode provider support at all, so it was not affected.feat: add opencode provider support(feat: add opencode provider support #1758, commitce94feee, 2026-04-17) introducedOpenCodeProviderLiveusingmakeManagedServerProviderandrunOpenCodeSdkwithout any timeout around the SDK calls. This commit first shipped in nightlyv0.0.21-nightly.20260420.75(commit66c326b8).Refactor OpenCode lifecycle and structured output handling(Refactor OpenCode lifecycle and structured output handling #2218, commit306ec4bb, 2026-04-19) reshaped the lifecycle but did not add a timeout to the probe; shipped in nightlyv0.0.21-nightly.20260420.77.Users who stayed on v0.0.20 stable will not see this; users who updated to any nightly from
v0.0.21-nightly.20260420.75onward and haveopencodeinstalled locally will.Environment
Linux (niri, Wayland), Electron 40.6.0, AppImage.
opencodev1.4.7 at/usr/bin/opencode. No externalserverUrlconfigured (local server path). DefaulttimeoutMsfor server startup.Workaround
Disable the OpenCode probe via
~/.t3/userdata/settings.json:{ "providers": { "opencode": { "enabled": false } } }This hits the early-return at
apps/server/src/provider/Layers/OpenCodeProvider.ts:308and skips the probe entirely. Verified end-to-end: cold start drops from 50-75s to ~1.6s on the same DB. The OpenCode provider then shows as disabled in the UI.Possible directions
A few ideas, though will likely leave the decisions to the maintainers:
checkOpenCodeProviderStatuswith a timeout (e.g. a few seconds total, covering version + inventory). On timeout, fall through to the existingfallback(...)path the function already uses for errors, and surface the timeout in the probe status. This is the smallest fix and localizes the bound to the provider.runOpenCodeSdkitself (or atimeoutOption-based wrapper around each SDK call) soclient.provider.list()andclient.app.agents()cannot hang indefinitely. Similar bound, different seam.ProviderRegistryLivelayer acquisition non-blocking on the first probe: haveloadProvidersconsume the already-persistedinitialSnapshot(the "pending" stub frommakePendingOpenCodeProvideralready exists for this) and letstreamChangespublish the real snapshot when the forked probe finishes. That decouples every provider's slowness fromListening.refreshSemaphoreserialization between the forked initial probe and.getSnapshot, or have.getSnapshotreturn the current cached snapshot without waiting for an in-flight refresh. Today a caller asking "what's the current snapshot?" during startup is forced to wait for a refresh it didn't request.Related
getSnapshotfull-table decode). Separate, much smaller (~2s) contributor that was initially conflated with this one. Fixing that issue alone would leave the 45-75s probe blocker in place; fixing this issue alone would leave the ~2s DB cost.