[Bug]: Regression: OpenCode probe blocks cold start 45-75s (no timeout)

### Before submitting

- [x] I searched existing issues and did not find a duplicate.
- [x] I included enough detail to reproduce or investigate the problem.

### Area

apps/server

### Steps to reproduce

1. Install the `opencode` CLI (`/usr/bin/opencode`, v1.4.7 on this box). Do not configure an external `serverUrl`, so T3 Code will spawn its own local OpenCode server for the probe.
2. Leave the default setting `providers.opencode.enabled: true`.
3. Cold-start T3 Code and measure the gap between `Running all migrations...` / `Migrations ran successfully` and `Listening on http://...` in `~/.t3/userdata/logs/server-child.log`.

### Expected behavior

Backend HTTP readiness (`Listening on ...`) does not block on any one provider's probe. A slow or hung provider should degrade that provider's status display, not delay every other layer that transitively depends on `ProviderRegistry`.

### Actual behavior

On this install, cold starts routinely block 45-75s waiting for the OpenCode probe. The gap before `Listening on ...` is dominated by `checkOpenCodeProviderStatus`, and it is unbounded because `runOpenCodeSdk` has no timeout.

Measured from one run of `apps/server/src/telemetry` trace output (`server.trace.ndjson`) over 56 `checkOpenCodeProviderStatus` spans during normal use:

| metric | duration |
|---|---|
| min | 3.6s |
| p50 | 47s |
| p90 | 55s |
| max | 73s |

Independently verified by toggling the setting. With `providers.opencode.enabled: false`, cold start drops by 45-70s on the same install:

| Scenario | DB size | opencode.enabled | Migrations -> Listening |
|---|---|---|---|
| Small DB | 30 MB | true | ~50s+ (varies) |
| Small DB | 30 MB | false | **1.58s** |
| Large DB | 584 MB | false | **3.51s** |

The DB-size axis is a separate, much smaller (~2s) issue tracked in #2245. The remaining tens of seconds are entirely the OpenCode probe.

### Root cause

Three mechanisms compose to make one provider's slow probe a global cold-start blocker:

1. `makeManagedServerProvider` (`apps/server/src/provider/makeManagedServerProvider.ts:140`) forks the initial snapshot with `Effect.forkScoped`, but that fiber is serialized against every subsequent `.getSnapshot` call via `refreshSemaphore.withPermits(1)` (`makeManagedServerProvider.ts:121`). So `getSnapshot` is effectively synchronous with the initial probe:
   ```ts
   const applySnapshot = (nextSettings, options?) =>
     refreshSemaphore.withPermits(1)(applySnapshotBase(nextSettings, options));
   // ...
   yield* applySnapshot(initialSettings, { forceRefresh: true }).pipe(
     Effect.ignoreCause({ log: true }),
     Effect.forkScoped,
   );
   // ...
   return {
     getSnapshot: input.getSettings.pipe(Effect.flatMap(applySnapshot), ...),
     ...
   };
   ```

2. `ProviderRegistryLive` calls `.getSnapshot` on every provider during layer acquisition (`apps/server/src/provider/Layers/ProviderRegistry.ts:266`), so the Layer body waits on the forked OpenCode probe before it can return:
   ```ts
   yield* loadProviders(providerSources).pipe(
     Effect.flatMap((providers) => upsertProviders(providers, { publish: false })),
   );
   ```

3. `runOpenCodeSdk` (`apps/server/src/provider/opencodeRuntime.ts:78`) wraps the OpenCode SDK calls (`client.provider.list()`, `client.app.agents()`) in `Effect.tryPromise` with no timeout. `loadOpenCodeInventory` runs them concurrently with `{ concurrency: "unbounded" }`, but either one hanging is enough to hold the probe:
   ```ts
   export const runOpenCodeSdk = <A>(operation, fn) =>
     Effect.tryPromise({ try: fn, catch: ... })
       .pipe(Effect.withSpan(`opencode.${operation}`));
   ```
   `startOpenCodeServerProcess` does have a timeout (`opencodeRuntime.ts:406`, `Deferred.await(readyDeferred).pipe(Effect.timeoutOption(timeoutMs))`), but that only bounds server startup; once the local server reports ready, the SDK calls against it are unbounded.

Because `ProviderRegistryLive` is a transitive dependency of the HTTP router layer, the `Listening on ...` log cannot fire until the probe returns.

### Impact

Major degradation or frequent failure. Cold-start delay is the visible symptom, but the real problem is that any single provider being slow (for any reason: slow disk I/O spawning the process, OpenCode internal slowness, a wedged local server) holds up the whole backend. On this install the user-visible gap before the app becomes usable was 50-75s per launch.

### Version or commit

Observed on a local build of `main` at `f6978db6` (the `package.json` version string still reads `0.0.20`, but that commit is post-v0.0.20 and corresponds to the `v0.0.21-nightly.20260420.77` release).

This is a regression in the post-v0.0.20 nightly series:

- v0.0.20 stable (tag `v0.0.20`, commit `b2cca674`, 2026-04-16) did not have OpenCode provider support at all, so it was not affected.
- `feat: add opencode provider support` (#1758, commit `ce94feee`, 2026-04-17) introduced `OpenCodeProviderLive` using `makeManagedServerProvider` and `runOpenCodeSdk` without any timeout around the SDK calls. This commit first shipped in nightly `v0.0.21-nightly.20260420.75` (commit `66c326b8`).
- `Refactor OpenCode lifecycle and structured output handling` (#2218, commit `306ec4bb`, 2026-04-19) reshaped the lifecycle but did not add a timeout to the probe; shipped in nightly `v0.0.21-nightly.20260420.77`.

Users who stayed on v0.0.20 stable will not see this; users who updated to any nightly from `v0.0.21-nightly.20260420.75` onward and have `opencode` installed locally will.

### Environment

Linux (niri, Wayland), Electron 40.6.0, AppImage. `opencode` v1.4.7 at `/usr/bin/opencode`. No external `serverUrl` configured (local server path). Default `timeoutMs` for server startup.

### Workaround

Disable the OpenCode probe via `~/.t3/userdata/settings.json`:

```json
{
  "providers": {
    "opencode": {
      "enabled": false
    }
  }
}
```

This hits the early-return at `apps/server/src/provider/Layers/OpenCodeProvider.ts:308` and skips the probe entirely. Verified end-to-end: cold start drops from 50-75s to ~1.6s on the same DB. The OpenCode provider then shows as disabled in the UI.

### Possible directions

A few ideas, though will likely leave the decisions to the maintainers:

1. Bound `checkOpenCodeProviderStatus` with a timeout (e.g. a few seconds total, covering version + inventory). On timeout, fall through to the existing `fallback(...)` path the function already uses for errors, and surface the timeout in the probe status. This is the smallest fix and localizes the bound to the provider.
2. Add a timeout inside `runOpenCodeSdk` itself (or a `timeoutOption`-based wrapper around each SDK call) so `client.provider.list()` and `client.app.agents()` cannot hang indefinitely. Similar bound, different seam.
3. Make `ProviderRegistryLive` layer acquisition non-blocking on the first probe: have `loadProviders` consume the already-persisted `initialSnapshot` (the "pending" stub from `makePendingOpenCodeProvider` already exists for this) and let `streamChanges` publish the real snapshot when the forked probe finishes. That decouples every provider's slowness from `Listening`.
4. Drop the `refreshSemaphore` serialization between the forked initial probe and `.getSnapshot`, or have `.getSnapshot` return the current cached snapshot without waiting for an in-flight refresh. Today a caller asking "what's the current snapshot?" during startup is forced to wait for a refresh it didn't request.

### Related

- #2245 — linear-in-history DB decode at cold start (`getSnapshot` full-table decode). Separate, much smaller (~2s) contributor that was initially conflated with this one. Fixing that issue alone would leave the 45-75s probe blocker in place; fixing this issue alone would leave the ~2s DB cost.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Regression: OpenCode probe blocks cold start 45-75s (no timeout) #2248

Before submitting

Area

Steps to reproduce

Expected behavior

Actual behavior

Root cause

Impact

Version or commit

Environment

Workaround

Possible directions

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scenario	DB size	opencode.enabled	Migrations -> Listening
Small DB	30 MB	true	~50s+ (varies)
Small DB	30 MB	false	1.58s
Large DB	584 MB	false	3.51s

[Bug]: Regression: OpenCode probe blocks cold start 45-75s (no timeout) #2248

Description

Before submitting

Area

Steps to reproduce

Expected behavior

Actual behavior

Root cause

Impact

Version or commit

Environment

Workaround

Possible directions

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions