Skip to content

[Bug]: ~2s cold start from decoding archived+deleted threads (linear) #2245

@mwolson

Description

@mwolson

Before submitting

  • I searched existing issues and did not find a duplicate.
  • I included enough detail to reproduce or investigate the problem.

Area

apps/server

Steps to reproduce

  1. Use T3 Code long enough that the projection tables accumulate hundreds of threads and 10^5+ activities. Archiving and deleting threads in the UI does not shrink them (see Root cause).
  2. Close the app and cold-start.
  3. Measure the gap between Running all migrations... / Migrations ran successfully and Listening on http://... in ~/.t3/userdata/logs/server-child.log.

Expected behavior

Cold-start cost scales with the data the user can actually see in the UI (active, non-archived, non-deleted threads), not with total all-time history.

Actual behavior

Cost scales linearly with total projection row count, including threads the user has archived and threads the user has deleted from the UI.

Measured on one install (apps/server / current main at f6978db6), with an unrelated blocking provider-probe isolated out:

Scenario DB size Active threads (UI-visible) Threads in DB Activities in DB Migrations -> Listening
Original (archived + soft-deleted still present) 584 MB 4 324 149,220 ~3.5s
After pruning archived AND soft-deleted threads and their rows 30 MB 4 4 5,493 ~1.6s

Per-row decode cost is modest (roughly 13us per activity on this box), but there is no upper bound: cost grows monotonically with history and with every thread the user "deletes" from the UI.

Two amplifiers make this worse than it needs to be:

  • projection_threads.deleted_at IS NOT NULL rows (soft-deleted from the UI) are still fully decoded at every cold start. On this install, 163 of 167 remaining threads after the archive prune were soft-deleted, and those 163 accounted for 92% of the remaining activity rows (62,233 of 67,726) and 98% of the remaining messages (9,210 of 9,444). The UI only surfaced the other 4.
  • projection_threads.archived_at IS NOT NULL rows are also fully decoded at startup, even though the archive view is gated behind a user action.

Root cause

makeOrchestrationEngine calls projectionSnapshotQuery.getSnapshot() during layer acquisition (apps/server/src/orchestration/Layers/OrchestrationEngine.ts:273). getSnapshot (apps/server/src/orchestration/Layers/ProjectionSnapshotQuery.ts:667) opens a single transaction and issues SELECT * FROM ... with ORDER BY against every projection table, then Effect-Schema-decodes every row into the in-memory read model:

  • projection_thread_activities
  • projection_thread_messages
  • projection_thread_proposed_plans
  • projection_thread_sessions
  • projection_turns
  • plus projects, pending approvals, projection state.

OrchestrationEngineLive is a transitive dependency of the routes layer, so the Listening on ... log cannot fire until this decode returns. That makes the linear-in-history cost directly observable as a cold-start delay.

There is no filter for archived or soft-deleted threads, so both categories get decoded every cold start despite being invisible in the UI.

Impact

Minor bug or occasional failure. The isolated cost on a 584 MB projection DB is only a couple of seconds here, but it is unbounded with history, it includes data the user thinks is deleted, and it compounds with any other blocking startup work.

Version or commit

0.0.20 (local build of main at f6978db6). The code path is in main.

Environment

Linux (niri, Wayland), Electron 40.6.0, AppImage. Node-variant sqlite client via @effect/sql-sqlite-*. DB (pre-prune): 584MB, 204,517 orchestration_events, 149,220 projection_thread_activities.

Workaround

There is no runtime workaround that lets startup scan less data without modifying persisted data. For manually pruning archived and soft-deleted threads and their traces from the SQLite file, the script below worked on one install. Quit T3 Code before running; it takes a .backup next to the live DB first, refuses to run if the DB is held open, and reports row counts before/after plus the size delta.

#!/bin/bash
set -euo pipefail

DB_DEFAULT="$HOME/.t3/userdata/state.sqlite"
DB="${1:-$DB_DEFAULT}"

for cmd in sqlite3 numfmt fuser; do
    if ! command -v "$cmd" >/dev/null 2>&1; then
        echo "Missing dependency: $cmd" >&2
        exit 1
    fi
done

if [[ ! -f "$DB" ]]; then
    echo "DB not found: $DB" >&2
    exit 1
fi

if fuser "$DB" >/dev/null 2>&1; then
    echo "Refusing: $DB is open by another process. Quit t3code first." >&2
    exit 1
fi

if ! sqlite3 "$DB" "BEGIN IMMEDIATE; ROLLBACK;" >/dev/null 2>&1; then
    echo "DB is locked. Something else is holding $DB open." >&2
    exit 1
fi

TS="$(date +%Y%m%d-%H%M%S)"
BACKUP="${DB}.prearchiveprune-${TS}"
if [[ -e "$BACKUP" ]]; then
    echo "Refusing to overwrite existing backup: $BACKUP" >&2
    exit 1
fi

echo "Backing up: $BACKUP"
sqlite3 "$DB" ".backup '$BACKUP'"

ARCHIVED_COUNT="$(sqlite3 "$DB" "SELECT COUNT(*) FROM projection_threads WHERE archived_at IS NOT NULL;")"
DELETED_COUNT="$(sqlite3 "$DB" "SELECT COUNT(*) FROM projection_threads WHERE deleted_at IS NOT NULL;")"
echo "Archived threads to prune:     $ARCHIVED_COUNT"
echo "Soft-deleted threads to prune: $DELETED_COUNT"
TOTAL=$((ARCHIVED_COUNT + DELETED_COUNT))
if [[ "$TOTAL" == "0" ]]; then
    echo "Nothing to do."
    exit 0
fi

SIZE_BEFORE="$(stat -c%s "$DB")"

sqlite3 "$DB" <<'SQL'
PRAGMA foreign_keys = OFF;
BEGIN IMMEDIATE;

CREATE TEMP TABLE archived_ids AS
    SELECT thread_id FROM projection_threads
    WHERE (archived_at IS NOT NULL OR deleted_at IS NOT NULL);

DELETE FROM projection_thread_activities
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM projection_thread_messages
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM projection_thread_proposed_plans
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM projection_thread_sessions
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM projection_turns
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM projection_pending_approvals
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM checkpoint_diff_blobs
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM provider_session_runtime
    WHERE thread_id IN (SELECT thread_id FROM archived_ids);

DELETE FROM orchestration_events
    WHERE aggregate_kind = 'thread'
      AND stream_id IN (SELECT thread_id FROM archived_ids);
DELETE FROM orchestration_command_receipts
    WHERE aggregate_kind = 'thread'
      AND aggregate_id IN (SELECT thread_id FROM archived_ids);

DELETE FROM projection_threads
    WHERE (archived_at IS NOT NULL OR deleted_at IS NOT NULL);

COMMIT;
SQL

echo "Checkpointing WAL and vacuuming..."
sqlite3 "$DB" "PRAGMA wal_checkpoint(TRUNCATE); VACUUM; PRAGMA integrity_check;"

SIZE_AFTER="$(stat -c%s "$DB")"
echo "DB size before: $(numfmt --to=iec --suffix=B "$SIZE_BEFORE")"
echo "DB size after:  $(numfmt --to=iec --suffix=B "$SIZE_AFTER")"
echo "Backup kept at: $BACKUP"

Possible directions

A few ideas, though this is best left to the maintainers since there is nuance to tuning the read model in a way that matches T3 Code's performance goals.

  1. Physically prune (or at least stop decoding at cold start) rows for threads with deleted_at IS NOT NULL. Soft-deletes do not need to be part of the startup read model; the user has already decided those threads are gone.
  2. Skip archived threads at startup and load them lazily when the archive view is opened. The boundary is already persisted on projection_threads.archived_at (migration 017).
  3. Cap the per-thread window loaded at startup (latest N activities/messages per thread) so cost becomes O(threads), not O(history).
  4. Persist the read-model snapshot and incrementally apply events since, so cold starts do not re-decode the full history.
  5. Independently, decouple desktop window creation from the HTTP-ready signal so any slow startup step does not keep the window from existing.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions