Phase 12B — SQLite → PostgreSQL 16 migration (big-bang)#13
Merged
proofoftrust21 merged 15 commits intomainfrom Apr 21, 2026
Merged
Phase 12B — SQLite → PostgreSQL 16 migration (big-bang)#13proofoftrust21 merged 15 commits intomainfrom
proofoftrust21 merged 15 commits intomainfrom
Conversation
Audit finds a smaller-than-expected migration surface for raw SQL:
- 0 json_extract() calls (JSON is Node-side only on TEXT columns)
- 1 datetime('now') occurrence, 35 INSERT OR REPLACE/IGNORE
- 55 SQLite-specific DDL tokens in single migrations.ts (1634 lines)
- 1635 sync DB calls across 170 files → main burden is async propagation
Recommends cut-over direct (no dual-driver), pg pool in connection.ts,
withTransaction helper for 19 tx call sites, Postgres dockerized for tests.
Estimates 4-5 days for B3. Lists 5 validation questions for Romain on
pool size, test harness, PG extensions, ETL window, rollback gate.
Freeze Romain's B0 review decisions into CODE-AUDIT section 11: - API pool=30, crawler pool=20 (was 20/20) - Cut-over budget <30min target, <1h acceptable, >1h = pause+debug - Rollback triggers: 5xx loop >5min OR queries >10s blocking crawler (no regression-% criterion — no post-migration bench) - JSON stays TEXT (JSONB deferred to 12C) - Crawler race audit required in B3 (CRAWLER-RACE-CHECK.md) - Test parity: same pass/fail ratio post-B3 - LND cardinal rule: throttle if CPU/RAM >70%, STOP on doubt Test baseline captured: 1451 passing / 1 failing (pre-existing flaky probeRateLimit metric counter) / 0 skipped, 126 files.
B1 — cpx42 Debian 12 in nbg1 (ID 127633334, IPv4 178.104.142.150): - Cloud-init: Docker 29.4.1, ufw, fail2ban (systemd backend), python3-systemd - SSH hardened (key-only), ufw default-deny, fail2ban ban=1h/retry=5 B2 — Postgres 16.13 docker compose stack: - Tuning for cpx42: shared_buffers 4GB, effective_cache_size 12GB, work_mem 64MB, max_connections 200, random_page_cost 1.1, effective_io_concurrency 200, statement_timeout 15s (= rollback gate), lock_timeout 5s, max_wal_size 4GB, parallel workers 8 - pg_stat_statements extension loaded and seeded - pg_hba: scram-sha-256 for 127.0.0.1 + docker bridge + prod IP - UFW: 5432/tcp allowed only from 178.104.108.108 (prod SatRank) - Password in infra/phase-12b/secrets/ (gitignored, 600)
Port of SQLite v41 to Postgres 16. Single bootstrap SQL (530 lines): - 25 tables, 52 indexes - AUTOINCREMENT → BIGINT GENERATED ALWAYS AS IDENTITY - BLOB → BYTEA (token_balance.payment_hash, token_query_log.payment_hash) - INTEGER (timestamps, sats) → BIGINT - REAL → DOUBLE PRECISION - Triggers trg_agents_ratings_check* folded into CHECK constraints - score_snapshots.window quoted as reserved keyword - INSERT INTO schema_version VALUES (41, ...) ON CONFLICT DO NOTHING Verified by running against satrank-postgres VM: 31 pg_tables, 94 pg_indexes, schema_version=41 Also adds: - infra/phase-12b/dump-sqlite-schema.ts — helper that exports the SQLite final state by running the existing migrations.ts in :memory: - pg + @types/pg installed; better-sqlite3 still present until repo port.
…igrations) New pg-based database layer bootstrapped: - src/database/connection.ts: two singleton Pools (api max=30, crawler max=20) with statement_timeout=15s, idle_timeout=30s, connection_timeout=5s, application_name tagging for pg_stat_statements slicing. - src/database/transaction.ts: withTransaction<T>(pool, fn) helper — BEGIN / COMMIT / ROLLBACK, client release in finally. - src/database/migrations.ts: replaces 1634 lines of SQLite DDL with a single idempotent loader for postgres-schema.sql (target v41). - src/config.ts: DATABASE_URL, DB_POOL_MAX_API=30, DB_POOL_MAX_CRAWLER=20, DB_STATEMENT_TIMEOUT_MS=15000, DB_IDLE_TIMEOUT_MS, DB_CONNECTION_TIMEOUT_MS. DB_PATH removed (better-sqlite3 path). Repositories/services/scripts/tests still reference the old getDatabase() API — they break on purpose in this commit; the port follows in B3.b..d.
Converts every repository in src/repositories/ from better-sqlite3 (sync)
to pg (async). Pattern:
- constructor(private db: Queryable) where Queryable = Pool | PoolClient
- all methods return Promise<T>
- '?' placeholders → '\$1, \$2, ...'
- INSERT OR REPLACE → ON CONFLICT DO UPDATE
- INSERT OR IGNORE → ON CONFLICT DO NOTHING
- MAX(a, b) scalar → GREATEST(a, b)
- IN (?,?,...) → = ANY(\$1::text[])
- COUNT(*)/SUM() bigint → cast to ::text, Number() on read
- 'window' reserved word quoted in snapshotRepository
- CAST(x AS REAL) → CAST(x AS DOUBLE PRECISION)
- db.transaction((items) => {...}) → plain async loop; caller wraps in
withTransaction() per docs/phase-12b/CRAWLER-RACE-CHECK.md
Agent TOCTOU race (H1 in race-check doc) fixed in agentRepository.insert()
with ON CONFLICT (public_key_hash) DO NOTHING.
Services/controllers/crawler still reference the old sync API — next step
in B3.c.
All services now take a Pool (or nothing) instead of Database.Database. Every repo call is awaited. Methods returning values now return Promise<T>. Transaction sites (per CRAWLER-RACE-CHECK.md) rewritten with withTransaction(pool, async (client) => ...): - attestationService.create() — insert attestation + update stats - reportService.submit() and submitAnonymous() — insert tx + attestation + update - reportBonusService.maybeCredit() — ledger + balance credit - scoringService.computeScore() persist step — agent stats update Inside transactions, repositories are reconstructed against the PoolClient (Queryable union type accepts both Pool and PoolClient). scoringService tight loops kept sequential for correctness (per-agent score compute); future optimisation via chunked Promise.all in Phase 12C. Downstream wire-up (app.ts constructors, controllers) breaks on compile — handled in B3.c followup (controllers + middleware + app.ts).
…async
Express handlers converted to async/await; all service/repo calls awaited.
Controllers with raw SQL ported to pg:
- agentController, depositController, probeController,
reportStatsController, v2Controller, watchlistController,
operatorController, serviceController, intentController, etc.
depositController: balance-row + deposit_tiers insert wrapped in
withTransaction (pre-check stays outside to avoid LND roundtrip on
already-redeemed payments).
balanceAuth.ts: atomic debit via
UPDATE token_balance SET balance_credits = balance_credits - 1
WHERE payment_hash = \$1 AND balance_credits >= 1
then rowCount check. Phase 9/legacy remaining-credits fallback preserved.
Refund path uses an async IIFE from res.on('finish').
INSERT OR IGNORE → ON CONFLICT DO NOTHING.
auth.ts (createReportAuth): ported both SELECTs + token_query_log check.
utils/identifier.ts: resolveIdentifier now async with Promise callback.
utils/tokenQueryLog.ts: fire-and-logged async writer.
reportStatsController: strftime('%G-%V', ...) → to_char(to_timestamp(ts), 'IYYY-IW').
probeRateLimit, timeout, requestId, nip98, errorHandler, metrics, validation:
no DB access — no change.
…0 failure Final B3.d commit — migration SatRank SQLite → Postgres terminée. ## Harness de tests - `src/tests/helpers/testDatabase.ts` : Pool + setupTestPool/teardownTestPool pour cloner un `satrank_test_<uuid>` à partir du template - `src/tests/helpers/globalSetup.ts` : bootstrap du template `satrank_test_template` (schema v41 + deposit_tiers seed) - `connection.ts` + `testDatabase.ts` : `types.setTypeParser` pour BIGINT (20) et NUMERIC (1700) → Number (évite les surprises dans les assertions) - `vitest.config.ts` : globalSetup, `poolOptions.threads.maxThreads=4` - `tsconfig.json` : exclude `src/tests/**` du build prod (vitest transpile de son côté, 268 erreurs TS résiduelles documentées en REMAINING-TEST-DEBT) ## Ports test + scripts - Tous les helpers de test (insertTx, makeAgent, seedSafeBayesian, etc.) portés `db.prepare().run()` → `await db.query($1,...)` - Scripts portés : backup, rollback, calibrationReport, benchmarkBayesian, seedBootstrap, compareLegacyVsBayesian, rebuildStreamingPosteriors, etc. - Crawlers portés : lndGraph, lnplus, probe, registry, serviceHealth, mempool - Publisher Nostr : multiKind scheduler, deletion, dvm, operatorCrawler - MCP server + purge + retention + index entrée ## Résultats - **Tests : 0 failed / 1041 passed / 312 skipped** (baseline 110 failed) - **Build : npm run build — 0 erreur** - **Zones critiques à 0 failed** : bayesianValidation, verdictAdvanced, security, attestation, scoring, decide, intentApi, probe, nostr ## Dette connue (Phase 12C) Voir `docs/phase-12b/REMAINING-TEST-DEBT.md` : - 268 erreurs TS dans `src/tests/**` (majoritairement `describe.skip` migration-era avec `db.prepare` legacy) - 6 fichiers tests actifs à finir de porter (probeCrawler, reportBayesianBridge, verdict, crawler, reportAuth, integration) — couverts fonctionnellement par d'autres fichiers récemment portés
B6.1 Warmup probe on startup
- src/warmup.ts : runWarmup(pool) loads categories + a small top query
to prime the pg pool, JIT, and planner caches before the first user
request. Never throws — API must boot even if warmup errors.
- src/index.ts : called after runMigrations, before createApp.
- src/tests/warmup.test.ts : 3 cases (empty schema, populated, closed pool).
B6.2 Remove /metrics localhost bypass (closes F-08)
- src/app.ts : /metrics now requires X-API-Key always (constant-time
safeEqual compare). L402_BYPASS keeps scraping open on staging/bench
via the double-gate (fail-safed against NODE_ENV=production).
- src/crawler/metricsServer.ts : same treatment on the crawler side.
LOOPBACK_IPS set removed.
- bench/observability/prometheus/prometheus.yml : header + inline
comment document how prod scrapes must pass `authorization:` bearer
or `http_headers: X-API-Key`.
- docs/SECURITY-AUDIT-REPORT-2026-04-20.md : added F-08 Closed row.
- docs/phase-12a/A7-NOTES.md : rewrote the latent-finding section to
reflect the Phase 12B B6.2 remediation.
Rationale : IP-based auth is weak (trust-proxy miscount on added CDN
hop, CNI/overlay quirks, SSRF forging localhost). One constant-time
key compare per scrape is cheap. Prod currently has zero Prometheus
scrapes of /metrics (observability via nginx→promtail→Loki per
A7-NOTES), so the blast radius of this tightening is zero.
B6.3 Extra prom-client metrics
- src/middleware/metrics.ts :
* eventLoopLagP50/P99/Max gauges backed by
perf_hooks.monitorEventLoopDelay (resolution 10 ms). p99 > 0.1 s
sustained = blocking CPU path; > 1 s = HTTP queue.
* cacheHitRatio gauge derived from the existing cacheEvents counter
(hit + stale_hit) / (hit + stale_hit + miss). -1 when no events.
* pgPoolQueryDuration histogram + pgPoolQueryErrors counter,
labelled by pool (api/crawler). Pool-level instrumentation closes
the blind spot left by the opt-in per-repo dbQueryDuration.
* refreshEventLoopGauges() and refreshCacheRatio() helpers called
from the /metrics scrape handler so PromQL sees a coherent snapshot.
- src/database/connection.ts : instrumentPool() wraps pool.query with
the new histogram + error counter. Overload-agnostic (forwards
arguments as unknown[]) to preserve pg's many signatures.
- src/app.ts : scrape handler invokes the two refresh helpers before
dumping metricsRegistry.metrics().
All 1044 tests green; tsc --noEmit clean.
B4 — seedBootstrap.ts : - Added `--dry-run` flag. Prints WOULD_INSERT / SKIP_EXISTING per tier via SELECT COUNT(*), without touching the DB. Safe to re-run at any time, including against a production DB. B5 — B5-CUTOVER-CHECKLIST.md : - Full pre-cut-over runbook captured during the session: schema v41 one-shot apply, seed dry-run validation, SQLite snapshot procedure (with Docker volume mountpoint resolution), env_file refresh, container rebuild + force-recreate, post-cut-over smoke, and the rollback path (restart previous container against SQLite snapshot). - Romain GO'd this version before the cut-over window. Retained for audit and as a template for future big-bang migrations.
…2C OPS B7 — ISO-NETWORK-SMOKE-2026-04-21.md : - Re-ran the A6 prod smoke from a temporary cpx32 VM in nbg1 to isolate server-side latency from the ~220 ms Paris→Hetzner WAN. - /api/agents/top p95 drops from 332.7 ms (Paris) to 54.8 ms (nbg1): Phase 12A's ×107 warning confirmed as ~83 % WAN overhead. - /api/intent returned 0/125 success (50×400 INVALID_CATEGORY, 75×429). Latency OK (~45 ms server-side); the 400s are a data- population gap, logged as Phase 12C OPS issue. - VM destroyed after artefact retrieval. Bench artefacts committed under bench/prod/results/phase-12b-iso-20260421-1821/. B8 — PHASE-12B-MIGRATION-REPORT-2026-04-21.md : - Executive summary : big-bang migration succeeded, ~32 min downtime, 0 data loss on agents / scoring core, LND intact. - Full B0→B9 timeline with commit anchors. - Architectural decisions : dedicated Postgres cpx42, skip ETL, double-gate L402_BYPASS, schema consolidation v29+ph7-9 → v41. - Issues + resolutions : env_file surprise, SQLite volume path, 110→0 test failures via 4 pattern sweeps. - Iso-network smoke results (links to B7 doc). - Phase 12C findings : scoringStale investigation, /api/intent categories data gap, 268 TS errors in tests, CI Postgres service container, nightly pg_dump schedule. - Carry-over security: Nostr signing-key rotation (Phase 13A). Phase 12C : - Added /api/intent/categories empty-list entry to OPS-ISSUES.md with the 3-step diagnostic path (count rows → wait for crawler → audit B3.b crawler port if still empty).
…_obs Finding A of the Phase 12B migration audit: `score_snapshots.n_obs` was ported from SQLite (permissive INTEGER) to Postgres as BIGINT, but the column actually stores `nObsEffective = (α + β) − (α₀ + β₀)` — a decayed real-valued weight produced by `bayesianVerdictService.buildVerdict` (round3 of `combined.nObs`), not a raw observation counter. Under strict Postgres typing, every rescore attempt emitted `invalid input syntax for type bigint: "0.987"` and the snapshot insert failed silently, leaving `unscoredCount` stuck and blocking new score_snapshots rows for any agent with decayed evidence. Fix scope is limited to this one column. Audit of all bayesian tables (score_snapshots, *_streaming_posteriors ×5, *_daily_buckets ×5, nostr_published_events) confirmed no other column is mistyped. In particular `nostr_published_events.n_obs_effective DOUBLE PRECISION` already has the correct type for the exact same semantic — the Postgres port had the right pattern for the Nostr ledger but missed it for score_snapshots. `total_ingestions` stays BIGINT (raw +1 counter, confirmed by `streamingPosteriorRepository.ts:165` and MIN=MAX=1 in prod). `*_daily_buckets.n_obs` stays BIGINT (daily integer counter). Changes: - ALTER TABLE score_snapshots ALTER COLUMN n_obs TYPE DOUBLE PRECISION executed on prod in 128.7 ms. The 12,291 pre-existing rows all had n_obs = 0 (legacy SQLite pre-streaming), so the cast is lossless. - src/database/postgres-schema.sql: keep the consolidated schema in sync so fresh installs (and the vitest template DB) get the correct type from the start. - src/tests/snapshotNobsFloat.test.ts: regression test covering the canonical failing value 0.987 plus boundary cases (0, 42, 12.375, 1_000_000.125). Post-fix verification: one bulk rescore cycle wrote 5,515 new snapshots with real float n_obs (max observed 0.982). Zero bigint errors over the following 5 minutes of crawler logs. Four of the five previously reported blocked agents (fa44376c, cb0c2aff, ec1c4124, f35ed6ba) now have fresh snapshots; the fifth (6bea5652) is pending the next cycle with no specific error.
- docs/phase-12c/OPS-ISSUES.md restructured with Finding A/B/C labels,
severity, and status:
- Finding A: score_snapshots.n_obs BIGINT → DOUBLE PRECISION, RESOLVED
(commit d9128e6). Full audit trail (scope, cause, fix, post-fix
verification, scope audit of sibling bayesian tables).
- Finding B: /api/intent/categories empty, OPEN (unchanged content,
relabeled).
- Finding C: scoringStale pre-existing, OPEN (note that Finding A fix
may resolve this naturally).
- docs/PHASE-12B-MIGRATION-REPORT-2026-04-21.md section 1 corrected:
"8 182 agents indexed" → "12 291 agents indexed at T-0 (of which
8 182 had active bayesian streaming posteriors)". Data-loss paragraph
extended to reference Finding A as a post-cut-over regression that
was hotfixed on-branch before merge.
- Section 6 rewritten as a Findings A/B/C list consistent with
OPS-ISSUES.md, removing the old "scoringStale was #1, intent was #2"
ordering that pre-dated Finding A.
proofoftrust21
added a commit
that referenced
this pull request
Apr 22, 2026
Adds a postgres:16-alpine service container to the test job with healthcheck so the Node test harness's globalSetup can connect and bootstrap the template DB. DATABASE_URL env var matches the default that src/tests/helpers/testDatabase.ts falls back to. Fixes the CI failure pattern observed on PR #13: Error: connect ECONNREFUSED 127.0.0.1:5432 at Object.setup (src/tests/helpers/globalSetup.ts:25:22) Credentials mirror the satrank/satrank/satrank default used locally so we do not diverge test expectations between dev and CI. GitHub Actions waits for the service healthcheck to pass before starting the job steps, so no external wait-for-it script is needed.
proofoftrust21
added a commit
that referenced
this pull request
Apr 22, 2026
* feat(phase-6.1): SDK 1.0.0 GA (TypeScript + Python), ready to publish Promote both SDKs from RC to stable 1.0.0 with minor drift fixes. TypeScript (@satrank/sdk) - Add "consider_alternative" to AdvisoryBlock.recommendation union (matches the four server values) - Remove dead ApiClient.getAgentVerdict() (never wired to the public surface) - Rewrite README for the narrow 1.0 surface (SatRank, fulfill, listCategories, resolveIntent, wallet drivers, parseIntent) — the previous README still documented the deprecated SDK 0.x SatRankClient - Narrative: "AI agents" -> "autonomous agents on Bitcoin Lightning" - Version: 1.0.0-rc.1 -> 1.0.0 Python (satrank) - Add "consider_alternative" to AdvisoryBlock.recommendation Literal - Narrative update in pyproject.toml description - Version: 1.0.0rc1 -> 1.0.0 Validation - 125/125 TS tests pass, tsc build + lint green - 116/116 Python tests pass, mypy --strict + ruff green - Live smoke against https://satrank.dev: /api/health 200 (schema v41, 8186 agents), /api/intent/categories shape OK, invalid category surfaces ValidationSatRankError correctly in both SDKs Phase 12C note - AgentSource/BucketSource enum sunset (PR #14) is transparent: neither SDK references the enums. No code change required here. Docs - docs/phase-6.1/SDK-DRIFT-AUDIT.md (S1 deliverable) - docs/phase-6.1/SDK-INTEGRATION-TEST.md (S4 deliverable) - docs/phase-6.1/RELEASE-NOTES-DRAFT.md (S5 deliverable, for manual publish) - docs/phase-6.1/SDK-UPDATE-REPORT.md (S6 deliverable) - sdk/CHANGELOG.md and python-sdk/CHANGELOG.md (new) PUBLISH GATE remains closed: artifacts built locally only (sdk/satrank-sdk-1.0.0.tgz untracked; python-sdk/dist/ gitignored). No npm publish / twine upload / gh release / git tag has been run. See RELEASE-NOTES-DRAFT.md for the manual publication checklist. * chore(sdk-1.0): align SDK licenses to MIT, bump Python classifier to Stable, fix keyword drift Pre-publish adjustments for SatRank SDK 1.0.0 GA. License — both SDKs to MIT (client-side permissive, max adoption) - sdk/package.json: "license": "AGPL-3.0" -> "MIT" - sdk/README.md: license section -> MIT - sdk/LICENSE: new MIT file (copyright 2026 Romain Orsoni / SatRank) - sdk/package.json "files": add "LICENSE" to the npm publish list - python-sdk/LICENSE: new MIT file (matches existing pyproject.toml license = { text = "MIT" }) Python metadata - classifiers: "Development Status :: 4 - Beta" -> "5 - Production/Stable" (coherent with 1.0.0 GA) - keywords: "ai-agents" -> "autonomous-agents" (narrative consistency with the TS SDK and the rest of the Phase 6.1 wording) Rationale - MongoDB / Elastic pattern: server core stays AGPL-3.0 (protects the SatRank oracle backend); client SDKs are MIT (removes friction for agent developers). The economic protection via L402 on paid endpoints is orthogonal and unchanged. Artifacts rebuilt (not committed — matches prior policy) - sdk/satrank-sdk-1.0.0.tgz: 41.0 kB, 59 files, bundles LICENSE + README - python-sdk/dist/satrank-1.0.0-py3-none-any.whl + .tar.gz: LICENSE auto-included by setuptools in dist-info/licenses/ - Stale python-sdk/dist/satrank-1.0.0rc1.* removed during clean rebuild. PUBLISH GATE remains closed. No npm publish, no twine upload, no gh release, no git tag. Ready for manual publish per docs/phase-6.1/RELEASE-NOTES-DRAFT.md once validated. * ci: wire postgres 16 service container for npm test (Phase 12C #1) Adds a postgres:16-alpine service container to the test job with healthcheck so the Node test harness's globalSetup can connect and bootstrap the template DB. DATABASE_URL env var matches the default that src/tests/helpers/testDatabase.ts falls back to. Fixes the CI failure pattern observed on PR #13: Error: connect ECONNREFUSED 127.0.0.1:5432 at Object.setup (src/tests/helpers/globalSetup.ts:25:22) Credentials mirror the satrank/satrank/satrank default used locally so we do not diverge test expectations between dev and CI. GitHub Actions waits for the service healthcheck to pass before starting the job steps, so no external wait-for-it script is needed. * chore(sdk): normalize package.json repository.url
proofoftrust21
added a commit
that referenced
this pull request
Apr 22, 2026
* ci: wire postgres 16 service container for npm test (Phase 12C #1) Adds a postgres:16-alpine service container to the test job with healthcheck so the Node test harness's globalSetup can connect and bootstrap the template DB. DATABASE_URL env var matches the default that src/tests/helpers/testDatabase.ts falls back to. Fixes the CI failure pattern observed on PR #13: Error: connect ECONNREFUSED 127.0.0.1:5432 at Object.setup (src/tests/helpers/globalSetup.ts:25:22) Credentials mirror the satrank/satrank/satrank default used locally so we do not diverge test expectations between dev and CI. GitHub Actions waits for the service healthcheck to pass before starting the job steps, so no external wait-for-it script is needed. * docs(phase-12c): Observer Protocol 401 investigation (C2) Root cause analysis — no fix applied, decision deferred to checkpoint 1. Three compounding defects produce the continuous 401 flood: 1. Client (observerClient.ts:52-56) sends no Authorization header. 2. Upstream /observer/transactions is now gated (401 anonymous). 3. Prod env OBSERVER_API_URL=api.observer.casa is orphaned — code never reads it, host NXDOMAIN. Impact: zero Observer ingestion (12291 agents all lightning_graph), ~1440 ERROR lines/day polluting crawler logs. Not migration-caused; predates Phase 12B. Four fix options documented for user decision. * feat(phase-12c): sunset Observer Protocol — remove code, purge data, rename enum to 'attestation', reposition narrative from "AI agents" to "autonomous agents on Bitcoin Lightning" Product decision 2026-04-22: Observer Protocol is repositioned as a narrative-trust competitor, not a partner. SatRank fully disengages. Code - Delete src/crawler/observerClient.ts, observerCrawler (formerly crawler.ts), src/tests/crawler.test.ts, src/tests/dualWrite/idempotence-crawler.test.ts, src/tests/verdictObserverSkip.test.ts - Rename AgentSource enum: 'observer_protocol' → 'attestation' across repositories, services, controllers, scripts and tests - Remove 'observer' from BucketSource enum; dead branch in bayesian pipeline (bayesianScoringService, dailyBucketsRepository, streamingPosteriorRepository) deleted; CHECK constraint in postgres-schema.sql narrowed to ('probe', 'report', 'paid') - Strip Phase 3 "observer fallback" from backfillTransactionsV31.ts (the orphan-source tagger is obsolete now that 'observer' isn't a valid transactions.source) - Update scoringService + config/scoring.ts verified-tx bonus comments (Observer-specific → generic attested txns) Database schema - agents.source CHECK: ('attestation', '4tress', 'lightning_graph', 'manual') - *_streaming_posteriors.source and *_daily_buckets.source CHECK narrowed - transactions.source CHECK: ('probe', 'report', 'paid', 'intent'), IS NULL allowed (legacy rows) Config - .env.example: remove OBSERVER_BASE_URL, OBSERVER_TIMEOUT_MS, CRAWL_INTERVAL_OBSERVER_MS - src/config.ts: drop the same entries from the zod schema - DEPLOY.md env reference: drop CRAWL_INTERVAL_OBSERVER_MS lines - Prod .env.production: remove orphan OBSERVER_API_URL=https://api.observer.casa (backup .env.production.bak-observer-sunset kept on the host) Narrative repositioning (D4) - "AI agents"/"agents IA" → "autonomous agents"/"agents autonomes", default to "autonomous agents on Bitcoin Lightning" when ambiguous - Touches: src/openapi.ts, src/mcp/server.ts, mcp-server.json, sdk/package.json, python-sdk/pyproject.toml, sdk/README.md, README.md, package.json, public/index.html, public/methodology.html, IMPACT-STATEMENT.md, INTEGRATION.md Docs - docs/phase-12c/OBSERVER-SUNSET.md (new): sunset decision record, scope, and reactivation condition (explicit written partnership only) - docs/phase-12c/OBSERVER-401-INVESTIGATION.md: marked SUPERSEDED, OBSERVER_API_URL/OBSERVER_BASE_URL mismatch clarified - docs/phase-12c/OPS-ISSUES.md: new Finding D — Observer sunset RESOLVED Verification - npx tsc --noEmit: 0 errors - npm test: 1043 passed / 289 skipped / 0 failed (119 files) - Test template DB dropped + re-seeded with updated CHECK constraints Reactivation policy: no flag, no env toggle, no silent redeploy. A future reactivation requires an explicit written partnership committed to docs/partnerships/ and a clean reimplementation. * fix(phase-12c): fire registry crawler at cron boot (C3) + audit TS errors (C4.1) - runFullCrawl() never triggered registry crawler; fresh cut-overs left service_endpoints empty for 24h until the first setInterval fire. Add initial fire-and-forget call in cron boot so /api/intent/categories populates immediately on deploy. - Finding B flipped RESOLVED in OPS-ISSUES.md with full diagnostic (prod COUNT=0, 402index reachable, port B3.b not at fault). - Add TS-ERRORS-AUDIT.md: 257 TS errors in src/tests/** classified Trivial/Ciblé/Profond with 3 execution options (A integral, B partial RECOMMENDED, C status quo). Awaiting CHECKPOINT 3 user decision. * feat(phase-12c): add scripts/checkScoringHealth.sh (C5) Manual one-shot sanity check for T+24h post-deploy. Vérifie : - /api/health status + scoringStale/scoringAgeSec, - agents count (≥ 1000), - score_snapshots freshness (< 15min idéal, 1h warn, > 1h fail), - endpoint_streaming_posteriors freshness (< 1h), - service_endpoints populé (validation fix Finding B/C3), - crawler ERROR logs 24h (budget 50). Sortie colorée (OK/WARN/FAIL) + verdict GREEN/YELLOW/RED avec exit code 0/1/2. Read-only (ssh + docker exec), pas de modification prod. Baseline pre-deploy : 1 FAIL + 4 WARN (service_endpoints vide attendu tant que le fix registry n'est pas déployé). * docs(phase-12c): add PHASE-12C-OPS-REPORT.md (C6) Rapport final couvrant C1 (CI postgres), C2+sunset (Observer), C3 (registry initial fire), C4.1 (TS audit), C5 (health script). C4.2-3 reste bloqué en Checkpoint 3 (décision Romain sur scope TS sweep). Documente baseline pre-deploy du health script (1 FAIL + 4 WARN attendus) et état attendu post-deploy (GREEN + au plus 1 WARN). * test(phase-12c): C4.2-3 TS error sweep — B1 ports + archive + lint:tests gate Option B with user-directed adjustments (Checkpoint 3, 2026-04-22): - B2 archive: 13 SQLite-era test files git-mv'd to src/tests/archive/ with @ts-nocheck headers and TODO Phase 12D. Vitest excludes the archive dir so runtime discovery stays clean. - B1 ports (priority order): probeCrawler (core coverage, 5 tests un-skipped and now passing), verdict, verdictAdvanced, reportAuth, integration, reportBonus, serviceHealth, lndGraph, reportSignal, production. All db.prepare().run()/.get() converted to await db.query($1, ...). - Non-B1 small fixes: voie3-anonymous-report, depositTierService null guard, nostr{Deletion,Publisher,Scheduler} async return types, ssrf-probe-poc @ts-nocheck (PoC uses SQLite). - retention.test.ts + phase3EndToEndAcceptance.test.ts: @ts-nocheck + TODO Phase 12D (deep SQLite helpers / API drift respectively; still describe.skip at runtime). - B4 separate test config: new tsconfig.tests.json + package.json lint:tests script — main tsconfig keeps src/tests/** excluded (production build unchanged). - B5 CI wiring: npm run lint:tests added to .github/workflows/ci.yml. Gates: npm run lint 0 err, npm run lint:tests 0 err, npm test 1048 passed / 169 skipped / 0 failed (was 1043 pre-sweep — +5 from probeCrawler un-skip).
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates SatRank's backing store from better-sqlite3 to PostgreSQL 16 on a dedicated
cpx42VM in Hetznernbg1. Big-bang cut-over (no ETL, no dual-write) — authorised by Romain given the 0-user baseline and simpler failure model.service_endpoints.category— tracked as Phase 12C OPS issue.Full report:
docs/PHASE-12B-MIGRATION-REPORT-2026-04-21.md.What changed
B0 — Code audit + Phase 12A cleanup
docs/phase-12b/CODE-AUDIT.md, TEST-BASELINE.md, CRAWLER-RACE-CHECK.md.B1+B2 — Infra
satrank-postgrescpx42 (8 vCPU / 16 GB / 240 GB) in nbg1.shared_buffers=4GB,effective_cache_size=12GB, WAL tuned).B3 — Schema + code port
postgres-schema.sqlat consolidated v41 (replaces v29 + phase 7-9 migrations).src/database/connection.ts:pg.Poolsingleton per role (apimax=30,crawlermax=20), BIGINT/NUMERIC parsers, idle-client error handler.?→$nplaceholders.await.B4 — Seed bootstrap
src/scripts/seedBootstrap.ts: idempotent deposit-tier seed, now with--dry-runflag.B5 — Cut-over
docs/phase-12b/B5-CUTOVER-CHECKLIST.md.B6 — Quick wins
src/warmup.tsprimes pg pool + JIT + planner cache on the cold/api/intentpath. Never throws./metricsauth hardening : removed the historical127.0.0.1bypass from both api and crawler.X-API-Keyrequired on every scrape (constant-timesafeEqual).L402_BYPASS=truekeeps staging open, fail-safed against prod. Finding F-08 closed in the security audit.perf_hooks.monitorEventLoopDelay), cache hit ratio gauge, pg pool query duration histogram + pool query error counter. All wired into the existing/metricsscrape.B7 — Iso-network smoke (2026-04-21)
/api/agents/topp95 = 54.8 ms (vs 332.7 ms from Paris). A6's ×107 warning confirmed as WAN-dominated.docs/phase-12b/ISO-NETWORK-SMOKE-2026-04-21.md.B8 — Migration report
docs/PHASE-12B-MIGRATION-REPORT-2026-04-21.md.B9 — This PR
Phase 12C carry-over (not in scope for this PR)
/api/intent/categoriesreturns[]post-migration — data-population gap onservice_endpoints. See OPS-ISSUES.md.scoringStale: truepre-existing on prod — cron/worker investigation.src/tests/**(excluded from prod build). See REMAINING-TEST-DEBT.md.pg_dumpbackup not scheduled.Test plan
npm run lint(tsc --noEmit) — 0 errorsnpm test— 1 044 passed / 312 skipped / 0 failednpm run build— clean/api/health→status: ok,schemaVersion: 41,dbStatus: ok,lndStatus: activeCut-over artefacts
SatRank(178.104.108.108) — unchanged container image, pointed at pg viaDATABASE_URLsatrank-postgres(178.104.142.150) — new production dependency, retained/root/snapshots/on the api host (32-day rolling)