Skip to content

Handover prep: cleanup, signed CI/CD, runbook, god-file split, ENV.md, admin lockdown#23

Merged
TeoSlayer merged 12 commits into
mainfrom
feat/site-search-and-submit
Jul 1, 2026
Merged

Handover prep: cleanup, signed CI/CD, runbook, god-file split, ENV.md, admin lockdown#23
TeoSlayer merged 12 commits into
mainfrom
feat/site-search-and-submit

Conversation

@TeoSlayer

Copy link
Copy Markdown
Contributor

Prepares cosift for a clean, confidentiality-preserving handover to an outside agency. Tree is green on build/vet/test/gofmt/deadcode.

Highlights

  • Cleanup (c77c106): remove tool-confirmed dead code (duplicate hnsw_writer.go, dead Federation config, 5 dead fields), fix the one go vet copylock, fix the failing TestRunAnswerEvalBadCorpus, dedup the vector dot-product, drop 7 orphaned v0 systemd units.
  • Signed pull-based CI/CD (f100bae): deploy.yml builds a signed (sha256 + minisign) arm64 release; a box-side self-updater verifies + atomically swaps + health-gates + auto-rolls-back — keeps the SSH key out of CI. Fixes the stale Makefile module path so version stamping works.
  • HANDOVER.md (68d5a85): secret-ownership matrix, confidentiality cutover, deploy model, principal-passthrough design, control-surface reference.
  • God-file split (ce64cb0): pebble_serve.go (8.1k) + main.go (6.5k) → 14 logical files. AST-verified byte-identical declarations — a pure-motion no-op.
  • CI gofmt gate (e3bb6a7) + ENV.md (d527465, 64 vars) + admin-token wiring for cosift-compact.sh (b215334).

Operational (done on the box, not in this diff)

  • Public /admin/* + /debug/* leak closed at Caddy (was serving raw queries + IPs unauthenticated).
  • Raw query logging disabled + existing log archived off the readable path (confidentiality).
  • OpenAI key removed (prod uses local models).
  • peer_auth_token set (defense-in-depth).

See docs/HANDOVER.md for the remaining cutover checklist (minisign keypair, passthrough host, agency account model).

teovl and others added 10 commits June 17, 2026 15:30
Allows answer-eval to run against local inference endpoints (vLLM,
Ollama-compatible) without an OpenAI key. -embed-url overrides the
embedding client URL; -chat-url overrides both the synth and judge
chat client URL.

API-key guard relaxed: only errors when all of embed-url, chat-url,
and OPENAI_API_KEY are absent.
…vesters

Search quality:
- per-host inverted-index partition (SearchInHost) + backfill — site= O(site_docs)
- BM25 MaxScore pre-check; zero-overlap site boost; rerank cap for site=
- dynamic runtime allowlist (/admin/allow-domain) for organic HN/Reddit growth

New capabilities:
- /find: live resource federation (HuggingFace + GitHub + PyPI) with LLM planner
  — fixes known-item lookup the indexed corpus is weak at
- query logging (qlog middleware → JSONL, X-Cosift-Query-Id) + /admin/query-log
- feedback loop: POST /feedback {query_id,rating,reason} + /admin/feedback?summary=1
  (reward/penalize signal joined to the query log), per-client rate limited

Ingest:
- WET bulk import skips host-partition writes; crawler bulk-index fast path
- testdata/eval/queries-realworld.json — representative intent-spanning eval set

Tests: host_partition, dynamic_allowlist, querylog/tailLines, inject.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pulled from the GH200 so the whole crawl/index content pipeline is reproducible
from git, not only on the box. 12 harvesters, 22 systemd units, 3 source lists,
README documenting sources/cadence/env.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pre-handover cleanup; tree now green on build/vet/test/gofmt/deadcode.

- gofmt 6 files; fix TestRunAnswerEvalBadCorpus (self-contained tempdir fixture)
- fix go vet copylocks at pebble_serve.go (pass index by param, drop tmp := *s)
- delete duplicate internal/index/hnsw_writer.go + test (unused in prod)
- remove dead Federation config, 5 dead struct fields, openaiMock.SetEmbedDim
- simplify 2 unparam signatures; dedup vector dot (export index.Dot, drop cosineUnit)
- remove 7 orphaned v0 systemd units in scripts/ (superseded by deploy/systemd/)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- fix Makefile ldflags module path (calinteodor -> pilot-protocol); version stamping now lands
- ci.yml: add go vet + linux/arm64 cross-compile build
- deploy.yml: build signed arm64 release (sha256 + minisign) on tag
- box self-updater (deploy/scripts + deploy/systemd/cosift-self-update.*): sha256+minisign verify, atomic swap + rollback, health-gated; keeps SSH key out of CI

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eploy model)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pure-motion, same-package split for navigability/editability. No behavior
change: every top-level declaration (and its doc comments / //go:embed
directives) was moved verbatim; only per-file import blocks were recomputed
by goimports. Verified byte-identical via a gofmt-normalized decl-multiset
diff. All gates green: go build, go vet, go test (all packages), gofmt -l
(empty), deadcode -test (clean, no new flags).

pebble_serve.go (~8095 lines) -> 7 files:
  serve_setup.go    runPebbleServe, in-process crawl bootstrap, pebbleHTTP
                    type, rate-limit/count middleware, statusCapturingWriter,
                    embedded static assets + their handlers
  serve_search.go   /search /find_similar /query /contents handlers, scatter +
                    gateways, full retrieval pipeline (retrieve*, MMR, RRF,
                    decay, expansion, site boosts)
  serve_answer.go   /answer /research synthesis, SSE, chat/rerank, prompts
  serve_admin.go    PQ encode/train, checkpoint, HNSW compact, eval-quick,
                    host/embed backfill, embed-lite truncation
  serve_crawl.go    crawl-enqueue/now, allow-domain, sitemap/rss/wet import,
                    site-submit/pack, frontier ops, lane helpers
  serve_stats.go    /healthz /stats /metrics /sla /domains /queue /verify
  serve_helpers.go  retrieval filters, site scopes, host/date helpers,
                    writeJSON/writeProblem, pebble verify/info commands

main.go (~6517 lines) -> kept main(), the dispatch switch, usage, version,
usageError, and shared CLI helpers (chunkerWith, resolveAPIKey(s), firstEnv,
contains, truncate, truncStr, abs); subcommands moved to:
  cmd_query.go        query/search/research/answer/find-similar/contents
                      clients + render/SSE consumers
  cmd_admin.go        admin client + stats/status-file/crawl-status
  cmd_eval.go         eval/answer-eval*/bench* + retriever adapters + judge
  cmd_crawl.go        crawl/refresh-due/check-robots/crawl-errors
  cmd_index.go        ingest/export/reembed/compact-index/migrate-to-pebble
  cmd_serve.go        runServe + hnswPassageWriter + authStatus
  cmd_maintenance.go  gc/init/doctor/frontier-trim/seed-tranco/outcomes

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fail CI if any file isn't gofmt-clean, so the pristine tree can't regress.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-sitemap comment

- cosift-compact.sh now reads cluster.peer_auth_token from cosift.json and sends it on /admin/hnsw-compact, so it keeps working once the admin token is set (all other harvesters already read the token dynamically via token())
- fix crawler.go comment: maybeAutoSitemap is gated by the crawler.auto_sitemap config field, not a (nonexistent) COSIFT_AUTO_SITEMAP env var

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
teovl and others added 2 commits July 1, 2026 17:19
…iddleware

The /admin/* handlers each inlined the peer-token check, and handleDomainsAudit was missing it — so GET /admin/domains-audit answered without a token (publicly blocked by Caddy, but a real gap). Add an awrap = count+rateLimit+requireAdmin wrapper and route all 24 /admin/* registrations through it, so the token gate is enforced at the mux level regardless of any per-handler check. Verified: /admin/query-log and /admin/crawl-now already 401 without a token; this closes domains-audit and any future miss.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TeoSlayer TeoSlayer enabled auto-merge July 1, 2026 14:24
@TeoSlayer TeoSlayer merged commit 5fe282b into main Jul 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants