Attachments stage 2: server-side expiry sweep#219
Merged
Conversation
Text extraction (PDF/txt/md uploads) fails from the browser with "Network error contacting Venice: Failed to fetch" - a CORS/network- layer rejection of the direct call to /augment/text-parser, even though the chat/image endpoints work from the same host + key. Pre-existing (extraction runs before any storage write; the attachments-storage migration is downstream), surfaced now because non-image upload was never exercised live. Flesh out the text-parser edge-function sub-plan with the full diagnosis, the two call sites that must move server-side (chat attachments in Chat.svelte; Library uploads in documents.ts), the current extractText shape, the target /text-parser route, and the large-file escape-hatch wrinkle. The server-side proxy is the fix regardless of the exact browser-side cause.
Replace the (now-inert) browser attachment_expiry worker with a server-side sweep that actually deletes bucket objects - the thing SQL can't do. - schema.sql: two service-definer, service-role-only RPCs - list_expirable_attachments (live + thread dormant p_days, bounded, FOR UPDATE SKIP LOCKED) and mark_attachments_expired (null storage_path + stamp expired_at). Plus nak_trigger_attachment_expiry + an hourly pg_cron job, same Vault-secret custody + local-stack guards as the embed backfill. - expire-attachments edge function: a standalone function (NOT a venice route - expiry never calls Venice, only Storage), service-role gated. Its deps wire the RPCs + storage.remove into the pure runExpiry orchestration. - _shared/expire-attachments.ts: I/O-free drain loop (batch -> delete -> mark, until short batch / row cap / time budget). No per-row claim - delete + mark are idempotent, so overlapping ticks can't corrupt. - deploy.yml: deploy the new function alongside venice. Browser-worker removal lands next. Deno: 5 new offline tests pass, handler type-checks. Cron/Storage round-trip can't be exercised here; verify after deploy (uploaded objects should disappear 30 days after a thread goes quiet).
Update the migration plan + attachments banner to reflect the server-side expiry sweep landing (expire-attachments function + cron + RPCs), and record the browser-worker retirement as the remaining cleanup (Stage 2b) - left in place because it's inert post-Stage-1.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SYNOPSIS
Attachments Stage 2: a server-side expiry sweep that deletes bucket objects on a schedule. Plus a doc capturing the text-parser CORS bug for the edge-function project.
PURPOSE
Stage 1 moved attachment bytes into the
attachmentsbucket but left expiry inert (the old browser worker's RPC now matches zero rows), so uploaded objects never get reclaimed. Storage objects are real cost; cleanup must run server-side, not depend on an open tab.DESCRIPTION
expire-attachmentsedge function - standalone, not a route onvenice(expiry only touches Storage, never calls Venice; also keeps it clear of the in-flight edge-function migration). Service-role gated, mirroring the backfill's JWT-role auth. Deployed via its own line indeploy.yml(the workflow previously deployed onlyvenice).Schema - two service-definer, service-role-only RPCs:
list_expirable_attachments(live + owning thread dormantp_days, bounded,FOR UPDATE SKIP LOCKED) andmark_attachments_expired(nullstorage_path+ stampexpired_at). The function does thestorage.removebetween them. Plusnak_trigger_attachment_expiry+ an hourlypg_cronjob, same Vault-secret custody + local-stack guards as the embed backfill._shared/expire-attachments.ts- I/O-freerunExpirydrain loop (batch -> delete -> mark, until short batch / row cap / time budget). No per-row claim: delete + mark are idempotent, so overlapping ticks can't corrupt.Deferred (next commits): retiring the now-inert browser
attachment_expirysupervisor unit +expire_old_attachmentsRPC (cleanup; harmless meanwhile), then thedata-column collapse.Also in this branch: a doc-only commit fleshing out
text-parser.mdwith the CORS diagnosis for the other edge-function session (text extraction broken from the browser - pre-existing, unrelated to storage). Rides along since it's a migration note.Verified: 5 new Deno tests (33 total green), handler
deno checkpasses. Not verifiable from the cloud env: the cron + Storage round-trip - confirm post-deploy that an object disappears ~30 days after its thread goes dormant.Generated by Claude Code