feat(cron): logger + per-step debug instrumentation#32
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
src/lib/log.ts ships a tiny Logger interface plus a console-backed provider that emits one JSON line per call via console.log. Vercel auto-parses JSON in stdout into searchable fields, so this is the platform's idiomatic shape — no external deps. Cron routes bind a child logger with their event name. Library code (stats.ts, ingest.ts, sync-geoip.ts) holds a module-level child and emits log.debug between every await — every SQL query, blob get/put, MaxMind fetch, tar extract, per-blob ingest fetch.
VERCEL_ENV=production keeps the existing info default. Preview, dev, and local all default to debug, so per-step cron traces show up on preview deploys without a manual env var. LOG_LEVEL still overrides.
The chunk-loop helper in src/lib/blob-stream.ts was a workaround for a Vercel Fluid Compute (Node.js) hang. vercel.json now pins bunVersion: "1.x" and the chunk loop hangs on Bun instead — confirmed on the build-stats cron at the asn.stream_to_buffer_start step. The Response().arrayBuffer() / .text() pattern is what the scripts have been using directly for months and what the helper's own comment said works on Bun. Inline it at the three call sites and delete the helper. One drain pattern, used everywhere.
The Vercel Blob SDK's get() returns response.body as a stream and lets the underlying Response go out of scope. Under Bun on Vercel the body stream then never reaches EOF for large bodies — neither the chunk-loop nor Response.arrayBuffer() drain the 12MB mmdb. Switch to head() for the URL (small metadata, no body-stream issue) plus a direct fetch with the bearer token. The response stays in scope across arrayBuffer(), and the read uses Next's data cache via next.tags=[ASN_BLOB_TAG]. tripwire-asn-update calls revalidateTag after a successful upload, so subsequent build-stats runs serve the mmdb from cache and only refetch when the data actually changed.
Same root cause as the ASN cron fix: @vercel/blob's get() returns response.body and lets the Response go out of scope, which under Bun on Vercel can leave the body stream stuck waiting for EOF — that's the most likely source of the ingest cron's flakiness on big runs. ingest.ts: per-event JSON read uses fetch with cache: "no-store" since each event is read exactly once. aggregates.ts: page-side loader uses head() + fetch with the STATS_BLOB_TAG cache tag. The build-stats cron already invalidates that tag after publishing, so warm pages flip to fresh aggregates without polling. The 2-min module-level singleton stays in front as an instance-local burst absorber. Also moves the literal "tripwire-aggregates" tag string into aggregate-shape.ts so producer (cron) and consumer (page) import the same constant.
The ingest cron was hanging at list.page_start with no progress: same root cause as the get() and head() drain hangs. requestApi inside @vercel/blob ends with await apiResponse.json(), which under Bun on Vercel leaves the body stream stuck waiting for EOF after the SDK's internal Response goes out of scope. Inline a direct fetch against the public list endpoint with the same auth + x-api-version headers the SDK sends. The Response stays in scope across .json() and the call completes.
The cron was listing the entire events/ prefix every 5 minutes, which grows linearly with the lifetime event count. Switch to listing today + yesterday by exact UTC-date prefix (events/<YYYY-MM-DD>/), bounded by INGEST_WINDOW_DAYS=2. Events older than the window are not auto-ingested by the cron path; the CLI script (scripts/tripwire/ingest-events.ts) still walks the full events/ prefix and can backfill manually if a longer outage happens.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the hand-rolled
makeCronLoggerwith a pino singleton (src/lib/log.ts). Level comes fromLOG_LEVELenv at startup; pretty-printed viapino-prettyin dev (orLOG_PRETTY=1), raw JSON in production.Each cron route binds a child logger with its
event:name. Library code (stats.ts,ingest.ts,sync-geoip.ts) owns its own module-level child and emitslog.debugbetween every await — every SQL query, blobget/put,streamToBuffer,Reader.openBuffer, MaxMind fetch, tar extract, per-blob ingest fetch.ingestNewEventskeeps itsonProgresscallback so the existing CLI script continues to work unchanged.Diagnostic intent
Open for the preview deploy. Set
LOG_LEVEL=debugon the preview env, curl each cron endpoint with the bearer, and the runtime logs will show exactly which step is the last one before the production hang. Patches to address whatever the trace reveals will land on this branch before merge.Local baseline (against prod blob + prod Neon):
tripwire-build-statstripwire-ingesttripwire-asn-update