Skip to content

fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect#3087

Merged
koala73 merged 2 commits into
mainfrom
fix/runseed-recordcount-fallback
Apr 14, 2026
Merged

fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect#3087
koala73 merged 2 commits into
mainfrom
fix/runseed-recordcount-fallback

Conversation

@koala73
Copy link
Copy Markdown
Owner

@koala73 koala73 commented Apr 14, 2026

Why this PR?

Health dashboard 2026-04-14 08:44 UTC reported 18 EMPTY_DATA CRITs in /api/health. After cross-referencing 12 Railway seeder logs, 16 of them are phantoms: the seeders ran on schedule, wrote payloads to Redis, and emitted `Verified: data present in Redis` in the same line that reported `recordCount:0`.

Smoking-gun signature (BLS-Series example):
```
[BLS-Series] {"event":"seed_complete","recordCount":0,"durationMs":3925,"payloadBytes":6093}
[BLS-Series] Verified: data present in Redis
```

VPD-Tracker is the most striking: 3 MB of payload, count reported as 0.

Root cause

`scripts/_seed-utils.mjs` `runSeed()` auto-detects `recordCount` from a hardcoded list of payload shapes:

```js
Array.isArray(data) ? data.length
: (topicArticleCount
?? data?.predictions?.length
?? data?.events?.length ?? data?.earthquakes?.length ?? data?.outages?.length
?? data?.fireDetections?.length ?? data?.anomalies?.length ?? data?.threats?.length
?? data?.quotes?.length ?? data?.stablecoins?.length
?? data?.cables?.length ?? 0);
```

If a seeder publishes a custom shape (`{score, inputs}` for fear-greed, `{geopolitical, tech}` for prediction-markets, `{primaryTitle, ...}` per-topic for insights, etc.) AND doesn't pass `opts.recordCount`, the chain falls through to 0. seed-meta is written with `{fetchedAt, recordCount: 0}`. health.js reads this and flips to EMPTY_DATA.

Audit: of 19 failing-health seeders, only 2 pass `opts.recordCount` to `runSeed` (`seed-spr-policies`, `seed-owid-energy-mix`). The other 17 rely on auto-detect.

Fix

Add a final `payloadBytes > 0 → 1` fallback to the resolution chain. When triggered, `console.warn` names the seeder so the author can add an explicit `opts.recordCount` for accurate dashboards.

Also extracted the resolution logic into a pure exported `computeRecordCount()` function so it can be unit-tested without a real Redis connection.

Resolution order (unchanged for existing callers):

  1. `opts.recordCount` (function or number) — explicit declaration wins
  2. Auto-detect from known shape
  3. NEW: `payloadBytes > 0` → 1 + warn
  4. 0

Explicit `opts.recordCount: 0` still wins (test covers it) — for cases like `seed-owid-energy-mix` which deliberately reports 0.

Files

  • `scripts/_seed-utils.mjs` — extract `computeRecordCount()`, wire fallback into `runSeed`
  • `tests/seed-utils.test.mjs` — 11 new test cases

Effect

  • Clears 16 phantom CRITs on the next bundle cycle (one cron tick per affected seeder).
  • Per-seeder `console.warn` will surface in logs so we know which seeders still need explicit `opts.recordCount` for accurate dashboards.
  • One genuine intermittent (`unrestEvents` — ACLED quiet periods) is unchanged; that hits the SKIPPED-validation path which deliberately writes count=0.
  • `goldExtended` and `sprPolicies` are NOT covered by this PR — those are real bugs (dead `.then()` block in seed-commodity-quotes; missing Railway runner). Separate PRs incoming.

Testing

  • `node --test tests/seed-utils.test.mjs` → 18/18 (11 new + 7 existing)
  • `node --test tests/seed-utils-empty-data-failure.test.mjs` → 2/2
  • `npm run typecheck` → clean

Post-Deploy Monitoring & Validation

  • Logs: Watch the next 1-2 bundle cycles on Railway (`seed-bundle-macro`, `seed-bundle-health`, `seed-bundle-energy-sources`, `seed-bundle-ecb-eu`, plus standalone services). Seeders that previously logged `recordCount:0, payloadBytes:>0` will now log `recordCount:1` AND a one-time `[recordCount] auto-detect did not match a known shape (payloadBytes=N); falling back to 1. Add opts.recordCount to : for accurate health metrics.` warning.
  • Health endpoint: `curl -sL https://worldmonitor.app/api/health | jq '.summary'` — `crit` count should drop from 18 to ~5 within 1 hour (only the genuine cases remain: `unrestEvents` intermittent + `goldExtended`/`sprPolicies` until separate PRs land).
  • Failure signal / rollback: if a seeder that previously reported a meaningful recordCount now reports 1 (regression), check whether its known-shape detection broke. Revert is one-line. No data is at risk — this only affects metadata write.
  • Validation window: 1 hour post-deploy.
  • Owner: @koala73

Related

  • Sibling PRs: PR-B (seed-commodity-quotes afterPublish fix → goldExtended) and PR-C (wire seed-spr-policies into bundle) — incoming.
  • Skill: `seed-recordcount-autodetect-phantom-empty` documents the diagnosis methodology.

…-detect

Phantom EMPTY_DATA in /api/health: 16 of 21 failing health checks were
caused by seeders publishing custom payload shapes without passing
opts.recordCount. The auto-detect chain in runSeed only matches a hardcoded
list of shapes; anything else falls through to recordCount=0 and triggers
EMPTY_DATA in /api/health even though the payload is fully populated and
verified in Redis.

Smoking-gun log signature from Railway 2026-04-14:
  [BLS-Series] recordCount:0, payloadBytes:6093, Verified: data present
  [VPD-Tracker] recordCount:0, payloadBytes:3068853, Verified: data present
  [Disease-Outbreaks] recordCount:0, payloadBytes:92684, Verified: data present

Fix:
- Extract recordCount logic into pure exported computeRecordCount() for
  unit testability.
- Add payloadBytes>0 → 1 fallback at the end of the resolution chain. When
  triggered, console.warn names the seeder so the author can add an
  explicit opts.recordCount for accurate dashboards.
- Resolution order unchanged for existing callers: opts.recordCount wins,
  then known-shape auto-detect, then the new payloadBytes fallback, then 0.
  Explicit opts.recordCount=0 still wins (test covers it).

Effect: clears 16 phantom CRITs on the next bundle cycle. Per-seeder warns
will surface in logs so we can add accurate opts.recordCount in follow-up.

Tests: 11 new computeRecordCount cases (opts precedence, auto-detect shapes,
fallback behavior, no-spurious-warn, explicit-zero precedence).
seed-utils.test.mjs 18/18 + seed-utils-empty-data-failure.test.mjs 2/2 +
typecheck clean.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Apr 14, 2026 9:27am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR fixes 16 phantom EMPTY_DATA health alerts by adding a payloadBytes > 0 → 1 fallback to runSeed()'s recordCount resolution and extracting that logic into a pure, unit-testable computeRecordCount() function. The root cause (seeders with custom data shapes falling through to recordCount: 0) and the fix are clearly diagnosed, the resolution order is preserved for all existing callers, and the console.warn on fallback gives authors a clear signal to add an explicit opts.recordCount.

Confidence Score: 5/5

  • Safe to merge — fix is correct, well-scoped, and all remaining findings are minor style/test suggestions.
  • No P0 or P1 issues found. The resolution chain preserves all existing behavior (explicit 0 wins, known shapes still detected correctly), the fallback is a provably-safe metadata-only write, and 18/18 tests pass. Both findings are P2: a harmless test function mutation and a missing edge-case test.
  • No files require special attention.

Important Files Changed

Filename Overview
scripts/_seed-utils.mjs Extracts computeRecordCount() as a pure exported function and wires a payloadBytes>0 → 1 fallback into runSeed(); logic is correct and the onPhantomFallback callback pattern keeps the function testable without Redis
tests/seed-utils.test.mjs 11 new computeRecordCount test cases covering explicit opts, shape auto-detect, fallback, and explicit-zero precedence; missing coverage for the empty-known-shape + payloadBytes>0 edge case (empty array should NOT trigger the fallback)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[computeRecordCount called] --> B{opts.recordCount != null?}
    B -- Yes --> C{typeof === 'function'?}
    C -- Yes --> D[return opts.recordCount data]
    C -- No --> E[return opts.recordCount]
    B -- No --> F{Array.isArray data?}
    F -- Yes --> G[detectedFromShape = data.length]
    F -- No --> H{topicArticleCount ?? predictions.length\n?? events.length ?? earthquakes.length\n?? outages.length ?? fireDetections.length\n?? anomalies.length ?? threats.length\n?? quotes.length ?? stablecoins.length\n?? cables.length}
    H -- resolved --> I[detectedFromShape = value]
    H -- all undefined --> J[detectedFromShape = undefined]
    G --> K{detectedFromShape != null?}
    I --> K
    K -- Yes --> L[return detectedFromShape]
    K -- No / undefined --> M{payloadBytes > 0?}
    J --> M
    M -- Yes --> N[onPhantomFallback warn\nreturn 1]
    M -- No --> O[return 0]

    style N fill:#f9c,stroke:#c66
    style O fill:#fcc,stroke:#c66
    style D fill:#cfc,stroke:#6c6
    style E fill:#cfc,stroke:#6c6
    style L fill:#cfc,stroke:#6c6
Loading

Reviews (1): Last reviewed commit: "fix(seed-utils): payloadBytes>0 fallback..." | Re-trigger Greptile

Comment thread tests/seed-utils.test.mjs Outdated
);
});

it.each = undefined; // node:test doesn't have it.each; explicit cases below
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unnecessary mutation of imported test function

it.each = undefined assigns a property directly onto the it function object imported from node:test. Since ES module imports are live bindings (not copies), this mutates the actual function object, and technically could affect any code in this module that inspects it.each. A comment alone expresses the intent without touching the runner:

Suggested change
it.each = undefined; // node:test doesn't have it.each; explicit cases below
// Note: node:test does not provide it.each explicit cases below

Comment thread tests/seed-utils.test.mjs
… empty-known-shape edge case

Greptile review on PR #3087 caught two minor test issues:

1. `it.each = undefined` mutated the imported `it` function (ES module
   live binding). Replaced with a plain comment.

2. Missing edge case: `data: { events: [] }` with payloadBytes > 0 should
   NOT trigger the payloadBytes fallback because detectedFromShape resolves
   to a real 0 (not undefined). Without this guard, a future regression
   could collapse the !=null check and silently mask genuine empty
   upstream cycles as "1 record". Test added.

Tests: 19/19 (was 18). No production code change.
@koala73 koala73 merged commit 5610368 into main Apr 14, 2026
9 checks passed
@koala73 koala73 deleted the fix/runseed-recordcount-fallback branch April 14, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant