fix(seed-utils): payloadBytes>0 fallback for runSeed recordCount auto-detect#3087
Conversation
…-detect Phantom EMPTY_DATA in /api/health: 16 of 21 failing health checks were caused by seeders publishing custom payload shapes without passing opts.recordCount. The auto-detect chain in runSeed only matches a hardcoded list of shapes; anything else falls through to recordCount=0 and triggers EMPTY_DATA in /api/health even though the payload is fully populated and verified in Redis. Smoking-gun log signature from Railway 2026-04-14: [BLS-Series] recordCount:0, payloadBytes:6093, Verified: data present [VPD-Tracker] recordCount:0, payloadBytes:3068853, Verified: data present [Disease-Outbreaks] recordCount:0, payloadBytes:92684, Verified: data present Fix: - Extract recordCount logic into pure exported computeRecordCount() for unit testability. - Add payloadBytes>0 → 1 fallback at the end of the resolution chain. When triggered, console.warn names the seeder so the author can add an explicit opts.recordCount for accurate dashboards. - Resolution order unchanged for existing callers: opts.recordCount wins, then known-shape auto-detect, then the new payloadBytes fallback, then 0. Explicit opts.recordCount=0 still wins (test covers it). Effect: clears 16 phantom CRITs on the next bundle cycle. Per-seeder warns will surface in logs so we can add accurate opts.recordCount in follow-up. Tests: 11 new computeRecordCount cases (opts precedence, auto-detect shapes, fallback behavior, no-spurious-warn, explicit-zero precedence). seed-utils.test.mjs 18/18 + seed-utils-empty-data-failure.test.mjs 2/2 + typecheck clean.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes 16 phantom Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[computeRecordCount called] --> B{opts.recordCount != null?}
B -- Yes --> C{typeof === 'function'?}
C -- Yes --> D[return opts.recordCount data]
C -- No --> E[return opts.recordCount]
B -- No --> F{Array.isArray data?}
F -- Yes --> G[detectedFromShape = data.length]
F -- No --> H{topicArticleCount ?? predictions.length\n?? events.length ?? earthquakes.length\n?? outages.length ?? fireDetections.length\n?? anomalies.length ?? threats.length\n?? quotes.length ?? stablecoins.length\n?? cables.length}
H -- resolved --> I[detectedFromShape = value]
H -- all undefined --> J[detectedFromShape = undefined]
G --> K{detectedFromShape != null?}
I --> K
K -- Yes --> L[return detectedFromShape]
K -- No / undefined --> M{payloadBytes > 0?}
J --> M
M -- Yes --> N[onPhantomFallback warn\nreturn 1]
M -- No --> O[return 0]
style N fill:#f9c,stroke:#c66
style O fill:#fcc,stroke:#c66
style D fill:#cfc,stroke:#6c6
style E fill:#cfc,stroke:#6c6
style L fill:#cfc,stroke:#6c6
Reviews (1): Last reviewed commit: "fix(seed-utils): payloadBytes>0 fallback..." | Re-trigger Greptile |
| ); | ||
| }); | ||
|
|
||
| it.each = undefined; // node:test doesn't have it.each; explicit cases below |
There was a problem hiding this comment.
Unnecessary mutation of imported test function
it.each = undefined assigns a property directly onto the it function object imported from node:test. Since ES module imports are live bindings (not copies), this mutates the actual function object, and technically could affect any code in this module that inspects it.each. A comment alone expresses the intent without touching the runner:
| it.each = undefined; // node:test doesn't have it.each; explicit cases below | |
| // Note: node:test does not provide it.each — explicit cases below |
… empty-known-shape edge case Greptile review on PR #3087 caught two minor test issues: 1. `it.each = undefined` mutated the imported `it` function (ES module live binding). Replaced with a plain comment. 2. Missing edge case: `data: { events: [] }` with payloadBytes > 0 should NOT trigger the payloadBytes fallback because detectedFromShape resolves to a real 0 (not undefined). Without this guard, a future regression could collapse the !=null check and silently mask genuine empty upstream cycles as "1 record". Test added. Tests: 19/19 (was 18). No production code change.
Why this PR?
Health dashboard 2026-04-14 08:44 UTC reported 18 EMPTY_DATA CRITs in /api/health. After cross-referencing 12 Railway seeder logs, 16 of them are phantoms: the seeders ran on schedule, wrote payloads to Redis, and emitted `Verified: data present in Redis` in the same line that reported `recordCount:0`.
Smoking-gun signature (BLS-Series example):
```
[BLS-Series] {"event":"seed_complete","recordCount":0,"durationMs":3925,"payloadBytes":6093}
[BLS-Series] Verified: data present in Redis
```
VPD-Tracker is the most striking: 3 MB of payload, count reported as 0.
Root cause
`scripts/_seed-utils.mjs` `runSeed()` auto-detects `recordCount` from a hardcoded list of payload shapes:
```js
Array.isArray(data) ? data.length
: (topicArticleCount
?? data?.predictions?.length
?? data?.events?.length ?? data?.earthquakes?.length ?? data?.outages?.length
?? data?.fireDetections?.length ?? data?.anomalies?.length ?? data?.threats?.length
?? data?.quotes?.length ?? data?.stablecoins?.length
?? data?.cables?.length ?? 0);
```
If a seeder publishes a custom shape (`{score, inputs}` for fear-greed, `{geopolitical, tech}` for prediction-markets, `{primaryTitle, ...}` per-topic for insights, etc.) AND doesn't pass `opts.recordCount`, the chain falls through to 0. seed-meta is written with `{fetchedAt, recordCount: 0}`. health.js reads this and flips to EMPTY_DATA.
Audit: of 19 failing-health seeders, only 2 pass `opts.recordCount` to `runSeed` (`seed-spr-policies`, `seed-owid-energy-mix`). The other 17 rely on auto-detect.
Fix
Add a final `payloadBytes > 0 → 1` fallback to the resolution chain. When triggered, `console.warn` names the seeder so the author can add an explicit `opts.recordCount` for accurate dashboards.
Also extracted the resolution logic into a pure exported `computeRecordCount()` function so it can be unit-tested without a real Redis connection.
Resolution order (unchanged for existing callers):
Explicit `opts.recordCount: 0` still wins (test covers it) — for cases like `seed-owid-energy-mix` which deliberately reports 0.
Files
Effect
Testing
Post-Deploy Monitoring & Validation
Related