Skip to content

fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate#3082

Merged
koala73 merged 2 commits into
mainfrom
fix/fuel-prices-resilience
Apr 14, 2026
Merged

fix(fuel-prices): resilient seeder — proxy, retry, stale-carry-forward, strict gate#3082
koala73 merged 2 commits into
mainfrom
fix/fuel-prices-resilience

Conversation

@koala73
Copy link
Copy Markdown
Owner

@koala73 koala73 commented Apr 14, 2026

Depends on #3078 (uses the emptyDataIsFailure: true opt-in added there).

Why

Railway log 2026-04-07 showed seed-fuel-prices running with 4 of 7 sources failing:

[err] [NZ] fetchNewZealand error: HTTP 403
[err] [BR] gas CSV failed: fetch failed
[err] [BR] dsl CSV failed: fetch failed
[err] [MX] fetchMexico error: fetch failed

The seeder still published 30 countries — with Brazil, Mexico, and New Zealand silently disappearing from the UI. The prior validator was countries.length >= 1, MAX_DROP_PCT=50 only warned, and :prev got rotated anyway which would have poisoned next week's WoW calc. Not up to our standards.

Root causes

Issue Fix
fetchWithProxyFallback needed PROXY_URL set but never logged whether it was Startup [PROXY] configured/NOT configured diagnostic line
Direct-first order wasted time failing on known-blocked hosts New fetchWithProxyPreferred (proxy-first) for NZ/BR/MX
No retry anywhere — one transient glitch = source dropped withFuelRetry(label, fn) — 3 attempts, 1.5s/3s backoff
MX api.datos.gob.mx/v2/precio.gasolina.publico went unresponsive (IPv4 connect hangs globally — verified from residential IPs) Swap to CRE publicacionexterna.azurewebsites.net/publicaciones/prices XML; parses 13k regular + 10k diesel stations
Failed-source countries silently dropped Stale-carry-forward from :prev with stale: true + original observedAt
:prev rotated after partial runs → poisons WoW for a failed source's countries Only rotate :prev when ALL sources succeeded
"Cheapest gasoline" could be a week-old stale-carried price Rank against freshCountries only
WoW computed against stale-carried entries = 0% vs self Skip WoW for stale entries
validateFn: countries.length >= 1 accepted anything >= 25 AND US+GB+MY present as FRESH (not stale-carried)
Validation-fail path refreshed seed-meta and blocked retries emptyDataIsFailure: true (from #3078)

Testability

Wrapped imperative body in async function main() with isMain guard (per memory rule feedback_seed_isMain_guard.md) so test imports don't trigger Redis calls. Exports parseCREStationPrices + validateFuel.

Test plan

  • node --test tests/seed-fuel-prices.test.mjs — 9/9 pass (XML parse, range filter, validator contract for missing/stale critical)
  • npm run test:data — 5183/5183 pass
  • npm run typecheck / typecheck:api / lint / lint:md / version:check
  • Local smoke run: [MX] 13429 stations, [GB] [MY] [US] [EU] 27 countries all working via proxy-fallback
  • Post-merge: verify next Railway seed-fuel-prices run logs [PROXY] configured and returns all 7 sources or stale-carries the failed ones
  • Post-merge: verify first partial-failure run logs [:prev] Skipping rotation and the :prev key retains the previous fresh snapshot

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
worldmonitor Ready Ready Preview, Comment Apr 14, 2026 5:16am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR hardens the fuel-price seeder against the four failure modes logged on Railway 2026-04-07: proxy-first fetching for NZ/BR/MX, a withFuelRetry wrapper (3 attempts, 1.5s/3s linear backoff), stale-carry-forward from :prev when a source fails, selective :prev rotation (only on fully-clean runs to protect WoW integrity), a stricter validateFuel gate (≥25 countries + US/GB/MY present as fresh), and the MX data source swapped from the unresponsive datos.gob.mx API to the CRE Azure XML feed. The isMain guard and exported helpers (parseCREStationPrices, validateFuel) are also added so the test file can import without triggering Redis calls.

Confidence Score: 5/5

  • Safe to merge — all findings are P2 style/maintenance concerns with no correctness impact in the current state.
  • The stale-carry-forward logic, selective :prev rotation, and stricter validation gate are all correct. The three findings are P2: a DRY issue where fetchMexico duplicates parseCREStationPrices inline (identical logic today, but the test suite exercises the exported copy rather than the production path), an unindented main() body, and a minor comment inaccuracy on backoff timing. None affect runtime correctness.
  • scripts/seed-fuel-prices.mjs — specifically the relationship between parseCREStationPrices (exported, tested) and the inline duplicate inside fetchMexico (production path).

Important Files Changed

Filename Overview
scripts/seed-fuel-prices.mjs Core seeder rewrite: adds proxy-first fetching for NZ/BR/MX, per-source retry, stale-carry-forward from :prev, strict validateFuel gate (≥25 countries + US/GB/MY fresh), and isMain guard. Main finding: parseCREStationPrices is exported for testing but fetchMexico duplicates the logic inline instead of calling it — the test suite doesn't cover the production code path. main() body also lacks indentation.
tests/seed-fuel-prices.test.mjs New test file — 9 tests covering parseCREStationPrices (extract, range filter, empty XML) and validateFuel (count floor, missing critical, stale-critical, healthy, mixed-stale, null inputs). All contracts are well-specified; only gap is that the tested parseCREStationPrices path is not the one actually called by fetchMexico.

Sequence Diagram

sequenceDiagram
    participant Cron as Railway Cron
    participant Main as main()
    participant Sources as Fetch Sources (×7)
    participant Proxy as Proxy (fetchWithProxyPreferred)
    participant Redis as Redis (Upstash)

    Cron->>Main: seed-fuel-prices.mjs
    Main->>Redis: readSeedSnapshot(:prev)
    Redis-->>Main: prevSnapshot (or null)
    Main->>Sources: Promise.allSettled([MY, MX, US, EU, BR, NZ, GB])
    Note over Sources,Proxy: NZ/BR/MX use fetchWithProxyPreferred<br/>+ withFuelRetry (3 attempts, 1.5s/3s)
    Sources->>Proxy: proxy-first fetch
    Proxy-->>Sources: response or fallback to direct
    Sources-->>Main: fulfilled/rejected results

    alt Some sources failed
        Main->>Redis: lookup failedSources in prevSnapshot
        Redis-->>Main: prev entries for failed-source countries
        Note over Main: mark as stale:true, preserve observedAt
    end

    Note over Main: Skip WoW for stale entries<br/>Rank cheapest/expensive from freshCountries only

    Main->>Main: "validateFuel(data)<br/>(≥25 countries + US/GB/MY fresh)"

    alt validateFuel passes
        Main->>Redis: SET canonical key
        alt allSourcesFresh AND wowAvailable
            Main->>Redis: SET :prev key (rotate snapshot)
        else partial failure
            Note over Main: [:prev] Skipping rotation
        end
        Main->>Redis: writeFreshnessMetadata (seed-meta)
    else validateFuel fails + emptyDataIsFailure
        Main->>Redis: extendExistingTtl (no seed-meta refresh)
    end
Loading

Reviews (1): Last reviewed commit: "fix(fuel-prices): resilient seeder — pro..." | Re-trigger Greptile

Comment on lines +604 to +609
export function parseCREStationPrices(xml) {
const re = (type) => new RegExp(`<gas_price\\s+type="${type}">([\\d.]+)</gas_price>`, 'g');
const collect = (type) => [...xml.matchAll(re(type))].map(m => parseFloat(m[1]))
.filter(v => Number.isFinite(v) && v > 5 && v < 100);
return { regular: collect('regular'), diesel: collect('diesel') };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 parseCREStationPrices not called from fetchMexico

The exported function duplicates the inline parsing logic inside fetchMexico (lines ~231–236) rather than calling it. The comment says "Used by fetchMexico" but fetchMexico owns its own re/collect closures with the same regex and 5 < v < 100 filter. This means the unit tests in seed-fuel-prices.test.mjs validate the exported helper, not the actual production code path. If the price bounds or regex are adjusted in one copy, the other will silently diverge.

Suggested change
export function parseCREStationPrices(xml) {
const re = (type) => new RegExp(`<gas_price\\s+type="${type}">([\\d.]+)</gas_price>`, 'g');
const collect = (type) => [...xml.matchAll(re(type))].map(m => parseFloat(m[1]))
.filter(v => Number.isFinite(v) && v > 5 && v < 100);
return { regular: collect('regular'), diesel: collect('diesel') };
}
export function parseCREStationPrices(xml) {
const re = (type) => new RegExp(`<gas_price\\s+type="${type}">([\\d.]+)</gas_price>`, 'g');
const collect = (type) => [...xml.matchAll(re(type))].map(m => parseFloat(m[1]))
.filter(v => Number.isFinite(v) && v > 5 && v < 100);
return { regular: collect('regular'), diesel: collect('diesel') };
}

And in fetchMexico, replace the duplicated block with:

const { regular, diesel } = parseCREStationPrices(xml);


async function sleep(ms) { return new Promise(r => setTimeout(r, ms)); }

// Retry wrapper: 3 attempts, 1.5s/3s/4.5s backoff. Use for all upstream calls.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Backoff comment lists a delay that never fires

The comment says "1.5s/3s/4.5s backoff" but with tries = 3, the if (i < tries) guard means the loop only sleeps between attempts 1→2 (1500ms) and 2→3 (3000ms). The 4.5s sleep (i=3, delay=4500ms) is never reached because after the third attempt fails the function throws immediately.

Suggested change
// Retry wrapper: 3 attempts, 1.5s/3s/4.5s backoff. Use for all upstream calls.
// Retry wrapper: 3 attempts, 1.5s/3s backoff. Use for all upstream calls.
async function withFuelRetry(label, fn, { tries = 3 } = {}) {

Comment on lines +621 to 622
async function main() {
const prevSnapshot = await readSeedSnapshot(`${CANONICAL_KEY}:prev`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 main() body has no indentation

The entire ~220-line function body sits at column 0, making it visually indistinguishable from module-level code. The wrapping was done without re-indenting the existing block. While JavaScript doesn't require indentation, this makes it hard to spot the function boundary and to reason about scope (e.g., staleCarried, failedSources, countryMap are local to main() but look global).

Consider re-indenting the body with two spaces to match the rest of the file's style.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@koala73
Copy link
Copy Markdown
Owner Author

koala73 commented Apr 14, 2026

Addressed both P1s in b899faa.

On the stale-carry-forward freshness bug: agreed — inserting stale:true rows into the published payload without proto/panel support is worse than the original "country vanishes for a week" behavior. Ripped the carry-forward logic out entirely.

On the validator masking outages: tightened the gate to reject ANY partial failure:

  • countries.length >= 30 (was 25; with carry-forward gone, 27 EU + 6 non-EU is the healthy baseline)
  • US + GB + MY present
  • failedSources.length === 0

New failure model: partial run → validator rejects → atomicPublish skips → emptyDataIsFailure: true skips seed-meta refresh → cache TTL (10 days) serves last healthy snapshot → health.js flips to STALE_SEED once maxStaleMin elapses. Visible signal, no silent degradation.

Dead code removed: SOURCE_COUNTRY_CODES, staleCarried, freshCountryCount, the stale-WoW skip.

Tests updated to cover the failedSources gate. 10/10 pass, full suite 5184/5184.

koala73 added 2 commits April 14, 2026 09:12
…d, strict gate

Addresses 2026-04-07 run where 4 of 7 sources failed (NZ 403, BR/MX fetch failed)
and the seeder silently published 30 countries with Brazil/Mexico/NZ vanishing
from the UI.

- Startup proxy diagnostic so PROXY_URL misconfigs are immediately visible.
- New fetchWithProxyPreferred (proxy-first, direct fallback) + withFuelRetry
  (3 attempts, backoff) wrapping NZ/BR/MX upstream calls.
- Swap MX from dead datos.gob.mx to CRE publicacionexterna XML (13k stations).
- Stale-carry-forward failed sources from :prev snapshot (stale: true) instead
  of dropping countries; fresh-only ranking; skip WoW for stale entries.
- Gate :prev rotation on all-sources-succeeded so partial runs don't poison
  next week's WoW.
- Strict validateFn: >=25 countries AND US+GB+MY fresh. Prior gate was >=1.
- emptyDataIsFailure: true so validation fail doesnt refresh seed-meta.
- Wrap imperative body in main() + isMain guard; export parseCREStationPrices
  and validateFuel; 9 new unit tests.
…view)

Reviewer flagged two P1s on the prior commit:

1. stale-carry-forward inserted stale: true rows into the published payload,
   but the proto schema and panel have no staleness render path. Users would
   see week-old BR/MX/NZ prices as current. Resilience turned into a
   freshness bug.
2. Validator counted stale-carried entries toward the floor. US/GB/MY fresh
   + 22 stale still passed, refreshing seed-meta.fetchedAt and leaving health
   operationally healthy indefinitely. Hid the outage.

Fix: remove stale-carry-forward entirely. Tighten validator to require
countries.length >= 30, US+GB+MY present, and failedSources.length === 0.
Partial-failure runs now rejected → 10-day cache TTL serves last healthy
snapshot → health STALE_SEED after maxStaleMin. Correct, visible signal.

Drops dead code: SOURCE_COUNTRY_CODES, staleCarried/freshCountries, stale
WoW skip. Tests updated for the failedSources gate.
@koala73 koala73 force-pushed the fix/fuel-prices-resilience branch from b899faa to 7614c3b Compare April 14, 2026 05:14
@koala73 koala73 merged commit 7e7ca70 into main Apr 14, 2026
9 of 10 checks passed
@koala73 koala73 deleted the fix/fuel-prices-resilience branch April 14, 2026 05:16
koala73 added a commit that referenced this pull request Apr 14, 2026
Brazil gov.br is structurally unreachable from Railway IPs:
- Decodo proxy 403s all .gov.br CONNECTs by policy
- Direct fetch fails undici TLS handshake from Railway egress

After PR #3082 tightened the publish gate to require zero failed sources,
every run exits 1 -> Railway "Deployment crashed" banner + STALE_SEED.

Add TOLERATED_FAILURES = {'Brazil'}; validateFuel ignores tolerated names
when checking failedSources. Critical regions (US/GB/MY) and the >=30
country floor still gate publish. Brazil's outage stays visible via the
existing [FRESHNESS] log.
koala73 added a commit that referenced this pull request Apr 14, 2026
…oop (#3085)

* fix(fuel-prices): tolerate Brazil ANP failure to stop Railway crash-loop

Brazil gov.br is structurally unreachable from Railway IPs:
- Decodo proxy 403s all .gov.br CONNECTs by policy
- Direct fetch fails undici TLS handshake from Railway egress

After PR #3082 tightened the publish gate to require zero failed sources,
every run exits 1 -> Railway "Deployment crashed" banner + STALE_SEED.

Add TOLERATED_FAILURES = {'Brazil'}; validateFuel ignores tolerated names
when checking failedSources. Critical regions (US/GB/MY) and the >=30
country floor still gate publish. Brazil's outage stays visible via the
existing [FRESHNESS] log.

* fix(fuel-prices): rotate :prev on tolerated-only failures to keep WoW fresh

Reviewer catch: after tolerating Brazil, allSourcesFresh stays false forever
→ :prev never rotates → panel's WoW stretches into 2-week, 3-week, ... deltas
for every non-Brazil country while still labeled 'week-over-week'.

Gate :prev rotation on untolerated failures only. Tolerated sources are
absent from the snapshot entirely, so rotating is safe (no stale-self-
compare poisoning next week).

* fix(fuel-prices): distinguish tolerated vs untolerated sources in [DEGRADED] log

Greptile P2: the [DEGRADED] message said 'publish will be rejected' even
when only tolerated sources (Brazil) failed — confusing for operators
watching Railway logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant