fix(relay): prevent Polymarket OOM crash via request deduplication#513
fix(relay): prevent Polymarket OOM crash via request deduplication#513
Conversation
Concurrent Polymarket requests for the same cache key each fired independent https.get() calls. With 12 categories × multiple clients, 740 requests piled up in 10s, all buffering response bodies → 4.1GB heap → OOM crash on Railway. Fix: in-flight promise map deduplicates concurrent requests to the same cache key. 429/error responses are negative-cached for 30s to prevent retry storms.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 884fe3d8d8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (response.statusCode !== 200) { | ||
| console.error(`[Relay] Polymarket upstream ${response.statusCode}`); | ||
| response.resume(); | ||
| polymarketCache.set(cacheKey, { data: '[]', timestamp: Date.now() - POLYMARKET_CACHE_TTL_MS + POLYMARKET_NEG_TTL_MS }); |
There was a problem hiding this comment.
Avoid replacing stale cache with empty payload
When the upstream call fails, this line unconditionally overwrites polymarketCache with [], so any previously cached market data is lost for subsequent requests. In the failure path, only the request already holding cached can return stale data; the next request hits this new entry and serves an empty list instead, even though stale data existed moments earlier. This creates user-visible data dropouts during transient 429/5xx periods.
Useful? React with 👍 / 👎.
| if (response.statusCode !== 200) { | ||
| console.error(`[Relay] Polymarket upstream ${response.statusCode}`); | ||
| response.resume(); | ||
| polymarketCache.set(cacheKey, { data: '[]', timestamp: Date.now() - POLYMARKET_CACHE_TTL_MS + POLYMARKET_NEG_TTL_MS }); |
There was a problem hiding this comment.
Prevent public 120s caching of negative-cache responses
Encoding negative cache by backdating timestamp makes the entry look like a normal cache hit, so handlePolymarketRequest serves it with Cache-Control: public, max-age=120. Although the in-process entry is intended to last 30s, downstream CDN/browser caches can retain the empty [] response for 120s, extending outage impact beyond the intended retry-suppression window.
Useful? React with 👍 / 👎.
Summary
Railway relay crashed with OOM (4.1GB heap) due to Polymarket request flooding:
https.get()to Gamma API. With 12 categories × multiple clients, 740 requests piled up in 10s, all buffering response bodies simultaneously.polymarketInflight) coalesces concurrent requests for the same cache key into a single upstream fetch. 429/error responses are negative-cached for 30s to prevent retry storms.Changes
scripts/ais-relay.cjs— extractedfetchPolymarketUpstream()returning a Promise, addedpolymarketInflightMap for dedup, added 30s negative cache on failureTest plan
polymarketInflight: 0at rest