Skip to content

Improve OIDC error diagnostics for CDN/WAF blocks#39

Merged
sverdlov93 merged 3 commits intomainfrom
fix/oidc-waf-error-diagnostics
Apr 3, 2026
Merged

Improve OIDC error diagnostics for CDN/WAF blocks#39
sverdlov93 merged 3 commits intomainfrom
fix/oidc-waf-error-diagnostics

Conversation

@sverdlov93
Copy link
Copy Markdown
Collaborator

Overview

When a CDN/WAF (e.g. CloudFront) intercepts and blocks the OIDC authentication request with a 403, the response body is an HTML error page — not JSON. Previously, fly-action logged this as OIDC failed 403, body: {}, which gave operators zero insight into why authentication failed. This made debugging WAF IP allowlist issues extremely painful.

What changed

Improved the error handling in authenticateOidc (the first and most critical HTTP call to Fly) to provide actionable diagnostics:

  1. Detect non-JSON responses — When JSON.parse fails on the response body, the raw body (truncated to 500 chars) is logged instead of an empty {}. This surfaces the actual CloudFront/WAF HTML error page.

  2. Log the blocking entity — The server or x-cache response header is logged on failure, making it immediately clear whether CloudFront, nginx, or the Fly service itself returned the error.

  3. Actionable hint for 403 + non-JSON — A specific error message suggests WAF IP allowlisting as the likely cause and points to the exact config location (saas-infrastructure/AWS/CDN-DIRECT/override/) where IPs are managed.

Why only oidc.ts?

fly-action makes two HTTP calls to Fly:

Call File Needs this fix?
/ci/start-oidc oidc.ts Yes — first contact with Fly, most likely to hit WAF blocks from new/unallowlisted runner IPs
/ci/end post.ts No — uses the tenant URL returned by successful OIDC, already has truncate(body) error logging, and never runs if OIDC fails

Context

This fix was prompted by investigating a staging CI failure where JFROG/ascii-frog release workflows failed with OIDC failed 403, body: {}. The root cause was a GHES self-hosted runner scale set (devf-dind-arm-scale-set-jlxw2) whose NAT IP wasn't in the staging WAF allowlist. The failure was deterministic per runner but invisible due to the unhelpful error message.

Example: Before vs After

Before:

Error: OIDC failed 403, body: {}

After:

Error: OIDC failed 403 from server: CloudFront
Error: Response is not JSON (likely a CDN/WAF error page). Raw body: <!DOCTYPE html><html>...403 Forbidden...</html>
Error: Hint: a 403 with a non-JSON body usually means a CDN/WAF (e.g. CloudFront) blocked the request. Check that this runner's IP is in the WAF allowlist. See: saas-infrastructure/AWS/CDN-DIRECT/override/ for IP configuration.

@sverdlov93 sverdlov93 added the bug Something isn't working label Apr 3, 2026
@sverdlov93 sverdlov93 self-assigned this Apr 3, 2026
@sverdlov93 sverdlov93 added the bug Something isn't working label Apr 3, 2026
When a CDN/WAF (e.g. CloudFront) blocks OIDC requests with a 403,
the response body is HTML — not JSON. Previously this was logged as
`body: {}` which gave no insight into the failure cause.

Now on OIDC failure:
- Detect non-JSON responses and log the raw (truncated) body
- Log the `server` / `x-cache` header to identify the blocking entity
- For 403 + non-JSON, print a hint about WAF IP allowlisting

Only oidc.ts needs this — post.ts (ci/end) uses the tenant URL from
a successful OIDC response and already has adequate error logging.
@sverdlov93 sverdlov93 force-pushed the fix/oidc-waf-error-diagnostics branch from 38a5512 to 84584a5 Compare April 3, 2026 15:19
Frogbot v2.32.0 (PR jfrog/frogbot#1262) now verifies that the
workflow file contains 'environment: frogbot' before running
scan-pull-request. Without it, all PR scans fail immediately.
- frogbot.yml: split into two jobs so environment: frogbot only
  gates PR scans (push-to-main no longer requires reviewer approval)
- unit-tests.yml: remove deprecated 'url' input (tenant resolved via OIDC)
- release.yml: add explicit permissions block (contents: write)
- deps: fix brace-expansion DoS vulnerability (npm audit fix)
- deps: bump @typescript-eslint, eslint, @types/node to latest
@sverdlov93 sverdlov93 merged commit a8a67ba into main Apr 3, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant