Skip to content

fix(security): carry SAML pending-session id in RelayState#28760

Merged
harshach merged 4 commits into
mainfrom
harshach/saml-relaystate-session-fix
Jun 6, 2026
Merged

fix(security): carry SAML pending-session id in RelayState#28760
harshach merged 4 commits into
mainfrom
harshach/saml-relaystate-session-fix

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Jun 5, 2026

Describe your changes:

Fixes #28763

SAML SSO logins fail with 401 "No pending session" on the IdP callback because that callback is a cross-site POST, so the browser drops the SameSite=Lax OM_SESSION cookie that the server-side pending session (introduced by #26314) depends on. I carry the pending-session id in the SAML RelayState instead — a POST-body field immune to SameSite/transport rules, already used by the MCP-over-SAML flow — falling back to the cookie for backward compatibility. This is the second of the two SAML regressions from #26314; the redirect-allowlist one is fixed separately in #28695.

Type of change:

  • Bug fix

High-level design:

  • handleLogin now passes the pending-session id as the SAML RelayState (auth.login(pendingSession.getId())).
  • handleCallback resolves the pending session from RelayState first, via a new cookie-free SessionService.getPendingSessionById, and only falls back to the OM_SESSION cookie when RelayState is absent/stale — so same-site deployments and in-flight logins keep working.
  • The pending id is single-use (a fresh active id is minted at activation) and its redirect URI is re-validated against the allowlist, so carrying it in RelayState is safe; a forged RelayState cannot escalate (the signed assertion governs identity).
  • A new keycloak-azure-saml-crosssite nightly variant fronts the IdP on 127.0.0.1 while OM stays on localhost — a different site (SameSite ignores port), so the callback POST is genuinely cross-site and the cookie drops. The existing localhost-only job is same-site and structurally cannot reproduce the bug.

Tests:

Use cases covered

  • SAML login via an external IdP whose callback POST is cross-site (browser drops the Lax OM_SESSION cookie) now completes instead of failing with "No pending session".
  • Same-site / in-flight logins still resolve through the cookie fallback.

Unit tests

  • Added.
  • SamlAuthServletHandlerTest (13/13 pass): handleLogin carries the pending id in RelayState; resolvePendingSession prefers RelayState (cookie-free) and falls back to the cookie.
  • SessionServiceTest (28/28 pass): getPendingSessionById resolves a PENDING session by id alone (no cookie/request) and rejects active/expired/unknown/malformed ids.

Backend integration tests

  • Not applicable (no API endpoint changes).

Ingestion integration tests

  • Not applicable (no ingestion changes).

Playwright (UI) tests

  • Added — keycloak-azure-saml-crosssite cross-site nightly variant (playwright-sso-login-nightly.yml, playwright/utils/sso-providers/index.ts).

Manual testing performed

  1. mvn -pl openmetadata-service test -Dtest=SamlAuthServletHandlerTest,SessionServiceTest — all pass.
  2. mvn -pl openmetadata-service spotless:check and actionlint on the workflow — clean.

UI screen recording / screenshots:

Not applicable — no UI changes; the only change under the UI tree is a Playwright test-config registration.

Checklist:

  • I have read the CONTRIBUTING document.

  • My PR title is Fixes <issue-number>: <short explanation>

  • My PR is linked to a GitHub issue via Fixes #28763 above.

  • I have commented on my code, particularly in hard-to-understand areas.

  • I have added tests (unit / Playwright as applicable) and listed them above.

  • I have added a test that covers the exact scenario we are fixing.

The SAML callback from the IdP is a cross-site POST, so the SameSite=Lax
OM_SESSION cookie set at login is dropped by the browser. The server-side
pending session introduced by #26314 can no longer be found on that callback,
and login fails with "No pending session" for any real external IdP on
non-secure transport. The localhost-only SSO nightly cannot reproduce this
because a co-located IdP is same-site.

Carry the pending-session id in the SAML RelayState (POST body, immune to
SameSite/transport rules), the same mechanism MCP-over-SAML already uses:
- handleLogin now issues auth.login(pendingSession.getId())
- handleCallback resolves the pending session from RelayState first via the new
  cookie-free SessionService.getPendingSessionById, falling back to the cookie
  for backward compatibility

Add a cross-site (127.0.0.1 IdP <-> localhost SP) nightly variant that drops
the Lax cookie so this regression cannot return. Unit tests cover the RelayState
wiring (SamlAuthServletHandlerTest) and the cookie-free lookup (SessionServiceTest).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@harshach harshach requested review from a team, akash-jain-10 and tutte as code owners June 5, 2026 13:42
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 63%
63.37% (67212/106057) 44.29% (36972/83474) 46.7% (11125/23818)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔴 Playwright Results — 1 failure(s), 9 flaky

✅ 4272 passed · ❌ 1 failed · 🟡 9 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 301 0 0 4
🔴 Shard 2 803 1 0 9
🟡 Shard 3 805 0 2 8
🟡 Shard 4 852 0 3 12
🟡 Shard 5 720 0 1 47
🟡 Shard 6 791 0 3 8

Genuine Failures (failed on all attempts)

Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="StatusBadgeTerm1780701248381"]').locator('.status-badge')
Expected: �[32m"In Review"�[39m
Received: �[31m"Draft"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="StatusBadgeTerm1780701248381"]').locator('.status-badge')�[22m
�[2m    19 × locator resolved to <div class="status-badge pending" data-testid=""PW%'065dfd1c.Bravebc3cdfb7".StatusBadgeTerm1780701248381-status">…</div>�[22m
�[2m       - unexpected value "Draft"�[22m

🟡 9 flaky test(s) (passed on retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/ExploreAggregationCountsMatching.spec.ts › should verify left panel counts and tab search results for normal search (shard 3, 1 retry)
  • Flow/NotificationAlerts.spec.ts › Multiple Filters Alert (shard 4, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Date (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Login.spec.ts › Refresh should work (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Adds a mock SAML 2.0 Identity Provider that mints signed SAML Responses which
OpenMetadata's onelogin Service Provider accepts, as the reusable building block
for SAML SSO integration tests. It is a pure assertion factory (no HTTP server),
since a non-browser test drives the ACS POST itself and simulates the cross-site
cookie drop by controlling the cookie jar.

Responses are signed with onelogin's own Util.addSign (RSA-SHA256) so there is no
algorithm/canonicalization mismatch with the validating SP. MockSamlIdpTest runs
the real SamlResponse.isValid path that SamlAuthServletHandler.handleCallback
uses, proving the mock's signed responses are accepted and that unsigned and
tampered responses are rejected. The keypair under test/resources/saml is a
self-signed test fixture only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… accessor

Address review on samlAuthCallback(): hoist the "/auth/callback" route literal
into AUTH_CALLBACK_PATH and extract a shared samlSp() accessor so samlAuthCallback()
and samlSpCallback() no longer duplicate the SamlConfiguration/getSp() null-guard.
Both stay single-return. Behavior-preserving; SamlAuthServletHandlerTest (15) green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 5, 2026

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 5, 2026

@harshach harshach merged commit 308db4a into main Jun 6, 2026
72 of 73 checks passed
@harshach harshach deleted the harshach/saml-relaystate-session-fix branch June 6, 2026 17:43
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 6, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Implements a cookie-free SAML pending-session lookup using RelayState to resolve cross-site authentication failures. The implementation addresses potential security concerns by validating the relay state and refactoring redundant SP configuration logic.

✅ 2 resolved
Security: RelayState pending-id is attacker-supplied, not browser-bound

📄 openmetadata-service/src/main/java/org/openmetadata/service/security/auth/SamlAuthServletHandler.java:377-387 📄 openmetadata-service/src/main/java/org/openmetadata/service/security/auth/SamlAuthServletHandler.java:345-358
resolvePendingSession now trusts the SAML RelayState (an unauthenticated, attacker-controllable POST field) to select the pending session, preferring it over the OM_SESSION cookie (SamlAuthServletHandler.java:377-387). Previously the pending session was bound to the victim's own browser via the cookie, which provided implicit login-CSRF protection. With RelayState carrying the id, an attacker who can drive an IdP assertion to the ACS could substitute their own pending-session id (and thus its redirect URI) into another user's callback.

The practical impact is mitigated and the change is reasonable: identity comes from the signed SAML assertion (the attacker cannot forge who is authenticated), the pending id is single-use with a fresh active id minted at activation (no session fixation), and the redirect URI is re-validated against the trusted allowlist before the token is appended (handleCallback line 350). So token exfiltration is only possible to already-allowlisted URIs.

This is informational rather than a defect: the residual risk is bounded by the strictness of trustedSamlRedirects(). Worth confirming that the allowlist is restrictive (only first-party OM frontends, no wildcards/open hosts), since that allowlist is now the sole control preventing a forged RelayState from redirecting a freshly minted token to an attacker-influenced destination.

Quality: Magic string and duplicated SP-config lookup in samlAuthCallback

📄 openmetadata-service/src/main/java/org/openmetadata/service/security/auth/SamlAuthServletHandler.java:656-670
The new samlAuthCallback() hardcodes the route literal "/auth/callback" (a magic string) and re-derives authConfig.getSamlConfiguration().getSp() with the same null-guard logic already present in samlSpCallback() (lines 683-689). Per the project standards (no magic strings, extract duplicated logic), consider hoisting the route into a named constant (e.g. AUTH_CALLBACK_PATH) and extracting a small samlSp() helper that both methods reuse. This is non-functional; behavior is correct and the derived origin is only added to the trusted-redirect allowlist, so it cannot loosen existing validation.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SAML SSO login fails with "No pending session" on cross-site IdP callback (regression from #26314)

1 participant