feat(backend): SAML SSO backend (schema, helpers, routes, admin CRUD)#1396
feat(backend): SAML SSO backend (schema, helpers, routes, admin CRUD)#1396BilalG1 wants to merge 20 commits intopr/saml-mock-idpfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Adds three tables to back per-user SAML accounts and the in-flight AuthnRequest temp store: - ProjectUserSamlAccount (mirrors ProjectUserOAuthAccount): one row per (tenancy, samlConnectionId, NameID). The unique constraint on (tenancyId, samlConnectionId, nameId) is what enforces multi-tenant connection isolation at the DB level — the same NameID from a different connection is treated as a distinct identity. - SamlAuthMethod (mirrors OAuthAuthMethod): connects an AuthMethod to a ProjectUserSamlAccount via composite FK. - SamlOuterInfo (mirrors OAuthOuterInfo): keyed by AuthnRequest ID so the ACS handler can look up the original context when the IdP POSTs the assertion back via the browser. ID is TEXT (not UUID) because SAML AuthnRequest IDs are XML xs:ID strings. Per-connection config (entity ID, IdP cert, ACS URL, attribute mapping, domain) is intentionally NOT a Prisma model — it lives in tenancy.config.auth.saml.connections JSON, matching how OAuth provider config (clientId/clientSecret) is stored.
Splits the email-merge strategy out of oauth.tsx into a small shared external-auth.tsx so the upcoming SAML ACS handler can reuse the same contact-channel lookup + link_method/raise_error/allow_duplicates switch without duplicating it. Also adds saml-account.tsx with the SAML-side parallel of OAuth's findExisting / link / create user-linking helpers, operating on ProjectUserSamlAccount and SamlAuthMethod. Each helper is keyed by (tenancyId, samlConnectionId, nameId), so a NameID arriving from a different connection is treated as a separate identity — connection isolation is enforced at the DB level. Schema strategy fallback: handleSamlEmailMergeStrategy reads tenancy.config.auth.saml.accountMergeStrategy if set, otherwise falls back to the OAuth strategy. The SAML config field will be added with the project config schema work. Adds @xmldom/xmldom and xpath as direct backend deps for the upcoming SAML protocol wrapper (currently transitive through @node-saml/node-saml).
Three modules under apps/backend/src/saml/:
- saml.tsx — buildSamlClient (per-connection SAML instance), build
AuthnRequestUrl (returns URL + extracted requestId for replay
protection), parseAndVerifyAssertion (signature + audience + clock-skew
+ InResponseTo are all enforced by node-saml), getSpMetadataXml.
Defines SamlConnectionConfig locally so the wrapper doesn't depend on
the project-config schema work.
- metadata-parser.tsx — pulls entityId, ssoUrl, and the signing X509
certificate out of pasted IdP metadata XML. Uses xmldom + xpath rather
than regex so it handles attribute-order variations across IdPs.
- discovery.tsx — email-domain to connection lookup for the
signInWithSso({ email }) flow. Iterates the project's connections and
returns the first whose `domain` matches.
The clock-skew tolerance is set to 60s, matching the e2e test matrix
item #16. The 'wantAssertionsSigned: true' default means an unsigned
assertion is rejected even if the response itself is signed — which is
the safer default per OWASP SAML guidance.
Two of the four planned SAML routes — the public-fetchable / read-only ones with no OAuth2-server integration: - GET /api/v1/auth/saml/metadata/[connection_id]?project_id=... SP metadata XML for the IdP admin to paste into their IdP console. Includes the project_id query param because connection IDs alone don't identify a tenancy (config lives in JSON, not a Prisma table). - GET /api/v1/auth/saml/discover?email=...&project_id=... Email-domain → connection lookup for the SDK's signInWithSso flow. Returns 404 (not 200 with null) when no connection matches so the SDK can fall back to other sign-in methods on status alone. login + acs routes are the next chunk. They need to mirror the OAuth callback's oauthServer.authorize integration so the customer's app receives a Stack Auth OAuth code on success — that's a meaningful copy-from-pattern job and is left for the next commit so it can be reviewed and tested in isolation.
Adds tenancy.config.auth.saml — mirrors the auth.oauth shape:
- branchAuthSchema gains saml.{accountMergeStrategy, connections}
with non-sensitive per-connection fields (displayName, allowSignIn,
domain). domain feeds /auth/saml/discover.
- environmentConfigSchema extends saml.connections with IdP-side
fields (idpEntityId, idpSsoUrl, idpCertificate, attributeMapping).
These belong at the environment level — different per IdP deployment
even though the cert is technically a public key — same way
oauth.providers splits clientId/clientSecret out of branch config.
- Defaults block adds an empty saml block; per-connection defaults set
allowSignIn=true and a placeholder displayName so partial configs
validate cleanly.
Also drops the temporary unknown-cast workaround in saml-account.tsx
(handleSamlEmailMergeStrategy) and updates the metadata + discover
routes to construct SamlConnectionConfig from the typed config record
(injecting the connection ID since it's stored as the record key).
Adds matching coverage in schema-fuzzer.test.ts so the fuzzed config
shape includes a sample SAML connection.
Two routes that complete the SAML SP-initiated round trip: - GET /api/v1/auth/saml/login/[connection_id] Receives the same Stack Auth OAuth client params as /auth/oauth/authorize (client_id, redirect_uri, scope, state, etc.), builds an AuthnRequest, persists the OAuth context + AuthnRequest ID in SamlOuterInfo, sets a CSRF cookie keyed to the request ID, and redirects to the IdP. Honors stack_response_mode=json so the SDK can intercept programmatically. V1 scope: SP-initiated only, no signed AuthnRequests, no link/upgrade flow. - POST /api/v1/auth/saml/acs/[connection_id] Receives the IdP's POST. Parses InResponseTo from the response WITHOUT verifying the signature, looks up SamlOuterInfo to recover tenancy/connection (this is necessary because the connection ID alone doesn't index a tenancy in the JSON-config storage model). Validates CSRF cookie, then runs node-saml's full validatePostResponseAsync (signature + audience + clock skew + InResponseTo). Defense-in-depth re-checks InResponseTo and cross-connection mismatch (the latter handles 'assertion sent to the wrong ACS endpoint' forgery, e2e test #10). On success, runs find-existing / link / create via the saml-account.tsx helpers, then hands off to oauthServer.authorize so Stack Auth issues a customer-facing OAuth code (mirrors the oauth/callback pattern). Deletes SamlOuterInfo at the end for replay protection. Adds extractInResponseTo helper to saml/saml.tsx for the pre-validation parse described above. Routes typecheck and lint clean. Runtime untested — needs the e2e test matrix (task #15) to exercise the round-trip end-to-end against the mock IdP.
Three endpoints under /api/v1/saml-connections — admin-only thin REST wrappers around the JSON-config storage so the dashboard SSO pages don't compose key paths manually: - GET /saml-connections list all (omits cert) - POST /saml-connections upsert by id - DELETE /saml-connections delete by id - GET /saml-connections/[id] full detail (includes cert) User accounts linked via a deleted connection remain in the DB; they just become unable to sign in until a connection with the same id is recreated. (Dashboard delete UX should warn on this.) Underlying storage is the same overrideEnvironmentConfigOverride / resetEnvironmentConfigOverrideKeys flow used by the seed script and the e2e tests, so behavior is identical across all surfaces.
Two bugs surfaced when running the SAML e2e suite against the live backend (in a separate PR): 1. Routes accessed `tenancy.config.auth.saml.connections[id].field` without first checking that the entry exists. With strict null checks off, TS types this as always-defined and the route 500'd with a TypeError on missing connections instead of returning 404. Add an explicit `id in connections` guard at the top of each route (login, acs, metadata). 2. SAML responses signed at the Response element (samlify default, also what Okta + Azure AD emit) failed verification because the backend was configured with wantAssertionsSigned=true, wantAuthnResponseSigned=false — i.e. demanded an Assertion-level signature. Per SAML 2.0 §4.1.4.2 either is valid. Flip to wantAuthnResponseSigned=true so we accept what real-world IdPs actually send.
e60f439 to
2d1db7a
Compare
Discover is the entry point for the SDK's signInWithSso({ email }) flow.
Previously it returned every connection whose domain matched, including
ones the project admin had disabled (`allowSignIn: false`). The SDK
would then send the user through /auth/saml/login, which intentionally
403s for disabled connections — so disabling a connection was a sharp
UX cliff: domain match → branded "Sign in with Acme SSO" CTA → 403.
Treat disabled connections as if they didn't exist for discovery
purposes. Direct sign-in via signInWithSaml({ connectionId }) is still
gated separately in the login route, which is the right place for
"intentional, explicit access by ID."
The implementation in client-app-impl.ts already defines signInWithSaml, signInWithSso, and getSamlConnectionForEmail, but the public StackClientApp interface in template/.../client-app.ts didn't list them. TypeScript users got no autocomplete and no type-checking for the methods, and the regenerated js/react/stack packages inherited the gap. Add the three signatures to the interface near signInWithOAuth. Run \`pnpm -w run generate-sdks\` to propagate to the downstream packages (js/react/stack). Also add a saml-sso → Building2 mapping to the docs APP_ICONS map (broke when the type added "saml-sso" to AppId). Tests: - New regression test in discover.test.ts confirming the new allowSignIn=false filter (#1396 fix) returns 404, so a disabled connection no longer leaks into the signInWithSso flow.
…hardening - Pin SP origin to NEXT_PUBLIC_STACK_API_URL across metadata, login, and ACS so SP entityId / audience match the value the IdP signs assertions against. Login no longer derives baseUrl from `redirect_uri`; metadata no longer derives it from `req.url`. - Discover and metadata accept an optional `branch_id` query param so non-default branches resolve the same tenancy login uses (login parses branch from `client_id` = `projectId#branchId`). - linkSamlAccountToUser wraps both writes in retryTransaction so a partial failure can't leave a sign-in-enabled SamlAccount with no AuthMethod. - saml-connections POST writes the whole nested record entry on first create (dotting into a missing parent is silently dropped by the config layer); updates keep dot-paths so optional fields aren't wiped. - Restrict NameID-as-email fallback to `nameid-format:emailAddress` so opaque persistent NameIDs don't become verified user emails. - Reject POST-only IdP metadata with a clear V1 error (we only emit HTTP-Redirect AuthnRequests). - ACS re-checks `allowSignIn === false` to close the 10-minute mid-flow disable window.
Greptile SummaryThis PR adds the full SAML 2.0 SP-initiated SSO backend: three Prisma models (
Confidence Score: 3/5Safe to merge for greenfield SAML deployments, but contains a latent P1 that will surface in production when IdPs change NameIDs One P1 defect (NameID change → 500 + user lockout) pulls the score to a ceiling of 4; multiple prior P1s addressed in this revision plus a large new untested surface area warrants a 3 apps/backend/src/lib/saml-account.tsx (linkSamlAccountToUser unique constraint), apps/backend/src/app/api/latest/auth/saml/acs/[connection_id]/route.tsx (sign-up rule tagging) Important Files Changed
Sequence DiagramsequenceDiagram
participant SDK as Client SDK
participant Login as /auth/saml/login
participant DB as SamlOuterInfo table
participant IdP as Identity Provider
participant ACS as /auth/saml/acs
participant UserDB as User/Account DB
participant OAuthSrv as oauthServer
SDK->>Login: GET login params
Login->>DB: INSERT pending row (requestId, TTL)
Login->>Login: set CSRF cookie SameSite=None in prod
Login-->>SDK: redirect to IdP
SDK->>IdP: AuthnRequest
IdP-->>SDK: POST SAMLResponse back to ACS
SDK->>ACS: POST SAMLResponse + CSRF cookie
ACS->>DB: atomic DELETE row by InResponseTo
ACS->>ACS: verify CSRF + expiry + connection match
ACS->>ACS: parseAndVerifyAssertion sig audience clock skew
ACS->>UserDB: findExistingSamlAccount
alt account found
UserDB-->>ACS: existing userId
else no account - email merge
ACS->>UserDB: linkSamlAccountToUser OR createSamlUserAndAccount
end
ACS->>OAuthSrv: authorize issue OAuth code for SDK
OAuthSrv-->>ACS: redirect to redirect_uri with code
ACS-->>SDK: 303 to redirect_uri
|
…ion lookups - ACS now deletes the SamlOuterInfo row up-front (right after extracting InResponseTo) and runs the rest of the flow against the consumed record. This closes the replay race where two concurrent SAMLResponse POSTs could both pass a findUnique check before either reached the post-success delete and each issue an OAuth code. If downstream verification fails, the client must restart the flow. - Replace `id in tenancy.config.auth.saml.connections` with `has(...)` from utils/objects across all six SAML route call sites. The `in` operator walks the prototype chain, so `__proto__` (which the body schema's regex actually allows) would pass the guard and surface internal config paths in errors.
Mirrors linkSamlAccountToUser. Without a transaction, a failure on the ProjectUserSamlAccount create left an orphaned bare AuthMethod tied to the freshly-created user. The parameter type also claimed PrismaClientTransaction while the ACS caller passes a regular client.
- Reject duplicate-domain SAML connections in admin POST so /auth/saml/discover stays deterministic; correct the discovery doc comment that claimed a non-existent DB invariant.
- Set SameSite=None+Secure on stack-saml-inner cookie in non-dev so the IdP cross-site POST to ACS doesn't drop it; keep Lax in dev where Secure isn't available over HTTP.
- Pass idpIssuer to node-saml so assertions issued under a different IdP entity (even with a matching signing cert) are rejected.
- Tighten the ACS body schema to { SAMLResponse, RelayState? } and wrap base64/XML decode failures as BadRequest.
- Make ProjectUserSamlAccount.projectUserId NOT NULL — no code path created orphan rows; drop the runtime ?? throwAssertion fallback and the dead getProjectUserIdFromSamlAccount helper.
|
closing for now, handling this post launch |
Second of 3 stacked PRs adding SAML 2.0 SSO. Stacked on: #1395 (mock IdP). Stacked above: #1397 (client + tests).
What this PR adds
The backend half of SAML SSO. Purely server-side — client SDK methods, dashboard page, demo page, and e2e tests live in the next PR (which depends on this one).
Database
Connection config storage
Per-connection config (entity ID, IdP cert, ACS URL, attribute mapping, domain) is stored as JSON in `tenancy.config.auth.saml.connections` — same pattern as `auth.oauth.providers`. No dedicated SamlConnection Prisma model. Connection-isolation invariant lives at the per-user table level.
Schema
Helpers
Protocol wrapper
Routes
Under `apps/backend/src/app/api/latest/auth/saml/`:
ACS hands off to `oauthServer.authorize()` so Stack Auth issues a customer-facing OAuth code (Stack Auth itself acts as an OAuth2 provider to the customer's SDK — same pattern as the existing OAuth callback handler at `oauth/callback/[provider_id]`).
Admin CRUD
Out-of-scope (planned follow-ups)
Test plan