feat(sso): self-service SSO config with DNS-verified domains#13507
feat(sso): self-service SSO config with DNS-verified domains#13507marksalpeter merged 21 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fication Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
… support endpoint Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f660efd to
8846d48
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rise baseUrl Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…form Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… provider Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
…etes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… discovery Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…exists Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@claude review |
| // Microsoft's documented multi-tenant tenantId values. Discovery for these | ||
| // returns `issuer` with the literal `{tenantid}` placeholder string instead | ||
| // of the configured tenantId — the actual tenant is bound at token-issuance | ||
| // time per user's home tenant. NextAuth's AzureADProvider handles this at | ||
| // sign-in, so save-time we compare against the placeholder rather than the | ||
| // configured value to avoid blocking legitimate multi-tenant configurations. | ||
| const AZURE_AD_MULTI_TENANT = new Set(["common", "organizations", "consumers"]); | ||
| const AZURE_AD_TENANT_PLACEHOLDER = "{tenantid}"; | ||
|
|
||
| function expectedReturnedIssuer( | ||
| payload: SsoProviderSchema, | ||
| trimmedIssuer: string, | ||
| ): string { | ||
| if ( | ||
| payload.authProvider === "azure-ad" && | ||
| payload.authConfig && | ||
| AZURE_AD_MULTI_TENANT.has(payload.authConfig.tenantId) | ||
| ) { | ||
| return trimmedIssuer.replace( | ||
| `/${payload.authConfig.tenantId}/`, | ||
| `/${AZURE_AD_TENANT_PLACEHOLDER}/`, | ||
| ); | ||
| } | ||
| return trimmedIssuer; | ||
| } |
There was a problem hiding this comment.
🟡 The new AZURE_AD_MULTI_TENANT set at validateSsoConfig.ts:59 includes "consumers" alongside "common" and "organizations", but Microsoft only emits the {tenantid} placeholder in the discovery doc for the two genuinely multi-tenant endpoints — https://login.microsoftonline.com/consumers/v2.0/.well-known/openid-configuration returns the literal MSA tenant GUID 9188040d-6c67-4c5b-b112-36a304b66dad, because consumers is a single-tenant alias for the fixed personal-accounts tenant. So a save with tenantId: "consumers" always fails the issuer-equality check at line 148 with PRECONDITION_FAILED. Niche case, but easy to fix: drop consumers from the set and special-case it to compare against the literal MSA GUID (or map consumers → 9188040d-6c67-4c5b-b112-36a304b66dad in expectedReturnedIssuer).
Extended reasoning...
What the bug is
The newly-added AZURE_AD_MULTI_TENANT set at web/src/ee/features/multi-tenant-sso/validateSsoConfig.ts:59 lumps three tenantId values together:
const AZURE_AD_MULTI_TENANT = new Set(["common", "organizations", "consumers"]);The accompanying comment claims discovery for all three "returns issuer with the literal {tenantid} placeholder string instead of the configured tenantId." That assumption is correct for common and organizations, but not for consumers. Microsoft documents consumers as a single-tenant alias for the Microsoft Personal Account (MSA) tenant — it is not multi-tenant; the tenant is fixed and known. Its discovery doc therefore reports the actual MSA GUID, not the placeholder.
Empirical verification
The discovery endpoints behave as follows (each verifier independently confirmed against the live Microsoft endpoints):
GET https://login.microsoftonline.com/common/v2.0/.well-known/openid-configuration→"issuer": "https://login.microsoftonline.com/{tenantid}/v2.0"(placeholder)GET https://login.microsoftonline.com/organizations/v2.0/.well-known/openid-configuration→"issuer": "https://login.microsoftonline.com/{tenantid}/v2.0"(placeholder)GET https://login.microsoftonline.com/consumers/v2.0/.well-known/openid-configuration→"issuer": "https://login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0"(literal MSA GUID)
Step-by-step proof
- Org admin opens Configure SSO for a verified domain, picks Azure AD / Entra ID, enters
tenantId: "consumers"(a documented Microsoft value, intended to scope the app to personal Microsoft accounts only). SsoProviderSchema.safeParsesucceeds —consumerspasses.min(1).validateSsoConfigruns.discoveryIssuerForbuildshttps://login.microsoftonline.com/consumers/v2.0.fetch(".../consumers/v2.0/.well-known/openid-configuration")returns 200 withissuer = "https://login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0".expectedReturnedIssuer(lines 62-77) seesconsumersis inAZURE_AD_MULTI_TENANTand substitutes it with the placeholder, producinghttps://login.microsoftonline.com/{tenantid}/v2.0.- The equality check at line 148 (
returnedIssuer !== expectedIssuer) compareshttps://login.microsoftonline.com/9188040d-.../v2.0againsthttps://login.microsoftonline.com/{tenantid}/v2.0— mismatch — and throwsPRECONDITION_FAILEDwithreported issuer "..." but we expected "...". - The save is blocked. Since the legacy
createNewSsoConfigHandlernow also callsvalidateSsoConfig(per this PR), even the support-engineer backdoor is blocked for this configuration.
Why existing safeguards do not prevent it
- The schema only enforces
.min(1)ontenantId, so the string"consumers"passes. expectedReturnedIssueractively mapsconsumersto the placeholder, which is the wrong direction — Microsoft sends the GUID, not the placeholder, for this endpoint.- The new
ssoConfigRouter.servertest.tstests the multi-tenant placeholder path withtenantId: "common"only; theconsumersvariant is not exercised.
Impact
Bounded but real. Configuring an org-level SSO restricted to MSA personal accounts is unusual (per-domain SSO doesnt naturally pair with personal-only sign-in), and the admin can work around by entering the literal GUID 9188040d-6c67-4c5b-b112-36a304b66dad. But the codes claim about Microsofts behavior is empirically wrong for consumers, so the comment and the set are out of sync with reality, and the legacy support path regresses in the same spot.
Fix
Drop consumers from AZURE_AD_MULTI_TENANT and either (a) map consumers to the literal GUID before the equality check, or (b) special-case consumers to compare against https://login.microsoftonline.com/9188040d-6c67-4c5b-b112-36a304b66dad/v2.0. Option (a) keeps the placeholder-substitution path for the genuinely multi-tenant case and adds a single mapping for consumers:
const AZURE_AD_MULTI_TENANT = new Set(["common", "organizations"]);
const MSA_TENANT_GUID = "9188040d-6c67-4c5b-b112-36a304b66dad";
function expectedReturnedIssuer(payload, trimmedIssuer) {
if (payload.authProvider !== "azure-ad" || !payload.authConfig) return trimmedIssuer;
const t = payload.authConfig.tenantId;
if (AZURE_AD_MULTI_TENANT.has(t)) {
return trimmedIssuer.replace(`/${t}/`, `/${AZURE_AD_TENANT_PLACEHOLDER}/`);
}
if (t === "consumers") {
return trimmedIssuer.replace(`/consumers/`, `/${MSA_TENANT_GUID}/`);
}
return trimmedIssuer;
}A short test mirroring the existing tenantId: "common" case but with consumers and the literal MSA-GUID issuer in the mocked discovery doc would lock the fix in.
There was a problem hiding this comment.
Hmm. This is so specific. I will have to deep dive potentially.
| // (enforced via a partial unique index added in the migration, since | ||
| // Prisma cannot express partial uniqueness in the schema DSL). | ||
| @@unique([organizationId, domain]) | ||
| @@index([organizationId]) |
There was a problem hiding this comment.
This should be redundant as it's already covered by the unique index above.
There was a problem hiding this comment.
I might change this if I have time. Its a small table, so not a huge deal, but you're right about the double index. My mistake.
|
|
||
| // Domain ownership claim for an organization, verified via DNS TXT record. | ||
| // Required before an SsoConfig can be created for that domain. | ||
| model VerifiedDomain { |
There was a problem hiding this comment.
Would it be possible to have a "loose" FK relationship to the ssoConfigs? That way, if an organization is deleted, we could delete the associated domain enforcement as well.
There was a problem hiding this comment.
Cascading deletes for SsoConfig would be nice, but its unclear how to implement, so I'm sidestepping the issue for now.
|
|
||
| // Pending claims are shareable across orgs; only verified claims are | ||
| // exclusive. Block create only when another org has already verified | ||
| // the domain — otherwise a hobby-plan org could squat a global slot |
There was a problem hiding this comment.
This should only be available on cloud:team and cloud:enterprise given the entitlement. IMO still a valid check and assumption!
| // Race protection: two parallel creates from the same org or two orgs | ||
| // racing each other after a verified row was just created. The | ||
| // (organizationId, domain) unique index and the partial index on | ||
| // verified rows both surface as Prisma P2002. |
There was a problem hiding this comment.
You could also consider a prisma transcation here with Serializable isolation. IMO both should be valid to protect against a race condition here.
Summary
Org admins on Langfuse Cloud have no self-service path to configure SSO — the only path today is a Langfuse support engineer hitting
/api/auth/add-sso-configwith theADMIN_API_KEY, and the Org Settings → SSO tab is a placeholder that opens the support drawer.To achieve this we:
VerifiedDomainmodel + tRPC router so orgs DNS-verify domain ownership (TXT record) before any SSO config can be created for that domain — closes the cross-tenant hijacking footgun.ssoConfigtRPC router (get/save/delete) that gates writes on a verified domain for the caller's org, encryptsclientSecretat rest, and emits before/after audit-log entries with the secret masked.saveagainst the IdP's OIDC discovery doc — fetches<issuer>/.well-known/openid-configuration(or the equivalent for Azure AD / Google), validates the required fields and that the returnedissuermatches what was configured. Catches mistyped issuer URLs and unreachable IdPs at save time instead of locking out users at first sign-in.issuervalidation to requirehttps://across every provider (also covers GitHub Enterprise'sbaseUrl) — catches a real footgun where Auth0/etc. issuers without a scheme broke at sign-in time instead of save time.Note: GitHub and GitHub Enterprise are OAuth2-only with no
.well-knownendpoint, so the discovery pre-flight is skipped for those — misconfiguration there still surfaces at first sign-in. Cross-pod SSO config cache propagation lag (the "active within 1 hour" copy in the confirmation dialog) is tracked in LFE-9662.Fixes LFE-9126
Verification
pnpm --filter web run typecheckpnpm --filter web run lintpnpm --filter web run test -- verifiedDomainRouter.servertest(15/15)pnpm --filter web run test -- ssoConfigRouter.servertest(22/22, including 8 new discovery tests)Test plan
_langfuse-verification) and copyable token → admin adds DNS record → Verify → row marks Verified.clientSecretalways required → Save replaces atomically.acme.us.auth0.com(missinghttps://) at validation time, not at sign-in.organization:update) gets FORBIDDEN on every mutation.Disclaimer: Experimental PR review
Greptile Summary
This PR introduces self-service SSO configuration for org admins on Langfuse Cloud, gated behind DNS-verified domain ownership. The implementation adds a
VerifiedDomainmodel/router, a newssoConfigRouter(get/save/delete) withclientSecretencryption and OIDC discovery pre-flight, and replaces the placeholder SSO settings page with a full domain-list + config form UI across 13 providers.VerifiedDomaintable and tRPC router let org admins prove domain ownership via a DNS TXT record before any SSO config can be written; the Prisma migration adds appropriate unique and FK indexes.ssoConfigRouter.savecorrectly requires a verified-domain claim, pre-flights the IdP's OIDC discovery document, encryptsclientSecretat rest, and emits masked audit logs; the delete handler's missingverifiedAtguard and the orphaned-SsoConfigproblem onVerifiedDomaindeletion were flagged in a prior review cycle.customOIDC provider form path is silently broken:CustomProviderSchemarequiresauthConfig.namebutbuildSsoPayloadnever includes it, so the Zod parse inhandleConfirmfails andform.setErrormaps the error to a non-existent field — the user receives no feedback and the mutation is never called.Confidence Score: 4/5
Safe to merge once the Custom OIDC form path is fixed — without it, custom provider saves silently fail in the UI with no user feedback.
The
customOIDC provider form never collects or sends theauthConfig.namefield thatCustomProviderSchemarequires; the resulting Zod validation failure inhandleConfirmsets an error on a non-existent form field, silently swallowing the failure. This makes custom provider configuration completely non-functional through the UI. All other providers work correctly, the server-side security model is sound, encryption and audit logging are well-implemented, and the DNS-verification flow is solid.web/src/ee/features/sso-settings/components/SSOSettings.tsx— thebuildSsoPayloadfunction and surrounding form code need anamefield added for thecustomprovider case.Security Review
validateSsoConfig.ts):fetchis called withredirect: \"follow\"against a user-supplied OIDC issuer URL. A redirect chain to an internal address (e.g. instance-metadata endpoints) will be silently followed; the issuer-equality check prevents body exfiltration but internal reachability is still leaked via error message differences. Switching toredirect: \"error\"closes this with no functional downside.Important Files Changed
buildSsoPayloadomits requirednamefield for thecustomprovider, causing silent save failures;allowDangerousEmailAccountLinkingis hardcoded totrue(flagged previously).savecorrectly gates onverifiedAtand encryptsclientSecret;deleteonly requires anyVerifiedDomainclaim (not verified) — already flagged in prior review;maskAuthConfigconsistently strips secret before returning or logging.createand no entitlement guard on mutations flagged;deleteleaves associatedSsoConfigas an orphan (already flagged).redirect: "follow"allows the server to follow user-supplied redirects to internal endpoints — recommendredirect: "error".verified_domainstable with appropriate unique index ondomain,organization_idFK with CASCADE delete, and index onorganization_id; no issues.oidcIssuerZod validator enforcinghttps://prefix across all provider schemas; all 13 providers correctly defined; no issues.dns.Resolverusing Cloudflare/Google public resolvers; timeout and retry configured; no issues.Sequence Diagram
sequenceDiagram participant Admin participant UI participant verifiedDomainRouter participant ssoConfigRouter participant validateSsoConfig participant DNS participant IdP Admin->>UI: Add domain UI->>verifiedDomainRouter: "create({ orgId, domain })" verifiedDomainRouter-->>UI: "{ recordHost, recordValue }" Admin->>DNS: Add TXT record Admin->>UI: Click Verify UI->>verifiedDomainRouter: "verify({ orgId, id })" verifiedDomainRouter->>DNS: resolveTxtFresh(_langfuse-verification.domain) DNS-->>verifiedDomainRouter: TXT records verifiedDomainRouter-->>UI: "{ verifiedAt }" Admin->>UI: Configure SSO (provider + credentials) UI->>ssoConfigRouter: "save({ orgId, payload })" ssoConfigRouter->>ssoConfigRouter: "check verifiedAt != null" ssoConfigRouter->>validateSsoConfig: validateSsoConfig(payload) validateSsoConfig->>IdP: GET issuer/.well-known/openid-configuration IdP-->>validateSsoConfig: discovery doc validateSsoConfig-->>ssoConfigRouter: ok / throws PRECONDITION_FAILED ssoConfigRouter->>ssoConfigRouter: encrypt(clientSecret) ssoConfigRouter->>ssoConfigRouter: upsert SsoConfig + auditLog ssoConfigRouter-->>UI: saved row (secret stripped)Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "feat(sso): pre-flight OIDC discovery on ..." | Re-trigger Greptile