Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAML SLO support #45379

Conversation

andreas-kupries
Copy link
Contributor

@andreas-kupries andreas-kupries commented May 6, 2024

Issue:

See #38494

This work is co-dependent on the UI work tracked at rancher/dashboard#10941

Problem

SURE 3572

Solution

  1. Extended AuthConfig, SamlConfig with the proposed flags about SLO (supported, enabled, forced).
    1. Based on the CRD setup the supported flag might be nonsense.
    2. As in, cannot be set into the initial AuthConfig CR instances. UI may have to simply know that only the SAML providers support SLO, and none of the others.
  2. New structures SamlConfigLogoutInput, and ...Output. Same fields as the known SamlConfigTest... structures. Hold the request/response data from/to the UI for the logoutAll action (see below).
  3. The tokens API should export a new action logoutAll.
  4. Basic implemention of the logout flow. Compiles, untested.
  5. Linkage between token manager and saml to invoke the flow from the frontend

KNOWN ISSUES: Does not guard against call of regular logout when SLO is forced.
Does guard against forced but not enabled, and call to logout-all when not enabled.

Testing

Engineering Testing

Manual Testing

Automated Testing

  • Test types added/modified:
    • Unit
    • Integration (Go Framework)
    • Integration (v2prov Framework)
    • Validation (Go Framework)
    • Other - Explain: EXPLAIN
    • None
    • REMOVE NOT APPLICABLE BULLET POINTS ABOVE
  • If "None" - Reason: EXPLAIN THE REASON
  • If "None" - GH Issue/PR: LINK TO GH ISSUE/PR TO ADD TESTS

Summary: TODO

QA Testing Considerations

Regressions Considerations

TODO

Existing / newly added automated tests that provide evidence there are no regressions:

  • TODO

@andreas-kupries andreas-kupries changed the title Sure 3572 saml single logout Sure 3572 SAML SLO May 6, 2024
@andreas-kupries andreas-kupries changed the title Sure 3572 SAML SLO SAML SLO support May 6, 2024
@andreas-kupries andreas-kupries self-assigned this May 6, 2024
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 6788bc7 to 2d218c6 Compare May 6, 2024 12:55
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 4f7f7a6 to cbb7eda Compare June 4, 2024 13:07
@andreas-kupries
Copy link
Contributor Author

andreas-kupries commented Jun 12, 2024

Status report after a few days of working on implementing the thing:

The main visualization of the necessary workflow is taken from
http://docs.oasis-open.org/security/saml/Post2.0/sstc-saml-tech-overview-2.0-cd-02_html_m50a2ba3e.gif
The same is described at https://xacmlinfo.org/2013/06/28/how-saml2-single-logout-works/

The main take I get from both is that the SP who initiated the SLO is not notified by the IdP that the user was logged out.

Okta seems to follow that principle, based on what I experienced with it today. I.e. I see a plain POST to Rancher's .../saml/slo endpoint, and based on the fact that I can see it in the browser's debug console I believe that this was indeed done by redirection from the initial norman action (logoutAll). Unfortunately this redirect looks to have lost tall he Cookies used by the SAML code to store the flow state. I am unsure if that may be because the response contained in the redirect is fail, not a success. Although I cannot see how the browser would know that. Only thing I can think of is that Okta unsets these, maybe all, cookies. The failure is not due to a bogus user reference. Looking at the Okta Events the logout request is recorded as such, and its recognizes the user. It claims to have found an Invalid signature.

That is weird because the signature on the logout request should be generated through the same code as for SSO, and use the same cert/key, and SSO is, well, successful. Given my should there is doubt on that, mainly because different key and such is the only way I can see this failure to happen. The code is still the same, service provider name is same, so same SP structures, same key data.

Also, Keycloak looks to be ok with the signature of the request, its issue looks to be somewhere else.

So, going back to KC, which I struggled with before trialing Okta today it seems that this IdP wants to notify the initiating SP also, possibly directly from itself. The initial failures I saw with it said Unable to finish logout because XXX endpoint is not set. With an endpoint set for it the other issues I seem to have with KC look to me as me not properly processing/responding to that notification.

It should be noted that the XXX endpoint is generally needed, namely to handle IdP initiated logouts, i.e. when Rancher and other apps share the IdP and users, and a user in the other apps performs a SLO.

Thus, if I can implement this endpoint properly then it should be possible to use KC completely.


In terms of the flow image at the beginning, for KC I am stuck in implementing the steps 3/4,
whereas for Okta I reach step 5 (with 3/4 skipped), it is just a failure response, not the expected success.


This is the state of the work as of commit 582687e.

Thanks to @aalves08 for providing access to a remote Okta IdP to work with.

@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 582687e to 11ba86a Compare June 13, 2024 09:09
@richard-cox
Copy link
Member

@andreas-kupries At some point next week could you provide an linux/amd64 image of the changes in this PR? I'll be taking over from Alex whilst he's away and noticed he was using arm64

@andreas-kupries
Copy link
Contributor Author

Managed a full logout with KC now.

That said, after that the UI does a number of things I was unable to track, and then landed on

https://tagetarl:8005/auth/login?err=An%20error%20occurred%20logging%20in.%20Please%20try%20again.

in the end. In other words, there may still be something more to do UI side.

Side note: I have to keep the haering SAML package. The logout response is deflate-compressed. haering handles that ok. crewjam does not try to decompress, tries to read the compressed string as XML and then fails with a bad utf-8 error.

@andreas-kupries
Copy link
Contributor Author

andreas-kupries commented Jun 26, 2024

Status report: SLO works for Keycloak and OKTA now.

Copy link
Contributor

@crobby crobby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment. There seems to be quite a number of debug logs. Our debug logs overall in Rancher are super dense.
The answer may be, "yes", and that is fine, but have you considered removing any of these that might produce lots of output that isn't very high-value?

@slickwarren slickwarren removed their request for review June 26, 2024 17:16
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 1eced75 to 2a3bf9b Compare June 27, 2024 07:49
@andreas-kupries
Copy link
Contributor Author

General comment. There seems to be quite a number of debug logs.

I tend towards higher amount of logging by default.

Our debug logs overall in Rancher are super dense. The answer may be, "yes",
and that is fine, but have you considered removing any of these that might
produce lots of output that isn't very high-value?

Weeded a bit, result is in commit [2a3bf9b]

pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_provider.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_provider.go Show resolved Hide resolved
pkg/auth/providers/saml/saml_client.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_client.go Outdated Show resolved Hide resolved
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from d96fb79 to 720a90b Compare June 28, 2024 08:12
@crobby crobby self-requested a review June 28, 2024 10:31
@samjustus
Copy link
Collaborator

for posterity: security needs to finish pentesting before we can merge

@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch 5 times, most recently from 76abf31 to 1494d1b Compare July 19, 2024 09:06
@andreas-kupries
Copy link
Contributor Author

The security review has completed as ok and closed.
I have the go ahead to merge.

This is deferred until 2.10 is open, i.e. 2.9 is released.

@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch 4 times, most recently from 78b71d6 to ba44227 Compare July 25, 2024 11:19
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch 5 times, most recently from 2081bdb to b9addbc Compare August 5, 2024 08:21
@andreas-kupries andreas-kupries changed the base branch from release/v2.9 to main August 5, 2024 08:40
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch 2 times, most recently from 60560ae to 52e39cb Compare August 6, 2024 07:49
@samjustus samjustus removed the request for review from raulcabello August 6, 2024 13:22
@andreas-kupries
Copy link
Contributor Author

Relevant flaky test: #42248

…supported, enabled, forced.

added structures for logout request and response.
regenerated code and yaml
side work: documented nature of InitializeSamlServiceProvider
global logout interceptor callback
linked saml logout handler/backend with token manager frontend, via the interceptor callback
added guards against UI misbehaviour
register the new structures with the norman frame work to enable serialization to and from json
added handling of SLO responses.
inject slo support flag into the initial authconfigs
fix: extended flow state storage with ability to set the cookie path.
hardwired acs path is no good when redirecting to the slo endpoint.
set proper cookie paths wherever we have state setup
added request signing - applies only to logout requests
(i.e. auth requests are not signed, as before)
(because we apparently use the redirect binding despite the code saying POST)
chore: log cleanup, removed some, made some official
fix: comment typo in go.mod
redirected crewjam/saml to our rancher/saml for decompression fix.
fix: missing/different generated files
fix KC logout issues with local crewjam patch.
fix missing handling of detached signatures on responses
import crewjam fix providing proper detached sig on logout requests.
KC still ok.
OKTA still fails, but different - issuer mismatch, not invalid sig
address comment, drop todo note, crewjam/saml is forked, and fork used
address comment - reduce saml logging

Apply suggestions from code review
Co-authored-by: Paulo Gomes <paulo.gomes.uk@gmail.com>

address comments

Apply suggestions from code review
Co-authored-by: Paulo Gomes <paulo.gomes.uk@gmail.com>

fixup of partial change for principled extension of redirect url with error information
old code still needed for bad case (unparseable url)

implemented todo: Assert "action == logout" && !sloForced, guard against UI misbehaviour.
IOW, the UI will get an error now if it tries to perform a regular logout while the provider is configured for forced SLO, i.e. logout all as the only allowed method.
…Sprintf does not support error-wrapping directive %w`
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 341f6de to 3f0c1b7 Compare August 8, 2024 10:54
Copy link
Contributor

@bigkevmcd bigkevmcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this change, and I've been able to test it against Keycloak (with Backchannel logout URL).

But there is some tidying that could be done to make it easier for the next person to work on this?

pkg/apis/management.cattle.io/v3/authn_types.go Outdated Show resolved Hide resolved
pkg/apis/management.cattle.io/v3/authn_types.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_provider.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_provider.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Show resolved Hide resolved
pkg/auth/tokens/manager.go Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/data/authconfig_data.go Outdated Show resolved Hide resolved
@andreas-kupries andreas-kupries merged commit cc769e8 into rancher:main Aug 12, 2024
12 checks passed
@andreas-kupries andreas-kupries deleted the ak-38494-sure-3572-saml-single-logout branch August 12, 2024 12:29
@andreas-kupries
Copy link
Contributor Author

andreas-kupries commented Aug 15, 2024

Validation

Root Cause

A change request to support logging a User out of the session held by the configured external auth provider (EAP), and thus out of all applications, instead of just out of Rancher itself. This last meant that when logging back into Rancher the still-open session in the EAP allowed for quick login, without having to run through full authentication again. Confusing several users which expected to fully re-authenticate. Despite the notification on regular logout that the EAP session may be retained.

What was fixed, or what change have occurred

Several, not all, EAP now support an LogoutAll option and action.
The supporting EAP are the SAML variants on offer.

Note that the following changes are in the Dashboard, not in the Backend.

Supporting LogoutAll means that when such an EAP is configured and activated the Admin
can configure the checkboxes

  • LogoutAllEnabled and
  • LogoutAllForced.

Checking LogoutAllEnabled causes the UI to offer the user the choice between regular logout and logout all.
Additionally checking LogoutAllForced causes the UI to not offer regular logout anymore, only logout all.
And as the sole choice no actual choice is offered to the user. Logout is logout all.
Note that Forced cannot be checked if Enabled is not checked.

The backend sees these configuration flags as well and will react with errors should the dashboard try to

  • invoke logout all when not enabled.
  • invoke logout when logout all is forced.

Areas or cases that should be tested

  • Dashboard
    • EAP configuration
      • Allow configuration of logout all for SAML providers.
      • Do not allow configuration of logout all for any other providers.
    • Logout
      • Use plain logout, with no choice, when logout all is not enabled (SAML) or not supported (any else).
      • Offer choice (dialog, IIRC) between logout and logout all when logout all is enabled.
      • Use logout all, with no choice, when logout all is enabled and forced.

What areas could experience regressions

  • Dashboard logout.
  • EAP configuration.

Are the repro steps accurate/minimal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants