Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAML SLO support #45379

Open
wants to merge 1 commit into
base: release/v2.9
Choose a base branch
from

Conversation

andreas-kupries
Copy link
Contributor

@andreas-kupries andreas-kupries commented May 6, 2024

Issue:

Fix #38494

This work is co-dependent on the UI work tracked at rancher/dashboard#10941

Problem

SURE 3572

Solution

PR holds work checkpoints at the moment. Not in a merge-able state.

  1. Extended AuthConfig, SamlConfig with the proposed flags about SLO (supported, enabled, forced).
    1. Based on the CRD setup the supported flag might be nonsense.
    2. As in, cannot be set into the initial AuthConfig CR instances. UI may have to simply know that only the SAML providers support SLO, and none of the others.
  2. New structures SamlConfigLogoutInput, and ...Output. Same fields as the known SamlConfigTest... structures. Hold the request/response data from/to the UI for the logoutAll action (see below).
  3. The tokens API should export a new action logoutAll.
  4. Basic implemention of the logout flow. Compiles, untested.
  5. Linkage between token manager and saml to invoke the flow from the frontend

KNOWN ISSUES: Does not guard against call of regular logout when SLO is forced.
Does guard against forced but not enabled, and call to logout-all when not enabled.

Testing

Engineering Testing

Manual Testing

Automated Testing

  • Test types added/modified:
    • Unit
    • Integration (Go Framework)
    • Integration (v2prov Framework)
    • Validation (Go Framework)
    • Other - Explain: EXPLAIN
    • None
    • REMOVE NOT APPLICABLE BULLET POINTS ABOVE
  • If "None" - Reason: EXPLAIN THE REASON
  • If "None" - GH Issue/PR: LINK TO GH ISSUE/PR TO ADD TESTS

Summary: TODO

QA Testing Considerations

Regressions Considerations

TODO

Existing / newly added automated tests that provide evidence there are no regressions:

  • TODO

@andreas-kupries andreas-kupries changed the title Sure 3572 saml single logout Sure 3572 SAML SLO May 6, 2024
@andreas-kupries andreas-kupries changed the title Sure 3572 SAML SLO SAML SLO support May 6, 2024
@andreas-kupries andreas-kupries self-assigned this May 6, 2024
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 6788bc7 to 2d218c6 Compare May 6, 2024 12:55
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 4f7f7a6 to cbb7eda Compare June 4, 2024 13:07
@andreas-kupries
Copy link
Contributor Author

andreas-kupries commented Jun 12, 2024

Status report after a few days of working on implementing the thing:

The main visualization of the necessary workflow is taken from
http://docs.oasis-open.org/security/saml/Post2.0/sstc-saml-tech-overview-2.0-cd-02_html_m50a2ba3e.gif
The same is described at https://xacmlinfo.org/2013/06/28/how-saml2-single-logout-works/

The main take I get from both is that the SP who initiated the SLO is not notified by the IdP that the user was logged out.

Okta seems to follow that principle, based on what I experienced with it today. I.e. I see a plain POST to Rancher's .../saml/slo endpoint, and based on the fact that I can see it in the browser's debug console I believe that this was indeed done by redirection from the initial norman action (logoutAll). Unfortunately this redirect looks to have lost tall he Cookies used by the SAML code to store the flow state. I am unsure if that may be because the response contained in the redirect is fail, not a success. Although I cannot see how the browser would know that. Only thing I can think of is that Okta unsets these, maybe all, cookies. The failure is not due to a bogus user reference. Looking at the Okta Events the logout request is recorded as such, and its recognizes the user. It claims to have found an Invalid signature.

That is weird because the signature on the logout request should be generated through the same code as for SSO, and use the same cert/key, and SSO is, well, successful. Given my should there is doubt on that, mainly because different key and such is the only way I can see this failure to happen. The code is still the same, service provider name is same, so same SP structures, same key data.

Also, Keycloak looks to be ok with the signature of the request, its issue looks to be somewhere else.

So, going back to KC, which I struggled with before trialing Okta today it seems that this IdP wants to notify the initiating SP also, possibly directly from itself. The initial failures I saw with it said Unable to finish logout because XXX endpoint is not set. With an endpoint set for it the other issues I seem to have with KC look to me as me not properly processing/responding to that notification.

It should be noted that the XXX endpoint is generally needed, namely to handle IdP initiated logouts, i.e. when Rancher and other apps share the IdP and users, and a user in the other apps performs a SLO.

Thus, if I can implement this endpoint properly then it should be possible to use KC completely.


In terms of the flow image at the beginning, for KC I am stuck in implementing the steps 3/4,
whereas for Okta I reach step 5 (with 3/4 skipped), it is just a failure response, not the expected success.


This is the state of the work as of commit 582687e.

Thanks to @aalves08 for providing access to a remote Okta IdP to work with.

@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 582687e to 11ba86a Compare June 13, 2024 09:09
@richard-cox
Copy link
Member

@andreas-kupries At some point next week could you provide an linux/amd64 image of the changes in this PR? I'll be taking over from Alex whilst he's away and noticed he was using arm64

@andreas-kupries
Copy link
Contributor Author

Managed a full logout with KC now.

That said, after that the UI does a number of things I was unable to track, and then landed on

https://tagetarl:8005/auth/login?err=An%20error%20occurred%20logging%20in.%20Please%20try%20again.

in the end. In other words, there may still be something more to do UI side.

Side note: I have to keep the haering SAML package. The logout response is deflate-compressed. haering handles that ok. crewjam does not try to decompress, tries to read the compressed string as XML and then fails with a bad utf-8 error.

@andreas-kupries
Copy link
Contributor Author

andreas-kupries commented Jun 26, 2024

Status report: SLO works for Keycloak and OKTA now.

Copy link
Contributor

@crobby crobby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment. There seems to be quite a number of debug logs. Our debug logs overall in Rancher are super dense.
The answer may be, "yes", and that is fine, but have you considered removing any of these that might produce lots of output that isn't very high-value?

@slickwarren slickwarren removed their request for review June 26, 2024 17:16
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 1eced75 to 2a3bf9b Compare June 27, 2024 07:49
@andreas-kupries
Copy link
Contributor Author

General comment. There seems to be quite a number of debug logs.

I tend towards higher amount of logging by default.

Our debug logs overall in Rancher are super dense. The answer may be, "yes",
and that is fine, but have you considered removing any of these that might
produce lots of output that isn't very high-value?

Weeded a bit, result is in commit [2a3bf9b]

pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/tokens/manager.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_provider.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_provider.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_client.go Outdated Show resolved Hide resolved
pkg/auth/providers/saml/saml_client.go Outdated Show resolved Hide resolved
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from d96fb79 to 720a90b Compare June 28, 2024 08:12
@crobby crobby self-requested a review June 28, 2024 10:31
@@ -6,8 +6,9 @@ toolchain go1.22.3

replace (
github.com/containerd/containerd => github.com/containerd/containerd v1.6.27 // for compatibilty with docker 20.10.x
github.com/crewjam/saml => github.com/rancher/saml v0.4.14-rancher3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is our long term plan regarding this fork, please? Was this validated with the Architecture team? Just asking to understand our goals and if we plan to eventually go back to the original upstream version or not.

I see in our fork that we added some fixes and open (not merged) upstream PRs. Are we working with upstream to get those merged?

My concern is that this is a very sensitive library in terms of security, and it already had 5 CVEs - https://github.com/crewjam/saml/security . By going with the fork we might lose track of future CVE fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we working with upstream to get those merged?

The PRs for the fixes we use look to have been ignored for more than a year, both.
And the project has lots of other open PRs which are not attended to.
It feels as if the entire project has fallen out of maintenance.
As such I am unsure how we could work with upstream to get these merged.

What is our long term plan regarding this fork, please?

Personally I would push for us to search for a better maintained replacement.

Was this validated with the Architecture team?

Who should I talk to ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattfarina WDYT about the situation described by Andreas, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreas-kupries I have a couple questions:

  1. Do we have a documented list of changes we are carrying and the reason for each of them? I ask because 1) I am looking to understand the situation and 2) I've run into problems with other forks where we had lost that information and no longer knew.
  2. Do we know of any alternative saml packages that are well maintained and stable? I suspect the answer is "no" because we have not looked but I wanted to ask. Just in case.

Copy link
Contributor Author

@andreas-kupries andreas-kupries Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no documented list of changes, except through the commit messages in the fork.
I am ok with writing such a list. Where should it go ? (Just) in the forked repo, or elsewhere too ?

I know of https://pkg.go.dev/github.com/russellhaering/gosaml2 because during the search for the OKTA/KC issues it was used in part. The parts were removed again when the first crewjam patch was added, fixing the mishandling of compressed data. I did not research deeper into suitability at that point because SLO was scheduled for 2.9 at the time and I wanted it working, not go down another rabbit hole of changing the supporting package.

Some other packages I saw were explicitly marked as experimental, incomplete, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just for my own understanding, https://pkg.go.dev/github.com/russellhaering/gosaml2 doesn't contain all the code and fixes that we need without having the need to still fork it too, right?

Note that our fork https://github.com/rancher/saml doesn't have main as the default branch, instead if points to v0.0.1-rancher1 which can be confusing. Would be good to update the default branch if possible, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pkg.go.dev/github.com/russellhaering/gosaml2

This package came up during work here as a possible alternative to crewjam/saml, which I did not investigate deeply enough to know if it provides everything we need, or not. I saw a fork of crewjam/saml as the less risky path than having to replace it with something completely new.

Would be good to update the default branch if possible, please.

Checking, I find that I do not have the admin permissions necessary to change the default path. Who should I talk to get the change done ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, Andreas.

I saw a fork of crewjam/saml as the less risky path than having to replace it with something completely new.

Putting this into perspective, it does make sense.

The security implications, ownership and maintainability of such forks, that and others, will eventually be addressed in the Engineering Handbook (CC @mattfarina).

Checking, I find that I do not have the admin permissions necessary to change the default path. Who should I talk to get the change done ?

You must open a GH issue for EIO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 37152f4 to 033aaa4 Compare July 1, 2024 07:29
…supported, enabled, forced.

added structures for logout request and response.
regenerated code and yaml
side work: documented nature of InitializeSamlServiceProvider
global logout interceptor callback
linked saml logout handler/backend with token manager frontend, via the interceptor callback
added guards against UI misbehaviour
register the new structures with the norman frame work to enable serialization to and from json
added handling of SLO responses.
inject slo support flag into the initial authconfigs
fix: extended flow state storage with ability to set the cookie path.
hardwired acs path is no good when redirecting to the slo endpoint.
set proper cookie paths wherever we have state setup
added request signing - applies only to logout requests
(i.e. auth requests are not signed, as before)
(because we apparently use the redirect binding despite the code saying POST)
chore: log cleanup, removed some, made some official
fix: comment typo in go.mod
redirected crewjam/saml to our rancher/saml for decompression fix.
fix: missing/different generated files
fix KC logout issues with local crewjam patch.
fix missing handling of detached signatures on responses
import crewjam fix providing proper detached sig on logout requests.
KC still ok.
OKTA still fails, but different - issuer mismatch, not invalid sig
address comment, drop todo note, crewjam/saml is forked, and fork used
address comment - reduce saml logging

Apply suggestions from code review
Co-authored-by: Paulo Gomes <paulo.gomes.uk@gmail.com>

address comments

Apply suggestions from code review
Co-authored-by: Paulo Gomes <paulo.gomes.uk@gmail.com>

fixup of partial change for principled extension of redirect url with error information
old code still needed for bad case (unparseable url)

implemented todo: Assert "action == logout" && !sloForced, guard against UI misbehaviour.
IOW, the UI will get an error now if it tries to perform a regular logout while the provider is configured for forced SLO, i.e. logout all as the only allowed method.
@andreas-kupries andreas-kupries force-pushed the ak-38494-sure-3572-saml-single-logout branch from 033aaa4 to b88fb90 Compare July 2, 2024 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] SAML Single Logout not implemented
6 participants