-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAML SLO support #45379
base: release/v2.9
Are you sure you want to change the base?
SAML SLO support #45379
Conversation
6788bc7
to
2d218c6
Compare
4f7f7a6
to
cbb7eda
Compare
Status report after a few days of working on implementing the thing: The main visualization of the necessary workflow is taken from The main take I get from both is that the SP who initiated the SLO is not notified by the IdP that the user was logged out. Okta seems to follow that principle, based on what I experienced with it today. I.e. I see a plain POST to Rancher's That is weird because the signature on the logout request should be generated through the same code as for SSO, and use the same cert/key, and SSO is, well, successful. Given my Also, Keycloak looks to be ok with the signature of the request, its issue looks to be somewhere else. So, going back to KC, which I struggled with before trialing Okta today it seems that this IdP wants to notify the initiating SP also, possibly directly from itself. The initial failures I saw with it said It should be noted that the XXX endpoint is generally needed, namely to handle IdP initiated logouts, i.e. when Rancher and other apps share the IdP and users, and a user in the other apps performs a SLO. Thus, if I can implement this endpoint properly then it should be possible to use KC completely. In terms of the flow image at the beginning, for KC I am stuck in implementing the steps 3/4, This is the state of the work as of commit 582687e. Thanks to @aalves08 for providing access to a remote Okta IdP to work with. |
582687e
to
11ba86a
Compare
@andreas-kupries At some point next week could you provide an linux/amd64 image of the changes in this PR? I'll be taking over from Alex whilst he's away and noticed he was using arm64 |
Managed a full logout with KC now. That said, after that the UI does a number of things I was unable to track, and then landed on
in the end. In other words, there may still be something more to do UI side. Side note: I have to keep the haering SAML package. The logout response is deflate-compressed. haering handles that ok. crewjam does not try to decompress, tries to read the compressed string as XML and then fails with a bad utf-8 error. |
adc8666
to
9559ede
Compare
Status report: SLO works for Keycloak and OKTA now.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment. There seems to be quite a number of debug logs. Our debug logs overall in Rancher are super dense.
The answer may be, "yes", and that is fine, but have you considered removing any of these that might produce lots of output that isn't very high-value?
1eced75
to
2a3bf9b
Compare
I tend towards higher amount of logging by default.
Weeded a bit, result is in commit [2a3bf9b] |
d96fb79
to
720a90b
Compare
@@ -6,8 +6,9 @@ toolchain go1.22.3 | |||
|
|||
replace ( | |||
github.com/containerd/containerd => github.com/containerd/containerd v1.6.27 // for compatibilty with docker 20.10.x | |||
github.com/crewjam/saml => github.com/rancher/saml v0.4.14-rancher3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our long term plan regarding this fork, please? Was this validated with the Architecture team? Just asking to understand our goals and if we plan to eventually go back to the original upstream version or not.
I see in our fork that we added some fixes and open (not merged) upstream PRs. Are we working with upstream to get those merged?
My concern is that this is a very sensitive library in terms of security, and it already had 5 CVEs - https://github.com/crewjam/saml/security . By going with the fork we might lose track of future CVE fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we working with upstream to get those merged?
The PRs for the fixes we use look to have been ignored for more than a year, both.
And the project has lots of other open PRs which are not attended to.
It feels as if the entire project has fallen out of maintenance.
As such I am unsure how we could work with upstream to get these merged.
What is our long term plan regarding this fork, please?
Personally I would push for us to search for a better maintained replacement.
Was this validated with the Architecture team?
Who should I talk to ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattfarina WDYT about the situation described by Andreas, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreas-kupries I have a couple questions:
- Do we have a documented list of changes we are carrying and the reason for each of them? I ask because 1) I am looking to understand the situation and 2) I've run into problems with other forks where we had lost that information and no longer knew.
- Do we know of any alternative saml packages that are well maintained and stable? I suspect the answer is "no" because we have not looked but I wanted to ask. Just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no documented list of changes, except through the commit messages in the fork.
I am ok with writing such a list. Where should it go ? (Just) in the forked repo, or elsewhere too ?
I know of https://pkg.go.dev/github.com/russellhaering/gosaml2 because during the search for the OKTA/KC issues it was used in part. The parts were removed again when the first crewjam patch was added, fixing the mishandling of compressed data. I did not research deeper into suitability at that point because SLO was scheduled for 2.9 at the time and I wanted it working, not go down another rabbit hole of changing the supporting package.
Some other packages I saw were explicitly marked as experimental, incomplete, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just for my own understanding, https://pkg.go.dev/github.com/russellhaering/gosaml2 doesn't contain all the code and fixes that we need without having the need to still fork it too, right?
Note that our fork https://github.com/rancher/saml doesn't have main
as the default branch, instead if points to v0.0.1-rancher1
which can be confusing. Would be good to update the default branch if possible, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package came up during work here as a possible alternative to crewjam/saml
, which I did not investigate deeply enough to know if it provides everything we need, or not. I saw a fork of crewjam/saml
as the less risky path than having to replace it with something completely new.
Would be good to update the default branch if possible, please.
Checking, I find that I do not have the admin permissions necessary to change the default path. Who should I talk to get the change done ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, Andreas.
I saw a fork of crewjam/saml as the less risky path than having to replace it with something completely new.
Putting this into perspective, it does make sense.
The security implications, ownership and maintainability of such forks, that and others, will eventually be addressed in the Engineering Handbook (CC @mattfarina).
Checking, I find that I do not have the admin permissions necessary to change the default path. Who should I talk to get the change done ?
You must open a GH issue for EIO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37152f4
to
033aaa4
Compare
…supported, enabled, forced. added structures for logout request and response. regenerated code and yaml side work: documented nature of InitializeSamlServiceProvider global logout interceptor callback linked saml logout handler/backend with token manager frontend, via the interceptor callback added guards against UI misbehaviour register the new structures with the norman frame work to enable serialization to and from json added handling of SLO responses. inject slo support flag into the initial authconfigs fix: extended flow state storage with ability to set the cookie path. hardwired acs path is no good when redirecting to the slo endpoint. set proper cookie paths wherever we have state setup added request signing - applies only to logout requests (i.e. auth requests are not signed, as before) (because we apparently use the redirect binding despite the code saying POST) chore: log cleanup, removed some, made some official fix: comment typo in go.mod redirected crewjam/saml to our rancher/saml for decompression fix. fix: missing/different generated files fix KC logout issues with local crewjam patch. fix missing handling of detached signatures on responses import crewjam fix providing proper detached sig on logout requests. KC still ok. OKTA still fails, but different - issuer mismatch, not invalid sig address comment, drop todo note, crewjam/saml is forked, and fork used address comment - reduce saml logging Apply suggestions from code review Co-authored-by: Paulo Gomes <paulo.gomes.uk@gmail.com> address comments Apply suggestions from code review Co-authored-by: Paulo Gomes <paulo.gomes.uk@gmail.com> fixup of partial change for principled extension of redirect url with error information old code still needed for bad case (unparseable url) implemented todo: Assert "action == logout" && !sloForced, guard against UI misbehaviour. IOW, the UI will get an error now if it tries to perform a regular logout while the provider is configured for forced SLO, i.e. logout all as the only allowed method.
033aaa4
to
b88fb90
Compare
Issue:
Fix #38494
This work is co-dependent on the UI work tracked at rancher/dashboard#10941
Problem
SURE 3572
Solution
PR holds work checkpoints at the moment. Not in a merge-able state.
AuthConfig
,SamlConfig
with the proposed flags about SLO (supported
,enabled
,forced
).supported
flag might be nonsense.SamlConfigLogoutInput
, and...Output
. Same fields as the knownSamlConfigTest...
structures. Hold the request/response data from/to the UI for thelogoutAll
action (see below).tokens
API should export a new actionlogoutAll
.KNOWN ISSUES: Does not guard against call of regular logout when SLO is forced.
Does guard against forced but not enabled, and call to logout-all when not enabled.
Testing
Engineering Testing
Manual Testing
Automated Testing
Summary: TODO
QA Testing Considerations
Regressions Considerations
TODO
Existing / newly added automated tests that provide evidence there are no regressions: