Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow code flow callback to be direct to Vault #130

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DrDaveD
Copy link

@DrDaveD DrDaveD commented Aug 13, 2020

Overview

This adds a new role option 'callback_mode=direct' which enables the callback from the Authorization Server to be direct to Vault instead of to the client. This allows clients from multiple users to share a machine because they do not need to share a port to listen on, and it also makes for easier management of firewalls, etc, because only the Vault server needs to be configured to accept connections from the Authorization Server instead of every client.

This has a better user experience than that proposed by pr #125 because the user does not need to intervene to copy anything back to the client. This implementation is incompatible with the OIDC device flow implementation in pr #122, but it solves a lot of the same problems as the device flow and introduces an API that could be easily extended to include a device flow (which still has some advantages) by adding a callback mode of 'device'.

Design of Change

When callback_mode=direct is set, the 'oidc/auth_url' client API returns additional parameters 'state' and 'poll_interval'. The client is then expected to call a new API 'oidc/poll' (instead of 'oidc/callback') and try again every poll_interval seconds while the response is an http 400 error 'authorization_pending'. When the Authorization Server instead calls the 'oidc/callback' api, the response is in html because it goes to the user's web browser, and the authorization information is stored in the state entry until the next call to oidc/poll.

The cli also has a new option 'callbackmode=direct' to apply different defaults for the redirect_uri parameter, based on the $VAULT_ADDR environment variable. That is a convenience and is not strictly necessary in order to make it work. When there is a 'state' in the response to the oidc/auth_url API, instead of starting a listener the client calls back to oidc/poll every poll_interval seconds.

cli_responses.go is renamed to html_responses.go because it's not used exclusively for cli (and in fact it already wasn't).

It might be a good idea to add a poll_interval role option as well, to change the default from 5 seconds to the administrator's choice.

Related Issues/Pull Requests

#103, #122, #125

Test results

ok  	github.com/hashicorp/vault-plugin-auth-jwt	4.028s

Contributor Checklist

[ ] Add relevant docs to upstream Vault repository, or sufficient reasoning why docs won’t be added yet
I will modify the relevant docs if this PR is acceptable
[x] Add output for any tests not ran in CI to the PR description (eg, acceptance tests)
[x] Backwards compatible

@ssosik
Copy link

ssosik commented Aug 19, 2020

@DrDaveD - I've been following along with all of your efforts to add "headless" support for OIDC Auth, thanks. I like this approach better than the prior implementations. Do you have a sense of how likely it is that this PR will be merged into the master branch? Is there anything I could help with to move this along?

@DrDaveD
Copy link
Author

DrDaveD commented Aug 19, 2020

@ssosik It's mostly up to @kalafut but I expect it would be helpful if you would test it & review the code for quality.

@ssosik
Copy link

ssosik commented Aug 19, 2020

@DrDaveD I'll run this through some tests over the next few days.

@pbohman
Copy link

pbohman commented Aug 20, 2020

This workflow is convenient, but is there anything in the OIDC/OAuth spec that specifies how the /poll API on vault server should be implemented? How is the CLI authenticated by vault server?

I don't think the state and nonce parameters were intended to be used outside of the protocol for authentication. There is a risk that these parameters are leaked as they are carried in query parameters to the authorization server and poll/ endpoint and may be logged by proxies or present in browser history.

@DrDaveD
Copy link
Author

DrDaveD commented Aug 20, 2020

This workflow is convenient, but is there anything in the OIDC/OAuth spec that specifies how the /poll API on vault server should be implemented?

No, but I wouldn't expect it to because that is implementation dependent. Most implementations do not split the OIDC client into two parts as Vault does with itself and its own client. This poll API was inspired by the Oauth device flow API.

How is the CLI authenticated by vault server?

The state code returned from the original request makes sure that the same client requesting the authorization in the first place is the one doing the poll. As an added precaution the client may send its own client_nonce. I thought about making it required, but I thought the state code would be sufficient.

I don't think the state and nonce parameters were intended to be used outside of the protocol for authentication. There is a risk that these parameters are leaked as they are carried in query parameters to the authorization server and poll/ endpoint and may be logged by proxies or present in browser history.

First, the nonce from the Authorization Server is not passed to the client; there's a separate client_nonce. If there's a concern about the state leaking, I will make the client_nonce be required for the direct callback mode. Would that answer your concern @pbohman ?

@DrDaveD
Copy link
Author

DrDaveD commented Aug 20, 2020

This workflow is convenient, but is there anything in the OIDC/OAuth spec that specifies how the /poll API on vault server should be implemented?

No, but I wouldn't expect it to because that is implementation dependent. Most implementations do not split the OIDC client into two parts as Vault does with itself and its own client. This poll API was inspired by the Oauth device flow API.

Instead of adding a separate "poll" API I probably could overload the "callback" API to be called by both the Vault client and the Authorization Server and it could tie the two together. I thought that could get too confusing especially when the device flow is added on top of it, but if it is preferred I could try it.

@pbohman
Copy link

pbohman commented Aug 21, 2020

If there's a concern about the state leaking, I will make the client_nonce be required for the direct callback mode. Would that answer your concern @pbohman ?

Not completely. (1) Due to our deployment environment, a TLS proxy intercepts and logs the URI and query parameter of all vault server requests. That would potentially leak the client_nonce and state. We would need to send both in a Post body, similar to the poll API of the device flow. (2) State should be deleted or invalidated in vault server's callback handler so that future replays of state are invalid. Your PR removes the call to delete the state in direct mode so that it can be retrieved during the poll API. The WIP draft for OAuth 2.0 Security Best Current Practice states

The "state" value SHOULD be invalidated by the client after its first use at the redirection endpoint.

It also seems like we should rename the client_nonce or add a new parameter "callback_challenge" to better convey its purpose since a nonce doesn't normally imply confidentiality, IMO.

This is out of scope for your PR, so feel free to ignore, but in response to:

Most implementations do not split the OIDC client into two parts as Vault does with itself and its own client

What would they do in this case? It seems like a common use case to have a CLI tool and API and need some form of OIDC authentication. Would it make more sense if the vault CLI were the sole OIDC client and retrieved ID and refresh tokens from the authorization server directly? In this case vault server would use JWT auth to verify the ID tokens as JWTs.AFAICT, this is how the k8s CLI and API server interact when configured with OIDC authentication.

@DrDaveD
Copy link
Author

DrDaveD commented Aug 21, 2020

If there's a concern about the state leaking, I will make the client_nonce be required for the direct callback mode. Would that answer your concern @pbohman ?

Not completely. (1) Due to our deployment environment, a TLS proxy intercepts and logs the URI and query parameter of all vault server requests. That would potentially leak the client_nonce and state. We would need to send both in a Post body, similar to the poll API of the device flow.

I hope you've got that proxy well secured! Doesn't a TLS proxy basically need to be as trusted as an end server, since it terminates the encrypted connection?

Nevertheless, I've got no problem with changing the poll API to use POST instead of GET. In fact I found the GET to be a little painful when I updated my python client that uses this (although it wasn't really too bad once I figured out what the problem was).

(2) State should be deleted or invalidated in vault server's callback handler so that future replays of state are invalid. Your PR removes the call to delete the state in direct mode so that it can be retrieved during the poll API. The WIP draft for OAuth 2.0 Security Best Current Practice states

The "state" value SHOULD be invalidated by the client after its first use at the redirection endpoint.

That statement is actually in section 4.2 "Credential Leakage via Referer Headers". I am not convinced that's applicable here.

In any case, I can easily add a check to pathCallback to prevent the state from being re-used if state.auth is not nil. That will have the same effect, leaving the state only available for use by the next poll, and it will be a lot simpler than having to use a different state code between the vault server and its client.

So if I do all 3 changes (require client_nonce, change poll from GET to POST, and invalidate the state in callback after it is called) will that answer your concerns?

It also seems like we should rename the client_nonce or add a new parameter "callback_challenge" to better convey its purpose since a nonce doesn't normally imply confidentiality, IMO.

I disagree with that. It seems to me to fit very well into what a nonce should be according to the Wikipedia definition of cryptographic nonce. Reusing the same name makes the api simpler.

This is out of scope for your PR, so feel free to ignore, but in response to:

Most implementations do not split the OIDC client into two parts as Vault does with itself and its own client

What would they do in this case? It seems like a common use case to have a CLI tool and API and need some form of OIDC authentication.

It may be common, but I don't think it was much on the mind of the OIDC designers. It was designed for the web browser world.

Would it make more sense if the vault CLI were the sole OIDC client and retrieved ID and refresh tokens from the authorization server directly? In this case vault server would use JWT auth to verify the ID tokens as JWTs.AFAICT, this is how the k8s CLI and API server interact when configured with OIDC authentication.

That would not make any sense the way I want to use it, because then the client secret would need to be known by every vault client. I expect to have a lot of vault clients using the same vault server and client id/secret.

@pbohman
Copy link

pbohman commented Aug 22, 2020

So if I do all 3 changes (require client_nonce, change poll from GET to POST, and invalidate the state in callback after it is called) will that answer your concerns?

Yes, excluding the custom protocol which seems unavoidable with this design.

That would not make any sense the way I want to use it, because then the client secret would need to be known by every vault client. I expect to have a lot of vault clients using the same vault server and client id/secret.

Using a public client seems like a viable solution, if your IdP supports public clients. There is some prior art here with kubernetes cli/server. The approach I described above would also enable the use of OIDC refresh tokens you mentioned in 101 and 119.

@DrDaveD
Copy link
Author

DrDaveD commented Aug 28, 2020

So if I do all 3 changes (require client_nonce, change poll from GET to POST, and invalidate the state in callback after it is called) will that answer your concerns?

Yes

Ok I made these changes in the last 2 commits, with one caveat: the server API accepts either POST (with options in the incoming body) or PUT (with options encoded in the url) for the UpdateOperation, but the vault Logical client-side api only supplies a function for PUT so the vault login cli is using that. Do you have your own client, writing to the API, or are you using the vault login cli?

@DrDaveD
Copy link
Author

DrDaveD commented Sep 14, 2020

@kalafut What do you think of this PR?

@kalafut
Copy link
Contributor

kalafut commented Sep 24, 2020

Hi @DrDaveD. I'm gonna take a look at these callback PRs (including from @broamski) soon!

@DrDaveD
Copy link
Author

DrDaveD commented Nov 23, 2020

@kalafut Have you made any progress at looking at this?

@evilmog
Copy link

evilmog commented Dec 1, 2020

I am also going to need this functionality, my organization restricts http:// callbacks and only supports https, so unless we can enable https on the callback I can't get OIDC to work for CLI. It would be much better for us to basically go into a direct mode and pass the callback directly to vault through our HAProxy

@pentiumoverdrive
Copy link

If this wouldn't force me to run a graphical browser over x forwarding on our jumphosts just to sign in, I would be eternally grateful!

@DrDaveD
Copy link
Author

DrDaveD commented Dec 8, 2020

Eternally grateful? That's a long time!

This does still require a web browser, but it does not need to connect to the localhost of the vault client so yes, you should be able to use the browser on your desktop/laptop instead of forwarding the web browser over x.

@kalafut
Copy link
Contributor

kalafut commented Dec 8, 2020

@DrDaveD (and others) Quick update... we're looking at some abstractions for OIDC and similar auth logic since these methods have a need in HashiCorp beyond Vault. That work is getting much of the attention ATM, but I do want to let you know that Device Flow, and the goals in this PR, are part of the use cases we're considering as we work on that abstraction.

@DrDaveD
Copy link
Author

DrDaveD commented Mar 22, 2021

Updated for version 0.9.2

@DrDaveD
Copy link
Author

DrDaveD commented Jun 17, 2021

Updated for version 0.9.4

@DrDaveD
Copy link
Author

DrDaveD commented Aug 5, 2021

Force-pushed after updating to the master branch at the time of the 0.10.1 tag, and squashing to a single commit.

@DrDaveD
Copy link
Author

DrDaveD commented Nov 3, 2021

Force-pushed after updating for 0.11.1

@markafarrell
Copy link
Contributor

Is there any progress here on this being merged? The concept seems really promising and useful.

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@DrDaveD
Copy link
Author

DrDaveD commented Mar 23, 2022

Force-pushed after updating for 0.12.1

@DrDaveD
Copy link
Author

DrDaveD commented May 23, 2022

Rebased on main after #204 merged

@DrDaveD
Copy link
Author

DrDaveD commented Jan 12, 2023

Rebased onto main after the commits of October 12, 2022

@jameshartig
Copy link

@kalafut any update on this? Our OIDC provider won't let us have non-https redirect URIs so this will be crucial for us to be able to do OIDC in Vault.

@kalafut
Copy link
Contributor

kalafut commented Mar 31, 2023

@kalafut any update on this? Our OIDC provider won't let us have non-https redirect URIs so this will be crucial for us to be able to do OIDC in Vault.

Sorry, I’m no longer at HashiCorp. cc @calvn

@jameshartig
Copy link

@DrDaveD it looks like this MR along with the latest release (v1.13.1) of vault breaks authentication:
image
Looks like the UI is making a request to /v1/auth/oidc/oidc/auth_url with the body: {"role":null,"redirect_uri":"https://<HOSTNAME>/ui/vault/auth/oidc/oidc/callback"} which returns {"errors":["missing client_nonce"]}. This might be related to hashicorp/vault#19541?

path_oidc.go Outdated Show resolved Hide resolved
@jameshartig
Copy link

jameshartig commented Apr 6, 2023

@DrDaveD Is there a reason why callback_mode is a parameter on the role itself and not a parameter in the call to auth_url? Why can't we reuse the same role across the UI and CLI and just use direct mode for the CLI and client mode for the UI? Currently the MR would force us to make multiple roles with the only difference being the callback_mode.

There's also a disconnect between the CLI and the role. If you specify don't specify anything in the CLI but the role has direct then there's a mismatch. If we have to specify it on the role for some reason then the CLI should default to what the role has.

Finally, should the UI prevent you from logging into a "callback_mode=direct" role? More reason to decouple it from the role itself.

@DrDaveD
Copy link
Author

DrDaveD commented Apr 12, 2023

@jameshartig I was away last week and have been trying to look into your questions this week but am having trouble getting vault-1.13.1 compiled. It keeps having various problems with the nodejs code. I'll keep trying. It might be better to discuss on an issue at https://github.com/DrDaveD/vault-plugin-auth-jwt than here. Certainly if you come up with any code changes that work better for you than the current PR (that's still compatible with this one) you could submit a PR there against my branch.

I never really considered that someone might want to use this with oidc authentication on the vault ui, I would have to try it.

@jameshartig
Copy link

@DrDaveD

I never really considered that someone might want to use this with oidc authentication on the vault ui, I would have to try it.

I don't want to use the new flow with the vault UI, but I would like to have a single oidc role and have it work with the CLI and the UI which means the callback_mode cannot be set on the role itself and should instead come from the caller. My issue is that once I merged your changes and built it, it broke the vault UI because it's not checking for empty string in various spots and existing roles that existed before your MR don't have a callback_mode set. Eliminating the callback_mode on the role would fix that issue too.

@DrDaveD
Copy link
Author

DrDaveD commented Apr 21, 2023

Is there a reason why callback_mode is a parameter on the role itself and not a parameter in the call to auth_url

It hadn't occurred to me. It would require more knowledge in the client. I suppose it could be done both ways, set a default in the role and optionally override in the auth_url call.

@DrDaveD DrDaveD force-pushed the code-flow-to-server branch 3 times, most recently from ddca7d2 to ac6ab28 Compare May 2, 2023 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants