Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to extract additional token in generic OAuth2 authenticator #487

Open
michaeljfazio opened this issue Feb 21, 2022 · 9 comments
Open

Comments

@michaeljfazio
Copy link

michaeljfazio commented Feb 21, 2022

Proposed change

I would like to extract an OIDC ID token in auth_state_hook. This seems like it should be straight forward when using the generic authenticator however it is not OIDC friendly. Specifically, it does not provide an easy way to extract the id_token.

Alternative options

Option 1: Allow users to specify additional tokens to be extracted into auth state (e.g. c.LocalGenericOAuthenticator.extra_tokens['id_token']
Option 2. Automatically detect if the token response contains an "id_token" and add it to the auth_state implicitly.

Who would use this feature?

In my case this is useful so that I can take the id_token and use it to register federated users in an identity pool before converting it to temporary AWS credentials server side using something like this auth hook:

async def pre_spawn_hook(spawner: Spawner):
    try:
        auth_state = await spawner.user.get_auth_state()

        id_token = auth_state['id_token']
        decoded = jwt.decode(id_token, options={"verify_signature": False}, audience=c.LocalGenericOIDCAuthenticator.client_id)
        cognito_provider = decoded['iss'][8:]
        identity_pool_id = os.environ.get('IDENTITY_POOL_ID')

        cognito = boto3.client('cognito-identity')
        logins = { cognito_provider: id_token }

        identity_id = cognito.get_id(IdentityPoolId=identity_pool_id, Logins=logins)['IdentityId']
        credentials = cognito.get_credentials_for_identity(IdentityId=identity_id, Logins=logins)['Credentials']

        spawner.environment['AWS_ACCESS_KEY_ID'] = credentials['AccessKeyId']
        spawner.environment['AWS_SECRET_ACCESS_KEY'] = credentials['SecretKey']
        spawner.environment['AWS_SESSION_TOKEN'] = credentials['SessionToken']
    except:
        spawner.log.error(traceback.format_exc())
@welcome
Copy link

welcome bot commented Feb 21, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@manics
Copy link
Member

manics commented Feb 21, 2022

Do you think we could add the entire token_response as a field of auth_state?

@staticmethod
def _create_auth_state(token_response, user_data_response):
access_token = token_response['access_token']
refresh_token = token_response.get('refresh_token', None)
scope = token_response.get('scope', '')
if isinstance(scope, str):
scope = scope.split(' ')
return {
'access_token': access_token,
'refresh_token': refresh_token,
'oauth_user': user_data_response,
'scope': scope,
}

@michaeljfazio
Copy link
Author

That seems sensible also. Certainly simplifies things.

@mahic
Copy link

mahic commented Jun 29, 2023

@manics @michaeljfazio The OAuthenticator hasn't had any releases since 15.1. There has been a major refactoring in the subsystem since the last release, and to me it seems as some of the refactorings now include the id_token as well?

@GeorgianaElena @consideRatio can we expect some releases of this soon? It would help us out a lot :-)

@GeorgianaElena
Copy link
Member

@mahic, the new release is being tracked in #585 🚀

@benjimin
Copy link
Contributor

benjimin commented Dec 6, 2023

I second this feature request.

For context, OIDC is the widely-used authentication standard built on top of the OAuth authorisation standard, and OIDC offers two separate avenues for retrieving information regarding the user: an ID token and a UserInfo endpoint. The process is:

  1. The JupyterHub hub redirects the user agent (web browser) to the OpenID provider's authorisation (login) endpoint.
  2. The identity provider authenticates the user and redirects their agent back to the hub with an opaque code.
  3. The hub directly contacts the identity provider's token endpoint, and exchanges that code for a bundle of tokens, which will include both the access token and the ID token. (These are each formatted as JWT tokens, that is, as base64url-encoded and cryptographically-signed JSON-formatted collections of claims.)
  4. The hub directly contacts the identity provider's UserInfo endpoint, and exchanges that access token for a simple JSON-formatted collection of claims.

I think JupyterHub should give admins the choice of whether to use claims from the UserInfo endpoint (step 4) versus from the ID token (step 3). In the latter choice, the hub could entirely skip the final network request, but would instead be responsible for decoding (and validating) the ID token to extract its claims.

Some identity providers will supply different sets of claims via each of these two avenues, for example, AWS Cognito currently provides cognito:groups (i.e. the list of groups that the user is a member of) as a claim in both of the tokens but not in the JSON from the UserInfo endpoint. (Note that JupyterHub admins commonly desire to customise profile lists according group membership; an example is even suggested in the Z2JH docs. GenericOAuthenticator also expressly facilitates using group membership to decide login and admin privileges. Also AWS is a popular infrastructure provider.)

Currently GenericOAuthenticator discards the ID token and loads only the UserInfo response into the auth_state (in the "oauth_user" aka "user_info" field, from where it is used e.g. to extract the claim_groups_key field). I think an option to instead parse the ID token should be obligatory for general OIDC support (e.g. #254) and would better support deployments in AWS (see #708 (comment)). While copying the raw tokens into auth_state could be a useful start, I think this library should definitely handle extracting claims from the ID token itself (to minimise complexity of downstream customisation code and encourage security best practices).

@minrk
Copy link
Member

minrk commented Dec 7, 2023

Thanks for the detailed writeup! I think including the id token makes sense, especially for OIDC, and further reinforces that having an actual OIDCAuthenticator which can make the appropriate assumptions that generic OAuth can't is a thing we should have (#254).

@nocnokneo
Copy link

I agree with @benjimin's proposal. There could be a few options exposed as a user_info_source config:

  • USERINFO_ENDPOINT_ONLY (default for backward compatibility)
  • ID_TOKEN_ONLY
  • PREFER_USERINFO_ENDPOINT
  • PREFER_ID_TOKEN

Where the last two options merge the results from the userinfo endpoint and the ID token payload.

@benjimin
Copy link
Contributor

benjimin commented Feb 6, 2024

I've created a PR (#725) proposing that if oauthenticator.userdata_url = None (note this is not the default and so conveys an unambiguous choice by the admin to not use any userinfo endpoint) then the ID Token is used, as a direct substitute for a response from a userinfo endpoint (the user_info dict), thus going on to populate claims into the user model stored in auth_state.

Note, It looks like it had already been possible (as of v16 as alluded by @GeorgianaElena) for spawner hooks etc to access the raw id token via auth_state, but additional custom code is required to extract the token claims that way. Also, the identity provider has until now been required to support an OIDC-style userinfo endpoint, because the fields used internally by JupyterHub (such as username and group membership) weren't able to be populated from an id token.

Not wanting to overcook things in absence of use-cases, this PR doesn't introduce any novel merge functionality (apologies @nocnokneo), and tries to keep the configuration complexity to a minimum. It certainly doesn't introduce full OIDC support (discovery mechanisms etc) like #254.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants