Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Globus] Remove the need for globus_sdk as a python dependency #337

Merged
merged 1 commit into from
Jun 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 24 additions & 23 deletions docs/source/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -388,12 +388,11 @@ Set the above settings in your ``jupyterhub_config``:
.. code:: python

# Tell JupyterHub to create system accounts
from oauthenticator.globus import LocalGlobusOAuthenticator
c.JupyterHub.authenticator_class = LocalGlobusOAuthenticator
c.LocalGlobusOAuthenticator.enable_auth_state = True
c.LocalGlobusOAuthenticator.oauth_callback_url = 'https://[your-host]/hub/oauth_callback'
c.LocalGlobusOAuthenticator.client_id = '[your app client id]'
c.LocalGlobusOAuthenticator.client_secret = '[your app client secret]'
from oauthenticator.globus import GlobusOAuthenticator
c.JupyterHub.authenticator_class = GlobusOAuthenticator
c.GlobusOAuthenticator.oauth_callback_url = 'https://[your-host]/hub/oauth_callback'
c.GlobusOAuthenticator.client_id = '[your app client id]'
c.GlobusOAuthenticator.client_secret = '[your app client secret]'
minrk marked this conversation as resolved.
Show resolved Hide resolved

Alternatively you can set env variables for the following:
``OAUTH_CALLBACK_URL``, ``OAUTH_CLIENT_ID``, and
Expand All @@ -406,13 +405,6 @@ settings related to User Identity, Transfer, and additional security.
User Identity
~~~~~~~~~~~~~

By default, all users are restricted to their *Globus IDs*
(example@globusid.org) with the default Jupyterhub config:

.. code:: python

c.GlobusOAuthenticator.identity_provider = 'globusid.org'

If you want to use a *Linked Identity* such as
``malcolm@universityofindependence.edu``, go to your `App Developer
page <http://developers.globus.org>`__ and set *Required Identity
Expand All @@ -421,12 +413,22 @@ in the config:

.. code:: python

c.GlobusOAuthenticator.identity_provider = 'universityofindependence.edu'
c.GlobusOAuthenticator.identity_provider = 'uchicago.edu'

**Pitfall**: Don't set 'Required Identity Provider' on pre-existing apps!
Previous user login consents will be tied to the identity users initially used
to login, and will continue to be tied to that identity after changing this
setting. Create a new Globus App with your preferred 'Required Identity Provider'
to avoid this problem.

Globus Scopes and Transfer
~~~~~~~~~~~~~~~~~~~~~~~~~~

The default configuration will automatically setup user environments
The following shows how to get tokens into user Notebooks. `You can see how users
use tokens here <https://github.com/globus/globus-jupyter-notebooks/blob/master/JupyterHub_Integration.ipynb>`__.
If you want a demonstration, you can visit `The Jupyter Globus Demo Server <https://jupyter.demo.globus.org>`__.

The default server configuration will automatically setup user environments
with tokens, allowing them to start up python notebooks and initiate
Globus Transfers. If you want to transfer data onto your JupyterHub
server, it’s suggested you install `Globus Connect
Expand All @@ -436,21 +438,20 @@ other behavior, you can modify the defaults below:

.. code:: python

# Allow Refresh Tokens in user notebooks. Disallow these for increased security,
# allow them for better usability.
c.LocalGlobusOAuthenticator.allow_refresh_tokens = True
# Allow saving user tokens to the database
c.GlobusOAuthenticator.enable_auth_state = True
# Default scopes are below if unspecified. Add a custom transfer server if you have one.
c.LocalGlobusOAuthenticator.scope = ['openid', 'profile', 'urn:globus:auth:scope:transfer.api.globus.org:all']
c.GlobusOAuthenticator.scope = ['openid', 'profile', 'urn:globus:auth:scope:transfer.api.globus.org:all']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# Default tokens excluded from being passed into the spawner environment
c.LocalGlobusOAuthenticator.exclude_tokens = ['auth.globus.org']
c.GlobusOAuthenticator.exclude_tokens = ['auth.globus.org']
# If the JupyterHub server is an endpoint, for convenience the endpoint id can be
# set here. It will show up in the notebook kernel for all users as 'GLOBUS_LOCAL_ENDPOINT'.
c.LocalGlobusOAuthenticator.globus_local_endpoint = '<Your Local JupyterHub UUID>'
c.GlobusOAuthenticator.globus_local_endpoint = '<Your Local JupyterHub UUID>'
# Set a custom logout URL for your identity provider
c.LocalGlobusOAuthenticator.logout_redirect_url = 'https://auth.globus.org/v2/web/logout'
c.GlobusOAuthenticator.logout_redirect_url = 'https://globus.org/logout'
# For added security, revoke all service tokens when users logout. (Note: users must start
# a new server to get fresh tokens, logging out does not shut it down by default)
c.LocalGlobusOAuthenticator.revoke_tokens_on_logout = False
c.GlobusOAuthenticator.revoke_tokens_on_logout = False

If you only want to authenticate users with their Globus IDs but don’t
want to allow them to do transfers, you can remove
Expand Down
172 changes: 103 additions & 69 deletions oauthenticator/globus.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,22 @@
"""
import os
import pickle
import json
import base64
import urllib

from tornado import web
from tornado.auth import OAuth2Mixin
from tornado.web import HTTPError
from tornado.httpclient import HTTPRequest, AsyncHTTPClient

from traitlets import List, Unicode, Bool, default

from jupyterhub.handlers import LogoutHandler
from jupyterhub.auth import LocalAuthenticator
from jupyterhub.utils import url_path_join
from jupyterhub.auth import LocalAuthenticator

from .oauth2 import OAuthenticator


try:
import globus_sdk
except:
raise ImportError(
'globus_sdk is not installed, please run '
'`pip install oauthenticator[globus]` for using Globus oauth.'
)


class GlobusLogoutHandler(LogoutHandler):
"""
Handle custom logout URLs and token revocation. If a custom logout url
Expand All @@ -36,27 +28,36 @@ class GlobusLogoutHandler(LogoutHandler):
"""

async def get(self):
# Ensure self.handle_logout() is called before self.default_handle_logout()
# If default_handle_logout() is called first, the user session is popped and
# it's not longer possible to call get_auth_state() to revoke tokens.
# See https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/handlers/login.py # noqa
await self.handle_logout()
await self.default_handle_logout()
if self.authenticator.logout_redirect_url:
await self.default_handle_logout()
await self.handle_logout()
# super().get() will attempt to render a logout page. Make sure we
# return after the redirect to avoid exceptions.
self.redirect(self.authenticator.logout_redirect_url)
else:
await super().get()
return
await super().get()

async def handle_logout(self):
"""Overridden method for custom logout functionality. Should be called by
Jupyterhub on logout just before destroying the users session to log them out."""
if self.current_user and self.authenticator.revoke_tokens_on_logout:
await self.clear_tokens(self.current_user)

async def clear_tokens(self, user):
"""Revoke and clear user tokens from the database"""
state = await user.get_auth_state()
if state:
self.authenticator.revoke_service_tokens(state.get('tokens'))
await self.authenticator.revoke_service_tokens(state.get('tokens'))
self.log.info(
'Logout: Revoked tokens for user "{}" services: {}'.format(
user.name, ','.join(state['tokens'].keys())
)
)
state['tokens'] = ''
state['tokens'] = {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor change, unrelated to removing the Globus SDK. Unit testing caught that it's possible to get tripped up by the type changing from dict to str, although in practice I think it would be difficult to get into a state where this caused an error. This change keeps types consistent, just in case.

await user.save_auth_state(state)


Expand All @@ -67,10 +68,26 @@ class GlobusOAuthenticator(OAuthenticator):
login_service = 'Globus'
logout_handler = GlobusLogoutHandler

@default("userdata_url")
def _userdata_url_default(self):
return "https://auth.globus.org/v2/oauth2/userinfo"

@default("authorize_url")
def _authorize_url_default(self):
return "https://auth.globus.org/v2/oauth2/authorize"

@default("revocation_url")
def _revocation_url_default(self):
return "https://auth.globus.org/v2/oauth2/token/revoke"

revocation_url = Unicode(
help="Globus URL to revoke live tokens."
).tag(config=True)

@default("token_url")
def _token_url_default(self):
return "https://auth.globus.org/v2/oauth2/token"

identity_provider = Unicode(
help="""Restrict which institution a user
can use to login (GlobusID, University of Hogwarts, etc.). This should
Expand All @@ -79,7 +96,7 @@ def _authorize_url_default(self):
).tag(config=True)

def _identity_provider_default(self):
return os.getenv('IDENTITY_PROVIDER', 'globusid.org')
return os.getenv('IDENTITY_PROVIDER', '')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another change unrelated to streamlining dependencies, but I think it's preferable. The previous value of globusid.org essentially locks down the default installation to an identity almost nobody uses. Setting this to the empty string allows the user to login with any IdP initially.


exclude_tokens = List(
help="""Exclude tokens from being passed into user environments
Expand All @@ -96,16 +113,6 @@ def _scope_default(self):
'urn:globus:auth:scope:transfer.api.globus.org:all',
]

allow_refresh_tokens = Bool(
help="""Allow users to have Refresh Tokens. If Refresh Tokens are not
allowed, users must use regular Access Tokens which will expire after
a set time. Set to False for increased security, True for increased
convenience."""
).tag(config=True)

def _allow_refresh_tokens_default(self):
return True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last refactor pulled out the code that could request refresh tokens, so this feature is no longer usable. In order to request refresh tokens, an additional parameter needs to be sent through the first leg of the OAuth flow.

globus_local_endpoint = Unicode(
help="""If Jupyterhub is also a Globus
endpoint, its endpoint id can be specified here."""
Expand Down Expand Up @@ -139,31 +146,35 @@ async def pre_spawn_start(self, user, spawner):
globus_data = base64.b64encode(pickle.dumps(state))
spawner.environment['GLOBUS_DATA'] = globus_data.decode('utf-8')

def globus_portal_client(self):
return globus_sdk.ConfidentialAppAuthClient(self.client_id, self.client_secret)

async def authenticate(self, handler, data=None):
"""
Authenticate with globus.org. Usernames (and therefore Jupyterhub
accounts) will correspond to a Globus User ID, so foouser@globusid.org
will have the 'foouser' account in Jupyterhub.
"""
code = handler.get_argument("code")
redirect_uri = self.get_callback_url(self)

client = self.globus_portal_client()
client.oauth2_start_flow(
redirect_uri,
requested_scopes=' '.join(self.scope),
refresh_tokens=self.allow_refresh_tokens,
# Complete login and exchange the code for tokens.
http_client = AsyncHTTPClient()
params = dict(
redirect_uri=self.get_callback_url(handler),
code=handler.get_argument("code"),
grant_type='authorization_code',
)
req = HTTPRequest(self.token_url, method="POST",
headers=self.get_client_credential_headers(),
body=urllib.parse.urlencode(params),
)
# Doing the code for token for id_token exchange
tokens = client.oauth2_exchange_code_for_tokens(code)
id_token = tokens.decode_id_token(client)
token_response = await http_client.fetch(req)
token_json = json.loads(token_response.body.decode('utf8', 'replace'))

# Fetch user info at Globus's oauth2/userinfo/ HTTP endpoint to get the username
user_headers = self.get_default_headers()
user_headers['Authorization'] = 'Bearer {}'.format(token_json['access_token'])
req = HTTPRequest(self.userdata_url, method='GET', headers=user_headers)
user_resp = await http_client.fetch(req)
user_json = json.loads(user_resp.body.decode('utf8', 'replace'))
# It's possible for identity provider domains to be namespaced
# https://docs.globus.org/api/auth/specification/#identity_provider_namespaces # noqa
username, domain = id_token.get('preferred_username').split('@', 1)

username, domain = user_json.get('preferred_username').split('@', 1)
if self.identity_provider and domain != self.identity_provider:
raise HTTPError(
403,
Expand All @@ -174,44 +185,67 @@ async def authenticate(self, handler, data=None):
'globus.org/app/account',
),
)

# Each token should have these attributes. Resource server is optional,
# and likely won't be present.
token_attrs = ['expires_in', 'resource_server', 'scope',
'token_type', 'refresh_token', 'access_token']
# The Auth Token is a bit special, it comes back at the top level with the
# id token. The id token has some useful information in it, but nothing that
# can't be retrieved with an Auth token.
# Repackage the Auth token into a dict that looks like the other tokens
auth_token_dict = {attr_name: token_json.get(attr_name) for attr_name in token_attrs}
# Make sure only the essentials make it into tokens. Other items, such as 'state' are
# not needed after authentication and can be discarded.
other_tokens = [{attr_name: token_dict.get(attr_name) for attr_name in token_attrs}
for token_dict in token_json['other_tokens']]
tokens = other_tokens + [auth_token_dict]
# historically, tokens have been organized by resource server for convenience.
# If multiple scopes are requested from the same resource server, they will be
# combined into a single token from Globus Auth.
by_resource_server = {
token_dict['resource_server']: token_dict
for token_dict in tokens
if token_dict['resource_server'] not in self.exclude_tokens
}
return {
'name': username,
'auth_state': {
'client_id': self.client_id,
'tokens': {
tok: v
for tok, v in tokens.by_resource_server.items()
if tok not in self.exclude_tokens
},
'tokens': by_resource_server,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole authenticate() method is very similar to the GenericOAuthenticator, where the only functional difference is the shape of tokens it returns. When the admin sets custom scopes, including scopes for third party resource servers secured with Globus Auth, the access/refresh tokens will come through the other_tokens key. If the admin didn't enable auth_state, this method behaves much the same as the GenericOAuthenticator

},
}

def revoke_service_tokens(self, services):
def get_default_headers(self):
return {"Accept": "application/json", "User-Agent": "JupyterHub"}

def get_client_credential_headers(self):
headers = self.get_default_headers()
b64key = base64.b64encode(
bytes("{}:{}".format(self.client_id, self.client_secret), "utf8")
)
headers["Authorization"] = "Basic {}".format(b64key.decode("utf8"))
return headers

async def revoke_service_tokens(self, services):
"""Revoke live Globus access and refresh tokens. Revoking inert or
non-existent tokens does nothing. Services are defined by dicts
returned by tokens.by_resource_server, for example:
services = { 'transfer.api.globus.org': {'access_token': 'token'}, ...
<Additional services>...
}
"""
client = self.globus_portal_client()
for service_data in services.values():
client.oauth2_revoke_token(service_data['access_token'])
client.oauth2_revoke_token(service_data['refresh_token'])

def get_callback_url(self, handler=None):
"""
Getting the configured callback url
"""
if self.oauth_callback_url is None:
raise HTTPError(
500,
'No callback url provided. '
'Please configure by adding '
'c.GlobusOAuthenticator.oauth_callback_url '
'to the config',
)
return self.oauth_callback_url
access_tokens = [token_dict.get('access_token') for token_dict in services.values()]
refresh_tokens = [token_dict.get('refresh_token') for token_dict in services.values()]
all_tokens = [tok for tok in access_tokens + refresh_tokens if tok is not None]
http_client = AsyncHTTPClient()
for token in all_tokens:
req = HTTPRequest(self.revocation_url,
method="POST",
headers=self.get_client_credential_headers(),
body=urllib.parse.urlencode({'token': token}),
)
await http_client.fetch(req)

def logout_url(self, base_url):
return url_path_join(base_url, 'logout')
Expand Down
Loading