Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] support authentication #596

Closed
wants to merge 21 commits into from
Closed

[WIP] support authentication #596

wants to merge 21 commits into from

Conversation

bitnik
Copy link
Collaborator

@bitnik bitnik commented Jun 29, 2018

Hi, last days I was working on adding an optional jhub authentication for binder. Firstly, comments on #323 helped a lot, thanks!

It is not completed yet and I won't be available next 2 weeks. So I thought I can make this PR and maybe get some feedback from you if I am going on the right way. It would be great for me, if you can review my changes.

Current situation:

  • if auth.custom.classNameset as "nullauthenticator.NullAuthenticator" (default configuration), authentication is not enabled and binderhub works with fake users as before.
  • if auth set to anything different from NullAuthenticator, authentication is enabled and when user comes to binder page, it is redirected to jhub login page. But after user logs in there is Too many redirections error. I think this is because HubAuthenticated.get_current_user can't gets the user and redirects back to login page and log in page redirects back to binder page (because user is logged in).

Sorry if I miss something to mention in this PR. I had today less time than planned to work on this.

@minrk
Copy link
Member

minrk commented Jun 29, 2018

This is a really great start, thank you!

I think the main thing missing to get the basics started for the auth to work is setting up the oauth client configuration for the Binder service, as illustrated in this example. We'll also want to make sure the environment variables defined here are available to the binder process

@bitnik bitnik force-pushed the auth branch 3 times, most recently from 4ec7b29 to b5bc39b Compare July 26, 2018 14:19
class Launcher(LoggingConfigurable):
"""Object for encapsulating launching an image for a user"""

hub_api_token = Unicode(help="The API token for the Hub")
hub_url = Unicode(help="The URL of the Hub")
hub_api_url = Unicode(help="The URL of the Hub API")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the thinking behind introducing this new parameter when we already have hub_url? Would be good to add a comment to explain it for future readers. (If this is just a temporary thing and part of the WIP ignore this comment :) )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that was unnecessary. i removed it in the next commit. thanks :)

@bitnik bitnik force-pushed the auth branch 12 times, most recently from 92bfe11 to 197a04a Compare August 2, 2018 11:53
@bitnik
Copy link
Collaborator Author

bitnik commented Aug 3, 2018

@minrk firstly thank you for the feedback and links.

I tested the current state with NullAuthenticator (no authentication), DummyAuthenticator, ShibbolethAuthenticator and GitHubOAuthenticator and authentication for binder worked. But there are still some problems and restrictions:

  • I had to use user id instead of name in pod and pvc names. Launcher.username_from_repo is used to create server names and it restricts the name to be max 30 chars length. But user name limit is 255. That means username + servername usually exceeds 63 char limit of pod and pvc names.
  • Another problem is that KubeSpawner._expand_user_properties makes username safe but not server name. And some repos had invalid pod and pvc names. PR to fix this: safe server name and check if events enabled when event_reflector is called kubespawner#226
  • When auth is enabled jupyterhub-singleuser command is used to launch user servers and this requires JupyterHub to be installed in user images.

The extra configuration I used to enable auth with github:

jupyterhub:
  cull:
    users: false
  hub:
    baseUrl: /jupyter/
    extraConfig:
      binder: |
        from kubespawner import KubeSpawner
        class BinderSpawner(KubeSpawner):
          def start(self):
              if 'image' in self.user_options:
                # binder service sets the image spec via user options
                self.image_spec = self.user_options['image']
                # to disabled pvc creation
                # self.storage_pvc_ensure = False
                # self.volumes = []
                # self.volume_mounts = []
              return super().start()
        c.JupyterHub.spawner_class = BinderSpawner
        c.JupyterHub.allow_named_servers = True
    services:
      binder:
        url: "/base_url/services/service_name"
        oauth_client_id: "binder-oauth-client-test"

  auth:
    type: github
    github:
      clientId: "client id from GitHub"
      clientSecret: "client secret from GitHub"
      callbackUrl: "host/jupyter/hub/oauth_callback"

  singleuser:
    cmd: jupyterhub-singleuser
    storage:
      type: dynamic
      dynamic:
        pvcNameTemplate: claim-{userid}{servername}
        volumeNameTemplate: volume-{userid}{servername}
      homeMountPath: /home/pv

baseUrl: /jupyter/services/binder/

As you see in the conf persistent storage is activated (#377) and mounted to /home/pv.

With this configuration user can launch servers under /jupyter/services/binder/ (with unique server names) and launch a server with name '' under /jupyter/hub/home. In the future I would like to use KubeSpawner.profile_list and list user's launched servers (Spawner table) under /hub/home and manage them. But this will be a new issue.

@bitnik bitnik force-pushed the auth branch 5 times, most recently from 24e946a to 630e55d Compare August 7, 2018 14:24
@bitnik
Copy link
Collaborator Author

bitnik commented Aug 7, 2018

Before commit bfaf51d named servers was required when authentication is enabled. With bfaf51d, it is possible to have authentication enabled without named servers (this means each logged in user can only launch 1 server at a time and they can stop/start this server in JupyterHub home).

@bitnik
Copy link
Collaborator Author

bitnik commented Aug 14, 2018

I am done for now and it would be nice to have a review :)

I think my previous comments were not really explanatory. Here I try to explain the status again.

Currently it is possible to have BinderHub running in 3 ways:

1. Without authentication:

This is the default configuration. It works in this way if .Values.jupyterhub.auth.custom.className is set to NullAuthenticator (which is the default). For each launch, BinderHub creates a temporary user and starts a server for that user.

2. With authentication and without named servers:

BinderHub is configured in this way when .Values.jupyterhub.auth.custom.className is set to sth different than NullAuthenticator. In this way Binderhub limits each authenticated user to start one server at a time. When user already has a running server, BinderHub returns an error.

This way requires:

  • to run BinderHub under <jupyterhub_base_url>/services/binder/, because jupyterhub-services cookie, which is used by the service for authentication, is available under <jupyterhub_base_url>/services/
  • to set jupyterhub.cull.users to False

When authentication is enabled, it is not required to use jupyterhub-singleuser command to start user notebook servers but it is better. Because then notebook server is aware of JupyterHub, so JupyterHub can start/stop it with default JupyterHub home actions and users don't need a token to reach a running server (if logged out). But using this command requires 2 things:

  • to have JupyterHub installed in user images which is not installed by repo2docker by default. But .Values.build.appendix can be used to install it in each build.

  • to use another BinderSpawner, e.g.:

from kubespawner import KubeSpawner
class BinderSpawner(KubeSpawner):
  def start(self):
      if 'image' in self.user_options:
        self.image_spec = self.user_options['image']
      return super().start()

3. With authentication and with named servers:

This is one step further of way 2. Authenticated users can launch multiple servers by using binder service. BinderHub generates a unique server name by using Launcher.username_from_repo method. This way requires enabling named servers (.Values.jupyterhub.hub.allowNamedServers). But JupyterHub home doesn't have a UI to manage these named servers yet.

Some further comments:

  • When 2 or 3 is used, persistent storage can be activated. But then the question is where to mount it.

  • When 2 or 3 is used, state field of Spawner table and KubeSpawner.profile_list feature can be used to list already launched images of user in JupyterHub home. And user can spawn one of the options.

  • Soon I also want to try to move binder form into JupyterHub home and test it.

I really like to hear your comments!

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! I left some suggestions inline, but in general the design looks right to me.

Mainly I think the separate use_oauth config is unnecessary, since auth_enabled seems to never be true without it.

I would love to get some tests for the authenticated cases. Do you think you are up for that?

If authentication in not enabled, this decorator doesn't do anything.
"""
@functools.wraps(method)
def wrapper(self, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than reimplementing @web.authenticated, we can define a .get_current_user that returns 'anonymous' if auth is not enabled.

def initialize(self):
super().initialize()
if self.settings['auth_enabled'] and self.settings['use_oauth']:
self.hub_auth_class = HubOAuth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be self.hub_auth = HubOAuth(), instantiating the object instead of setting the class?

@@ -7,3 +7,4 @@ pytest-tornado
requests
ruamel.yaml>=0.15
https://github.com/jupyterhub/chartpress/archive/271c75e.tar.gz
jupyterhub==0.9.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to pin jupyterhub in dev-requirements.

start the new server for the logged in user.""",
config=True)

use_oauth = Bool(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't nee to have separate auth_enabled and use_oauth flags. We can have a single auth_enabled flag that sets up HubOAuth, perhaps? Or even run with a jupyterhub_service flag that will load the full service config (URL, etc.)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but don't we need 2 flags to distinguish if an OAuthenticator or another authenticator (e.g. ldapauthenticator, jhub_shibboleth_auth) is used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that when I use jhub_shibboleth_auth, auth_enabled is True but user_oauth is False. And then HubAuth is used (not HubOAuth).

# check if user have a running server ('')
user_data = await self.get_user_data(username)
if server_name in user_data['servers']:
raise web.HTTPError(500, "User %s already has a running server." % username)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error code here ought to be 409 if a server is requested but can't be started because it's running.

define get_current_user rather than reimplementing authenticated.
instantiate the object if oauth is used.
raise 409 if user already has a running server.
dont pin jhub in dev-requirements.
@bitnik
Copy link
Collaborator Author

bitnik commented Aug 20, 2018

thanks a lot for the review. I made some changes according to your comments.

And yes I really like to add tests for authentication. But first I have to learn how to do that. Could you help me about this? I will start with checking the existing tests once again and doing changes if necessary.

for nbviewer, test for loading page is updated and cases added.
nbviewer url generation moved into handler and there it is easier to handle different cases.
@bitnik
Copy link
Collaborator Author

bitnik commented Aug 28, 2018

sorry, the last commit e1272af actually doesn't do anything for authentication. i was going through existing tests and end up with doing all these changes. if you wish, i can move this commit to another PR.

in order to test some cases for nbviwer i used berndweiss/gesis-meta-analysis-2018 repo. if you want, i can change it with https://github.com/binderhub-ci-repos/requirements but then some changes are needed in that repo.

@jhamman
Copy link

jhamman commented Aug 28, 2018

@bitnik - thanks for working on this. I'm really excited to try it out so let us know when you think its ready for some beta testing.

@bitnik
Copy link
Collaborator Author

bitnik commented Aug 30, 2018

@minrk i started writing tests for auth. could you check b82ec56 if I started correctly?

The only problem I have while running auth tests locally is that HubAuth.api_token is not filled with JUPYTERHUB_API_TOKEN env variable by default (https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/services/auth.py#L172). To have it work, I had to add the dynamic default value generator for api_token into jupyterhub/services/auth.py

    @default('api_token')
    def _default_api_token(self):
        return os.getenv('JUPYTERHUB_API_TOKEN', '')

@bitnik
Copy link
Collaborator Author

bitnik commented Aug 30, 2018

@jhamman I think you can already test current status. I am testing it on our staging server. Here is the configuration I used last time. (don't forget to set .Values.jupyterhub.hub.services.binder.oauth_client_id if you use oauth)

@bitnik
Copy link
Collaborator Author

bitnik commented Sep 11, 2018

@minrk when can you review here? I really want to finish this asap :)

@bitnik
Copy link
Collaborator Author

bitnik commented Sep 13, 2018

this pr contains many changes unrelated to authentication. i am going to restructure it into smaller ones. sorry that i didn't have time to do it before.

@bitnik bitnik mentioned this pull request Sep 18, 2018
@bitnik
Copy link
Collaborator Author

bitnik commented Sep 18, 2018

moved to #666.

@bitnik bitnik closed this Sep 18, 2018
@bitnik bitnik deleted the auth branch October 12, 2018 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants