[WIP] support authentication #596

bitnik · 2018-06-29T11:24:32Z

Hi, last days I was working on adding an optional jhub authentication for binder. Firstly, comments on #323 helped a lot, thanks!

It is not completed yet and I won't be available next 2 weeks. So I thought I can make this PR and maybe get some feedback from you if I am going on the right way. It would be great for me, if you can review my changes.

Current situation:

if auth.custom.classNameset as "nullauthenticator.NullAuthenticator" (default configuration), authentication is not enabled and binderhub works with fake users as before.
if auth set to anything different from NullAuthenticator, authentication is enabled and when user comes to binder page, it is redirected to jhub login page. But after user logs in there is Too many redirections error. I think this is because HubAuthenticated.get_current_user can't gets the user and redirects back to login page and log in page redirects back to binder page (because user is logged in).

Sorry if I miss something to mention in this PR. I had today less time than planned to work on this.

minrk · 2018-06-29T12:56:29Z

This is a really great start, thank you!

I think the main thing missing to get the basics started for the auth to work is setting up the oauth client configuration for the Binder service, as illustrated in this example. We'll also want to make sure the environment variables defined here are available to the binder process

betatim · 2018-07-27T06:29:26Z

binderhub/launcher.py

 class Launcher(LoggingConfigurable):
    """Object for encapsulating launching an image for a user"""

    hub_api_token = Unicode(help="The API token for the Hub")
    hub_url = Unicode(help="The URL of the Hub")
+    hub_api_url = Unicode(help="The URL of the Hub API")


What is the thinking behind introducing this new parameter when we already have hub_url? Would be good to add a comment to explain it for future readers. (If this is just a temporary thing and part of the WIP ignore this comment :) )

yes, that was unnecessary. i removed it in the next commit. thanks :)

bitnik · 2018-08-03T07:07:03Z

@minrk firstly thank you for the feedback and links.

I tested the current state with NullAuthenticator (no authentication), DummyAuthenticator, ShibbolethAuthenticator and GitHubOAuthenticator and authentication for binder worked. But there are still some problems and restrictions:

I had to use user id instead of name in pod and pvc names. Launcher.username_from_repo is used to create server names and it restricts the name to be max 30 chars length. But user name limit is 255. That means username + servername usually exceeds 63 char limit of pod and pvc names.
Another problem is that KubeSpawner._expand_user_properties makes username safe but not server name. And some repos had invalid pod and pvc names. PR to fix this: safe server name and check if events enabled when event_reflector is called kubespawner#226
When auth is enabled jupyterhub-singleuser command is used to launch user servers and this requires JupyterHub to be installed in user images.

The extra configuration I used to enable auth with github:

jupyterhub:
  cull:
    users: false
  hub:
    baseUrl: /jupyter/
    extraConfig:
      binder: |
        from kubespawner import KubeSpawner
        class BinderSpawner(KubeSpawner):
          def start(self):
              if 'image' in self.user_options:
                # binder service sets the image spec via user options
                self.image_spec = self.user_options['image']
                # to disabled pvc creation
                # self.storage_pvc_ensure = False
                # self.volumes = []
                # self.volume_mounts = []
              return super().start()
        c.JupyterHub.spawner_class = BinderSpawner
        c.JupyterHub.allow_named_servers = True
    services:
      binder:
        url: "/base_url/services/service_name"
        oauth_client_id: "binder-oauth-client-test"

  auth:
    type: github
    github:
      clientId: "client id from GitHub"
      clientSecret: "client secret from GitHub"
      callbackUrl: "host/jupyter/hub/oauth_callback"

  singleuser:
    cmd: jupyterhub-singleuser
    storage:
      type: dynamic
      dynamic:
        pvcNameTemplate: claim-{userid}{servername}
        volumeNameTemplate: volume-{userid}{servername}
      homeMountPath: /home/pv

baseUrl: /jupyter/services/binder/

As you see in the conf persistent storage is activated (#377) and mounted to /home/pv.

With this configuration user can launch servers under /jupyter/services/binder/ (with unique server names) and launch a server with name '' under /jupyter/hub/home. In the future I would like to use KubeSpawner.profile_list and list user's launched servers (Spawner table) under /hub/home and manage them. But this will be a new issue.

bitnik · 2018-08-07T14:41:39Z

Before commit bfaf51d named servers was required when authentication is enabled. With bfaf51d, it is possible to have authentication enabled without named servers (this means each logged in user can only launch 1 server at a time and they can stop/start this server in JupyterHub home).

…is causes error and user gets `internal server error`

… token if auth is enabled. add more comments.

bitnik · 2018-08-14T13:56:09Z

I am done for now and it would be nice to have a review :)

I think my previous comments were not really explanatory. Here I try to explain the status again.

Currently it is possible to have BinderHub running in 3 ways:

1. Without authentication:

This is the default configuration. It works in this way if .Values.jupyterhub.auth.custom.className is set to NullAuthenticator (which is the default). For each launch, BinderHub creates a temporary user and starts a server for that user.

2. With authentication and without named servers:

BinderHub is configured in this way when .Values.jupyterhub.auth.custom.className is set to sth different than NullAuthenticator. In this way Binderhub limits each authenticated user to start one server at a time. When user already has a running server, BinderHub returns an error.

This way requires:

to run BinderHub under <jupyterhub_base_url>/services/binder/, because jupyterhub-services cookie, which is used by the service for authentication, is available under <jupyterhub_base_url>/services/
to set jupyterhub.cull.users to False

When authentication is enabled, it is not required to use jupyterhub-singleuser command to start user notebook servers but it is better. Because then notebook server is aware of JupyterHub, so JupyterHub can start/stop it with default JupyterHub home actions and users don't need a token to reach a running server (if logged out). But using this command requires 2 things:

to have JupyterHub installed in user images which is not installed by repo2docker by default. But .Values.build.appendix can be used to install it in each build.
to use another BinderSpawner, e.g.:

from kubespawner import KubeSpawner
class BinderSpawner(KubeSpawner):
  def start(self):
      if 'image' in self.user_options:
        self.image_spec = self.user_options['image']
      return super().start()

3. With authentication and with named servers:

This is one step further of way 2. Authenticated users can launch multiple servers by using binder service. BinderHub generates a unique server name by using Launcher.username_from_repo method. This way requires enabling named servers (.Values.jupyterhub.hub.allowNamedServers). But JupyterHub home doesn't have a UI to manage these named servers yet.

Some further comments:

When 2 or 3 is used, persistent storage can be activated. But then the question is where to mount it.
When 2 or 3 is used, state field of Spawner table and KubeSpawner.profile_list feature can be used to list already launched images of user in JupyterHub home. And user can spawn one of the options.
Soon I also want to try to move binder form into JupyterHub home and test it.

I really like to hear your comments!

minrk

This is really great! I left some suggestions inline, but in general the design looks right to me.

Mainly I think the separate use_oauth config is unnecessary, since auth_enabled seems to never be true without it.

I would love to get some tests for the authenticated cases. Do you think you are up for that?

minrk · 2018-08-17T13:41:18Z

binderhub/base.py

+    If authentication in not enabled, this decorator doesn't do anything.
+    """
+    @functools.wraps(method)
+    def wrapper(self, *args, **kwargs):


Rather than reimplementing @web.authenticated, we can define a .get_current_user that returns 'anonymous' if auth is not enabled.

minrk · 2018-08-17T13:42:21Z

binderhub/base.py

+    def initialize(self):
+        super().initialize()
+        if self.settings['auth_enabled'] and self.settings['use_oauth']:
+            self.hub_auth_class = HubOAuth


Shouldn't this be self.hub_auth = HubOAuth(), instantiating the object instead of setting the class?

minrk · 2018-08-17T13:44:51Z

dev-requirements.txt

@@ -7,3 +7,4 @@ pytest-tornado
 requests
 ruamel.yaml>=0.15
 https://github.com/jupyterhub/chartpress/archive/271c75e.tar.gz
+jupyterhub==0.9.2


I don't think we want to pin jupyterhub in dev-requirements.

minrk · 2018-08-17T13:46:18Z

binderhub/app.py

+        start the new server for the logged in user.""",
+        config=True)
+
+    use_oauth = Bool(


It seems we don't nee to have separate auth_enabled and use_oauth flags. We can have a single auth_enabled flag that sets up HubOAuth, perhaps? Or even run with a jupyterhub_service flag that will load the full service config (URL, etc.)?

but don't we need 2 flags to distinguish if an OAuthenticator or another authenticator (e.g. ldapauthenticator, jhub_shibboleth_auth) is used?

I mean that when I use jhub_shibboleth_auth, auth_enabled is True but user_oauth is False. And then HubAuth is used (not HubOAuth).

minrk · 2018-08-17T13:48:08Z

binderhub/launcher.py

+            # check if user have a running server ('')
+            user_data = await self.get_user_data(username)
+            if server_name in user_data['servers']:
+                raise web.HTTPError(500, "User %s already has a running server." % username)


The error code here ought to be 409 if a server is requested but can't be started because it's running.

define get_current_user rather than reimplementing authenticated. instantiate the object if oauth is used. raise 409 if user already has a running server. dont pin jhub in dev-requirements.

bitnik · 2018-08-20T11:49:00Z

thanks a lot for the review. I made some changes according to your comments.

And yes I really like to add tests for authentication. But first I have to learn how to do that. Could you help me about this? I will start with checking the existing tests once again and doing changes if necessary.

for nbviewer, test for loading page is updated and cases added. nbviewer url generation moved into handler and there it is easier to handle different cases.

bitnik · 2018-08-28T07:48:39Z

sorry, the last commit e1272af actually doesn't do anything for authentication. i was going through existing tests and end up with doing all these changes. if you wish, i can move this commit to another PR.

in order to test some cases for nbviwer i used berndweiss/gesis-meta-analysis-2018 repo. if you want, i can change it with https://github.com/binderhub-ci-repos/requirements but then some changes are needed in that repo.

jhamman · 2018-08-28T21:13:38Z

@bitnik - thanks for working on this. I'm really excited to try it out so let us know when you think its ready for some beta testing.

bitnik · 2018-08-30T09:40:23Z

@minrk i started writing tests for auth. could you check b82ec56 if I started correctly?

The only problem I have while running auth tests locally is that HubAuth.api_token is not filled with JUPYTERHUB_API_TOKEN env variable by default (https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/services/auth.py#L172). To have it work, I had to add the dynamic default value generator for api_token into jupyterhub/services/auth.py

    @default('api_token')
    def _default_api_token(self):
        return os.getenv('JUPYTERHUB_API_TOKEN', '')

bitnik · 2018-08-30T09:44:50Z

@jhamman I think you can already test current status. I am testing it on our staging server. Here is the configuration I used last time. (don't forget to set .Values.jupyterhub.hub.services.binder.oauth_client_id if you use oauth)

bitnik · 2018-09-11T09:06:22Z

@minrk when can you review here? I really want to finish this asap :)

bitnik · 2018-09-13T06:28:46Z

this pr contains many changes unrelated to authentication. i am going to restructure it into smaller ones. sorry that i didn't have time to do it before.

bitnik · 2018-09-18T13:30:36Z

moved to #666.

bitnik force-pushed the auth branch 3 times, most recently from 4ec7b29 to b5bc39b Compare July 26, 2018 14:19

betatim reviewed Jul 27, 2018

View reviewed changes

bitnik force-pushed the auth branch 12 times, most recently from 92bfe11 to 197a04a Compare August 2, 2018 11:53

bitnik force-pushed the auth branch 5 times, most recently from 24e946a to 630e55d Compare August 7, 2018 14:24

bitnik force-pushed the auth branch from 4492603 to e4fb9b7 Compare August 14, 2018 06:59

bitnik added 5 commits August 14, 2018 09:54

support authentication

1b1396b

add env variables. jhub 0.9.1. allow all successfully identified users.

2844bd1

reduce code, no auth for metrics and small fix for BinderSpawner

8561fa2

raise exception if last api call fails. otherwise it returns None, th…

f535573

…is causes error and user gets `internal server error`

shorter server name => shorter pod and pvc names. no need to generate…

e51374d

… token if auth is enabled. add more comments.

bitnik added 4 commits August 14, 2018 09:54

authentication without named servers

def7244

is jupyterhub-singleuser really required?

cb3aa17

jupyterhub 0.9.2

c9f73df

some minor changes

cf9ef37

bitnik force-pushed the auth branch from e4fb9b7 to cf9ef37 Compare August 14, 2018 13:32

minrk reviewed Aug 17, 2018

View reviewed changes

changes according to minrk's reviews:

3029d9a

define get_current_user rather than reimplementing authenticated. instantiate the object if oauth is used. raise 409 if user already has a running server. dont pin jhub in dev-requirements.

bitnik force-pushed the auth branch from 00a6f93 to 3029d9a Compare August 20, 2018 11:21

Merge branch 'master' into auth

0cd9114

fix: import gen in builder

966ce2f

bitnik force-pushed the auth branch from 9a42d11 to 966ce2f Compare August 21, 2018 07:50

updates and small fixes in index.js.

e1272af

for nbviewer, test for loading page is updated and cases added. nbviewer url generation moved into handler and there it is easier to handle different cases.

fix: dont pass event data to updateUrlDiv

0c4af26

bitnik force-pushed the auth branch from da9689e to ab190ee Compare August 30, 2018 09:24

jhamman mentioned this pull request Aug 31, 2018

add authentication to binder.pangeo.io pangeo-data/pangeo-binder#6

Open

wip: test for auth

b82ec56

bitnik force-pushed the auth branch from a787f4c to b82ec56 Compare September 11, 2018 08:41

Merge branch 'master' into auth

c482612

bitnik mentioned this pull request Sep 18, 2018

support authentication #666

Merged

bitnik closed this Sep 18, 2018

bitnik deleted the auth branch October 12, 2018 05:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] support authentication #596

[WIP] support authentication #596

bitnik commented Jun 29, 2018

minrk commented Jun 29, 2018

betatim Jul 27, 2018

bitnik Jul 30, 2018

bitnik commented Aug 3, 2018

bitnik commented Aug 7, 2018 •

edited

Loading

bitnik commented Aug 14, 2018

minrk left a comment

minrk Aug 17, 2018

minrk Aug 17, 2018

minrk Aug 17, 2018

minrk Aug 17, 2018

bitnik Aug 20, 2018

bitnik Aug 20, 2018

minrk Aug 17, 2018

bitnik commented Aug 20, 2018

bitnik commented Aug 28, 2018

jhamman commented Aug 28, 2018

bitnik commented Aug 30, 2018 •

edited

Loading

bitnik commented Aug 30, 2018 •

edited

Loading

bitnik commented Sep 11, 2018

bitnik commented Sep 13, 2018

bitnik commented Sep 18, 2018

[WIP] support authentication #596

[WIP] support authentication #596

Conversation

bitnik commented Jun 29, 2018

minrk commented Jun 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitnik commented Aug 3, 2018

bitnik commented Aug 7, 2018 • edited Loading

bitnik commented Aug 14, 2018

1. Without authentication:

2. With authentication and without named servers:

3. With authentication and with named servers:

minrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitnik commented Aug 20, 2018

bitnik commented Aug 28, 2018

jhamman commented Aug 28, 2018

bitnik commented Aug 30, 2018 • edited Loading

bitnik commented Aug 30, 2018 • edited Loading

bitnik commented Sep 11, 2018

bitnik commented Sep 13, 2018

bitnik commented Sep 18, 2018

bitnik commented Aug 7, 2018 •

edited

Loading

bitnik commented Aug 30, 2018 •

edited

Loading

bitnik commented Aug 30, 2018 •

edited

Loading