New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: implement services API #705

Merged
merged 9 commits into from Sep 7, 2016

Conversation

Projects
None yet
4 participants
@minrk
Member

minrk commented Aug 27, 2016

cc @willingc @jhamrick @ellisonbg

  • make the basics work
  • update cull-idle example to run as a managed service
  • implement proxied endpoints
  • tests!
  • example for formgrader service as external with proxied endpoint
  • API endpoint for services to retrieve their own info
  • work with subdomains

For now, ignore the implementation because it's mostly notes at this point. The main thing to look at is the services.md description of capabilities and how to specify services in config.

The gist:

  • a service is something that talks to the Hub API, so it always needs an API token.
  • a service can be managed (local process, started by GitHub) or
  • external: not managed by the Hub, only informing the Hub that it exists, up to you to start it, keep it running, etc.
  • a service MAY have one HTTP(S) endpoint that will be proxied at /services/service-name
  • managed services get environment variables that tell it about the context (The API token assigned, Hub URLs, etc.)
  • managed services can generate a new API token on start
  • external services must specify their API token in config
  • cull-idle would logically be a managed script with no http endpoint, approximately:
{
    'name': 'cull-idle',
    'command': '/path/to/cull-idle',
    'admin': True,
}

And that's it.

  • formgrader would want an http endpoint. To run formgrader as an external service:
{
    'name': 'formgrader',
    'api_token': 'added-to-db-on-start',
    'url': 'http://host:port',
}

where the Hub would assume that another system, such as docker/systemd/humans is keeping formgrader running.

  • starting services is part of starting JupyterHub
  • Services cannot be added at runtime
  • Managed services will always be subprocesses of the Hub, if you want to run services in docker, etc. use external services. The Spawner abstractions will not be applied to managed services.
  • All services will share the same cookie (no protection of services from each other)
@willingc

This comment has been minimized.

Contributor

willingc commented Aug 27, 2016

@minrk Great start and thanks for sharing.

When I get some time at the end of the weekend or Monday morning, I'll take a pass at editing the Services specification document and submit a PR against your branch.

As an FYI, I spent time earlier in the week editing docstrings for jupyterhub/services/auth.py since they flow into the API docs. Not yet ready for a PR.

**Managed** services are services that the Hub starts and is responsible for.
These can only be local subprocesses of the Hub.
While there are similarities with notebook Spawners,
there are no plans to support the same spawning abstractions as notebook

This comment has been minimized.

@jhamrick

jhamrick Aug 27, 2016

Contributor

Grammar/missing a period?

This comment has been minimized.

@minrk

minrk Aug 27, 2016

Member

Yup, thanks.

JUPYTERHUB_SERVICE_NAME: the name of the service ('cull-idle' above)
JUPYTERHUB_API_TOKEN: API token assigned to the service
JUPYTERHUB_API_URL: URL for the JupyterHub API (http://127.0.0.1:8080/hub/api)
JUPYTERHUB_BASE_URL: Base URL of the Hub (https://mydomain/)

This comment has been minimized.

@jhamrick

jhamrick Aug 27, 2016

Contributor

I assume this includes the port as well?

This comment has been minimized.

@minrk

minrk Aug 27, 2016

Member

yes, ip:port, host:port, host should all work here.

If you want to run these services in docker or other environments,
you can register it as an external service below.
A managed service is characterized by the `command` specified for launching the service.

This comment has been minimized.

@jhamrick

jhamrick Aug 27, 2016

Contributor

Will managed services also be monitored and restarted if they exit?

This comment has been minimized.

@minrk

minrk Aug 27, 2016

Member

yes, I'll add a note to this effect

## External services
You can use your own service management tools, such as docker or systemd, to manage JupyterHub services.
These are not subprocesses of the Hub, and you must tell JupyterHub what API token the service is using to perform its API requests.

This comment has been minimized.

@jhamrick

jhamrick Aug 27, 2016

Contributor

Does each service need a unique API token? What would happen if two services are started with the same API token?

This comment has been minimized.

@minrk

minrk Aug 27, 2016

Member

Yes, each service needs its own token because the requests are authenticated based on the originating service/user. Services are peers to Users in this way - each request is performed 'by' a user or service, so each token maps to a single user or service.

An example of an externally managed service with admin access and running its own web server:
```python
c.JupyterHub.services = [

This comment has been minimized.

@jhamrick

jhamrick Aug 27, 2016

Contributor

For external services, it might be nice if there was a JupyterHub API endpoint they could hit that would tell them the information that JupyterHub knows about -- for example, it could include most of the environment variables that you pass to managed services, like:

  • JUPYTERHUB_SERVICE_NAME
  • JUPYTERHUB_SERVICE_PATH
  • JUPYTERHUB_BASE_URL

Presumably this wouldn't need to include the API token or API url since those would be needed to make this call in the first place.

The reason why I think this might be useful is that then I don't need to hardcode in a bunch of configuration options for my services -- those options can be loaded dynamically from JupyterHub itself. For example, the formgrader has its own configuration options for specifying the base hub URL, which makes configuration a pain since you have to remember to set them both in the JupyterHub config as well as the nbgrader config.

This comment has been minimized.

@minrk

minrk Aug 27, 2016

Member

Good call, I'll add an API endpoint where you can fetch this.

@jhamrick

This comment has been minimized.

Contributor

jhamrick commented Aug 27, 2016

🎉 🍰

This looks awesome! I left few questions/comments in the design doc -- mostly clarification things, though I had one suggestion for an API endpoint to include that services could query.

@minrk

This comment has been minimized.

Member

minrk commented Aug 27, 2016

Thanks for the feedback, @jhamrick! I've added a few todo items. I know I said at the dev meeting that I expected to have this ready to go for you by Fall semester, but it's been a much crazier summer of conferences and workshops than I expected, so sorry about that. I'm sprinting at EuroSciPy right now, which is the last such thing I have for a couple of months, so this can be my focus for the next while.

basic implementation of managed services
- managed services are automatically restarted
- proxied services not there yet
@gen.coroutine
def delete_service(self, service, client=None):
"""Remove a service's server to the proxy table."""

This comment has been minimized.

@willingc

willingc Aug 27, 2016

Contributor

to -> from

@minrk minrk force-pushed the minrk:actual-services branch from 88a9355 to 140c4f2 Aug 27, 2016

@jhamrick

This comment has been minimized.

Contributor

jhamrick commented Aug 27, 2016

No worries, I've been super busy myself this summer so I'm not sure if I would have had time to incorporate the changes into nbgrader anyway. Once I'm back in Berkeley (tomorrow!) I should have more time to get it integrated, so you're timing is actually perfect 😄

These can only be local subprocesses of the Hub.
While there are similarities with notebook Spawners,
there are no plans to support the same spawning abstractions as notebook
If you want to run these services in docker or other environments,

This comment has been minimized.

@ssanderson

ssanderson Aug 27, 2016

Contributor

The main use-case we've had at Q for something like services is for one-per-hub containers that do things like run pgbouncer or maintain data volumes that are shared with user notebook containers.

Am I right in thinking that these uses wouldn't be well-suited for services as they're outlined here? It sounds like I basically want a managed service that isn't a simple local process (I could shell out to docker run I suppose, but then I'd have to keep the container open in interactive mode or the process would exit immediately), which is explicitly noted as being unsupported here.

This comment has been minimized.

@minrk

minrk Aug 29, 2016

Member

I think you are right, but I'll clarify a few things to be sure.

There's two use cases I intend to cover here:

  1. simple subprocesses run locally, managed by the Hub
  2. externally managed services, where this code only helps with:
    1. setting up API tokens (e.g. cull-idle, HubAuth)
    2. setting up proxy forwarding for web services (e.g. formgrader)

If what you are after is keeping an external service up while the Hub is up, then probably adding it one level up at the docker-compose (or similar) level, rather than internal to JupyterHub, makes the most sense to me.

This comment has been minimized.

@ssanderson

ssanderson Aug 29, 2016

Contributor

We've so far found it useful to have singleton spawning in the hub itself, since that lets it participate on the hub config system. This makes it easy, for example, to run only sub-parts of our full hub setup (e.g. not using pgbouncer when running locally, or not creating and destroying heavyweight containers between development runs).

Currently what we're doing is overriding the hub's setup and teardown logic to first invoke our own setup code before continuing. There are a couple things we're hooking this way, but the singleton container code looks like this:

    @gen.coroutine
    def init_spawners(self):
        self.init_singleton_containers()
        yield super().init_spawners()

    def init_singleton_containers(self):
        """
        Initialize singleton containers and remove any single-user containers
        that are left over from the last run of the hub.
        """
        log = self.log
        log.info("Initializing singleton containers.")

        self.cleanup_single_user()

        # The client instantiated in each container doesn't survive getting
        # copied from the config file, so share our client among
        # the singleton containers.
        for container in self.singleton_containers:
            log.info("Starting [%s]" % container.name)
            instances = container.instances()
            # Remove any existing instances of this container before starting.
            if instances and self.clean_singletons:
                log.warn("Found instance of [%s]. Purging." % container.name)
                container.purge()
            elif not instances:
                container.run()

    def cleanup_containers(self):
        """
        Clean up singleton containers and any single-user containers that
        haven't been removed yet.
        """
        self.log.info("Cleaning up singleton containers.")
        if self.research_config.clean_singletons:
            for container in self.singleton_containers:
                self.log.info("Stopping singleton: [%s]" % container.name)
                container.purge(stop_first=True, remove_volumes=True)

        self.cleanup_single_user()

This has worked well enough for us so far, so I'm not super concerned that services won't support our use -case.

I do, however, think there's a useful class of services which are conceptually like local processes but require some other startup mechanism. Another example of this might be the ability to set up new PeriodicCallbacks to start and stop with the hub (we do this to make our hubs send heartbeats to an external monitoring system).

This comment has been minimized.

@minrk

minrk Aug 30, 2016

Member

Would a generic extension mechanism be sufficient? For the most part, I would like to discourage people adding code that runs in the Hub process itself, but that doesn't mean I need to make it impossible.

I could start simple with two hooks:

  1. setup, fires after all Hub initialization, before starting the main loop
  2. teardown, fires during cleanup

Alternately, I could allow passing a Service object, in addition to the current dict spec, which would let you define your own start/stop behavior (pretty much equivalent to above). This might have the benefit of being more clearly not a general extension mechanism.

This comment has been minimized.

@ssanderson

ssanderson Aug 30, 2016

Contributor

For generic setup and teardown hooks, do you think there's a benefit of having configurable extension points vs telling advanced users to subclass and override JupyterHub.initialize and JupyterHub.cleanup? Flask, for example, explicitly encourages advanced users to override documented APIs on their main application class.

I think the main downside to a subclass/extend strategy for us would be that you have to write your own entrypoint to use a subclass, since the default jupyterhub cli calls JupyterHub.launch_instance directly. (I'm running from my own entrypoint on Quantopian, which is easy enough if you know where to look.)

I'd be 👍 on supporting passing instances of Service to set up services that don't fall cleanly into the existing buckets of managed/non-managed. If that were the API though, seems a little odd to also support passing dictionaries, since those are presumably getting turned into Service objects by the hub anyway.

What would you think about making Service an interface (consisting, say, of start, stop, and close), and defining two concrete implementations LocalProcessService and ExternalService, which would express the two currently-targeted cases, but would leave room for enterprising third-party developers to write, say, LocalDockerService or PeriodicalCallbackService? This would also make it easier to write services as jupyterhub plugins, since you could just tell users to put something like the following in their jupyterhub config:
`

from my_cool_module import MyCoolService

c = get_config()
c.JupyterHub.services = MyCoolService(...)

This comment has been minimized.

@minrk

minrk Aug 31, 2016

Member

do you think there's a benefit of having configurable extension points vs telling advanced users to subclass and override JupyterHub.initialize and JupyterHub.cleanup?

Not really, and subclass makes it a bit clearer who the appropriate audience for such customization is, which I like. I should document that, though, so it's not quite at the "if you know where to look" level.

If that were the API though, seems a little odd to also support passing dictionaries, since those are presumably getting turned into Service objects by the hub anyway.

I agree that it's weird. I wanted to do something a bit more declarative so there are fewer black boxes to deal with, and the capabilities are explicitly not another arbitrary scope-expansion point, but if I end up allowing an instance, I would probably make it always an instance as you suggest, instead of sometimes a dict and sometimes an instance.

A service is, by definition here, something that has API access to the Hub, so it will go in the DB and get an API token. That might be one point against using services as the extension point for things that don't actually want that.

@minrk minrk force-pushed the minrk:actual-services branch from 20f17ac to 583a57d Sep 1, 2016

add services to the proxy
and start test coverage

@minrk minrk force-pushed the minrk:actual-services branch from 583a57d to f97d32c Sep 1, 2016

minrk added some commits Sep 1, 2016

support services subdomain
- all services are on the 'services' domain, share the same cookie
remove redundant user_url utility
public_url works for users now
@minrk

This comment has been minimized.

Member

minrk commented Sep 2, 2016

It works! @jhamrick I've got everything working for services short of the formgrader-as-external-service example.

@ssanderson I'm going to stop short of trying out the instance API in this PR, but I'll take a look at it next week, and will make another PR if it seems better than the dicts here.

@willingc

This comment has been minimized.

Contributor

willingc commented Sep 2, 2016

@minrk Awesome! Do you want to merge and then I can iterate the docs and docstrings to be end-user friendly in a separate PR (which is probably less confusing than doing it here)?

@ssanderson

This comment has been minimized.

Contributor

ssanderson commented Sep 2, 2016

Sounds reasonable to me.

On Sep 2, 2016, at 9:48 AM, Min RK notifications@github.com wrote:

It works! @jhamrick I've got everything working for services short of the formgrader-as-external-service example.

@ssanderson I'm going to stop short of trying out the instance API in this PR, but I'll take a look at it next week, and will make another PR if it seems better than the dicts here.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@jhamrick

This comment has been minimized.

Contributor

jhamrick commented Sep 6, 2016

@minrk awesome. I would maybe say go ahead and merge this, and I will work on updating the formgrader (maybe tonight?). If I run into any issues we can iterate on master (unless you'd prefer to get that working first before merging).

def admin_or_self(method):
"""Decorator for restricting access to either the target service or admin"""
def m(self, name):

This comment has been minimized.

@willingc

willingc Sep 6, 2016

Contributor

Do we want a more descriptive name than 'm'?

@@ -291,6 +292,15 @@ def _subdomain_host_changed(self, name, old, new):
# if not specified, assume https: You have to be really explicit about HTTP!
self.subdomain_host = 'https://' + new
domain = Unicode(
help="domain name without proto,port"

This comment has been minimized.

@willingc

willingc Sep 6, 2016

Contributor

Are we set on using 'proto'? My first reaction was "is this a typo for proxy?" Since it's only in 3 or so places in the code base, could we change it to 'protocol'?

@@ -222,9 +267,22 @@ def check_routes(self, user_dict, routes=None):
# catch filter bug, either in sqlalchemy or my understanding of its behavior

This comment has been minimized.

@willingc

willingc Sep 6, 2016

Contributor

Hmm... I wouldn't hold up the PR for this but it would be nice to catch why this is true occasionally.

This comment has been minimized.

@minrk

minrk Sep 7, 2016

Member

This hasn't changed, it was introduced some time ago. I still don't understand how it was happening, but I've never seen it personally.

@willingc

This comment has been minimized.

Contributor

willingc commented Sep 6, 2016

+1 on merging. I left some small comments on the PR. Once it is merged, I will flesh out the user docs and docstrings for the API.

@minrk

This comment has been minimized.

Member

minrk commented Sep 7, 2016

@willingc thanks, I think I addressed your comments.

@jhamrick merging now, thanks!

@minrk minrk merged commit 8ca321e into jupyterhub:master Sep 7, 2016

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@minrk minrk deleted the minrk:actual-services branch Sep 7, 2016

@willingc

This comment has been minimized.

Contributor

willingc commented Sep 7, 2016

Thanks @minrk 🌅

@willingc willingc referenced this pull request Sep 8, 2016

Closed

Add Services #357

@minrk minrk modified the milestone: 0.7 Nov 11, 2016

chicocvenancio pushed a commit to chicocvenancio/jupyterhub that referenced this pull request May 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment