Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REST API for managing services at runtime #4381

Merged
merged 26 commits into from
Aug 9, 2023

Conversation

trungleduc
Copy link
Contributor

@trungleduc trungleduc commented Mar 1, 2023

Proposed change

This PR allows users with authorized scope to create and remove services dynamically at runtime. Since we will have two ways to define a service (via the config file and via REST API), a column is added to the services table to distinguish the origin of services.

New features

New scope

A new scope admin:services is added, this scope is required to be able to modify the services at runtime.

Add a new service via REST API

To add a new service, send a POST request to this endpoint

POST /hub/api/services/:servicename

Required scope: admin:services
Payload: The payload should contain the definition of the service to be created. The endpoint supports the same properties as services defined in the config file.
Possible responses

  • 201 Created: The service and related objects are created (and started in the case of a Hub-managed one) successfully.
  • 400 Bad Request: The payload is invalid or JupyterHub can not create the service.
  • 409 Conflict: The service with the same name already exists.

Remove an existing service via REST API

To remove a new service, send a DELETE request to this endpoint

DELETE /hub/api/services/:servicename

Required scope: admin:services
Payload: None
Possible responses

  • 200 OK: The service and related objects are removed (and stopped in case of a Hub-managed one) successfully.
  • 400 Bad Request: JupyterHub can not remove the service.
  • 404 Not Found: The requested service does not exist.
  • 405 Not Allowed: The requested service is created from the config file, it can not be removed at runtime.

Todo

  • Update the service database schema
    • Distinguish services registered at runtime from services that came from config.
    • Ensure the content parity between the services defined from the config file and from the database.
  • Logic to reconcile runtime changes from config changes at startup
    • Remove services from the db when removed from config
    • Leave services created at runtime alone during startup
  • Initialize services with config defined in the database.
    • Create services from orm.Service
  • Add scope for managing services by REST API.
  • Add handlers for service CRUD operations.
  • Logic to handle non-managed/managed services at run time.
    • Prevent changing services via the API that were registered via config
    • Update health check logic for services added at runtime
    • Start/stop managed services on adding/removing services
    • Add servers, oauth clients, API tokens to the database on adding services.
    • Remove servers, oauth clients, API tokens after removing services
    • Update the oauth_no_confirm_list list
  • Update documentation.
  • Update test.
  • Review the necessity of Expose oauth_no_confirm_list setting #4371

Reference

#4370
https://discourse.jupyter.org/t/is-there-a-facility-for-dynamically-adding-services-to-the-hub/15352/3

@trungleduc trungleduc force-pushed the service-api branch 4 times, most recently from 879b48f to 959414b Compare March 7, 2023 14:47
@minrk
Copy link
Member

minrk commented Mar 7, 2023

Thanks for opening this! Looking great so far, and I think this same simple .from_config approach will be extendable to roles in #3980

@trungleduc
Copy link
Contributor Author

trungleduc commented Mar 7, 2023

Hi @minrk,
I'm having an issue with the database test, especially the migration from the 2.1.1 version
(test_db.py::test_upgrade[2.1.1]).
The problem comes from using the ORM classes inside the migration script. In the 651f5419b74d_api_token_scopes.py script, we invoke the scopes.access_scopes function which then will use the orm.Service class.

for oauth_client in db.query(orm.OAuthClient):
allowed_scopes = set(roles.roles_to_scopes(oauth_client.allowed_roles))
allowed_scopes.update(scopes.access_scopes(oauth_client))
oauth_client.allowed_scopes = sorted(allowed_scopes)

Since I added a new column (from_config) to the orm.Service class but we're migrating from the 2.1.1 version, the database does not contain this column and it causes the failed test with the following error

 sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: services.from_config 

I can make the test pass by replacing the scopes.access_scopes call with a local version defined inside the migration script without using orm.Service, e.g.

def access_scopes(oauth_client: orm.OAuthClient, db: Session):
    """Return scope(s) required to access an oauth client
    This is a clone of `scopes.access_scopes` without using
    the `orm.Service`
    """
    scopes = set()
    if oauth_client.identifier == 'jupyterhub':
        return frozenset()
    spawner = oauth_client.spawner
    if spawner:
        scopes.add(f'access:servers!server={spawner.user.name}/{spawner.name}')
    else:
        statement = f"SELECT * FROM services WHERE oauth_client_id = '{oauth_client.identifier}'"
        service = db.execute(text(statement)).fetchall()
        if len(service) > 0:
            scopes.add(f'access:services!service={service[0].name}')
    return frozenset(scopes)

What do you think about this solution?

@minrk
Copy link
Member

minrk commented Mar 8, 2023

@trungleduc that solution in the migration script sounds reasonable!

Really, the migration steps shouldn't import anything from 'current' jupyterhub, which can change, they should bundle copies from their appropriate versions. It was short-sighted of me to do it that way, and we're seeing here exactly why!

if 'services' in tables:
op.add_column(
'services',
sa.Column('from_config', sa.Boolean, nullable=True, default=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a situation where NULL is required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, this is my leftover after debugging the migration script issue.

@trungleduc trungleduc force-pushed the service-api branch 3 times, most recently from 2ac5960 to d604a00 Compare March 10, 2023 00:17
@trungleduc trungleduc marked this pull request as ready for review March 10, 2023 22:46
@trungleduc trungleduc changed the title Add REST API for managing services at run time Add REST API for managing services at runtime Mar 13, 2023
@trungleduc
Copy link
Contributor Author

The check links failed test is not related to this PR.

@trungleduc trungleduc requested review from minrk and manics and removed request for minrk and manics March 13, 2023 16:07
@trungleduc trungleduc force-pushed the service-api branch 2 times, most recently from 9caed92 to b77ea67 Compare March 20, 2023 20:51
@trungleduc
Copy link
Contributor Author

@minrk @manics What do you think about this implementation? Is there anything else I should beware of?

@minrk
Copy link
Member

minrk commented Mar 30, 2023

This is looking good! I'm focusing on getting 4.0 out first before landing another big change, then I'll come back for a more detailed review.

One challenge we need to consider: privilege escalation. Since a user can easily use the credentials of a service they register, they must not be able to create a service with permissions they don't already have.

Here's something that shouldn't be possible:

  1. a user has permission admin:services to create services, but they don't have admin:users
  2. they create a service with permission admin:users

And we should make sure that's tested.

We have a utility for this that we already use in token permission assignment that can probably be used. I'm not sure if it should be used as-is, or if it needs to be modified a bit.

@trungleduc
Copy link
Contributor Author

Thank @minrk for the remark, I'll update the PR to take it into account.

@martinRenou
Copy link

@minrk Friendly ping :)

It would be great if you could make another round of review when you have time, thanks!

@minrk
Copy link
Member

minrk commented Aug 1, 2023

Coming back from vacation, will have a look soon.

@martinRenou
Copy link

Thanks!

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally had a chance to look through, thanks @trungleduc! I think this is almost there. I realize it's been a long time, so let me know if you'd like me to finish this up.

"""

if domain is None:
if self.domain:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not clarifying domain is defined as domain of the hub, while self.domain is also the domain of the hub, so these two variables should have the same value. I don't think there's a reason to have these args, and instead always take the if domain is None: path for domain here and host below.

# Do nothing if the config file tries to modify a API-base service
# or vice versa.
if orm_service.from_config != from_config:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be right for the Hub to halt. Invalid configuration errors like this should usually prevent loading the configuration. This method should not return without error if it did nothing, it should be up to the caller to catch that error, if it should at all.

self.service_tokens[service.api_token] = service.name
elif service.managed:
# generate new token
# TODO: revoke old tokens?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that in its own PR. It's finicky because we want to revoke the previous token, not all tokens associated with the service, which means we have to unambiguously identify when that happens.

elif parsed.scheme == 'http':
port = 80
elif parsed.scheme == 'https':
port = 443
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid raising an exception in this function to keep Jupyterhub from crashing at startup because of badly configured services.

I don't think we should do that. If config is invalid and not loaded, it's a lot more surprising to see deployment and everything succeed and your config just have no effect than it is to see an error.

Given how JupyterHub is usually deployed, logs don't tend to get seen until after a problem is noticed by real users. So admin errors like invalid service config should probably prevent startup, because otherwise bad config is more likely to go unnoticed long enough to affect users.

jupyterhub/app.py Outdated Show resolved Hide resolved
@trungleduc
Copy link
Contributor Author

Thank @minrk for your suggestion! And indeed it would be great if you can help me to push this PR over the line.

leave service check always running, since it doesn't cost anything to call an empty function once a minute
no chance for undefined port
rather than checking columns in the db

makes things more explicit
- cleanup services after each test
- more fixtures for services
@minrk minrk merged commit f215324 into jupyterhub:main Aug 9, 2023
19 checks passed
@welcome
Copy link

welcome bot commented Aug 9, 2023

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@minrk
Copy link
Member

minrk commented Aug 9, 2023

Great! Applied my review and all looks good. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants