Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop and restart Jupyter Servers while maintaining state #4857

Closed
karlschriek opened this issue Mar 12, 2020 · 13 comments
Closed

Stop and restart Jupyter Servers while maintaining state #4857

karlschriek opened this issue Mar 12, 2020 · 13 comments
Labels

Comments

@karlschriek
Copy link

karlschriek commented Mar 12, 2020

/kind feature

Why you need this feature:
I support data science teams with sophisticated Machine Learning toolsets and in particular have several teams right now who are interested in adopting Kubeflow largely because of the Jupyter Notebook functionality.

There is a lot to like about what is on there at the moment: you can customize the image you want to use, you can set the computing resources you need, you can attach volumes and configurations etc.

There are however, some limitations. Most notably the fact that you cannot stop a Jupyter Server and restart it the next day, just continuing where you left off. Sure, you can create a new server and reattach the PV that you worked on, but that will only restore the notebooks you created. Anything else you did (such as installing additional Python packages, changing Jupyter's theme, setting git credentials etc.) will go lost.

Most users will balk at this. In this case it would be much simpler for them to just start their own compute instance and run Jupyter from there, happily installing and changing stuff as they go along, and then just shutting the instance down at the end of the day.

Describe the solution you'd like:
Quite simply a "shut down server" button in the central dashboard that scales the stateful set for the server down to zero and a "start server" button that scales it back up again. I don't work with stateful sets that often, so I will admit that I am not certain of the complexity involved here.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.93

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@kubeflow-bot kubeflow-bot added this to To Do in Needs Triage Mar 12, 2020
@yanniszark
Copy link
Contributor

/cc @kimwnasptd
/area jupyter
/priority p1

@k8s-ci-robot k8s-ci-robot added area/jupyter Issues related to Jupyter priority/p1 labels Mar 12, 2020
@kubeflow-bot kubeflow-bot removed this from To Do in Needs Triage Mar 12, 2020
@karlschriek
Copy link
Author

As a workaround for now, the following kubectl commands can be used to trigger stop/restart:

export NAMESPACE="my-namespace"
export NOTEBOOK="my-notebook-server"
# shut down
kubectl annotate notebook/${NOTEBOOK} kubeflow-resource-stopped="true" -n ${NAMESPACE}
# restart
kubectl annotate notebook/${NOTEBOOK} kubeflow-resource-stopped- -n ${NAMESPACE}

@jlewi
Copy link
Contributor

jlewi commented Apr 20, 2020

@karlschriek how do those annotations work?

Anything else you did (such as installing additional Python packages, changing Jupyter's theme, setting git credentials etc.) will go lost.

/home/jovyan should be on the PVC. So any git credentials or pip packages installed in /home/jovyan won't be lost if the user restarts the notebook using the same PVC.

@karlschriek
Copy link
Author

As far as I can gather the annotations trigger a culling service that looks for pods to shut down (presumably it scales the StatefulSet down to zero), so in principle the functionality to stop/start a server already exists, it just needs to be exposed to users in the dashboard.

True, /home/jovyan will be on the PVC and thinking a bit more about it that might be sufficient as long as it is clear to users that they cannot (should not) change any system-wide settings and expect it to still be there the next day, But this probably runs a bit counter to what many users would expect.

This wouldn't matter if we use a stop/restart approach rather than a delete/create (with re-attach) approach since the state would in that case be saved entirely. This also seems to be a much more intuitive approach to me as opposed to asking users to remember which PVC they used and attaching it to a new server every time.

@Jeffwan
Copy link
Member

Jeffwan commented Aug 11, 2020

I think we can either improve culling feature in Jupyter or support a new button to scale statefulset to 0.

Actually both can be supported for different use cases. We'd better bring notebook status in the UI as well

@swiftdiaries
Copy link
Member

How would replacing the backend for the controller to a Knative Service work?
We can get scale-to-zero functionality from Knative Serving. Or is that over-engineering for this feature?

@wdhorton
Copy link
Contributor

wdhorton commented Aug 12, 2020

If this works:

export NAMESPACE="my-namespace"
export NOTEBOOK="my-notebook-server"
# shut down
kubectl annotate notebook/${NOTEBOOK} kubeflow-resource-stopped="true" -n ${NAMESPACE}
# restart
kubectl annotate notebook/${NOTEBOOK} kubeflow-resource-stopped- -n ${NAMESPACE}

it seems like we could get to this functionality with these changes:

  1. some API that could be called to make these annotation changes
  2. a button on the notebook dashboard that calls the API
  3. a display on the dashboard that turns these annotations into a user-visible status

I know there are probably a lot of weird edge cases, but it seems like those are all achievable steps without having to do major re-architecting?

@thesuperzapper
Copy link
Member

For those watching, this feature has been implemented in PR #5280

@stale
Copy link

stale bot commented Dec 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Dec 6, 2020
@thesuperzapper
Copy link
Member

I am gonna bump this, because we have implemented it, but its not merged into master yet.

@stale stale bot removed the lifecycle/stale label Dec 6, 2020
@DavidSpek DavidSpek added this to In progress in Notebooks WG Dec 16, 2020
@DavidSpek
Copy link
Contributor

Closing this issue as the feature to start/stop a notebook server is integrated into the Jupyter Web App that is being released wtih 1.3. The issue regarding user installed packages through pip being persistent does remain. However, I have included a small section about this in the README for the new notebook images.
/close

@google-oss-robot
Copy link

@DavidSpek: Closing this issue.

In response to this:

Closing this issue as the feature to start/stop a notebook server is integrated into the Jupyter Web App that is being released wtih 1.3. The issue regarding user installed packages through pip being persistent does remain. However, I have included a small section about this in the README for the new notebook images.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Notebooks WG automation moved this from In progress to Done Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests