Fault tolerant storage for Jupterhub #19

jlewi · 2017-12-07T20:30:59Z

Jupyter pods are storing data in the pod volume. So if the pod dies you would lose any notebook/file edits.

We should be using a fault tolerant volume so that if the pod dies we don't lose our data.

foxish · 2017-12-08T01:37:06Z

As we discussed - this should just need additional config if we want PV per pod. If we want something that also allows sharing like NFS, we may need to add the necessary config to run that.

cc/ @yuvipanda for thoughts on sharing between notebooks.

jlewi · 2017-12-08T03:43:35Z

I'd be happy with whatever's easiest. Do you have a pointer to the config that need's to change?

I'd like to start using Jupterlab on Kubeflow to write examples for Kubeflow. But I don't want to do that until we have fault tolerant storage.

foxish · 2017-12-08T03:56:41Z

The quickest thing may be using a PV per Jupyter pod. It needs adding a spawner option.

c.KubeSpawner.user_storage_pvc_ensure = True

It may need a storage class also to be set through c.KubeSpawner.user_storage_class if it doesn't auto-provision a PV automagically already in the environment it's being run in. There's a sample config here.

yuvipanda · 2017-12-08T04:57:31Z

Note that for this and other JupyterHub config things, we have a helm chart in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/tree/master/jupyterhub that's very actively maintained and heavily used. I'd love to re-use that here than build from scratch.

yuvipanda · 2017-12-08T04:57:47Z

See docs about chart at z2jh.jupyter.org

yuvipanda · 2017-12-08T05:04:48Z

To be clear, I'm happy for this to be using KubeSpawner + Vanillay JupyterHub too! But we've put in a bunch of effort into the helm chart to provide easy OOTB solutions to things like:

Updating config of hub without disrupting users
Automatic HTTPS
Multiple authenticators
Load tests (http://github.com/yuvipanda/jupyterhub-loadtest) to validate performance at scale, etc

That we want to make sure you can re-use all that work, and we can re-use any improvements you make to the hub deployment without having to re-invent it. I understand that not everyone wants to use helm, but I do want to try find a path to not having y'all duplicate all our work there...

yuvipanda · 2017-12-08T05:08:38Z

As examples of other projects that include the JupyterHub helm chart as a dependency and build on it, see http://github.com/jupyterhub/binderhub. As examples of direct deployments that use JupyterHub and other charts, see http://github.com/jupyterhub/mybinder.org-deploy/ or http://github.com/berkeley-dsep-infra/datahub/ or http://github.com/berkeley-dsep-infra/data8xhub or http://github.com/yuvipanda/paws :)

foxish · 2017-12-08T05:09:47Z

Good point, and I totally agree. But I do think we need to come up with a cohesive solution here suitable for hub, and the other components in this repo. Having different deployment mechanisms for different parts of this effort will lead to more confusion. So, as soon as we get to that unified deployment solution, we can try match upstream as much as possible.

…

On Dec 7, 2017 11:04 PM, "Yuvi Panda" ***@***.***> wrote: To be clear, I'm happy for this to be using KubeSpawner + Vanillay JupyterHub too! But we've put in a bunch of effort into the helm chart to provide easy OOTB solutions to things like: 1. Updating config of hub without disrupting users 2. Automatic HTTPS 3. Multiple authenticators 4. Load tests (http://github.com/yuvipanda/jupyterhub-loadtest <https://github.com/yuvipanda/jupyterhub-loadtest>) to validate performance at scale, etc That we want to make sure you can re-use all that work, and we can re-use any improvements you make to the hub deployment without having to re-invent it. I understand that not everyone wants to use helm, but I do want to try find a path to not having y'all duplicate all our work there... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3U5xGWAYVDtpnwBkrxBE4XJCpJ23Qqks5s-MNxgaJpZM4Q6Jlz> .

yuvipanda · 2017-12-08T05:12:56Z

I agree too! The way I'd do it would be to make a Helm Chart for the TF controller and one for the Model Server, and KubeFlow then just configures them together (similar to the http://github.com/jupyterhub/mybinder.org-deploy/ pattern). IMO that's much more user friendly than having users edit kubernets objects directly and then apply them... I agree that using Helm for JupyterHub but something else for the other tools isn't a valid long term strategy.

I'm happy to take an initial shot at doing this if you'd like, which can easily be discarded too without hurting any of my feelings :)

yuvipanda · 2017-12-08T05:15:20Z

In the meantime I've also created #22 which provides persistent storage for each user with the current setup :)

foxish · 2017-12-08T05:24:16Z

@jlewi and I also had discussed ksonnet as one potential mechanism here and I'm not sure where the others stand. Another thought is that we could be more opinionated here in our config and possibly have less knobs since we are targeting a very specific ML use-case as opposed to the upstream project and chart which are aiming at a broader use-case. Happy to discuss this more. Perhaps we should have a "how do we manage config" issue, enumerate the options, pros and cons and go from there.

…

On Dec 7, 2017 11:12 PM, "Yuvi Panda" ***@***.***> wrote: I agree too! The way I'd do it would be to make a Helm Chart for the TF controller and one for the Model Server, and KubeFlow then just configures them together (similar to the http://github.com/jupyterhub/ mybinder.org-deploy/ <https://github.com/jupyterhub/mybinder.org-deploy/> pattern). IMO that's much more user friendly than having users edit kubernets objects directly and then apply them... I agree that using Helm for JupyterHub but something else for the other tools isn't a valid long term strategy. I'm happy to take an initial shot at doing this if you'd like, which can easily be discarded too without hurting any of my feelings :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3U517hkwOEK4zpIAAUvsMST-XwxEBSks5s-MVZgaJpZM4Q6Jlz> .

yuvipanda · 2017-12-08T05:26:50Z

That sounds like a good idea! ksonnet looks cool too! I am not attached to helm particularly, only against having end users directly edit kubernetes object specifications. 99% of the work that needs to happen to fully support this is in kubespawner anyway, so it's not a very big deal!

+1 on opening another issue. This discussion is already pretty off-topic for this issue :)

jlewi · 2017-12-08T17:44:39Z

@foxish So I modified the JupterHub config jlewi#2 the PVC and PV are created. But it doesn't look tike the pod running Jupyter for me is attaching it as a volume

    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: no-api-access-please
      readOnly: true

clkao · 2018-01-28T11:21:11Z

This issue is about user-pod storage persistence, related to #145. However the data used by the hub itself is also not persistent. IIRC the z2jh helm chart has hub pvc as well.

While we were adding pvc for every jupyter instance, we didn't mount it anywhere. Let's mount it to work dir, as I assume this is dir where users will likely put their notebooks. This will ensure that work will be retained even if pod dies. Fix #19 Fix #22

Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com>

Add e2e tests for the Notebook Controllers

foxish mentioned this issue Dec 8, 2017

Tooling to manage configuration and deployment #23

Closed

jlewi added this to the Kubecon Europe milestone Feb 21, 2018

inc0 mentioned this issue Feb 21, 2018

Add mount of pvc #270

Merged

jlewi closed this as completed in #270 Feb 23, 2018

kimwnasptd pushed a commit to arrikto/kubeflow that referenced this issue Mar 5, 2019

Add Yang to code search team. (kubeflow#19)

efa7ec9

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021

update packages (kubeflow#19)

293c6ba

Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com>

VaishnaviHire pushed a commit to VaishnaviHire/kubeflow that referenced this issue Jul 22, 2022

Merge pull request kubeflow#19 from VaishnaviHire/kfnbc-e2e

25b2fd4

Add e2e tests for the Notebook Controllers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fault tolerant storage for Jupterhub #19

Fault tolerant storage for Jupterhub #19

jlewi commented Dec 7, 2017

foxish commented Dec 8, 2017

jlewi commented Dec 8, 2017

foxish commented Dec 8, 2017 •

edited

Loading

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

foxish commented Dec 8, 2017 via email

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

foxish commented Dec 8, 2017 via email

yuvipanda commented Dec 8, 2017

jlewi commented Dec 8, 2017

clkao commented Jan 28, 2018

Fault tolerant storage for Jupterhub #19

Fault tolerant storage for Jupterhub #19

Comments

jlewi commented Dec 7, 2017

foxish commented Dec 8, 2017

jlewi commented Dec 8, 2017

foxish commented Dec 8, 2017 • edited Loading

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

foxish commented Dec 8, 2017 via email

yuvipanda commented Dec 8, 2017

yuvipanda commented Dec 8, 2017

foxish commented Dec 8, 2017 via email

yuvipanda commented Dec 8, 2017

jlewi commented Dec 8, 2017

clkao commented Jan 28, 2018

foxish commented Dec 8, 2017 •

edited

Loading