Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The need for persistence #1003

Open
betatim opened this issue Nov 17, 2019 · 15 comments
Open

The need for persistence #1003

betatim opened this issue Nov 17, 2019 · 15 comments

Comments

@betatim
Copy link
Member

betatim commented Nov 17, 2019

This issue is about working on making it easier to save a currently running Binder session as well as restoring/restarting a new Binder session from that state at a later point in time.

Right now when a user's binder session times out they lose their work. This fits the "Binder sessions are ephemeral" goal but having a way to save/restore your work would be a great feature to have. In particular for public deployments like mybinder.org where the timeouts are set to fairly short times.

Below some ideas that have been discussed previously with some pros and cons. This issue is about collecting additional ideas with their pros&cons as well as zeroing in on something simple that we can implement and test drive. It "only" has to be better than what exists now to get my support (the perfect being the enemy of the good etc). We can go for more ambitious solutions in a second iteration.

Show time till timeout

The idea is to display a countdown in the UI that lets users know how long they have left till things timeout. It would also give feedback about which actions reset the timer as people would be able to see it reset. It sounds like it should be simple to implement but I don't know if that is actually true. Can the UI access a (lower bound) on how much time is left and notice that it has been reset?

If we can access or compute how long is left this would be a nice first solution that would hopefully be simple to implement as a Jupyter notebook and Lab extension. For other UIs it would be more tricky to do.

Upload pod state to a blob store

We could execute a script via the preStop lifecycle hook of kubernetes. This script could then upload the state of the home directory (/home/jovyan) to a blob store. We'd need to find a way to tell users where to download this blob after the binder has timed out. It is also not clear that the time window the preStop script has is enough to upload everything. Unclear how a user would resume from such a download.

"Save as" uses notebook state in the browser

The state of a notebook that is open is available to the browser even after the server has gone away because the state of the notebook is only stored there. This means we could have a notebook extension that lets users save/download the notebook they are looking at even after the server has gone away. I am not sure what would need to happen or where to start. Drawbacks include that it would only cover Jupyter frontends and data files would be lost.

I will keep adding to this thread over time but please do post your own ideas and thoughts on any of these. I will try and dig out the relevant issues for the ideas that have been previously discussed so that we can pick up things from those discussions.

@manics
Copy link
Member

manics commented Nov 18, 2019

Another idea: use a browser's LocalStorage. One downside is that it's at the domain scope not the notebook / URL scope, so need to be careful about automatically restoring state. Could perhaps make it a prompt (Do you want to restore your previous state?).

@betatim
Copy link
Member Author

betatim commented Nov 18, 2019

You'd store the current notebook state in local storage and then when someone opens the same binder (or notebook) we'd ask the user if they want to restore from the local state? Would this be something to implement as a notebook extension (via some JS)?

I like the idea, the hardest part could be recognising that a notebook is the same.

@manics
Copy link
Member

manics commented Nov 19, 2019

Maybe this should be split into two topics:

  • what should be stored e.g. all of home, just the state of all notebooks, only files which are managed by the notebook ContentManager?
  • how should it be stored?

@betatim
Copy link
Member Author

betatim commented Nov 19, 2019

I'd keep it as one issue to collect all the ideas and their pros&cons. Then if there is consensus on what to start with make a new issue to implement this.

At least for me what to store, for how long and where are parts of the trade-offs we can make. "Everything on my personal dropbox" being maybe one end of the extreme and "nothing, nowhere, never" at the other end.

My guess would be that by being able to recover the currently open-in-the-tab notebook we'd already make a lot of people happy. Without any special upload functionality or anything. Just don't know where to get started on that (I assume we need some JS code for this which makes it a notebook extension?)

@manics
Copy link
Member

manics commented Nov 19, 2019

Quick proof-of-concept (tested on Firefox):

  1. Load a repository in binder, open a notebook, make some changes, save it.
  2. Open your browser's JavaScript console for that page
  3. Paste this into the JS console and run it:
Jupyter.contents.get(Jupyter.notebook.notebook_path, {type: "notebook", content: true}).then(function(value) {
    console.log(value);
    localStorage.setItem(Jupyter.notebook.notebook_path, JSON.stringify(value));
}, function(value) { alert("Failed to get notebook"); } )

It should save your current notebook into localstorage

  1. Load a new instance of the same binder repository, you must be on the same domain (if the federator directs you to a different binderhub this won't work). Open the same notebook.
  2. Open your browser's JavaScript console for that page
  3. Paste this into the JS console and run it:
loaded = JSON.parse(localStorage.getItem(Jupyter.notebook.notebook_path));
Jupyter.notebook.fromJSON(loaded)

If it works you should see your notebook with previous changes!

I think https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API would be a better option than localstorage though, it's more complicated but it's designed for storing much larger amounts of data.

As you've already mentioned the biggest problem is finding a repository identifier when storing the notebook.

@betatim
Copy link
Member Author

betatim commented Nov 20, 2019

I am unreasonably excited by this :) Is this the moment where we make a new issue for "Browser based storage of notebooks"? (I'd say yes)

What do you think of the following: we hook into the "save" event of each notebook and store stuff to "browser storage" (IndexDB or LocalStorage or ...).

I could see two alternatives for letting users get their stuff:

  • We also install a handler that when people try to navigate away/close the tab we show a popup "are you sure?" (the kind you get with unsaved changes in a form on other pages) and somehow tell people they can get their notebook by clicking somewhere.
  • We detect that the kernel has become disconnected and show a popup with a button to download the notebook as ipynb. I think "kernel unreachable" means that the user lost internet connectivity or the binder has been shutdown, and in both cases we would want to save the notebook. Are there more cases where this happens?

In order to offer the user a "want to restore from browser storage?" option when they open a notebook it would be nice if we had a unique ID in the notebook metadata. Maybe a (big) random number that is written when the notebook is first created. You could then use that as key. I will create an issue in the notebook format repo to see if we can start a discussion on this.

@manics
Copy link
Member

manics commented Nov 20, 2019

Is this the moment where we make a new issue for "Browser based storage of notebooks

Sounds good to me!

In order to offer the user a "want to restore from browser storage?" option when they open a notebook it would be nice if we had a unique ID in the notebook metadata

I think a repository identifer would be useful alongside a notebook UUID:

  • works with existing repos and notebooks that don't have an embedded UUID
  • there are other use-cases that require the original repo: Linking back to the source of a binder #674 (looks like there were several PRs, I can't work out whether it's working or not though)
  • if we have "Browser based storage of notebooks" you could imagine a new page https://mybinder.org/localstorage that lists all notebooks in local storage. If the metadata contains the original repo identifier (e.g. the launch URL) you could click on the notebook and have it "just work".
  • It works for non-notebooks too. IndexedDB might support the "Everything on my personal dropbox" option!

@betatim
Copy link
Member Author

betatim commented Nov 21, 2019

There are now REF_URL and REPO_URL on mybinder.org (via jupyterhub/mybinder.org-deploy#1202) that let you know which repo you are in and the info to start a new binder again. So we could store that together with the notebook path.

What do you think of starting with just notebooks in the browser storage? It feels like if we include arbitrary files we need to figure out something to prevent (very) large files filling up the browser storage/need a good UI to let people inspect/manage it.

The domain thing is a shame but as long as we start with the "store in your browser" feature being an optional extra I feel like we can get started instead of having to find a solution to this from day one.

@betatim
Copy link
Member Author

betatim commented Nov 21, 2019

To show the time left till the timeout we can talk to curl -H "Authorization: bearer $JPY_API_TOKEN" http://hub:8081/hub/api/users/$JUPYTERHUB_USER from inside the container/notebook and it will tell us the last activity timestamp from which we can start a countdown with ~8min as countdown time or some such. It is a lower bound because each kernel first has to time out etc. So this is a "simple fix" only.

@ivan-gomes
Copy link
Contributor

Is there any way we could do this extensibly - Contents API comes to mind - such that one could implement persistence with a blob store like S3 in addition to browser to stretch beyond a single browser and computer?

@betatim
Copy link
Member Author

betatim commented Nov 21, 2019

Sounds interesting. Could you re-post/link you comment in #1007 @ivan-gomes ? Using a remote storage like S3 sounds interesting but like something I'd postpone until we have a first version that works with users. Trying to figure out auth and such could be tricky :-/

I don't see this being implemented as a contents API storage (I think there are already options there to use S3 in https://github.com/nteract/bookstore?). We'd use the Contents API to get the contents of the notebook (maybe). Anyway, something for the other issue :)

@consideRatio
Copy link
Member

Questions

Is it possible, within for example JupyterLab, to download an open notebook from the UI even though we have been disconnected from the server? Is there such JupyterLab extension already? It sounds like like something useful to create totally separate from creating it specifically for mybinder.org or similar.

@manics
Copy link
Member

manics commented Dec 3, 2019

https://github.com/manics/jupyter-offlinenotebook lets you download an open notebook on any system running jupyter-notebook, it's only the local-storage and binder links that are restricted to BinderHub, though the local storage restriction could be relaxed to work on any system.

PRs adding Jupyterlab support would be very welcome 😀

@manics
Copy link
Member

manics commented Dec 9, 2019

@almereyda
Copy link

After jupyterlab/jupyterlab#5382 (comment) had been merged, there is a SharedNotebook object now that one could use to persist a notebook, too.

The server doesn't yet use the implemented SharedNotebook because it is written in Javascript. However, we are making good progress on the Yjs-Python port, that would allow the kernel to write messages directly to the shared notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants