Store Y updates #12600

davidbrochart · 2022-05-19T22:20:30Z

References

Code changes

The first time a document is accessed from the front-end, it is read from the source file and a Y document is created in the back-end (thus no history of changes).
The next time a document is accessed from a front-end, it is synced with the Y document, either from memory if the room was not deleted, or from file otherwise. When the room was deleted, the Y updates were saved to a file. Thus, the history of changes is always available, at least if using a YStore that persists between JupyterLab sessions. By default, an SQLiteYStore is used and does persist.

User-facing changes

None at the moment, but later we can implement a timeline of the document changes in the front-end, and potentially restore the document at a point in time.

Backwards-incompatible changes

None.

jupyterlab-probot · 2022-05-19T22:20:31Z

Thanks for making a pull request to jupyterlab!
To try out this branch on binder, follow this link:

davidbrochart · 2022-05-24T07:48:42Z

This is ready for review.

jtpio · 2022-05-30T16:12:35Z

Thanks @davidbrochart this looks good, and would be indeed be useful for debugging.

jtpio · 2022-05-30T16:12:58Z

Looks like this could be merged in its current state so it could be tested, without blocking on #12614?

davidbrochart · 2022-05-30T16:14:51Z

Yes I also think that #12614 is quite independent. I'm in favor of merging it.

jtpio · 2022-05-31T12:13:05Z

least during the life cycle of the Jupyter server

Curious of where this logic lives. Is it in SQLiteYStore? Looking at the diff the database does not seem to be modified or deleted on server shutdown?

davidbrochart · 2022-05-31T12:30:43Z

That was when we used a TempFileYStore, but now we have a default SQLiteYStore which persists between JupyterLab sessions.

jtpio · 2022-06-03T07:22:48Z

OK I see you updated the description in the top-level comment, thanks 👍

jtpio

Thanks this looks good.

Since this will automatically create a new .jupyter_ystore.db file and users might wonder what this is about, maybe we could mention that in the RTC documentation?

https://jupyterlab.readthedocs.io/en/latest/user/rtc.html

davidbrochart · 2022-06-03T12:08:23Z

Thanks @jtpio, I added a note about this file and also updated the documentation with auto-save.

ellisonbg · 2022-06-07T01:20:57Z

@davidbrochart what would it look like if we merge this PR now and then later get it to work with the document ID service described in #12614. I would love to see this move forward in the meantime, but want to understand the migration path.

ellisonbg · 2022-06-07T01:22:09Z

jupyterlab/handlers/ydoc_handler.py

+
+
+class JupyterSQLiteYStore(SQLiteYStore):
+    db_path = ".jupyter_ystore.db"


Where is this file located? Is there one per directory? One per server? What are the tradeoffs of the two options?

This file is located in the directory where JupyterLab (i.e. the server) is run. There is one file for the whole duration of the server, and if the server is run again in the same directory, the file is reused, so there is persistence between "JupyterLab sessions".
Having one file per directory could be interesting because we wouldn't identify a document with its path, but only its name. That means if JupyterLab is launched from another directory, we are still able to easily find the Y store for any opened document. With the current approach, the (relative) path to a document is encoded in the database, so changing the JupyterLab directory would create a new Y store and updates from the previous session would be lost.
Thanks for raising that up @ellisonbg, I think we should have one SQLiteYStore per directory, what do you think?

davidbrochart · 2022-06-07T07:07:07Z

@davidbrochart what would it look like if we merge this PR now and then later get it to work with the document ID service described in #12614. I would love to see this move forward in the meantime, but want to understand the migration path.

Maybe we could have only one database for storing the Y updates and the document IDs? Actually, the database for storing the document IDs would suffer the same issues of "server relocation", so having one database per directory as discussed would be relevant for the document ID database too. So why not one database per directory for storing every document state, be it Y updates or document ID or comments?
This would make getting the document for a given ID more complicated though, because we would need to walk through all the sub-directories to find the ID.

echarles · 2022-06-07T07:13:04Z

Whatever implementation is defined by the platform administrator (File, SQLite...), I am thinking that the storage would cover the complete arborescence under the defined root_dir. So if the server is started with root_dir==/foo/barwith SQLite , there should be a single database serving all the update of all the documents under/foo/bar`.

Having that same persistence system to serve the ID mapping could be useful. What if the platform administrator is willing to configure with file for Y.js updates and SQLite for IDs?

davidbrochart · 2022-06-07T07:20:57Z

Having that same persistence system to serve the ID mapping could be useful. What if the platform administrator is willing to configure with file for Y.js updates and SQLite for IDs?

We currently have the flexibility to store Y updates into files, but it could be kept in ypy-websocket (which is independent of Jupyter), while Jupyter would require them to be stored in its own storage type.

fcollonval · 2022-06-07T07:54:21Z

jupyterlab/handlers/ydoc_handler.py

 RENAME_SESSION = 127


+class JupyterTempFileYStore(TempFileYStore):
+    prefix_dir = "jupyter_ystore_"


Why do you use class variables? You do not control their life cycle. It will be better to pass a dictionary of configurable kwargs to the constructor.

The problem is that a new Y store is instantiated for each document, identifying the document with its path. But you want to store the document updates to some "common root": the database for an SQLiteYStore, or a directory prefix for a TempFileYStore. So you cannot pass this common root in the constructor, hence the class variable.
There is not issue of class variable life cycle if you create a new class for your store, which is what the docstrings suggest here and there.

fcollonval · 2022-06-07T08:02:00Z

Whatever implementation is defined by the platform administrator (File, SQLite...), I am thinking that the storage would cover the complete arborescence under the defined root_dir. So if the server is started with root_dir==/foo/barwith SQLite , there should be a single database serving all the update of all the documents under/foo/bar`.

It will also reduce the number of files created that will crowd the user filesystem.

Or if we store it for each directory, it could be stored it under the checkpoint directory (as this is kind of an enhanced version of checkpoints). So that users already know that this is related to Jupyter, it is probably already ignored by their Version Control System and it may be removed if they need to clean thing up.

davidbrochart · 2022-06-13T21:29:19Z

I feel that the question of where to store the updates could be handled in a separate issue/PR, and that we should merge this PR.
Let's not forget that it solves a bug where content is duplicated when a user disconnects for more than one minute (by default).

davidbrochart · 2022-06-14T07:25:28Z

Also, we have been using YStores to debug RTC issues, by "replaying" changes to a document, and it appears to be a very valuable tool.
A JupyterLab extension to navigate through a document history using its YStore could be nice to have.

davidbrochart · 2022-07-12T06:36:33Z

This PR was moved to jupyterlab/jupyter-collaboration#2.

github-actions bot assigned davidbrochart May 19, 2022

davidbrochart mentioned this pull request May 19, 2022

Save document changes to disk in RTC #12596

Closed

davidbrochart force-pushed the save_updates branch from ddc4d97 to 1b48c64 Compare May 20, 2022 08:31

davidbrochart marked this pull request as draft May 20, 2022 10:05

davidbrochart force-pushed the save_updates branch from 1b48c64 to f5aa74a Compare May 23, 2022 16:38

davidbrochart marked this pull request as ready for review May 23, 2022 16:58

davidbrochart force-pushed the save_updates branch from f5aa74a to 8ce826b Compare May 24, 2022 13:15

davidbrochart changed the title ~~Save Y updates to sidecar file~~ Store Y updates May 27, 2022

jtpio added the enhancement label May 30, 2022

jtpio added this to the 4.0 milestone May 30, 2022

jtpio reviewed Jun 3, 2022

View reviewed changes

github-actions bot added the documentation label Jun 3, 2022

ellisonbg reviewed Jun 7, 2022

View reviewed changes

fcollonval reviewed Jun 7, 2022

View reviewed changes

davidbrochart mentioned this pull request Jul 6, 2022

Store Y updates jupyter-server/jupyverse#190

Merged

davidbrochart force-pushed the save_updates branch from b8a67ce to f0024ad Compare July 7, 2022 07:03

davidbrochart added 9 commits July 7, 2022 10:38

Save Y updates to sidecar file

a102dec

Use temporary directory for saving Y updates

2146231

Use ypy-websocket >=0.1.9

c153d0f

Use ypy-websocket >=0.1.10

ee4b84c

Handle out-of-sync YStore/source file

3bf919d

Make SQLiteYStore the default

9941f01

Use ypy-websocket >=0.1.12

c8262a0

Update RTC documentation

625c882

Bump ypy-websocket to v0.1.13

e699418

davidbrochart force-pushed the save_updates branch from f0024ad to e699418 Compare July 7, 2022 08:42

echarles mentioned this pull request Jul 10, 2022

Move JupyterLab's YDocWebSocketHandler here jupyter-server/jupyter_server#901

Closed

davidbrochart mentioned this pull request Jul 10, 2022

Save Y updates using YStore jupyterlab/jupyter-collaboration#2

Merged

davidbrochart closed this Jul 12, 2022

davidbrochart mentioned this pull request Jul 25, 2022

Store Y updates #12852

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store Y updates #12600

Store Y updates #12600

davidbrochart commented May 19, 2022 •

edited

jupyterlab-probot bot commented May 19, 2022

davidbrochart commented May 24, 2022

jtpio commented May 30, 2022

jtpio commented May 30, 2022 •

edited

davidbrochart commented May 30, 2022

jtpio commented May 31, 2022

davidbrochart commented May 31, 2022

jtpio commented Jun 3, 2022

jtpio left a comment

davidbrochart commented Jun 3, 2022

ellisonbg commented Jun 7, 2022

ellisonbg Jun 7, 2022

davidbrochart Jun 7, 2022

davidbrochart commented Jun 7, 2022

echarles commented Jun 7, 2022

davidbrochart commented Jun 7, 2022

fcollonval Jun 7, 2022

davidbrochart Jun 7, 2022

fcollonval commented Jun 7, 2022

davidbrochart commented Jun 13, 2022

davidbrochart commented Jun 14, 2022

davidbrochart commented Jul 12, 2022



		class JupyterSQLiteYStore(SQLiteYStore):
		db_path = ".jupyter_ystore.db"

Store Y updates #12600

Store Y updates #12600

Conversation

davidbrochart commented May 19, 2022 • edited

References

Code changes

User-facing changes

Backwards-incompatible changes

jupyterlab-probot bot commented May 19, 2022

davidbrochart commented May 24, 2022

jtpio commented May 30, 2022

jtpio commented May 30, 2022 • edited

davidbrochart commented May 30, 2022

jtpio commented May 31, 2022

davidbrochart commented May 31, 2022

jtpio commented Jun 3, 2022

jtpio left a comment

Choose a reason for hiding this comment

davidbrochart commented Jun 3, 2022

ellisonbg commented Jun 7, 2022

ellisonbg Jun 7, 2022

Choose a reason for hiding this comment

davidbrochart Jun 7, 2022

Choose a reason for hiding this comment

davidbrochart commented Jun 7, 2022

echarles commented Jun 7, 2022

davidbrochart commented Jun 7, 2022

fcollonval Jun 7, 2022

Choose a reason for hiding this comment

davidbrochart Jun 7, 2022

Choose a reason for hiding this comment

fcollonval commented Jun 7, 2022

davidbrochart commented Jun 13, 2022

davidbrochart commented Jun 14, 2022

davidbrochart commented Jul 12, 2022

davidbrochart commented May 19, 2022 •

edited

jtpio commented May 30, 2022 •

edited