New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store Y updates #12600
Store Y updates #12600
Conversation
Thanks for making a pull request to jupyterlab! |
ddc4d97
to
1b48c64
Compare
1b48c64
to
f5aa74a
Compare
This is ready for review. |
f5aa74a
to
8ce826b
Compare
Thanks @davidbrochart this looks good, and would be indeed be useful for debugging. |
Looks like this could be merged in its current state so it could be tested, without blocking on #12614? |
Yes I also think that #12614 is quite independent. I'm in favor of merging it. |
Curious of where this logic lives. Is it in |
That was when we used a TempFileYStore, but now we have a default SQLiteYStore which persists between JupyterLab sessions. |
OK I see you updated the description in the top-level comment, thanks 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this looks good.
Since this will automatically create a new .jupyter_ystore.db
file and users might wonder what this is about, maybe we could mention that in the RTC documentation?
Thanks @jtpio, I added a note about this file and also updated the documentation with auto-save. |
@davidbrochart what would it look like if we merge this PR now and then later get it to work with the document ID service described in #12614. I would love to see this move forward in the meantime, but want to understand the migration path. |
|
||
|
||
class JupyterSQLiteYStore(SQLiteYStore): | ||
db_path = ".jupyter_ystore.db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this file located? Is there one per directory? One per server? What are the tradeoffs of the two options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is located in the directory where JupyterLab (i.e. the server) is run. There is one file for the whole duration of the server, and if the server is run again in the same directory, the file is reused, so there is persistence between "JupyterLab sessions".
Having one file per directory could be interesting because we wouldn't identify a document with its path, but only its name. That means if JupyterLab is launched from another directory, we are still able to easily find the Y store for any opened document. With the current approach, the (relative) path to a document is encoded in the database, so changing the JupyterLab directory would create a new Y store and updates from the previous session would be lost.
Thanks for raising that up @ellisonbg, I think we should have one SQLiteYStore
per directory, what do you think?
Maybe we could have only one database for storing the Y updates and the document IDs? Actually, the database for storing the document IDs would suffer the same issues of "server relocation", so having one database per directory as discussed would be relevant for the document ID database too. So why not one database per directory for storing every document state, be it Y updates or document ID or comments? |
Whatever implementation is defined by the platform administrator (File, SQLite...), I am thinking that the storage would cover the complete arborescence under the defined Having that same persistence system to serve the ID mapping could be useful. What if the platform administrator is willing to configure with file for Y.js updates and SQLite for IDs? |
We currently have the flexibility to store Y updates into files, but it could be kept in ypy-websocket (which is independent of Jupyter), while Jupyter would require them to be stored in its own storage type. |
RENAME_SESSION = 127 | ||
|
||
|
||
class JupyterTempFileYStore(TempFileYStore): | ||
prefix_dir = "jupyter_ystore_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use class variables? You do not control their life cycle. It will be better to pass a dictionary of configurable kwargs to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that a new Y store is instantiated for each document, identifying the document with its path. But you want to store the document updates to some "common root": the database for an SQLiteYStore, or a directory prefix for a TempFileYStore. So you cannot pass this common root in the constructor, hence the class variable.
There is not issue of class variable life cycle if you create a new class for your store, which is what the docstrings suggest here and there.
It will also reduce the number of files created that will crowd the user filesystem. Or if we store it for each directory, it could be stored it under the checkpoint directory (as this is kind of an enhanced version of checkpoints). So that users already know that this is related to Jupyter, it is probably already ignored by their Version Control System and it may be removed if they need to clean thing up. |
I feel that the question of where to store the updates could be handled in a separate issue/PR, and that we should merge this PR. |
Also, we have been using YStores to debug RTC issues, by "replaying" changes to a document, and it appears to be a very valuable tool. |
b8a67ce
to
f0024ad
Compare
f0024ad
to
e699418
Compare
This PR was moved to jupyterlab/jupyter-collaboration#2. |
References
Fixes #12596.
Code changes
The first time a document is accessed from the front-end, it is read from the source file and a Y document is created in the back-end (thus no history of changes).
The next time a document is accessed from a front-end, it is synced with the Y document, either from memory if the room was not deleted, or from file otherwise. When the room was deleted, the Y updates were saved to a file. Thus, the history of changes is always available, at least if using a YStore that persists between JupyterLab sessions. By default, an
SQLiteYStore
is used and does persist.User-facing changes
None at the moment, but later we can implement a timeline of the document changes in the front-end, and potentially restore the document at a point in time.
Backwards-incompatible changes
None.