Add a public API for getting a read-only view of the shared model #275

krassowski · 2024-04-01T13:32:25Z

Closes #270

Adds get_document(path, content_type, file_format) method to the YDocExtension.

The returned object is a fork of the original document, thus any modifications are not propagated back.

github-actions · 2024-04-01T13:32:42Z

👈 Launch a Binder on branch krassowski/jupyter_collaboration/public-api-for-getting-document

davidbrochart · 2024-04-01T17:31:39Z

jupyter_collaboration/app.py

+            fork_ydoc = Doc()
+            fork_ydoc.apply_update(update)
+
+            return YDOCS.get(content_type, YDOCS["file"])(fork_ydoc)


So this will be a snapshot of the document at the time this method is called, but don't you want a live document, with all future updates applied?

Good point. For my use case I am fine with having just a snapshot, but maybe having a proper view which gets the future updates would make more sense for others?

This view is what we call a fork in the suggestion system (see here). The fork is a copy of the root, just like you did, but it is continuously rebased on the root, so that it gets all the live updates from the root. On the other hand, the fork's updates are not applied to the root just yet, only when the suggestion is accepted do we merge it back into the root.
I don't know if you would be fine with having a live fork, or you strictly want a snapshot. In the latter case, I'm wondering why you still need a shared document, rather than just converting it to a plain Python object?

I don't know if you would be fine with having a live fork.

Maybe.

I do not strictly want a snapshot. I am indifferent to that, as long as it does not force me to wait for the lock/update/etc.

In the latter case, I'm wondering why you still need a shared document, rather than just converting it to a plain Python object?

Converting a notebook to a plain Python object takes 4 times longer (#270 (comment)). 30 ms on a notebook with 100 empty cells makes me worried that this would be unsuitable for the completions use case especially when notebooks have large visualisations in outputs etc. Often we will only want last/next ~10 cells so we would not be converting full notebook to Python ever.

You could also directly connect to the shared document (not a copy), and convert a subset of it in the frontend. For instance, get the content of the last 10 cells.

This is exactly what I am doing in jupyterlab/jupyter-ai#708 and plan to use this PR for. The question is do we want to return:

a frozen read-only model copy

a live read-only model

a live read-write model

Again, for my use case this is really indifferent. From the point of view of API design I thought that starting with the smallest possible API surface (frozen model copy) and then expanding it in backward-compatible way (adding live updates, adding write access) seems like an easy decision (well maybe that is not fully backward compatible, but it would not be breaking for most use cases).

Currently this PR returns a frozen (but not read-only) copy of the model. I believe I can make it read-only by monkey-patching some methods in the Model instance after creating the copy so that they raise an exception when write is attempted.

What is the best next step/solution in your opinion?

Maybe I'm missing something, but I don't understand the benefit of having a frozen copy of the model. Memory-wise, it will be bigger than converting the live model to a Python object.
I think get_document() should just return the shared model, not a copy of it. Then a subset of it can be converted to a Python object, which can be considered read-only as it won't affect the original shared model.

but I don't understand the benefit of having a frozen copy of the model

There is not any - other than that it is a simple way of ensuring that other extensions do not start writing to the shared model via this new public API.

I think get_document() should just return the shared model, not a copy of it.

I am happy with that.

What do you think about adding a parameter get_document(copy=True)? In the future we could also add live=True, which would correspond to the use-case we have in the suggestion system.

Sounds fine to me, done in: 3a8ba7d

jupyter_collaboration/app.py

Co-authored-by: David Brochart <david.brochart@gmail.com>

krassowski · 2024-04-12T08:44:32Z

@davidbrochart should we merge and iterate or do you have any further suggestions to address here first?

davidbrochart

Let's merge, thanks Mike!

krassowski added the enhancement New feature or request label Apr 1, 2024

davidbrochart reviewed Apr 1, 2024

View reviewed changes

davidbrochart requested changes Apr 11, 2024

View reviewed changes

jupyter_collaboration/app.py Outdated Show resolved Hide resolved

jupyter_collaboration/app.py Show resolved Hide resolved

krassowski and others added 4 commits April 11, 2024 17:32

Implement get_document() API

fc85cdb

Return a live copy

651084b

Add copy argument

6cd09ef

Update the docstring to reflect the copy arg

a87e20e

Co-authored-by: David Brochart <david.brochart@gmail.com>

krassowski force-pushed the public-api-for-getting-document branch from 24d90cc to a87e20e Compare April 11, 2024 16:32

krassowski requested a review from davidbrochart April 11, 2024 17:40

krassowski mentioned this pull request Apr 12, 2024

Support server-side execution #279

Merged

davidbrochart approved these changes Apr 12, 2024

View reviewed changes

davidbrochart merged commit cf774a4 into jupyterlab:main Apr 12, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a public API for getting a read-only view of the shared model #275

Add a public API for getting a read-only view of the shared model #275

krassowski commented Apr 1, 2024

github-actions bot commented Apr 1, 2024

davidbrochart Apr 1, 2024

krassowski Apr 1, 2024

davidbrochart Apr 1, 2024

krassowski Apr 2, 2024

davidbrochart Apr 2, 2024

krassowski Apr 11, 2024

davidbrochart Apr 11, 2024

krassowski Apr 11, 2024

davidbrochart Apr 11, 2024

krassowski Apr 11, 2024

krassowski commented Apr 12, 2024

davidbrochart left a comment

Add a public API for getting a read-only view of the shared model #275

Add a public API for getting a read-only view of the shared model #275

Conversation

krassowski commented Apr 1, 2024

github-actions bot commented Apr 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krassowski commented Apr 12, 2024

davidbrochart left a comment

Choose a reason for hiding this comment