Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates a new FileLoader class to separate the logic of watching files #121

Merged
merged 25 commits into from
Apr 11, 2023

Conversation

hbcarlos
Copy link
Member

@hbcarlos hbcarlos commented Mar 14, 2023

Creates a new FileLoader class to separate the logic of watching files from the WebSocketHandler.

Code changes:

  • Split classes in different files
  • Creates a new FileLoader class to centralize loading, saving, and watching files.
    Multiple rooms can get an instance of this class and use it to access the content safely since we can create a single lock by file to ensure two rooms are not accessing the same file simultaneously.
  • Updates the DocumentRoom to use the FileLoader.

TODO:

  • Create an abstract class for FileLoader to extend from.
    • Some extensions (like jupytercad) may want their own loader.
    • Allow registering new file loaders.
  • Add documentation

@hbcarlos hbcarlos added the enhancement New feature or request label Mar 14, 2023
@hbcarlos hbcarlos self-assigned this Mar 14, 2023
@github-actions
Copy link
Contributor

Binder 👈 Launch a Binder on branch hbcarlos/jupyter_collaboration/refactor

@codecov-commenter
Copy link

codecov-commenter commented Mar 14, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (eb8456d) 0.00% compared to head (ac92018) 0.00%.

Additional details and impacted files
@@          Coverage Diff           @@
##            main    #121    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files          3       7     +4     
  Lines        297     418   +121     
======================================
- Misses       297     418   +121     
Impacted Files Coverage Δ
jupyter_collaboration/app.py 0.00% <0.00%> (ø)
jupyter_collaboration/handlers.py 0.00% <0.00%> (ø)
jupyter_collaboration/loaders.py 0.00% <0.00%> (ø)
jupyter_collaboration/rooms.py 0.00% <0.00%> (ø)
jupyter_collaboration/stores.py 0.00% <0.00%> (ø)
jupyter_collaboration/utils.py 0.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hbcarlos
Copy link
Member Author

It would be great to have a way to swap loaders so extensions can register new document types and the logic to load those documents. For example, JupyterCAD registers a new document type that uses the FreeCAD format. This format is a compressed folder with multiple XML files. Jupyter Server needs to learn how to open those documents, and the contents service needs a way to plug new loaders.

I have looked into the contents service of Jupyter Server, trying to find the best solution for this problem. Currently, the ContentsManager only allows registering pre/post save hooks. We could use these hooks to load the content of a specific file type, but we can not ensure that the hook we register will be executed the last one. In addition, there are only hooks for saving the file. There are no hooks for loading the file.

I'm not trying to solve the problem in this PR, but I want to redesign the architecture of jupyter_collaboration mindful of this problem to solve it in a follow-up PR.

At the moment, I see two options:
1 - Create a new file loader manager that allows registering loaders depending on the file types, and create the default FileLoader using Jupyter Server's content manager as we currently do.
2 - Create a new content manager based on the default content manager that allows registering loaders for each file type.

There is a caveat with the second option. If jupyter_collaboration is installed with another extension that also swaps the default content manager, it may not work.

@hbcarlos
Copy link
Member Author

hbcarlos commented Mar 14, 2023

Hi @Zsailer. I'm pinging you since your thoughts here would be appreciated, and you might be interested in the comment I posted above.

I will try to join the jupyter server team meeting this week to discuss this topic. Jupyter Server's collaborators should be aware of and participate in the decisions taken here in the jupyter_collaboration extension since this involves some parts of the jupyter server.

Thanks in advance for your comments and your time!

@Zsailer
Copy link
Member

Zsailer commented Mar 16, 2023

Thanks for pinging me here, @hbcarlos. I'll give this a more thorough review over the next couple of days.

I don't know if I'll make it to tomorrow's Jupyter Server meeting—I have a family commitment that conflicts with part of the meeting. If I don't see you then, let's connect over gitter and maybe set up a time to meet separately?

@hbcarlos
Copy link
Member Author

Thanks, @Zsailer!

If I don't see you then, let's connect over gitter and maybe set up a time to meet separately?

It's fine, we can continue the discussion here and discuss also it at next week's meeting.

@hbcarlos
Copy link
Member Author

With the current implementation, the content is not synced between different rooms. It doesn't sync because there is only one property, _last_modified, for each file. If room A saves to disk, we update the property _last_modified. Next time we load the content, we will not notify every other room because, for the file, the content is up to date. See:
https://github.com/jupyterlab/jupyter_collaboration/blob/b1aedb92da354cf6c9df8979755145e8a3e952d7/jupyter_collaboration/loaders.py#L89

We could change this logic and keep a timestamp by room, allowing to sync between rooms. However, I prefer not to sync between rooms for now. If we sync the content between rooms, each time the content changes in one room, on every other room, we will create a Y update replacing the entire content. This could increase memory usage, so I would like to experiment before allowing rooms to sync.

@davidbrochart
Copy link
Collaborator

With the current implementation, the content is not synced between different rooms.

You mean with this PR? Before this PR, the content is synced between rooms.

@davidbrochart
Copy link
Collaborator

If we sync the content between rooms, each time the content changes in one room, on every other room, we will create a Y update replacing the entire content.

See jupyter-server/jupyter_ydoc#15.

@hbcarlos
Copy link
Member Author

You mean with this PR?

Yes, I meant in this PR.

@davidbrochart
Copy link
Collaborator

So you mean that this PR breaks document synchronization between rooms?

Copy link
Member

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hbcarlos

Here are some suggestions. It will be good to add doc string to document the code and unit tests to at least test for the correctness of the instantiation and clean actions.

jupyter_collaboration/handlers.py Outdated Show resolved Hide resolved
jupyter_collaboration/handlers.py Show resolved Hide resolved
jupyter_collaboration/handlers.py Outdated Show resolved Hide resolved

def check_origin(self, origin):
return True

@classmethod
def clean_up(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another proof that the architecture is not yet great - but let put it aside for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm doing my best, but there are already a lot of changes in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to open an issue, and I'll work on it in another PR

jupyter_collaboration/loaders.py Show resolved Hide resolved
jupyter_collaboration/loaders.py Outdated Show resolved Hide resolved
jupyter_collaboration/loaders.py Outdated Show resolved Hide resolved
jupyter_collaboration/rooms.py Show resolved Hide resolved
jupyter_collaboration/rooms.py Show resolved Hide resolved
prefix_dir = "jupyter_ystore_"


class SQLiteYStoreMetaclass(type(LoggingConfigurable), type(_SQLiteYStore)): # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment why we need this complex code based on metaclass rather than the classical direct inheritance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved it to a new file. I'm not sure why we need it. I have not looked into the stores yet.

@davidbrochart Could you open a PR adding documentation?

@davidbrochart
Copy link
Collaborator

So you mean that this PR breaks document synchronization between rooms?

I've not been following all the commits, has this been addressed?

@hbcarlos
Copy link
Member Author

hbcarlos commented Apr 4, 2023

I've not been following all the commits, has this been addressed?

I'm not restoring it. This PR removes the synchronization between rooms in favor of simplicity and stability. I'll add it back once the extension is stable and we have a pleasant RTC experience.

@davidbrochart
Copy link
Collaborator

I think it is a big regression.
So what is the user experience when a notebook is opened as a notebook document and as a JSON document, for instance?
What happens to one document when the user makes a change to the other?

@hbcarlos
Copy link
Member Author

hbcarlos commented Apr 4, 2023

Users are notified that they might lose changes if they open the same document with multiple views.

If you edit in one room, it won't sync with the other one. The last edited room is the one saved to disk.

@davidbrochart
Copy link
Collaborator

I think it needs to be discussed, as it is a significant regression and it could lead to a lot of user frustration.
What about changes that are made on the backend, like checking out a git branch that would change the open file? Is the change reflected on the frontend, or is that also not supported anymore?

@hbcarlos
Copy link
Member Author

hbcarlos commented Apr 4, 2023

What about changes that are made on the backend, like checking out a git branch that would change the open file? Is the change reflected on the frontend, or is that also not supported anymore?

That is supported

@davidbrochart
Copy link
Collaborator

So there is something I don't understand. If the frontend reacts to file changes made in the backend, and you have one user with a notebook opened as a notebook document, and another user with the same notebook opened as a JSON document, what is the behavior?
Users won't be warned because they individually only have one view of the notebook, right?

@davidbrochart
Copy link
Collaborator

I just tried this PR and I'm seeing weird behaviors while synchronizing rooms, like a notebook being reverted to a previous state:

room_sync.mp4

@hbcarlos
Copy link
Member Author

Yes, because I prioritize what is in the disk instead of automatically overwriting it whenever a change occurs.

I'm not in favor of the synchronization between rooms or the auto-save. It is unpredictable, and I'm sure it is related to some of the issues we are having.

@davidbrochart
Copy link
Collaborator

It is unclear to me what you do in this PR regarding room synchronization. Commit 1ee3848 says "Fixes sync between rooms", but it doesn't seem to work. Can you clarify what is fixed?

Copy link
Member

@fcollonval fcollonval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hbcarlos

Let's move forward with this one.

@fcollonval fcollonval merged commit ba6d756 into jupyterlab:main Apr 11, 2023
17 checks passed
@davidbrochart
Copy link
Collaborator

@fcollonval I was not done testing this PR, and there was still pending questions.

@hbcarlos hbcarlos deleted the refactor branch April 11, 2023 14:34
@fcollonval
Copy link
Member

The synchronization has been fixed.

We can address following question in follow-up. But we should move forward to unlock the file loader.

@davidbrochart
Copy link
Collaborator

The synchronization has been fixed.

It's hard to tell, I think we should give more time for reviewers to get their questions answered before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants