New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition issues on slower FS #564
Conversation
Let me know if you generally like this. Not sure why it can't find |
Nice work. I like the thinking behind this PR. We will need a better description of the error/issue this is trying to solve (eg a specimen of the actual error message; might be nicest if it was written down in a separate issue). I'd also like to better understand what locking (if any) the git CLI itself does. Ideally, this PR should also include a test to prove that it fixes the problem it's trying to solve. One idea for the test: explicitly lock a resource for ~5 seconds, then immediately start another operation that would create a race condition if locking is not present. |
Git locking: https://docs.microsoft.com/en-us/azure/devops/repos/git/git-index-lock?view=azure-devops Waiting on locks: https://stackoverflow.com/a/36364687 So we can improve this by telling execute which commands care about a lock file or not (e.g. I don't believe status does), but I think thats an over optimization at this point and error prone.
I think #555 is the same issue. Symptoms are at least
I'll add one |
@telamonian do you have an example where execute is tested? |
Hmm, not off the top of my head. Frederic's the expert, since he wrote execute and most of the related code (and the related tests). @fcollonval Any ideas? |
I thought of a potential concern: should read-only ops (in particular |
Per my comment:
Yes, |
@telamonian let me know what you'd like me to do here to get this merged so that it's included in 0.10. |
Hey sorry for the delay. But you could test your nice feature by calling directly the |
Testing is a bit more complicated that what I said. But that commit shows how to start achieving it: fcollonval@6ede2a8 |
One additional comment: The argument - cwd: "Optional[str]" = None,
+ cwd: str, |
I'm targetting this for the next release (which should be 0.20.0). Once this PR is pulled in we'll also backport it to (jlab 1.0 compatible) 0.11.0 |
This adds locking before executing commands. While this may add locking in some cases that are not a race (i.e. things that do not need to aquire index.lock), this solves issues related to using this with a git directory on NFS
@fcollonval this should be all set now. I'll admit the unit test is mediocre because we don't have integration testing. It does prove that if a lock exists, it will sleep and simulates that if a lock exists, a command fails. |
There is no need to backport this to 0.11 for me at this time. I'm hoping to be on 2.x in the near future, but I'll let you know if something changes. If you want to backport anyway, it may be a good idea. Also, let me know if you want me to deal with some commands not touching the index.lock and we know it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update and the test. Here is a proposal of a better less mocked test. The idea is simply to mock sleep to check the lock status and remove the lock file.
Co-Authored-By: Frédéric Collonval <fcollonval@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mlucool for this PR and the discussion.
I won't merge it immediately to let @telamonian have a look.
It is not clear to me that we should continually retry until command success in the event that an The main issue I see with this PR is the possibility of a race condition. We don't "lock" the UI to prevent further commands, and, from my reading, we run the risk that the pending command could be executed after a subsequent command (e.g., another commit). If we want to add retry logic, we should presumably lock the UI to prevent further interactions. IMO, that is not particularly desirable, in comparison to pushing the retry to the user, which is exactly what happens when using Git at the command-line. |
We only loop on waiting for the lock, not rerunning the command
This enforces that each command wait for The reason for this commit has less to do with needing retry, but because some of the action from the UI require multiple commands to be sent (e.g. a |
@mlucool Yes, you are correct; from my reading, However, another concern is what happens if a user shuts down JupyterLab before an Meaning, is it possible that, e.g., one or more pending commands (e.g., one or more commit commands for "saving" changes) could fail to be executed in the event that JupyterLab closes during the interim waiting for an If, instead of locking on the backend, where the locking behavior is hidden from the user, the extension erred immediately and displayed an error modal to the user, recommending manually retrying an action, then, IMO, while arguably more laborious, expectations could be better managed. In short, I am largely concerned about all the ways that this change could cause the extension to fail. Are there edge cases that we are missing? Lastly, I am leery about hardcoding intervals for wait times. Wait times would seem to be highly particular to a user's environment and the relevant repository. For example, one of my side projects is rather large, and I frequently encounter the presence of an Personally, I don't have a good sense as to what could be considered reasonable. |
I thought extension added routes and interactions are not done via a kernel? All actions that take more than one command which affects
You are right. We should let users change these. There is no "right" amount of time to wait with very different environments. I think we pick a good default in terms of human timeframes and then let advanced users change this. Should we both |
It seems to make sense that if we are allowing a long running process to take place on the server, then we should make sure to make this clear to the user. So we should add some way for them to see what actions are being run on the server, or block the UI while one action is being run. If that could be done after this PR is merged, instead of in this PR itself, that seems like it could be fine as well. |
This already happens for at least some commands (a popup with a spinner). I agree that we should have feedback for all commands to make it more clear to a user that something is being done. FWIW, in practice, this wait is still subhuman scale for the cases I tested (fast network, NFS, smallish repo). Any case that would be on human scales (e.g. very large repos that @kgryte pointed out), would either fail (e.g. |
@telamonian You have any thoughts on this? Proposal:
The need for the latter is to prevent/dissuade the user from closing the JupyterLab server before pending commands have had a chance to complete (e.g., before an |
I'm happy with @kgryte proposal if that means @telamonian / @fcollonval will accept this PR. It's been stuck for a while, and it would be good wrap up the work here. |
@meeseeksdev backport to 0.11.x |
…on-0.11.x Backport PR #564 on branch 0.11.x (Fix race condition issues on slower FS)
This adds locking before executing commands. While this may add locking in some cases that are not a race (i.e. things that do not need to acquire index.lock), this solves issues related to using this with a git directory on NFS.