Skip to content

Conversation

@pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Apr 14, 2022

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Closes #5739

  • exp show no longer requires acquiring the repo lock
  • Add internal use --no-fetch option to disable fetching of exp refs for temp/queue experiments
    • vscode extension triggers exp show on any changes to refs/exps/..., so exp show will end up triggering repeated calls to itself when fetching into refs/exps
    • After this change vscode can wait for the refs to be updated by the completed exp run and then only call exp show --no-fetch once at that time (so the infinite recursive calls should be resolved)

Note that using --no-fetch disables active ref update for checkpoint runs in vscode. Currently in DVC there is no way to automate updating these refs, so we implemented manual updates on any CLI exp show call. For CLI users there is no difference between automating it and doing it on exp show, either way the exp show UI will reflect the proper current checkpoint status.

Based on the discussion with @mattseddon this was not a formally supported use case in the vscode extension anyways, so this behavior change should not make a major difference right now. After the dvc-task queueing changes, we will be able to properly implement automatically fetching those refs when needed in DVC, and at that point we should end up with parity between the dvc CLI and vscode wrt checkpoint updating.

One potential temporary workaround to allow the checkpoint updating would be for vscode to schedule a periodic exp show call (without --no-fetch) to force refreshing everything. The scheduled call would end up triggering the file watcher when it updates the checkpoint exp refs, but as long as the file watcher was setup so that it only triggers exp show --no-fetch, it would still avoid the infinite recursive calls.

@pmrowla pmrowla self-assigned this Apr 14, 2022
@pmrowla pmrowla added product: VSCode Integration with VSCode extension A: experiments Related to dvc exp bugfix fixes bug labels Apr 14, 2022
@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 14, 2022

@iterative/vs-code I'm not 100% sure this will work, but I think it should solve the worst of the remaining exp show/lock related issues? Please test w/this branch and let me know whether or not it will work for you.

In summary, the minimum changes needed on the vscode side w/this PR would be:

  • use --no-fetch in addition to the existing args used when calling exp show via the file watcher

Optionally:

  • schedule periodic exp show calls without --no-fetch if the "live" checkpoint updating feature is needed

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 19, 2022

Discussed this with the rest of the core team and we think we can actually just completely drop the repo lock usage in exp show right now (rather than acquiring it for collecting workspace state), so this PR will need another update for that.

@mattseddon
Copy link
Contributor

Thanks @pmrowla I'll wait for the update.

@pmrowla pmrowla changed the title [WIP] exp show: only acquire repo lock for specific operations [WIP] exp show: lockless behavior Apr 20, 2022
@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 20, 2022

@mattseddon please try the latest version of the PR

@mattseddon
Copy link
Contributor

My first impression is that this definitely fixes an issue with running experiments in the workspace. Previously I would be able to get to 3/4 runs in the demo before I would run into a lock issue where I could no longer generate experiments. This is where I got up to with this branch before I said "that's enough":

image

The plan to call exp show without --no-fetch if the watched path is inside .dvc/tmp/exps (to cater for running experiments from the queue) actually works better than expected. The watcher has a debounce of 200ms meaning that we don't get the second --no-fetch call after the initial exp show --show-json one.

One thing that I started running into was the branch name switching from the current one back to main. This is going to be an issue for us. Would be good to bump the priority of #6051 because the toggle messes with our experiment selection and looks pretty bad when you're observing:

Screen.Recording.2022-04-21.at.12.57.16.pm.mov

Note: We might be able to mitigate this on our end if we have to.

Would it be possible to remove the rwlock from run as well or what that be far more difficult? I'm still seeing this whilst running an experiment (from status):

ERROR: '/vscode-dvc/demo/model.pt' is busy, it is being blocked by:
  (PID 71127): /vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/__main__.py exp run --reset

If there are no processes with such PIDs, you can manually remove '.dvc/tmp/rwlock' and try again.

Would another way around be to have the new status bypass/not require the rwlock? Both plots diff and exp show run fine whilst an experiment is running. We have asked for the new command to be lockless so if that works please disregard the above.

Lastly, the version of DVC that this branch produces is 2.6.5... πŸ˜•

I very much appreciate the work that you've done on this. It is really going to help the release effort. Thank you. Let me know if there is anything you need from me. Please LMK when it's been released so I can update the extension and bump the min required version of DVC.

@mattseddon
Copy link
Contributor

πŸ™πŸ»

@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 21, 2022

Would another way around be to have the new status bypass/not require the rwlock? Both plots diff and exp show run fine whilst an experiment is running. We have asked for the new command to be lockless so if that works please disregard the above.

@skshetry can confirm but afaik the current plan on the DVC side is that the new command will be lockless, but we won't be changing the lock behavior for the existing status implementation.

@pmrowla pmrowla marked this pull request as ready for review April 21, 2022 05:16
@pmrowla pmrowla requested a review from a team as a code owner April 21, 2022 05:16
@pmrowla pmrowla requested a review from karajan1001 April 21, 2022 05:16
@pmrowla pmrowla changed the title [WIP] exp show: lockless behavior exp show: lockless behavior Apr 21, 2022
@pmrowla pmrowla merged commit 444bf27 into treeverse:main Apr 21, 2022
@pmrowla pmrowla deleted the exp-vscode-lock branch April 21, 2022 06:46
@pmrowla
Copy link
Contributor Author

pmrowla commented Apr 28, 2022

Lastly, the version of DVC that this branch produces is 2.6.5... πŸ˜•

I think this is just a side effect of how we use setuptools-scm to generate dev version strings, afaik it will just generate a version based on the most recent tag you have available. If you git fetch --tags from upstream DVC it should generate a newer version string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: experiments Related to dvc exp bugfix fixes bug product: VSCode Integration with VSCode extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exp show: Allow running while DVC is locked

2 participants