-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud versioning: pushing to a worktree remote hangs #8836
Comments
@dberenbaum could you maybe share the profile I have been testing a larger number of files and for me, it's taking quite some time in building the data index Line 112 in dd2d2dc
|
It's slow because we have to walk the worktree remote so we can do the diff to see what needs to be pushed and what needs to be deleted in the remote. We probably need to display some kind of message while it is building the index, but we can't use a regular progressbar because we don't know how many files there will be in the remote. |
Compare the video above to this one, which pushes a dataset ~3x bigger: Screen.Recording.2023-01-19.at.2.46.36.PM.movIn the first video, I've checked that the "hanging" happens for ~5 minutes, while it lasts almost no time in the second video. The only difference I can tell is that there are more previous versions of objects in the first video. Why should that matter here if we are only checking the current versions? |
It's because in S3 you have to list all object versions if you want any versioning information at all |
🤔 Why isn't it a problem for version_aware remotes? Don't we need version info there also? And do we need the versions when pushing to a worktree remote? If we just want to check the current version, can we use etags? Edit: There also might be workarounds like https://stackoverflow.com/a/72281706 |
@daavoo For some reason I'm getting an error trying to view this, but here's the JSON file: |
For The |
I think is because it's too big. In my computer Firefox breaks but chrome manages to render it |
Wow I have never seen this 😅 The profile is completely blanket inside Grabacion.de.pantalla.2023-01-20.a.las.15.00.40.movAnyhow, looks like you ran before the changes made in #8842 : Could you rerun with the latest version? |
head-object only works for individual files, and it's what we already do for querying an individual object
This could work, but will probably require some adjustment in dvc-data because we will essentially need to mix non-versioned filesystem calls with versioned ones. The issue is that when listing a prefix, (which is the only way we can find files we don't know about) you can either do So to get the "etags only" listing from S3 we need a non-versioned s3fs instance (to use Also, this is only a problem on S3. The azure and gcs APIs are designed properly so that you can list only the latest versions of a prefix and get a response that includes version IDs for those latest versions. (Using the etag method won't improve performance vs using IDs on azure or gcs) |
Edit: addressed above by @pmrowla I've narrowed down the problem a little -- it only happens on the initial push/when there is no existing cloud metadata.
On the latest version,
|
|
#8842 seems to have fixed the "hanging" issue and made worktree and version_aware remotes perform similarly. However, it also slowed down the overall operation, spending a lot of time on "Updating meta for new files": Screen.Recording.2023-01-20.at.10.41.40.AM.movI'm assuming it's related to listing all object versions, but I'm not clear why it's so much worse in the new version. Here are results that I think you can reproduce more or less by:
Before #8842: After #8842: |
@dberenbaum, can you try running |
Anyway, it seems like it's calling |
|
@dberenbaum is your actual real-world The "updating meta" behavior is the same as it was before, the only difference is that gets a separate progressbar now. Previously it was included in the overall edit: actually I see the issue, the info calls were previously batched and are done sequentially now, will send a PR with fixes |
The updating meta issue should be fixed with |
Thanks everyone for your help solving this so quickly! Looks good now, closing. |
Bug Report
Description
When I push to a worktree remote, I often get stuck in this state for minutes (I have not left it hanging long enough to see if it eventually completes):
Reproduce
I'm not yet sure how to create a minimal example that reproduces it, but it happens often when testing. Here's the steps I have taken to reproduce it:
dvc get dvc get git@github.com:iterative/dataset-registry.git use-cases/cats-dogs
to get some data.dvc add cats-dogs
to track the data.dvc push
to that worktree remote.And here's a video of it:
Screen.Recording.2023-01-18.at.9.01.20.AM.mov
Output of
dvc doctor
:The text was updated successfully, but these errors were encountered: