-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import
: pull performance
#10059
Comments
For the record: Can reproduce even with |
The main issue is that we don't use md5 provided by the fs (e.g. dvcfs), which results in needless hash recomputing. We can just use tried-and-tested `hash_file` here for now. Fixes iterative/dvc#10059
@efiop Does the example above work for you? I'm seeing it get a little further but still get stuck on fetching:
|
I've modified it to work with dvc-bench to make it quicker for me, but looks like I might've missed something. Let me try again. |
So i was testing with a slightly different setup in a sense that the dataset in the data registry (not dvc-bench but derived local one) was a new one with |
@efiop Any status update on this? |
We've discussed this, but for the record: the only thing left here is cross-hash compatibility, which I'm in no rush to implement as still waiting for user feedback on whether this was enough to fix it for them or not (can''t find a link yet, but will post if I do find it). |
Copied from slack
I’m able to reproduce it using the aws sandbox:
This pulls data imported from git@github.com:dberenbaum/coco-sample.git. When pulling directly from the source repo, it starts to pull fast, but pulling from download-dvc-dir gets stuck here for a long time:
The text was updated successfully, but these errors were encountered: