-
Notifications
You must be signed in to change notification settings - Fork 1.2k
update: only update the missing hashes with --to-remote #5773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@isidentical Cool optimization! 🔥 We've talked about getting rid of |
49f2229
to
5f9b078
Compare
e2d1692
to
420404e
Compare
Benchmark results: master, this branch. The |
91beae2
to
db3e79b
Compare
4705d25
to
64145cd
Compare
dvc/objects/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you could executor.map
, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to manually unpack some attributes of entry (entry.path_info
, entry.fs
etc) + add **kwargs
to the calls. In theory we can do something like executor.map(lambda entry: odb.add(...), ...)
, what would be the advantage with map though? It would just block the progress bar if there is some big file in the middle, which might not be the best UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed nicer and shorter. E.g. see how we wrap the worker in tqdm in dvc/objects/stage.py, where we use it in map, so it doesn't block the pbar.
64145cd
to
df3f0e5
Compare
@isidentical Btw, very interesting failures on windows. Looks like we have the same objects (e.g. two files with the same content) saved in parallel, which results in that move exception. We need to simply ignore it (somewhere in EDIT: that will fix #4992 |
41d5990
to
1a2cf91
Compare
1a2cf91
to
adf14ef
Compare
adf14ef
to
66228ac
Compare
Co-authored-by: Ruslan Kuprieiev <kupruser@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🔥 As we've discussed above, there are a few things like using verify
in odb.add
and putting stage()&save() logic somewhere in one method, but let's deal with those on top.
dvc update --to-remote
now only updates the missing cache files instead of syncing all over, as a pre-requisite this patch also mergestransfer()
logic withstage()
and introduces thestage(upload=...)
option for transferring to the remote cache.Fixes #4992, part of #5768