Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odb: use regular upload() when the source filesystem is local #6365

Merged
merged 2 commits into from Jul 27, 2021

Conversation

isidentical
Copy link
Contributor

In the old remote logic, for transfering to remotes we used upload() instead of upload_fobj() which on some backends defaulted to upload_fobj and on some others used the native transfer libraries logic. For some backends like oss / ssh etc. this is faster than the upload_fobj, so this patch adds an extra check to the ODB to match against whether the source file system is local or not. If it is the local filesystem, we can safely use _upload() if not we can still fallback to _upload_fobj().

@isidentical isidentical added the optimize Optimizes DVC label Jul 27, 2021
@isidentical isidentical self-assigned this Jul 27, 2021
@isidentical isidentical added this to In progress in DVC 13 July - 26 July 2021 via automation Jul 27, 2021
Comment on lines +88 to +90
self.fs.upload(from_info, to_info)
elif isinstance(self.fs, LocalFileSystem):
from_fs.download_file(from_info, to_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, should both of these use _file methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical I see that we use fs.upload but fs.download_file. Wondering if they both should be upload/download or upload_file/download_file (likely the latter if we take fsspec's get/put_file future in the context, but maybe I'm missing something important here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have upload_file(), so the upload() is put_file. And download_file() is the get_file(). The reason that I didn't use download() is that, it has an extra check against whether the source path is a directory, which is an waste of API call for this case where we know we are adding a single entity (file).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thank you! I guess we'll switch to get_file/put_file when reducing fsspec_wrapper in the future.

@pmrowla
Copy link
Contributor

pmrowla commented Jul 27, 2021

LGTM

@isidentical isidentical marked this pull request as ready for review July 27, 2021 12:53
@isidentical isidentical requested a review from a team as a code owner July 27, 2021 12:53
@efiop efiop merged commit 4a843db into master Jul 27, 2021
DVC 13 July - 26 July 2021 automation moved this from In progress to Done Jul 27, 2021
@efiop efiop deleted the use-upload-when-the-source-fs-local branch July 27, 2021 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimize Optimizes DVC
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants