-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
objects: migrate remote push/pull to objects.transfer #6308
Merged
Merged
Changes from 21 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
60a055e
objects: migrate remote status to objects.status
pmrowla 9383fd2
data_cloud: use objects.status
pmrowla 87c8947
objects: migrate remote._process to objects.transfer
pmrowla 9ac7e97
data_cloud: use objects.transfer
pmrowla 885dad7
objects: handle memfs staging objects in transfer
pmrowla de8dae4
fetch/import: use unified transfer()
pmrowla 13e9345
update get_remote and transfer exception usage
pmrowla be9e9fb
update tests
pmrowla 214a9e7
objects.transfer: use src ODB verify rules after xfer
pmrowla 4bbd6af
odb: move index from remote into odb
pmrowla b6b655b
objects.transfer: skip src status check when possible
pmrowla 2713584
update tests
pmrowla 00743d9
remove dvc.remote
pmrowla 1479b8a
odb.index: migrate to diskcache
pmrowla 5f8b895
objects: fix status/transfer optimization
pmrowla 84bc217
update index usage
pmrowla 9a8686d
state: remove dead sqlite related code
pmrowla aef3b83
update cloud tests
pmrowla be33b6b
drop unnecessary remove in tests
pmrowla 857fe5b
use abstract base class in odb index
pmrowla 905a580
status: load trees from local cache instead of remote odb when possible
pmrowla 99b1654
handle status when dir cache is explicitly removed
pmrowla File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, discussed with @isidentical that for some filesystems like (hdfs and future ssh)
upload_fobj
down below will no longer be atomic, so we might need to use a temporary path here and then justrename
into place. (there is an option to wrap fs calls to make them atomic but that is error prone and ugly).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me like this might still need to be handled at the fs level, uploading to a temp path and renaming at the ODB level won't work for all of our filesystems (HTTP doesn't support move/rename)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmrowla That atomicity is not something that fs should care about when uploading/downloading, this is an odb-level behavior.
Are operations already atomic there? Or it just doesn't support rename at all anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the HTTP case, it's atomic since the full POST/PUT request wouldn't be completed so the server should drop whatever was partially uploaded. And yeah, we don't support rename/move/copy at all since there's no HTTP method for that operation (unless you're using an extension built on top of HTTP like webdav)
It seems to me that both
_upload
and_upload_fobj
should work the same way, and should both guarantee atomicity at the fs level - like how in localfs we do the explicit upload to tempfile and rename for both _upload and _upload_fobjThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmrowla Thanks for clarifying!
_upload_fobj
is temporary until fsspec migration is complete and we can useput/get[_file]
directly.fs atomicity is unlikely to be guaranteed by all filesystems and might actually be undesirable in some use cases outside dvc (e.g. you might want to upload as much of a file as you can, or you might not care about atomicity so you might not want to waste an API call for
rename
), so it seems like it could be more robust if we do that in our odb layer (or fs wrapper after all?) for now.Clearly, it seems like it would be useful to have the knowledge about whether or not particular fs operations are atomic so that we could waste the least api calls possible, so maybe our
fsspec_wrapper
is indeed a pretty good place for it for now, similar how, IIRC, inC
libraries you haveatomic_*
functions, we could have something likeput_file
andatomic_put_file
oratomic=True
or something. Maybe this could be useful forfsspec
in general as well, not quite sure right now π€There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like this PR is changing the old behaviour, so probably not worth blocking it because of it, but we'll def need to keep this in mind for the followups.