Skip to content

remote: should DVC prevent external cache overlap default remote? #3703

@jorgeorpinel

Description

@jorgeorpinel

A possible misunderstanding from https://dvc.org/doc/user-guide/managing-external-data#examples is that you can have an actual remote storage be the external cache for your project. So if you don't notice that there are path/directories in those examples, you may end up setting up a remote, even set as default, in the root of an S3 bucket or other cloud storage, and then setup that remote as the external cache also.

This would be a contradicting setup an I'm not even sure if it could cause problems for DVC commands (for example dvc push would never push anything as it's already stored there). Basically you would not have a backup of the project cache because of a bad setup.

The question or suggestion is: should DVC detect this situation and prevent users from having such a config? If so, it's a bug as it's possible to do this now.

Context: https://discordapp.com/channels/485586884165107732/485596304961962003/704739550483710032


Also consider updating command references and/or the external data guide to mention how certain commands that deal with remote data work with external outputs, for example dvc add uses eTag instead of md5 has I believe; get/import don't work with external data, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugDid we break something?discussionrequires active participation to reach a conclusion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions