Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc: support http(s) as an external dependency #1146

efiop opened this issue Sep 21, 2018 · 2 comments


None yet
2 participants
Copy link

commented Sep 21, 2018


$ dvc import data


$ dvc run -d -o data wget

@efiop efiop added the enhancement label Sep 21, 2018

@efiop efiop added this to the Queue milestone Sep 21, 2018

@efiop efiop self-assigned this Sep 21, 2018

@efiop efiop removed their assignment Nov 2, 2018

@mroutis mroutis self-assigned this Nov 9, 2018


This comment has been minimized.

Copy link

commented Nov 9, 2018

@efiop ,I didn't understand the second case;
When specifying a dependency with the http protocol should we try to make a GET request and cache the content? (if so, would it be only the body of the response or also the headers?)


This comment has been minimized.

Copy link
Member Author

commented Nov 9, 2018

No, it means that we should get ETAG(if available, if not -- throw an error) and save it instead of md5(how it is done in s3, gs drivers). No downloading should be done on dvc side. Notice that In the second example user downloads the file himself and saves it into data file, which is cached by dvc. Maybe that example is a bit confusing, sorry. Imagine you have a script that uses the remote file without downloading it(e.g. it is way to big), then you can use dvc run -d -o processed python so that when remote file changes, your stage gets reproduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.