Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Drive download helper #136

Closed
heinrichreimer opened this issue Feb 26, 2021 · 5 comments
Closed

Google Drive download helper #136

heinrichreimer opened this issue Feb 26, 2021 · 5 comments

Comments

@heinrichreimer
Copy link

Often datasets are distributed on Google Drive.
That's an issue because Google requires confirming downloading for large files (i.e., on which they don't scan malware).
Transformers.jl already has a custom fetch_method implementation for that case.
So I wonder if it might be worth including that helper method in DataDeps.jl, possibly integrating it without having to use fetch_method at all.

@oxinabox
Copy link
Owner

oxinabox commented Feb 26, 2021

Yes, downloading things from google drive is a thing people do.

Embeddings,jl uses GoogleDrive.jl similarly.
I think it is broadly similar to the code that is inside Transformers.jl
https://github.com/JuliaText/Embeddings.jl/blob/306c04bead62b32873dedbc2609c74c4ca34306b/src/Paragram.jl#L31

I don't see any reason to have it in this package.
More useful to have it in another suitable package (like GoogleDrive.jl, or some new package if you want to start from scratch) that can do this and likely more (e.g. writing).
When those can work with DataDeps.jl

That could look like AWSS3.jl which provides the S3Path type,
which works with DataDeps without needed to specifiy fetch_method because it overloads Base.basename and Base.download.
These two things are all that is required to work with DataDeps without a fetch method:

The more important feature is that this works for anything that has overloaded
`Base.basename` and `Base.download`, e.g. [`AWSS3.S3Path`](https://github.com/JuliaCloud/AWSS3.jl).
While this doesn't work for all transport mechanisms (so some datadeps will still a custom `fetch_method`),
it works for many.
"""
function fetch_base(remote_path, local_dir)
localpath = joinpath(local_dir, basename(remote_path))
return Base.download(remote_path, localpath)
return string(localpath)
end

More broadly: It would be really cool if someone overloaded the FilePathsBase API for Google Drive.

Other reason i wouldn't want it here is I don't want to take on dependencies nor do i want to take on maintance burden.

@heinrichreimer
Copy link
Author

So possibly a GoogleDriveFile("0B9w48e1rj-MOLVdZRzFfTlNsem8") struct could be added to GoogleDrive.jl with overriding Base.basename and Base.download?
If that would then work out-of-the-box with DataDeps.jl, I would agree that should be the preferred way.

@oxinabox
Copy link
Owner

oxinabox commented Mar 3, 2021

yeah that would be great

@ggebbie
Copy link

ggebbie commented Dec 2, 2021

Thank you for discussing the issue when downloading large files from Google Drive. I also think it would be really cool to add this code to GoogleDrive.jl as I can't even download a 43 MB file without virus scanner interference. I have looked at the suggestions above, but I didn't immediately grasp how to do this coding myself.

@heinrichreimer
Copy link
Author

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants