-
Notifications
You must be signed in to change notification settings - Fork 7.2k
improve error handling for GDrive downloads #5704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit a6cdc84 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Conflicts: torchvision/datasets/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmeier .
The most common failure case is GDrive returning an unknown API response as HTML
Shouldn't we check the HTML code of the download / reply from GDrive instead of downloading then?
Unfortunately, GDrive mostly just returns |
If you mean checking against the HTML regex right after the files are downloaded, then yeah I feel like this might be where the check needs to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Philip
Summary: * improve error handling for GDrive downloads * perform HTML check regardless of MD5 check Reviewed By: NicolasHug Differential Revision: D36760932 fbshipit-source-id: 1cad96e1505f88f6945c048d9c3e0fbe1ccfd00f
We have plenty of reports that the download of datasets hosted on GDrive does not work as expected:
zipfile.BadZipFile
exception when downloading WIDERFace (traced todownload_file_from_google_drive
function) #5615Caltech101
,Caltech256
downloads are broken due to Google Drive redirect "scan for viruses" popup #5716Although most of them are closed, they see another comment from time to time since a user stumbled upon "the same issue"
The errors non-descriptive in most cases. The problem is that we don't check the MD5 sum after the download and naively write the response from GDrive to disk. In contrast, on
download_url
we perform such a checkvision/torchvision/datasets/utils.py
Lines 150 to 152 in aa21197
The most common failure case is GDrive returning an unknown API response as HTML. This PR adds a MD5 check after the download with an additional HTML check to make the error message more descriptive.