large file download fails with OverflowError #30

rupertlevene · 2015-02-11T14:27:20Z

On my 32-bit linux machine, files over 2GB fail to download. Memory usage while my test script runs gets very high, suggesting the entire download is being cached in memory; I think the download should be streamed to disk instead.

To use the script, upload a large file called bigvid.avi to google drive and put client_secrets.json in the working directory.

$ ./test.py 
bigvid.avi
Traceback (most recent call last):
  File "./test.py", line 17, in <module>
    f.GetContentFile('/tmp/bigvid-from-pydrive.avi')
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 167, in GetContentFile
    self.FetchContent(mimetype)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 36, in _decorated
    return decoratee(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 198, in FetchContent
    self.content = io.BytesIO(self._DownloadFromUrl(download_url))
  File "/usr/local/lib/python2.7/dist-packages/pydrive/auth.py", line 54, in _decorated
    return decoratee(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 313, in _DownloadFromUrl
    resp, content = self.auth.service._http.request(url)
  File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 135, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 547, in new_request
    redirections, connection_type)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1318, in _conn_request
    content = response.read()
  File "/usr/lib/python2.7/httplib.py", line 541, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.7/httplib.py", line 624, in _read_chunked
    return ''.join(value)
OverflowError: join() result is too long for a Python string
$ cat test.py
#!/usr/bin/env python

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

gauth=GoogleAuth()
if not gauth.LoadCredentialsFile("auth.txt") :
    gauth.CommandLineAuth()
    gauth.SaveCredentialsFile("auth.txt")

drive=GoogleDrive(gauth)

filelist=drive.ListFile({'q': "title='bigvid.avi'"}).GetList()
for f in filelist:
    print f['title'];
    f.GetContentFile('/tmp/bigvid-from-pydrive.avi')

$ ls -l /tmp/big*
ls: cannot access /tmp/big*: No such file or directory

The text was updated successfully, but these errors were encountered:

aliafshar · 2015-02-11T17:05:00Z

This uses the google-api-python-client under the hood. That is where the bug is. However, I am really sorry about this - that is appalling forethought to dump the entire thing into memory without streaming.

Fjodor42 · 2016-02-03T09:16:51Z

You might want to have a look at #27

Fjodor42 · 2016-02-17T02:38:43Z

I don't have a 32bit system handy for testing, but could you report whether replacing

filelist=drive.ListFile({'q': "title='bigvid.avi'"}).GetList()
for f in filelist:
    print f['title'];
    f.GetContentFile('/tmp/bigvid-from-pydrive.avi')

with

local_file = io.FileIO('/tmp/bigvid-from-pydrive.avi', mode='wb')
for f in file_list:
    print f['title']
    id = f.metadata.get('id')
    request = drive.auth.service.files().get_media(fileId=id)
    downloader = MediaIoBaseDownload(local_file, request, chunksize=2048*1024)

    done = False

    while done is False:
        status, done = downloader.next_chunk()
local_file.close()

works (you'll probably need an from apiclient.http import MediaIoBaseDownload somewhere)?

Inasmuch as it seems to download a 4Gb file om random data, without any serious memory use, on my machine, I posit the dreaded "works on my machine", but that is a 64bit one.

If it does work, I think I can cook up a way to let PyDrive take a decision to do this for files over a certain size, but I then think that I shall want to open a feature request to solicit responses as to what that limit should be, as well as whether the limit should be the chunk size, then.

rupertlevene · 2016-05-13T16:25:45Z

Thanks, this works!

(I upped the chunk size by a factor of 10 to save time. Otherwise it was
rather slow.)

On 17 February 2016 at 02:38, Fjodor42 notifications@github.com wrote:

I don't have a 32bit system handy for testing, but could you report
whether replacing

filelist=drive.ListFile({'q': "title='bigvid.avi'"}).GetList()
for f in filelist:
print f['title'];
f.GetContentFile('/tmp/bigvid-from-pydrive.avi')

with

`local_file = io.FileIO('/tmp/bigvid-from-pydrive.avi', mode='wb')
for f in file_list:
print f['title']
id = f.metadata.get('id')
request = drive.auth.service.files().get_media(fileId=id)
downloader = MediaIoBaseDownload(local_file, request, chunksize=2048*1024)

done = False

while done is False:
status, done = downloader.next_chunk()

local_file.close()`

works (you'll probably need an from apiclient.http import
MediaIoBaseDownload somewhere)?

Inasmuch as it seems to download a 4Gb file om random data, without any
serious memory use, on my machine, I posit the dreaded "works on my
machine", but that is a 64bit one.

If it does work, I think I can cook up a way to let PyDrive take a
decision to do this for files over a certain size, but I then think that I
shall want to open a feature request to solicit responses as to what that
limit should be, as well as whether the limit should be the chunk size,
then.

—
Reply to this email directly or view it on GitHub
#30 (comment).

RNabel · 2016-06-08T03:38:40Z

@rupertlevene This should be resolved now. Post here if you are still encountering this issue.

RNabel · 2016-06-09T14:11:47Z

Reopening, as @Fjodor42 points out, and #62 references, there is no verification of this being resolved.

smichaud · 2020-06-09T14:52:22Z

I don't have a 32bit system handy for testing, but could you report whether replacing
filelist=drive.ListFile({'q': "title='bigvid.avi'"}).GetList()
for f in filelist:
    print f['title'];
    f.GetContentFile('/tmp/bigvid-from-pydrive.avi')
with
local_file = io.FileIO('/tmp/bigvid-from-pydrive.avi', mode='wb')
for f in file_list:
    print f['title']
    id = f.metadata.get('id')
    request = drive.auth.service.files().get_media(fileId=id)
    downloader = MediaIoBaseDownload(local_file, request, chunksize=2048*1024)

    done = False

    while done is False:
        status, done = downloader.next_chunk()
local_file.close()
works (you'll probably need an from apiclient.http import MediaIoBaseDownload somewhere)?

Inasmuch as it seems to download a 4Gb file om random data, without any serious memory use, on my machine, I posit the dreaded "works on my machine", but that is a 64bit one.

If it does work, I think I can cook up a way to let PyDrive take a decision to do this for files over a certain size, but I then think that I shall want to open a feature request to solicit responses as to what that limit should be, as well as whether the limit should be the chunk size, then.

Thank you for the solution.
Side note:

import io
from googleapiclient.http import MediaIoBaseDownload

shcheklein · 2020-06-09T14:57:44Z

@smichaud btw, GetContentFile has been rewritten (among other fixes and improvements) in the iterative/PyDrive2 - a maintained fork. It uses MediaIoBaseDownload internally and should work out of the box ... here is an example how it used in DVC -

https://github.com/iterative/dvc/blob/b57077af11ae287941b4d2939071fda2ad01f483/dvc/remote/gdrive.py#L376

rupertlevene mentioned this issue Feb 11, 2015

large file download fails with OverflowError googleapis/google-api-python-client#52

Closed

RNabel closed this as completed Jun 8, 2016

RNabel reopened this Jun 9, 2016

RNabel added the bug label Jun 15, 2016

RNabel added this to the Future Improvements milestone Oct 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

large file download fails with OverflowError #30

large file download fails with OverflowError #30

rupertlevene commented Feb 11, 2015

aliafshar commented Feb 11, 2015

Fjodor42 commented Feb 3, 2016

Fjodor42 commented Feb 17, 2016 •

edited by RNabel

Loading

rupertlevene commented May 13, 2016

RNabel commented Jun 8, 2016 •

edited

Loading

RNabel commented Jun 9, 2016

smichaud commented Jun 9, 2020

shcheklein commented Jun 9, 2020

large file download fails with OverflowError #30

large file download fails with OverflowError #30

Comments

rupertlevene commented Feb 11, 2015

aliafshar commented Feb 11, 2015

Fjodor42 commented Feb 3, 2016

Fjodor42 commented Feb 17, 2016 • edited by RNabel Loading

rupertlevene commented May 13, 2016

RNabel commented Jun 8, 2016 • edited Loading

RNabel commented Jun 9, 2016

smichaud commented Jun 9, 2020

shcheklein commented Jun 9, 2020

Fjodor42 commented Feb 17, 2016 •

edited by RNabel

Loading

RNabel commented Jun 8, 2016 •

edited

Loading