Uploading a huge file #27

Closed
ima-tech opened this Issue Dec 4, 2014 · 13 comments

Comments

Projects
None yet
6 participants
@ima-tech

ima-tech commented Dec 4, 2014

Currently, I am using PyDrive to upload my backup to google drive.

Is there a special thing to do with this library to upload a huge file to Google Drive (around 5gb). In the Google Drive API documentation, it says that we must use the Resumable upload ? https://developers.google.com/drive/web/manage-uploads

My problem is that when I try to send a huge file, the script executes without any errors. However, if I do this with a small file around 100mb, everything works perfectly fine...

My code is the following:

def upload(self, backupFile, backupFileName):

json_data=open(os.path.join(__location__, 'client_secrets.json'))

data = json.load(json_data)

"""Email of the Service Account"""
SERVICE_ACCOUNT_EMAIL = data['client_email']

"""Path to the Service Account's Private Key file"""
SERVICE_ACCOUNT_PKCS12_FILE_PATH = os.path.join(__location__, 'key.p12')

f = file(SERVICE_ACCOUNT_PKCS12_FILE_PATH, 'rb')
key = f.read()
f.close()

credentials = SignedJwtAssertionCredentials(SERVICE_ACCOUNT_EMAIL, key,
    scope='https://www.googleapis.com/auth/drive', sub='email')
http = httplib2.Http()
credentials.authorize(http)

gauth = GoogleAuth()
gauth.credentials = credentials

drive = GoogleDrive(gauth)

file1 = drive.CreateFile({'title': backupFileName, "parents" : [{"id":"0B7FoN03AUUdZVlNETEtWLS1VTzQ"}]} )  # Create GoogleDriveFile instance with title 'Hello.txt'

file1.SetContentFile(backupFile);
file1.Upload()

When I try to send a large file, no errors are returned whatsoever. The python script simply ends without anything shown...

@sagiegurari

This comment has been minimized.

Show comment
Hide comment
@sagiegurari

sagiegurari Mar 19, 2015

I face same issue, but when doing a catch in the python, I get an overflowerror.
full stack (i'm using raspberry pi which is 32 bit OS and the file is over 4 gig which might be the issue):

Traceback (most recent call last):
  File "./drive_upload.py", line 33, in <module>
    backupFile.Upload()
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 225, in Upload
    self._FilesInsert(param=param)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/auth.py", line 54, in _decorated
    return decoratee(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 241, in _FilesInsert
    metadata = self.auth.service.files().insert(**param).execute()
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 750, in method
    payload = media_upload.getbytes(0, media_upload.size())
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 357, in getbytes
    return self._fd.read(length)
OverflowError: Python int too large to convert to C long

I face same issue, but when doing a catch in the python, I get an overflowerror.
full stack (i'm using raspberry pi which is 32 bit OS and the file is over 4 gig which might be the issue):

Traceback (most recent call last):
  File "./drive_upload.py", line 33, in <module>
    backupFile.Upload()
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 225, in Upload
    self._FilesInsert(param=param)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/auth.py", line 54, in _decorated
    return decoratee(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 241, in _FilesInsert
    metadata = self.auth.service.files().insert(**param).execute()
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 750, in method
    payload = media_upload.getbytes(0, media_upload.size())
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 357, in getbytes
    return self._fd.read(length)
OverflowError: Python int too large to convert to C long
@sagiegurari

This comment has been minimized.

Show comment
Hide comment
@sagiegurari

sagiegurari Mar 20, 2015

created an issue in google apis client git google/google-api-python-client#74

created an issue in google apis client git google/google-api-python-client#74

@Lyrrad

This comment has been minimized.

Show comment
Hide comment
@Lyrrad

Lyrrad Mar 29, 2015

The issue is that PyDrive is trying to load the entire file into memory before uploading, which is very inefficient for larger files. It's a bug with PyDrive, presumably, since it's claims the Upload() method picks the most efficient way to send the file.

Resumable, chunked uploads are supported by the underlying client, though you'd need to
You can look at the Python code samples for the alternate way of doing with the apiclient:
https://developers.google.com/drive/v2/reference/files/insert

You can also change the chunk size of the upload by setting it in the parameters of the MediaFileUpload, like so:
media_body = MediaFileUpload(file_path, mimetype=mime_type, chunksize=1024*1024, resumable=True)

Lyrrad commented Mar 29, 2015

The issue is that PyDrive is trying to load the entire file into memory before uploading, which is very inefficient for larger files. It's a bug with PyDrive, presumably, since it's claims the Upload() method picks the most efficient way to send the file.

Resumable, chunked uploads are supported by the underlying client, though you'd need to
You can look at the Python code samples for the alternate way of doing with the apiclient:
https://developers.google.com/drive/v2/reference/files/insert

You can also change the chunk size of the upload by setting it in the parameters of the MediaFileUpload, like so:
media_body = MediaFileUpload(file_path, mimetype=mime_type, chunksize=1024*1024, resumable=True)

@sagiegurari

This comment has been minimized.

Show comment
Hide comment
@sagiegurari

sagiegurari Mar 29, 2015

thanks.
the pydrive hasn't been updated for a year, so I guess will have to bypass it to use the streaming.

thanks.
the pydrive hasn't been updated for a year, so I guess will have to bypass it to use the streaming.

@sagiegurari

This comment has been minimized.

Show comment
Hide comment
@sagiegurari

sagiegurari Mar 30, 2015

I verified and the issue is definitely in pydrive.
the google apis python client works good with big files (i tested with 4gig).
I hope someone will fix this lib, but until than i switched to the google api client project instead.

I verified and the issue is definitely in pydrive.
the google apis python client works good with big files (i tested with 4gig).
I hope someone will fix this lib, but until than i switched to the google api client project instead.

@audax

This comment has been minimized.

Show comment
Hide comment
@audax

audax Jun 29, 2015

This bug now comes up in duplicity in which PyDrive is used. I can't upload the archive with all the signatures because it is 4.3gb. The problem is that is uses StringIO to buffer the large file and doesn't stream it.

Here is the full traceback:

 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 365, in inner_retry
   return fn(self, *args)
 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 531, in move
   self.__do_put(source_path, remote_filename)
 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 501, in __do_put
   self.backend._put(source_path, remote_filename)
 File "/usr/lib/python2.7/site-packages/duplicity/backends/pydrivebackend.py", line 98, in _put
   drive_file.Upload()
 File "/usr/lib/python2.7/site-packages/pydrive/files.py", line 225, in Upload
   self._FilesInsert(param=param)
 File "/usr/lib/python2.7/site-packages/pydrive/auth.py", line 54, in _decorated
   return decoratee(self, *args, **kwargs)
 File "/usr/lib/python2.7/site-packages/pydrive/files.py", line 241, in _FilesInsert
   metadata = self.auth.service.files().insert(**param).execute()
 File "/usr/lib/python2.7/site-packages/googleapiclient/discovery.py", line 758, in method
   g.flatten(msgRoot, unixfrom=False)
 File "/usr/lib/python2.7/email/generator.py", line 83, in flatten
   self._write(msg)
 File "/usr/lib/python2.7/email/generator.py", line 108, in _write
   self._dispatch(msg)
 File "/usr/lib/python2.7/email/generator.py", line 134, in _dispatch
   meth(msg)
 File "/usr/lib/python2.7/email/generator.py", line 203, in _handle_multipart
   g.flatten(part, unixfrom=False)
 File "/usr/lib/python2.7/email/generator.py", line 83, in flatten
   self._write(msg)
 File "/usr/lib/python2.7/email/generator.py", line 108, in _write
   self._dispatch(msg)
 File "/usr/lib/python2.7/email/generator.py", line 134, in _dispatch
   meth(msg)
 File "/usr/lib/python2.7/email/generator.py", line 180, in _handle_text
   self._fp.write(payload)
OverflowError: length too large

I am using this branch of duplicity for the PyDrive integration: https://code.launchpad.net/~ed.so/duplicity/gdocs.pydrive

I guess for now I just have to split the backup somehow...

audax commented Jun 29, 2015

This bug now comes up in duplicity in which PyDrive is used. I can't upload the archive with all the signatures because it is 4.3gb. The problem is that is uses StringIO to buffer the large file and doesn't stream it.

Here is the full traceback:

 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 365, in inner_retry
   return fn(self, *args)
 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 531, in move
   self.__do_put(source_path, remote_filename)
 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 501, in __do_put
   self.backend._put(source_path, remote_filename)
 File "/usr/lib/python2.7/site-packages/duplicity/backends/pydrivebackend.py", line 98, in _put
   drive_file.Upload()
 File "/usr/lib/python2.7/site-packages/pydrive/files.py", line 225, in Upload
   self._FilesInsert(param=param)
 File "/usr/lib/python2.7/site-packages/pydrive/auth.py", line 54, in _decorated
   return decoratee(self, *args, **kwargs)
 File "/usr/lib/python2.7/site-packages/pydrive/files.py", line 241, in _FilesInsert
   metadata = self.auth.service.files().insert(**param).execute()
 File "/usr/lib/python2.7/site-packages/googleapiclient/discovery.py", line 758, in method
   g.flatten(msgRoot, unixfrom=False)
 File "/usr/lib/python2.7/email/generator.py", line 83, in flatten
   self._write(msg)
 File "/usr/lib/python2.7/email/generator.py", line 108, in _write
   self._dispatch(msg)
 File "/usr/lib/python2.7/email/generator.py", line 134, in _dispatch
   meth(msg)
 File "/usr/lib/python2.7/email/generator.py", line 203, in _handle_multipart
   g.flatten(part, unixfrom=False)
 File "/usr/lib/python2.7/email/generator.py", line 83, in flatten
   self._write(msg)
 File "/usr/lib/python2.7/email/generator.py", line 108, in _write
   self._dispatch(msg)
 File "/usr/lib/python2.7/email/generator.py", line 134, in _dispatch
   meth(msg)
 File "/usr/lib/python2.7/email/generator.py", line 180, in _handle_text
   self._fp.write(payload)
OverflowError: length too large

I am using this branch of duplicity for the PyDrive integration: https://code.launchpad.net/~ed.so/duplicity/gdocs.pydrive

I guess for now I just have to split the backup somehow...

@sagiegurari

This comment has been minimized.

Show comment
Hide comment
@sagiegurari

sagiegurari Jun 30, 2015

@audax checkout google/google-api-python-client#74
basically the google api client does support really big files with chunked upload just fine.
This library however does not, and don't think you should use it for big files.
since the google api client is using https, i'm not sure why use some library for encryption of data on top of the SSL already used (have to admit, didn't read too much about this duplicity lib)

@audax checkout google/google-api-python-client#74
basically the google api client does support really big files with chunked upload just fine.
This library however does not, and don't think you should use it for big files.
since the google api client is using https, i'm not sure why use some library for encryption of data on top of the SSL already used (have to admit, didn't read too much about this duplicity lib)

@audax

This comment has been minimized.

Show comment
Hide comment
@audax

audax Jun 30, 2015

Duplicity is a backup program which saves the backup encrypted with gpg, so that's what the additional layer is about. It already supports pydrive, so if this problems annoys me enough, I may just fix it. It shouldn't be too hard to enable the chunked upload while not changing the API.

audax commented Jun 30, 2015

Duplicity is a backup program which saves the backup encrypted with gpg, so that's what the additional layer is about. It already supports pydrive, so if this problems annoys me enough, I may just fix it. It shouldn't be too hard to enable the chunked upload while not changing the API.

@Fjodor42

This comment has been minimized.

Show comment
Hide comment
@Fjodor42

Fjodor42 Jun 30, 2015

Contributor

audax: If you'd care to fix it, I'd be much obliged.

There is an old duplicity bug7feature request somewhere to limit/split the sigtar file to the same rules as for the ordinary split files, but I don't think any progress was made in that regard.

On my own end, I forewent incremental backups for some time, resulting in the sigtar becoming big enough to prompt the problem, but when I get a faster connection, enabling me to do a full backup again, I shall assume that the sigtar will be even bigger, triggering the problem again.

My initial, full, backup was from the times of the gdocs backend...

Contributor

Fjodor42 commented Jun 30, 2015

audax: If you'd care to fix it, I'd be much obliged.

There is an old duplicity bug7feature request somewhere to limit/split the sigtar file to the same rules as for the ordinary split files, but I don't think any progress was made in that regard.

On my own end, I forewent incremental backups for some time, resulting in the sigtar becoming big enough to prompt the problem, but when I get a faster connection, enabling me to do a full backup again, I shall assume that the sigtar will be even bigger, triggering the problem again.

My initial, full, backup was from the times of the gdocs backend...

@audax

This comment has been minimized.

Show comment
Hide comment
@audax

audax Jul 1, 2015

It will take a while until I start hacking on PyDrive though, I am currently a bit busy. A fix of the old duplicity bug would be nice, though, but I guess that's a much harder problem than to implement streaming in this small lib here.

For the moment I switched to rsync and just backup my server to my home server instead of google drive…

audax commented Jul 1, 2015

It will take a while until I start hacking on PyDrive though, I am currently a bit busy. A fix of the old duplicity bug would be nice, though, but I guess that's a much harder problem than to implement streaming in this small lib here.

For the moment I switched to rsync and just backup my server to my home server instead of google drive…

@Fjodor42

This comment has been minimized.

Show comment
Hide comment
@Fjodor42

Fjodor42 Feb 3, 2016

Contributor

Bump, anyone?

Contributor

Fjodor42 commented Feb 3, 2016

Bump, anyone?

@audax

This comment has been minimized.

Show comment
Hide comment
@audax

audax Feb 4, 2016

I won't fix the bug, I switched to a local NAS in the living room instead of using PyDrive…

audax commented Feb 4, 2016

I won't fix the bug, I switched to a local NAS in the living room instead of using PyDrive…

Fjodor42 added a commit to Fjodor42/PyDrive that referenced this issue Feb 16, 2016

gsuitedevs#27
gsuitedevs#55

Reading the code for apiclient.http.MediaIoBaseUpload at
https://github.com/google/google-api-python-client/blob/master
/googleapiclient/http.py , it would seem the simply setting
resumable=True should be enough let it select a chunk size for
itself, thus precluding loading the entire file into memory.

@untzag untzag referenced this issue in wright-group/PyCMDS Mar 8, 2016

Closed

google drive upload memory overflow #41

@RNabel

This comment has been minimized.

Show comment
Hide comment
@RNabel

RNabel Jun 8, 2016

Collaborator

The patch #56 has been merged, closing this issue.

Collaborator

RNabel commented Jun 8, 2016

The patch #56 has been merged, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment