Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading a huge file #27

Closed
ima-tech opened this issue Dec 4, 2014 · 13 comments
Closed

Uploading a huge file #27

ima-tech opened this issue Dec 4, 2014 · 13 comments

Comments

@ghost
Copy link

@ghost ghost commented Dec 4, 2014

Currently, I am using PyDrive to upload my backup to google drive.

Is there a special thing to do with this library to upload a huge file to Google Drive (around 5gb). In the Google Drive API documentation, it says that we must use the Resumable upload ? https://developers.google.com/drive/web/manage-uploads

My problem is that when I try to send a huge file, the script executes without any errors. However, if I do this with a small file around 100mb, everything works perfectly fine...

My code is the following:

def upload(self, backupFile, backupFileName):

json_data=open(os.path.join(__location__, 'client_secrets.json'))

data = json.load(json_data)

"""Email of the Service Account"""
SERVICE_ACCOUNT_EMAIL = data['client_email']

"""Path to the Service Account's Private Key file"""
SERVICE_ACCOUNT_PKCS12_FILE_PATH = os.path.join(__location__, 'key.p12')

f = file(SERVICE_ACCOUNT_PKCS12_FILE_PATH, 'rb')
key = f.read()
f.close()

credentials = SignedJwtAssertionCredentials(SERVICE_ACCOUNT_EMAIL, key,
    scope='https://www.googleapis.com/auth/drive', sub='email')
http = httplib2.Http()
credentials.authorize(http)

gauth = GoogleAuth()
gauth.credentials = credentials

drive = GoogleDrive(gauth)

file1 = drive.CreateFile({'title': backupFileName, "parents" : [{"id":"0B7FoN03AUUdZVlNETEtWLS1VTzQ"}]} )  # Create GoogleDriveFile instance with title 'Hello.txt'

file1.SetContentFile(backupFile);
file1.Upload()

When I try to send a large file, no errors are returned whatsoever. The python script simply ends without anything shown...

@sagiegurari
Copy link

@sagiegurari sagiegurari commented Mar 19, 2015

I face same issue, but when doing a catch in the python, I get an overflowerror.
full stack (i'm using raspberry pi which is 32 bit OS and the file is over 4 gig which might be the issue):

Traceback (most recent call last):
  File "./drive_upload.py", line 33, in <module>
    backupFile.Upload()
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 225, in Upload
    self._FilesInsert(param=param)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/auth.py", line 54, in _decorated
    return decoratee(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pydrive/files.py", line 241, in _FilesInsert
    metadata = self.auth.service.files().insert(**param).execute()
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 750, in method
    payload = media_upload.getbytes(0, media_upload.size())
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 357, in getbytes
    return self._fd.read(length)
OverflowError: Python int too large to convert to C long
@sagiegurari
Copy link

@sagiegurari sagiegurari commented Mar 20, 2015

created an issue in google apis client git googleapis/google-api-python-client#74

@Lyrrad
Copy link

@Lyrrad Lyrrad commented Mar 29, 2015

The issue is that PyDrive is trying to load the entire file into memory before uploading, which is very inefficient for larger files. It's a bug with PyDrive, presumably, since it's claims the Upload() method picks the most efficient way to send the file.

Resumable, chunked uploads are supported by the underlying client, though you'd need to
You can look at the Python code samples for the alternate way of doing with the apiclient:
https://developers.google.com/drive/v2/reference/files/insert

You can also change the chunk size of the upload by setting it in the parameters of the MediaFileUpload, like so:
media_body = MediaFileUpload(file_path, mimetype=mime_type, chunksize=1024*1024, resumable=True)

@sagiegurari
Copy link

@sagiegurari sagiegurari commented Mar 29, 2015

thanks.
the pydrive hasn't been updated for a year, so I guess will have to bypass it to use the streaming.

@sagiegurari
Copy link

@sagiegurari sagiegurari commented Mar 30, 2015

I verified and the issue is definitely in pydrive.
the google apis python client works good with big files (i tested with 4gig).
I hope someone will fix this lib, but until than i switched to the google api client project instead.

@audax
Copy link

@audax audax commented Jun 29, 2015

This bug now comes up in duplicity in which PyDrive is used. I can't upload the archive with all the signatures because it is 4.3gb. The problem is that is uses StringIO to buffer the large file and doesn't stream it.

Here is the full traceback:

 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 365, in inner_retry
   return fn(self, *args)
 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 531, in move
   self.__do_put(source_path, remote_filename)
 File "/usr/lib/python2.7/site-packages/duplicity/backend.py", line 501, in __do_put
   self.backend._put(source_path, remote_filename)
 File "/usr/lib/python2.7/site-packages/duplicity/backends/pydrivebackend.py", line 98, in _put
   drive_file.Upload()
 File "/usr/lib/python2.7/site-packages/pydrive/files.py", line 225, in Upload
   self._FilesInsert(param=param)
 File "/usr/lib/python2.7/site-packages/pydrive/auth.py", line 54, in _decorated
   return decoratee(self, *args, **kwargs)
 File "/usr/lib/python2.7/site-packages/pydrive/files.py", line 241, in _FilesInsert
   metadata = self.auth.service.files().insert(**param).execute()
 File "/usr/lib/python2.7/site-packages/googleapiclient/discovery.py", line 758, in method
   g.flatten(msgRoot, unixfrom=False)
 File "/usr/lib/python2.7/email/generator.py", line 83, in flatten
   self._write(msg)
 File "/usr/lib/python2.7/email/generator.py", line 108, in _write
   self._dispatch(msg)
 File "/usr/lib/python2.7/email/generator.py", line 134, in _dispatch
   meth(msg)
 File "/usr/lib/python2.7/email/generator.py", line 203, in _handle_multipart
   g.flatten(part, unixfrom=False)
 File "/usr/lib/python2.7/email/generator.py", line 83, in flatten
   self._write(msg)
 File "/usr/lib/python2.7/email/generator.py", line 108, in _write
   self._dispatch(msg)
 File "/usr/lib/python2.7/email/generator.py", line 134, in _dispatch
   meth(msg)
 File "/usr/lib/python2.7/email/generator.py", line 180, in _handle_text
   self._fp.write(payload)
OverflowError: length too large

I am using this branch of duplicity for the PyDrive integration: https://code.launchpad.net/~ed.so/duplicity/gdocs.pydrive

I guess for now I just have to split the backup somehow...

@sagiegurari
Copy link

@sagiegurari sagiegurari commented Jun 30, 2015

@audax checkout googleapis/google-api-python-client#74
basically the google api client does support really big files with chunked upload just fine.
This library however does not, and don't think you should use it for big files.
since the google api client is using https, i'm not sure why use some library for encryption of data on top of the SSL already used (have to admit, didn't read too much about this duplicity lib)

@audax
Copy link

@audax audax commented Jun 30, 2015

Duplicity is a backup program which saves the backup encrypted with gpg, so that's what the additional layer is about. It already supports pydrive, so if this problems annoys me enough, I may just fix it. It shouldn't be too hard to enable the chunked upload while not changing the API.

@Fjodor42
Copy link
Contributor

@Fjodor42 Fjodor42 commented Jun 30, 2015

audax: If you'd care to fix it, I'd be much obliged.

There is an old duplicity bug7feature request somewhere to limit/split the sigtar file to the same rules as for the ordinary split files, but I don't think any progress was made in that regard.

On my own end, I forewent incremental backups for some time, resulting in the sigtar becoming big enough to prompt the problem, but when I get a faster connection, enabling me to do a full backup again, I shall assume that the sigtar will be even bigger, triggering the problem again.

My initial, full, backup was from the times of the gdocs backend...

@audax
Copy link

@audax audax commented Jul 1, 2015

It will take a while until I start hacking on PyDrive though, I am currently a bit busy. A fix of the old duplicity bug would be nice, though, but I guess that's a much harder problem than to implement streaming in this small lib here.

For the moment I switched to rsync and just backup my server to my home server instead of google drive…

@Fjodor42
Copy link
Contributor

@Fjodor42 Fjodor42 commented Feb 3, 2016

Bump, anyone?

@audax
Copy link

@audax audax commented Feb 4, 2016

I won't fix the bug, I switched to a local NAS in the living room instead of using PyDrive…

Fjodor42 added a commit to Fjodor42/PyDrive that referenced this issue Feb 16, 2016
googleworkspace#55

Reading the code for apiclient.http.MediaIoBaseUpload at
https://github.com/google/google-api-python-client/blob/master
/googleapiclient/http.py , it would seem the simply setting
resumable=True should be enough let it select a chunk size for
itself, thus precluding loading the entire file into memory.
@RNabel
Copy link
Collaborator

@RNabel RNabel commented Jun 8, 2016

The patch #56 has been merged, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants