Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multi-part uploads for large files #142

Closed
GoogleCodeExporter opened this issue Mar 21, 2015 · 4 comments
Closed

Implement multi-part uploads for large files #142

GoogleCodeExporter opened this issue Mar 21, 2015 · 4 comments

Comments

@GoogleCodeExporter
Copy link

For files with size greater than 20MB, split the file into 10MB chunks and 
upload serially (for now).

Use AWS multi-part upload protocol.

Hopefully this will alleviate some of the issues when trying to upload large 
files.

Original issue reported on code.google.com by dmoore4...@gmail.com on 24 Dec 2010 at 10:34

@GoogleCodeExporter
Copy link
Author

Issue 97 has been merged into this issue.

Original comment by dmoore4...@gmail.com on 27 Dec 2010 at 8:58

@GoogleCodeExporter
Copy link
Author

Issue 30 has been merged into this issue.

Original comment by dmoore4...@gmail.com on 27 Dec 2010 at 11:55

@GoogleCodeExporter
Copy link
Author

Just committed r297 as a checkpoint.

Multipart upload is written and operational and has undergone various testing. 
Last big test was a rsync of a 1GB file. Using the standard U.S. region bucket, 
this had issues at the end of the rsync, all parts got uploaded but the final 
mtime/chmod that rsync does caused a hang.

Repeated on a US-west bucket and things went well:

> rsync -av --progress --stats --whole-file --inplace 1G.bin uswest.suncup.org/
sending incremental file list
1G.bin
  1073741824 100%   18.95MB/s    0:00:54 (xfer#1, to-check=0/1)

Number of files: 1
Number of files transferred: 1
Total file size: 1073741824 bytes
Total transferred file size: 1073741824 bytes
Literal data: 1073741824 bytes
Matched data: 0 bytes
File list size: 45
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1073872989
Total bytes received: 31

sent 1073872989 bytes  received 31 bytes  183677.93 bytes/sec
total size is 1073741824  speedup is 1.00


However copies and rsync's of smaller files <500MB worked just fine.

More testing is needed and there are a few issues to take care of before 
calling this one good. (e.g. code cleanup, some more error checking, a compile 
warning, etc)

I did do a MD5 comparison of a 400MB file that I uploaded and then downloaded 
elsewhere -- the sums matched.

Changing the read_write_timeout option helps too for large files. It seems that 
when the multipart upload  is complete, the Amazon server needs some time to 
assemble the file. Increasing the timeout resolved the curl timeout funciton 
issue.

Max file size  is now ~2GB, as getting over 2^31 causes some datatype issues - 
there are some alternate functions to try.  Right now if you try to upload a 
file bigger thean this you'll get a "not supported" error.

If anyone is interested in testing this, please svn update, compile, install 
and test. your feedback will be much appreciated.

Original comment by dmoore4...@gmail.com on 28 Dec 2010 at 4:32

@GoogleCodeExporter
Copy link
Author

r298 fixes this one

Original comment by dmoore4...@gmail.com on 30 Dec 2010 at 3:56

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant