Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support creating files bigger than 50GB #20

Closed
kahing opened this issue Oct 21, 2015 · 9 comments
Closed

support creating files bigger than 50GB #20

kahing opened this issue Oct 21, 2015 · 9 comments

Comments

@kahing
Copy link
Owner

kahing commented Oct 21, 2015

Right now each MPU part is fixed at 5MB, and since S3 has a 10000 maximum part limit we cannot create files bigger than 50GB. We can automatically adjust part size (ie: first 100 parts 5MB, then 50MB, etc) to support bigger files

@kahing kahing changed the title support files bigger than 50GB support creating files bigger than 50GB Dec 1, 2015
@jindov
Copy link

jindov commented Dec 17, 2015

is it support now?I still stuck at 10000 part with a 70GB file
ps: I've already git the latest version

@kahing
Copy link
Owner Author

kahing commented Dec 17, 2015

Not yet. You can try manually increasing the part size from 5MB to something larger.

@jindov
Copy link

jindov commented Dec 17, 2015

Thank you but goofys does not provide any option to increase part size. I must edit in source code?

kahing added a commit that referenced this issue Dec 18, 2015
@kahing
Copy link
Owner Author

kahing commented Dec 18, 2015

I lift the limit to 100GB for now. The proper fix will come later.

@jindov
Copy link

jindov commented Mar 24, 2016

Does we have any options to set part size manually yet?

@schelhorn
Copy link

Implementing larger part sizes (preferably automatically based on the file size) would be highly appreciated here as well; we are storing genomics files up to 500GB.

@kahing
Copy link
Owner Author

kahing commented Jun 24, 2016

The difficulty is that when you write to a new file we don't know how large it's going to be. Will do some sort of staggering part sizes as mentioned in #20 (comment)

kahing added a commit that referenced this issue Jul 29, 2016
this is in preparation for using different mpu part sizes so
we can write larger files

refs #20
@kahing kahing closed this as completed in 11a561b Jul 29, 2016
@kahing
Copy link
Owner Author

kahing commented Jul 29, 2016

@jindov @schelhorn could either of you give the latest revision a try? You will need to have at least 125MB free memory since goofys buffers each part in memory.

this is not release quality just yet, I removed some gating code which means it is possible (although not likely) for goofys to flush thousands of parts concurrently, which is probably not what we want.

kahing added a commit that referenced this issue Aug 21, 2016
@kahing
Copy link
Owner Author

kahing commented Aug 21, 2016

with the last fix I successfully wrote 900GB:

$ dd if=/dev/zero of=local-mnt/900GB bs=1M oflag=nocache count=900000
900000+0 records in
900000+0 records out
943718400000 bytes (944 GB) copied, 4834.32 s, 195 MB/s

test was done on a hi1.4xlarge which has 10GigE. goofys now supports writing up to 5MB * 1000 + 25 * 1000 + 125 * 8000 = 1005GB files.

kahing added a commit that referenced this issue Aug 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants