Estimate disk space requirement #1000

zhuzhuo · 2018-07-12T20:08:07Z

Hello,

I'm also having the issue of not enough disk space. I'm working on a plant genome of ~600M with 7X coverage (more data is being produced), and 7 Tb /scratch disk space was used up.

I have read issue #587 and #703 and I saw the Sergey's answers. I wonder if it was possible to estimate disk usage. Now I'm running serveral tests with different number of input reads and monitorting disk usage, but I want to ask the question here to see if there is a better way.

Thank you very much,
Zhu

brianwalenz · 2018-07-13T09:03:50Z

7x really isn't enough to do anything useful, and could end up using more space than with higher coverage since the repeats won't be masked out as heavily. It would be better to estimate based on the output of a few jobs from the full read set, then adjusting parameters if needed.

Space usage depends heavily on genome properties (repeats, duplications, ploidy) and read length/quality. Plants, unfortunately, seem to want a lot of space. 10TB is a reasonable guess. I've seen it as bad as 20TB, but I forget what the genome was.

brianwalenz self-assigned this Aug 8, 2018

brianwalenz added the FAQ needed label Aug 8, 2018

skoren closed this as completed Aug 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimate disk space requirement #1000

Estimate disk space requirement #1000

zhuzhuo commented Jul 12, 2018

brianwalenz commented Jul 13, 2018

Estimate disk space requirement #1000

Estimate disk space requirement #1000

Comments

zhuzhuo commented Jul 12, 2018

brianwalenz commented Jul 13, 2018