Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate disk space requirement #1000

Closed
zhuzhuo opened this issue Jul 12, 2018 · 1 comment
Closed

Estimate disk space requirement #1000

zhuzhuo opened this issue Jul 12, 2018 · 1 comment
Assignees

Comments

@zhuzhuo
Copy link

zhuzhuo commented Jul 12, 2018

Hello,

I'm also having the issue of not enough disk space. I'm working on a plant genome of ~600M with 7X coverage (more data is being produced), and 7 Tb /scratch disk space was used up.

I have read issue #587 and #703 and I saw the Sergey's answers. I wonder if it was possible to estimate disk usage. Now I'm running serveral tests with different number of input reads and monitorting disk usage, but I want to ask the question here to see if there is a better way.

Thank you very much,
Zhu

@brianwalenz
Copy link
Member

7x really isn't enough to do anything useful, and could end up using more space than with higher coverage since the repeats won't be masked out as heavily. It would be better to estimate based on the output of a few jobs from the full read set, then adjusting parameters if needed.

Space usage depends heavily on genome properties (repeats, duplications, ploidy) and read length/quality. Plants, unfortunately, seem to want a lot of space. 10TB is a reasonable guess. I've seen it as bad as 20TB, but I forget what the genome was.

@brianwalenz brianwalenz self-assigned this Aug 8, 2018
@skoren skoren closed this as completed Aug 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants