Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements relating to the use of SRA data #179

Closed
louiejtaylor opened this issue Nov 29, 2018 · 2 comments
Closed

Improvements relating to the use of SRA data #179

louiejtaylor opened this issue Nov 29, 2018 · 2 comments
Assignees

Comments

@louiejtaylor
Copy link
Member

louiejtaylor commented Nov 29, 2018

I've been stress-testing running Sunbeam using two fairly large datasets from SRA--this issue is a reminder for me to migrate some of the solutions @ArwaAbbas and I used in hisss and hopefully make running on SRA data seamless.

Every so often, downloading a sample from NCBI fails. Most of the time, however, this failure is stochastic, so that re-running the grabseqs or fasterq-dump command will succeed. This problem is exacerbated if you ask for a lot of samples at once (like on an HPC or something) and is likely some sort of undocumented rate limit on NCBI's end. For example, in the 300-sample dataset I'm working on, 15 of them failed the first try (but all succeeded just by running sunbeam run again).

The way Arwa and I solved this in hisss was to pass the --restart-times parameter to tell Snakemake to retry failed rules, which would look something like this in Sunbeam:

 sunbeam run --restart-times 3 --configfile sunbeam_config.yml all_decontam

I remember trying to wait progressively longer after failed downloads using the attempt variable in Snakemake (CTRL+F "attempt" in https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources) but never could get that to work--however, passing --restart-times 3 and sleeping for 100s after a failed download solved all of the problems Arwa and I were having with the SRA fragility.

This may just be material for a question in the FAQ/point in the documentation, but I would bet that there's something we can do in Sunbeam to mitigate this.

@louiejtaylor louiejtaylor self-assigned this Nov 29, 2018
@louiejtaylor
Copy link
Member Author

Come to think of it, I do have a retries parameter in grabeqs...maybe that's where this should be solved. It makes sense to squash problems as far upstream as possible!

Aaanyway. TL;DR, sunbeam get works well so far, need to make downloading a bit more robust.

@louiejtaylor
Copy link
Member Author

This should be fixed now, although I'll hold off closing this issue until I do another stress test with another large dataset (and bump the grabseqs version in environment.yaml)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant