Improvements relating to the use of SRA data #179

louiejtaylor · 2018-11-29T18:21:21Z

I've been stress-testing running Sunbeam using two fairly large datasets from SRA--this issue is a reminder for me to migrate some of the solutions @ArwaAbbas and I used in hisss and hopefully make running on SRA data seamless.

Every so often, downloading a sample from NCBI fails. Most of the time, however, this failure is stochastic, so that re-running the grabseqs or fasterq-dump command will succeed. This problem is exacerbated if you ask for a lot of samples at once (like on an HPC or something) and is likely some sort of undocumented rate limit on NCBI's end. For example, in the 300-sample dataset I'm working on, 15 of them failed the first try (but all succeeded just by running sunbeam run again).

The way Arwa and I solved this in hisss was to pass the --restart-times parameter to tell Snakemake to retry failed rules, which would look something like this in Sunbeam:

 sunbeam run --restart-times 3 --configfile sunbeam_config.yml all_decontam

I remember trying to wait progressively longer after failed downloads using the attempt variable in Snakemake (CTRL+F "attempt" in https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources) but never could get that to work--however, passing --restart-times 3 and sleeping for 100s after a failed download solved all of the problems Arwa and I were having with the SRA fragility.

This may just be material for a question in the FAQ/point in the documentation, but I would bet that there's something we can do in Sunbeam to mitigate this.

The text was updated successfully, but these errors were encountered:

louiejtaylor · 2018-11-29T18:23:27Z

Come to think of it, I do have a retries parameter in grabeqs...maybe that's where this should be solved. It makes sense to squash problems as far upstream as possible!

Aaanyway. TL;DR, sunbeam get works well so far, need to make downloading a bit more robust.

louiejtaylor · 2018-11-29T20:15:54Z

This should be fixed now, although I'll hold off closing this issue until I do another stress test with another large dataset (and bump the grabseqs version in environment.yaml)

louiejtaylor self-assigned this Nov 29, 2018

louiejtaylor mentioned this issue Nov 29, 2018

Retries parameter should be improved louiejtaylor/grabseqs#4

Closed

louiejtaylor added a commit to louiejtaylor/grabseqs that referenced this issue Nov 29, 2018

Fix #4; related to sunbeam-labs/sunbeam#179

029b208

louiejtaylor mentioned this issue Nov 30, 2018

Merge sunbeam get into sunbeam init #180

Merged

1 task

louiejtaylor closed this as completed in 7f67ad6 Dec 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements relating to the use of SRA data #179

Improvements relating to the use of SRA data #179

louiejtaylor commented Nov 29, 2018 •

edited

Loading

louiejtaylor commented Nov 29, 2018

louiejtaylor commented Nov 29, 2018

Improvements relating to the use of SRA data #179

Improvements relating to the use of SRA data #179

Comments

louiejtaylor commented Nov 29, 2018 • edited Loading

louiejtaylor commented Nov 29, 2018

louiejtaylor commented Nov 29, 2018

louiejtaylor commented Nov 29, 2018 •

edited

Loading