You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been stress-testing running Sunbeam using two fairly large datasets from SRA--this issue is a reminder for me to migrate some of the solutions @ArwaAbbas and I used in hisss and hopefully make running on SRA data seamless.
Every so often, downloading a sample from NCBI fails. Most of the time, however, this failure is stochastic, so that re-running the grabseqs or fasterq-dump command will succeed. This problem is exacerbated if you ask for a lot of samples at once (like on an HPC or something) and is likely some sort of undocumented rate limit on NCBI's end. For example, in the 300-sample dataset I'm working on, 15 of them failed the first try (but all succeeded just by running sunbeam run again).
The way Arwa and I solved this in hisss was to pass the --restart-times parameter to tell Snakemake to retry failed rules, which would look something like this in Sunbeam:
sunbeam run --restart-times 3 --configfile sunbeam_config.yml all_decontam
I remember trying to wait progressively longer after failed downloads using the attempt variable in Snakemake (CTRL+F "attempt" in https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources) but never could get that to work--however, passing --restart-times 3 and sleeping for 100s after a failed download solved all of the problems Arwa and I were having with the SRA fragility.
This may just be material for a question in the FAQ/point in the documentation, but I would bet that there's something we can do in Sunbeam to mitigate this.
The text was updated successfully, but these errors were encountered:
Come to think of it, I do have a retries parameter in grabeqs...maybe that's where this should be solved. It makes sense to squash problems as far upstream as possible!
Aaanyway. TL;DR, sunbeam get works well so far, need to make downloading a bit more robust.
This should be fixed now, although I'll hold off closing this issue until I do another stress test with another large dataset (and bump the grabseqs version in environment.yaml)
I've been stress-testing running Sunbeam using two fairly large datasets from SRA--this issue is a reminder for me to migrate some of the solutions @ArwaAbbas and I used in hisss and hopefully make running on SRA data seamless.
Every so often, downloading a sample from NCBI fails. Most of the time, however, this failure is stochastic, so that re-running the
grabseqs
orfasterq-dump
command will succeed. This problem is exacerbated if you ask for a lot of samples at once (like on an HPC or something) and is likely some sort of undocumented rate limit on NCBI's end. For example, in the 300-sample dataset I'm working on, 15 of them failed the first try (but all succeeded just by runningsunbeam run
again).The way Arwa and I solved this in hisss was to pass the
--restart-times
parameter to tell Snakemake to retry failed rules, which would look something like this in Sunbeam:I remember trying to wait progressively longer after failed downloads using the
attempt
variable in Snakemake (CTRL+F "attempt" in https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources) but never could get that to work--however, passing--restart-times 3
and sleeping for 100s after a failed download solved all of the problems Arwa and I were having with the SRA fragility.This may just be material for a question in the FAQ/point in the documentation, but I would bet that there's something we can do in Sunbeam to mitigate this.
The text was updated successfully, but these errors were encountered: