Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support aspera download #68

Closed
atongsa opened this issue Jan 19, 2022 · 8 comments
Closed

support aspera download #68

atongsa opened this issue Jan 19, 2022 · 8 comments
Labels
enhancement Improvement for existing functionality
Milestone

Comments

@atongsa
Copy link

atongsa commented Jan 19, 2022

Description of feature

aspera from ena link may be faster than ftp

@atongsa atongsa added the enhancement Improvement for existing functionality label Jan 19, 2022
@Midnighter
Copy link
Contributor

Hi @atongsa,

I haven't yet come across Aspera in a Docker image which is a hard requirement for nf-core pipelines. If you come across a solution for that it would be possible to integrate it. Do you have some benchmarks comparing the speed, though, or is that just an idea that using Aspera would be faster? I think Aspera is also not officially supported by NCBI any longer, although it still works, of course.

@atongsa
Copy link
Author

atongsa commented Jan 19, 2022

sorry, just idea, i dont test.

@asuragan
Copy link

asuragan commented Feb 3, 2022

Aspera is not officially supported by NCBI because prefetch and fasterq-dump additionally download the index files needed to decompress the *.sra into *.fastq. These index files are saved in {sra-toolkit folder}/refseq.
Whereas using the NCBI API we only get a link to an *.sra file, which we then have to decompress via fasterq-dump anyway. There is no difference in speed, because using ascp, curl, wget, etc we download *.sra from AWS, but prefetch also downloads *.sra files from AWS. https://github.com/ncbi/sra-tools/wiki/Avoid-using-ascp-directly-for-downloads

But when we download the *.fastq.gz from ENA, the download speed via FTP is very limited, but aspera gives a much higher speed.
For example, I was looking at the execution_report of a test run on AWS: https://nf-co.re/fetchngs/results#fetchngs/results-2d593fb504caf65301c78b8076272f895e364cd7/pipeline_info/execution_report_2021-09-15_16-37-52.html
The NFCORE_FETCHNGS:FETCHNGS:SRA_FASTQ_FTP process took 5m 39s to download 46 MB via FTP, which means that the FTP download speed on the AWS server was about 150 KB/s.
And about aspera in docker, aspera has its own image in docker hub: https://hub.docker.com/r/ibmcom/aspera-cli/

@Midnighter
Copy link
Contributor

Sounds promising.

@Midnighter
Copy link
Contributor

Hmm, would either of you be interested in contributing this feature? I'm afraid I won't have time to work on this myself.

@davetang
Copy link

davetang commented Sep 4, 2023

I'm interested in contributing but will need some guidance. The Docker Hub link above for Aspera no longer exists, so I wrote my own Dockerfile and pushed it to Docker Hub.

You can use it by providing the download link and where to output the file.

docker run --rm -u $(id -u):$(id -g) -v $(pwd):$(pwd) -w $(pwd) davetang/aspera_connect:4.2.6.393 era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR390/SRR390728/SRR390728_1.fastq.gz .

Using Aspera Connect is substantially faster than any other method on my home Internet connection. For example:

SRR292241_1.fastq.gz 100%  369MB  238Mb/s    00:13
Completed: 378151K bytes transferred in 13 seconds
 (227709K bits/sec), in 1 file.
SRR292241_2.fastq.gz 100%  376MB  272Mb/s    00:13
Completed: 385855K bytes transferred in 14 seconds
 (218815K bits/sec), in 1 file.
SRR390728_1.fastq.gz 100%   96MB  265Mb/s    00:05
Completed: 99006K bytes transferred in 5 seconds
 (141136K bits/sec), in 1 file.
SRR390728_2.fastq.gz 100%   97MB  261Mb/s    00:07
Completed: 99628K bytes transferred in 7 seconds
 (105342K bits/sec), in 1 file.

There's a tool called ffq, which you may already be aware of, that can be used to generate download links. I made a request for ffq to support Aspera Connect but it was ignored. I wrote a script (that needs a bit more work) to generate Aspera Connect download links from ffq.

Anyway, please let me know how I can contribute! I've just recently been learning about Nextflow and just joined the nf-core Slack workspace.

@Midnighter
Copy link
Contributor

Hi @davetang,

Thank you for speaking up. That seems really cool.

For aspera, I wonder if will be possible to create a bioconda recipe for it? That will ensure a conda package and that BioContainers will be built which can then support Docker and Singularity.

We are aware of ffq, but I also wasn't 100% happy with it so far. I made a prototype of reworking it so that it is much faster, and can output the S3 links https://github.com/Midnighter/ffqf. So maybe I should work on spitting out aspera links, too, and finish up this tool?

Anyway, I think creating a bioconda recipe would be a good starting point, if the license of aspera allows this.

@davetang
Copy link

davetang commented Sep 4, 2023

I have never built a Conda package before but I've always wanted to find out how.

I just found https://anaconda.org/hcc/aspera-cli and will check it out first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

5 participants