Skip to content

HowTo: Access SRA Data

vartanianmh edited this page Dec 1, 2016 · 7 revisions

The SRA Toolkit provides tools to download and access SRA data.

At the risk of starting this page off on a negative note, please do not download data using generic tools such as ftp, wget, etc. Doing so can create incomplete images and complicate problem diagnosis.

The supported means of downloading SRA data is to use the tool prefetch included in the SRA Toolkit. Data may also be downloaded on demand (see our Wiki page) over HTTPS. The decision of which method to use depends upon your circumstances and in some cases the amount of data you will actually use from an SRA file.

feature prefetch on-demand wget ascp
supports Aspera yes no no yes
supports HTTPS yes yes yes no
partial download no yes no no
VDB name resolution yes yes no no
VDB cache yes yes no no
dbGaP authorization yes yes no no
Kart files yes no no no

As an example of prefetch usage:

$ prefetch SRR1482462
Maximum file size download limit is 20,971,520KB

2015-02-19T13:20:06 prefetch.2.4.4: 1) Downloading 'SRR1482462'...
2015-02-19T13:20:06 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:32 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:32 prefetch.2.4.4: 1) 'SRR1482462' was downloaded successfully
2015-02-19T13:20:35 prefetch.2.4.4: 'SRR1482462' has 22 dependencies
2015-02-19T13:20:36 prefetch.2.4.4: 2) Downloading 'ncbi-acc:NC_000067.5?vdb-ctx=refseq'...
2015-02-19T13:20:36 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:41 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:41 prefetch.2.4.4: 2) 'ncbi-acc:NC_000067.5?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:41 prefetch.2.4.4: 3) Downloading 'ncbi-acc:NC_000068.6?vdb-ctx=refseq'...
2015-02-19T13:20:41 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:46 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:46 prefetch.2.4.4: 3) 'ncbi-acc:NC_000068.6?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:46 prefetch.2.4.4: 4) Downloading 'ncbi-acc:NC_000069.5?vdb-ctx=refseq'...
2015-02-19T13:20:46 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:51 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:51 prefetch.2.4.4: 4) 'ncbi-acc:NC_000069.5?vdb-ctx=refseq' was downloaded successfully
...

As can be seen from the output above, prefetch performs several steps:

  1. check the size of the file being downloaded
    If the file is very large, prefetch must be given a higher download limit, e.g.:
    $ prefetch --max-size 100000000 SRR1482462

  2. download the requested file
    The file is downloaded using Aspera if available on your system, or HTTPS otherwise.

  3. put the file into its proper place
    The file is downloaded into your designated cache area. This permits VDB name resolution to work as designed.

  4. recursively download missing external reference sequences
    Most SRA files require additional sequence files in order to reconstruct original reads. prefetch ensures that you not only download the main file but all of its dependencies.

  5. access dbGaP encrypted data
    prefetch will make use of download and decryption keys that have been added to SRA Toolkit configuration to obtain authorization for the download in addition to performing all of the steps above. (N.B. In order to access dbGaP data, you will need to change directory or "cd" to the dbGaP project's workspace.)

prefetch will also operate on existing, previously downloaded files to recursively download any missing external reference sequences.