HowTo: Access SRA Data
The SRA Toolkit provides tools to download and access SRA data.
At the risk of starting this page off on a negative note, please do not download data using generic tools such as ftp, wget, etc. Doing so can create incomplete images and complicate problem diagnosis.
The supported means of downloading SRA data is to use the tool
prefetch included in the SRA Toolkit. Data may also be downloaded on demand (see our Wiki page) over HTTPS. The decision of which method to use depends upon your circumstances and in some cases the amount of data you will actually use from an SRA file.
|VDB name resolution||yes||yes||no||no|
As an example of
$ prefetch SRR1482462 Maximum file size download limit is 20,971,520KB 2015-02-19T13:20:06 prefetch.2.4.4: 1) Downloading 'SRR1482462'... 2015-02-19T13:20:06 prefetch.2.4.4: Downloading via fasp... 2015-02-19T13:20:32 prefetch.2.4.4: fasp download succeed 2015-02-19T13:20:32 prefetch.2.4.4: 1) 'SRR1482462' was downloaded successfully 2015-02-19T13:20:35 prefetch.2.4.4: 'SRR1482462' has 22 dependencies 2015-02-19T13:20:36 prefetch.2.4.4: 2) Downloading 'ncbi-acc:NC_000067.5?vdb-ctx=refseq'... 2015-02-19T13:20:36 prefetch.2.4.4: Downloading via fasp... 2015-02-19T13:20:41 prefetch.2.4.4: fasp download succeed 2015-02-19T13:20:41 prefetch.2.4.4: 2) 'ncbi-acc:NC_000067.5?vdb-ctx=refseq' was downloaded successfully 2015-02-19T13:20:41 prefetch.2.4.4: 3) Downloading 'ncbi-acc:NC_000068.6?vdb-ctx=refseq'... 2015-02-19T13:20:41 prefetch.2.4.4: Downloading via fasp... 2015-02-19T13:20:46 prefetch.2.4.4: fasp download succeed 2015-02-19T13:20:46 prefetch.2.4.4: 3) 'ncbi-acc:NC_000068.6?vdb-ctx=refseq' was downloaded successfully 2015-02-19T13:20:46 prefetch.2.4.4: 4) Downloading 'ncbi-acc:NC_000069.5?vdb-ctx=refseq'... 2015-02-19T13:20:46 prefetch.2.4.4: Downloading via fasp... 2015-02-19T13:20:51 prefetch.2.4.4: fasp download succeed 2015-02-19T13:20:51 prefetch.2.4.4: 4) 'ncbi-acc:NC_000069.5?vdb-ctx=refseq' was downloaded successfully ...
As can be seen from the output above,
prefetch performs several steps:
check the size of the file being downloaded
If the file is very large,
prefetchmust be given a higher download limit, e.g.:
$ prefetch --max-size 100000000 SRR1482462
download the requested file
The file is downloaded using Aspera if available on your system, or HTTPS otherwise.
put the file into its proper place
The file is downloaded into your designated cache area. This permits VDB name resolution to work as designed.
recursively download missing external reference sequences
Most SRA files require additional sequence files in order to reconstruct original reads.
prefetchensures that you not only download the main file but all of its dependencies.
access dbGaP encrypted data
prefetchwill make use of download and decryption keys that have been added to SRA Toolkit configuration to obtain authorization for the download in addition to performing all of the steps above. (N.B. In order to access dbGaP data, you will need to change directory or "cd" to the dbGaP project's workspace.)
prefetch will also operate on existing, previously downloaded files to recursively download any missing external reference sequences.