Skip to content

Getting BLAST databases

Christiam Camacho edited this page Dec 19, 2018 · 15 revisions

How one may go about accessing BLAST databases on the cloud varies depending on where the data comes from and the environment in which it runs.

Environment Origin: NCBI Origin: User-provided
VM/AMI Pre-configured in /blast/blastdb Build your own using:
Install in /blast/blastdb_custom
Docker Obtain via:
Install in /blast/blastdb_custom
Same as above cell

NCBI provided BLAST databases

The BLAST databases available in Google Cloud Storage (GCS) can be found here.

VM/AMI

The BLAST databases listed above are pre-configured and available on /blast/blastdb. Any user-provided BLAST databases can be installed in /blast/blastdb_custom for ease of integration with BLAST tools on that VM/AMI.

Docker

Please see data provisioning in our Docker image documentation.

End user provided BLAST databases

If you want to BLAST against your own BLAST databases (in any enviroment), please build your BLAST databases using either makeblastdb4cloud or makeblastdb, and make those BLAST databases available in /blast/blastdb_custom.

remote-fuser

The BLAST VM/AMI comes pre-configured with BLAST databases retrieved by this program. It is configured to download the most recent version of these BLAST databases at the time of the first search, but these BLAST databases will not be automatically updated as newer databases become available.

remote-fuser downloads the BLAST database over the network, so the first BLAST search against a new database will be slow. remote-fuser caches the database on disk and subsequent searches will be faster.

remote-fuser is part of the NCBI SRA Toolkit. You can download and install remote-fuser following these instructions or via a Docker container.