Installation

We recommend to use virtualenv when possible, especially when dealing distributed systems such as SLURM.

Virtual environment

So if you are using conda, make sure to exit your environment via conda deactivate. While you could install DeepBLAST / TMvec within a conda environment, we have run into problems when running distributed training, so use at your own risk.

You can create a new virtual environment via

python3 -m venv tmvec

This will create a folder called tmvec, and all install scripts will be installed there. The install script location can be placed anywhere on your system (which is very useful on distributed systems). You can activate your environment via source tmvec/bin/activate

CPU installation

Once the virtualenv is created, you can activate your environment and install everything as follows.

For more details on pytorch versions, see the pytorch instructions

You will then need to install faiss via pip install faiss-cpu

Then, the latest versions of DeepBLAST and TMvec can be installed as follows.

pip install tm-vec

To install the development versions, run the following commands

pip install git+https://github.com/tymor22/tm-vec.git

Because DeepBLAST is a dependency of TM-vec, installing TM-vec will automatically install DeepBLAST.

GPU installation

If you have a GPU available, you can take advantage of accelerated database building, search and alignment.

This can be done as follows (with cuda=11.6). You change the URL below to reflect your cuda toolkit version (cu118 for cuda 11.8, cu121 for cuda 12.1). Don't supply a number greater than your installed cuda toolkit version though!

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
pip3 install faiss-gpu

Then DeepBLAST / TM-vec can both be installed via pip install tm-vec

For more information on other cuda versions, see the pytorch installation documentation.

DeepBLAST does use Numba in order to compute alignments on the GPU. As a result, Numba can be finicky regarding the GPU setup. Sometimes, it is sufficient to use the locally installed cudatoolkit. If your compute cluster has it installed, it may be a matter of loading the modules. The command that we used on our slurm cluster was module load gcc cudnn cuda, but this may vary depending on the cluster.

If you are installing this on your local machine, you may need to rewire some paths, namely manually installing nvidia drivers and the cudatoolkit. On Ubuntu, the cuda-toolkit can be installed via

sudo apt-get install nvidia-cuda-toolkit

However, this isn't enough, since Numba has default paths defined for searching for the cuda-toolkit, so you may need to override CUDA_HOME. For instance, on my Ubuntu machine, I ran

export CUDA_HOME=/usr/local/cuda-11.3

Other notes

We have only tested DeepBLAST / TM-vec on Linux machines, so there is no guarantee that it will work with Windows / Mac. Furthermore, these models are large and will require >12GB of GPU memory (we've tested it with 24GB-80GB of RAM for training and inference). If you don't have these types of GPUs, you can still run DeepBLAST / TM-vec on the CPU, but expect a >10x increase in runtime.