# Conda

There are many bioinformatics software applications available for Linux and we have used many of these in this course (e.g. `samtools`, `bcftools`, `iqtree`, `roary` etc.). Many of these applications are complex and difficult to install. This is because often they dependend on and require other software applications and packages. This is complicated further when you have software applications that require different versions of the same software dependency. Consider `srst2` which has a dependency on python 2.7 and `ARIBA` which requires python 3.X or higher. How can you install both `srst2` and `ARIBA` on your computer and manage the conflicting python dependencies?

One solution is to use a software package manager to help manage software installations and their dependencies. A package manager is a tool that automates the process of installing, updating, and removing software packages on your computer. One of the most common package managers is `conda`. Here we will demonstrate how to install `conda`, create `conda` environments and install bioinformatics software with `conda`.

## Installing conda

A software distribution is a collection of software packages that are pre-built and pre-configured for use on a specific computer system. There are two available distributions of conda:

* __miniconda__: a basic, lightweight installation (mostly just conda and Python)
* __anaconda__: a larger distribution with more pre-installed packages (includes conda, Python and 250+ automatically installed, open-source scientific packages and their dependencies)

Let's check if conda is installed:

In [None]:
conda --version

In previous tutorials conda was installed for the Linux user `manager`. As we have switched to a different user on the system (`bioinf`) you should see a message indicating that conda is not installed.

In [None]:
Command 'conda' not found, did you mean:

Try: sudo apt install <deb name>

Now let's install _miniconda_, a lightweight distribution of conda. Detailed instructions for installing conda on a Linux system can be found at [https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html). 

In summary you need to run the following commands:

1. Set up a directory in your home directory for miniconda:

In [None]:
mkdir -p ~/miniconda3

2. Download the miniconda installation script and store in a file called `miniconda.sh` in the `~/miniconda3` directory:

In [None]:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \\
-O ~/miniconda3/miniconda.sh

3. Run the installation script:

In [None]:
bash ~/miniconda3/miniconda.sh -b -p ~/miniconda3

Here `-b` and `-p` are options passed to the script `miniconda.sh`, `-b` means run the install in batch mode (without manual intervention) and `-p` specifies where on the computer to put the miniconda installation and in this case it will be stored in ~/mininconda3.

In [None]:
ls ~/miniconda3

4. Remove the installation script `~/miniconda3/miniconda.sh` as it is no longer required:

In [None]:
rm -rf ~/miniconda3/miniconda.sh

5. Initialize your newly-installed Miniconda:

In [None]:
~/miniconda3/bin/conda init bash

Now check if conda has installed:

In [None]:
conda --version

If installed correctly, you should also see (_base_) at the start of your prompt in your terminal.

__Note:__ If you are installing conda on your own computer, the instructions may vary depending on what operating system you are using. Detailed installation instructions can be found at [https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)

## Conda channels

There are many software packages available via conda and they are made available to users via channels. Conda channels are the online locations where packages are stored and serve as the base for hosting and managing packages. Examples of conda channels are `default` and `conda-forge` and many bioinformatics software packages are available in the `bioconda` channel. 

By default, conda will search the `default` channel for a software package.

In [None]:
conda install bwa

Conda fails to install `bwa` as it is not available in the `default` channel.

If you want to install a software package from a specific channel you can specify the channel with the `-c` option of the `conda install` command. For example, to install bwa from the `bioconda` channel:

In [None]:
conda install -c bioconda bwa

Alternatively you can configure and modify which channels are automatically searched by conda using the commands:

In [None]:
conda config --add channels default
conda config --add channels bioconda
conda config --add channels conda-forge

This means that `conda` will first search the `default` channel, followed by the `bioconda` channel, followed by the `conda-forge` channel until it finds the given package. 

Try installing another bioinformatics software:

In [None]:
conda install mafft

More information about Bioconda can be found at [https://bioconda.github.io/](https://bioconda.github.io/)

## Installing software

Some software packages will have dependencies in common where others will have dependencies that may conflict with each other. One approach to manage this is to install workflows or even individual software versions in their own environments.

You can create a conda environment with:

In [None]:
conda create -n samtools-1.17 samtools=1.17

This will create a conda environment (think of it as a box) and put `samtools` version 1.17 and all it's dependencies in the environment called `samtools-1.17`. You can name the environment anything but it is good practice to name it using a combination of the software name and version.

Once the environment is created you can access the software by activating the environment that contains the software:

In [None]:
conda activate samtools-1.17

The start of your terminal prompt will change from (base) to (samtools-1.17).

Check that `samtools` is installed:

In [None]:
samtools -h

To deactivate the environment and go back to the base environment:

In [None]:
conda deactivate

To see the list of available environments:

In [None]:
conda info --envs

To search for a particular environment:

In [None]:
conda info --envs | grep -i samtools

To list all the software packages installed in a specific environment:

In [None]:
conda list -n samtools-1.17

## Keeping conda up to date
Conda and related software are often updated regularly so it is good to keep everything up to date. To update all software in the base conda environment you can use:

In [None]:
conda update --all

## Cheatsheet

A list of useful conda commands can be found at [https://docs.conda.io/projects/conda/en/stable/_downloads/843d9e0198f2a193a3484886fa28163c/conda-cheatsheet.pdf](https://docs.conda.io/projects/conda/en/stable/_downloads/843d9e0198f2a193a3484886fa28163c/conda-cheatsheet.pdf)

## Exercises

1. List all the software packages installed in the base environment 
2. List all the software packages installed in the samtools-1.17 environment
3. Install the ARIBA software package in a new environment
4. What version of Python was installed with ARIBA? Is this different to the version installed in the base environment?
5. How many software packages were installed with ARIBA?
6. In the ARIBA environment, what channel was ARIBA installed from?
7. In the ARIBA environment, what channel was Python installed from?
8. How many conda environments have been created?

Now move on to the next part of the tutorial [Nextflow](nextflow.ipynb)