# A Taste of Conda
## Bioinformatics Coffee Hour - April 14, 2020
#### Your Host: Nathan Weeks

## Problems
#### How do I install scientific software on _INSERT LINUX SERVER OR CLUSTER NAME HERE_ as an unprivileged user?
##### (and how do I get the same software on my workstation?)
#### How do I create a reproducible software environment for data analysis?

### Possible solutions
* [Environment modules](https://docs.rc.fas.harvard.edu/kb/modules-intro/)
  - (+) Maintained by FAS RC staff
  - (+) Easy to use
  - (-) New / updated software requests: submit ticket (wait...)
  - (-) Not reproducible outside of Cannon (mostly)
* language-specific (e.g., [pip](https://pip.pypa.io/) (Python))
* [Singularity containers](https://docs.rc.fas.harvard.edu/kb/singularity-on-the-cluster/) (future topic...)

### Conda is...
> Package, dependency and environment management for any language--Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. (source: https://docs.conda.io)

### Where can I get conda... for my workstation?
* [Anaconda Python](https://www.anaconda.com/distribution/)
  - 100s of bundled scientific packages
  - Anaconda Navigator (GUI for launching apps & installing packages)
![Anaconda Navigator](https://docs.anaconda.com/_images/nav-defaults.png)

* [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
  - Minimal python+conda environment
  - Can be installed in your home directory on a Linux cluster (not necessary on Cannon...)

#### Cannon
Find latest version with `module-query Anaconda3`, or searching the [FAS RC Application Portal](https://portal.rc.fas.harvard.edu/p3/build-reports/Anaconda3)
```
$ module load Anaconda3/2019.10
```

*For this lesson, we'll use the conda bundled with  conda*

## Channel
* Package set maintained by an organization
* Main ones a computational biologist will use:


#### 1. [defaults](https://anaconda.org/anaconda/repo)
  - Maintained by Anaconda, Inc.

#### 2. [conda-forge](https://conda-forge.org/)
  - Large community-led package collection
  - *Well-curated, up-to-date*

#### 3. [bioconda](https://bioconda.github.io)
  - Specializes in bioinformatics software

## Where to find Conda packages
- [anaconda.org](https://anaconda.org/)
- [Bioconda recipe index](https://bioconda.github.io/conda-recipe_index.html)

## An interactive walkthrough...

---

#### Getting help
Invoke `conda` with the `-h` or `--help` option to display a list of subcommands.
- Displays usage for subcommands (e.g., `conda list -h`)

In [None]:
conda --help

## Setting conda channels for bioinformatics

bioconda packages may have dependencies on packages in the *conda-forge* and *defaults* channels.
We can specify channels to search as command-line arguments for conda operations, e.g.:

`conda search -c conda-forge -c bioconda bwa`

However, it is convenient to configure a default list of channels.
Per [the bioconda documentation](https://bioconda.github.io/user/install.html#set-up-channels):

In [None]:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

---
*Pro tip:* Set `channel_priority` to `strict` to [speed up conda package searches](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/conda-performance.html#set-strict-channel-priority)

In [None]:
conda config --set channel_priority strict

*Note:* strict channel priority will be the [default in conda 5.0](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html#strict-channel-priority)

We can verify the list of channels:

In [None]:
conda info

#### Searching for packages with conda

Use `conda search` to search for packages by name.

E.g., to search for the package `bwa` (exact match)

In [None]:
conda search bwa

The `*` character can be used as a wildcard.

E.g., to search for all packages *beginning* with the string "bwa":

In [None]:
conda search 'bwa*'

### Environments

Conda packages can be installed into separate *environments* (directories containing separate sets of conda packages).

conda is installed into a "base" environment.

In [None]:
conda env list

#### Listing packages installed in an environment
We can see what packages are already installed in our current environment using `conda list`

In [None]:
conda list

*Never install packages into the base environment*. Always create a new environment.
- base environment is read-only on Cannon

#### Creating a new environment

In [None]:
conda create -h

The `-n/--name ENVIRONMENT` option creates a "named" conda environment in your "envs" directory (by default `${HOME}/.conda/envs`).

Let's create an environment called *bwa*:

In [None]:
conda create -y -n bwa

Verify the new environment was created:

In [None]:
conda env list

---

*Pro tip (Jupyter notebooks & conda via environment module)*: activate the _base_ environment before issuing any subsequent `conda activate` / `conda deactivate` commands ([gory details...](https://github.com/conda/conda/issues/7980))

In [None]:
source activate

*Note: currently slightly-incompatible with Jupyter notebooks; will cause subsequent commands to display error exit status `: 1`*

---

Before we activate the new *bwa* environment, let's check our PATH environment variable (list of directories your shell searches for commands)

In [None]:
echo ${PATH}

Activate the new "bwa" environment.
Subsequent 
Notice that the `/srv/conda/envs/bwa/bin` directory was prepended to your PATH.
In addition, in an interactive shell, the shell prompt would be prefixed with `(bwa)`.

In [None]:
conda activate bwa
echo ${PATH}

In [None]:
conda install -y bwa

In [None]:
In addition to named environments, we can also create an environment in an arbitrary directory using the `-p PATH` option. This can be useful for installing software in a shared directory that is accessible by your lab/group.

Suppose we want to install samtools at `/srv/shiny-server/sample-apps/samtools` (*Note: this directory is just for illustration*).
Furthermore, suppose we need an old version of samtools (0.1.19).
We'll select the version using the `=` operator.



In [None]:
conda create -y -p /srv/shiny-server/sample-apps/samtools samtools=0.1.19 

Note the directory structure in `/srv/shiny-server/sample-apps/samtools`:

In [None]:
ls /srv/shiny-server/sample-apps/samtools

`conda list -p` treats that directory as a conda environment, and lists installed packages:

In [None]:
conda list -p /srv/shiny-server/sample-apps/samtools

### Nested (aka "stacked") environments

Normally, when a conda environment is activated, it replaces the previous environment.
We can use the `--stack` option to instead nest the environment so that we have access to packages in both.

In [None]:
conda activate --stack /srv/shiny-server/sample-apps/samtools

`conda env list` shows only the most recently-activated directory (on the top of the environment "stack"):

In [None]:
conda env list

But both `samtools` and `bwa` are in our PATH:

In [None]:
type samtools; type bwa

To deactivate the current environment, use `conda deactivate`.

After this is executed, we're back to the *bwa* environment (only).
If executed a second time, we would be back to the *base* environment.

In [None]:
conda deactivate
conda env list

### Sharing environments

OK, we used *bwa* to cure COVID-19. Way to go!

Now for the important part (in academia): publish our work.

We'll use `conda env export` to export our current conda environment (*bwa*) to a [YAML](https://en.wikipedia.org/wiki/YAML) file that will record and can be used to recreate our (conda) environment:

In [None]:
conda env export

Redirect this output to a file to save it (let's call the file `environment.yaml`)

In [None]:
conda env export > environment.yml
cat environment.yml

To show that this suffices to recreate the *bwa* environment, we'll delete our current bwa environment... 

In [None]:
conda deactivate
conda env remove -n bwa
conda env list

...and recreate the environment using `conda env create`:

In [None]:
conda env create -f environment.yml
conda env list

And there's our *bwa* environment:

In [None]:
conda list -n bwa

### Your turn!
Search for a package at [bioconda.io]([Bioconda recipe index](https://bioconda.github.io/conda-recipe_index.html)), [Anaconda.org](https://anaconda.org), or using `conda search`, and try to install it into a new environment:

In [None]:
conda create -y -n myenv PACKAGE1 [PACKAGE2...]

In [None]:
conda activate myenv