# Setting up the conda environment

In this notebook I will set up the `conda` environment for the project. It is mostly a run-of-the-mill Python `scanpy` environment with extra packages for integration and cross-species comparisons thrown in.

### Install conda/conda

(I assume this is already taken care of)

### Create environment

There are two rules for choosing a name. First, and most important, the name should help you remember what this environment was supposed to achieve. You will definitely need this when you look back at your conda environments three months after finishing with a project. Second, and this is more of a personal preference, the name should be easy/fast to type. If you switch up between different conda environments a lot you will appreciate their names being short.

```
> conda create -n ascc23
> conda activate ascc23
```

### Install python

Our environment is first and foremost a Python environment, so we should install this first.

```
> conda install python=3.9
```

### Install single-cell basics

We will now install `scanpy`, the most important package for the Python single-cell analysis ecosystem. Scanpy has a lot of important dependencies, so by installing it first we will get them in the appropriate versions.

```
> conda install -c conda-forge scanpy python-igraph leidenalg
```

### Install jupyterlab & plotting

Plotting is one area where R is much better than python, though the situation has improved a lot in the last years. Scanpy comes with `matplotlib-basics`, but we might be needing the full suite. Plus, `seaborn` has some excellent out-of-the-box plots that will be very useful once we get to integration and cross-species comparisons.

```
> conda install -c conda-forge jupyterlab matplotlib seaborn
```

### Install R interface

```
> conda install -c conda-forge rpy2
> conda install -c bioconda anndata2ri
```

### Install integration algorithms, dependencies

```
> conda install -c conda-forge pandas matplotlib datashader bokeh holoviews colorcet scikit-image r-igraph
> conda install -c conda-forge python-annoy pybind11 dill fast-histogram umap-learn
> conda install -c bioconda bbknn
```

### Other, not on `conda`

Harmony performs integration on the PCA embedding:
```
> pip install harmonypy
```

Scanorama, a manifold stitching approach:
```
> pip install scanorama
```

PyLiger, the Python implementation of the LIGER NMF-based integration algorithm:
```
> pip install hnswlib
> pip install pyliger
```

SAMap, a tool for cross-species comparison using a BLAST graph.
```
> pip install hnswlib
> pip install sam-algorithm
> pip install https://github.com/galicae/SAMap/archive/master.zip
```

Scrublet, a doublet detection algorithm:
```
> pip install scrublet
```

PHATE, a visualisation algorithm:
```
> pip install phate
```

PHATE installation will change the `pandas` version, because of some old dependencies. We will manually revert that:

```
> pip uninstall pandas
> conda install -c conda-forge pandas==2.0.1
```

PhenoGraph, a clustering algorithm.
```
> pip install PhenoGraph
```

### R dependencies

Start an R session inside the conda environment:

```
> conda activate asc23
> R
```

Inside the R session, type and execute the following lines:
```
> install.packages(c("Seurat", "SoupX", "BiocManager"))
> BiocManager::install(c("scDblFinder", "scater"))
```