<div style="border: 2px solid #8A9AD0; margin: 1em 0.2em; padding: 0.5em;">

# Conda Environments For Software Development

by [The Carpentries](https://training.galaxyproject.org/hall-of-fame/carpentries/), [Helena Rasche](https://training.galaxyproject.org/hall-of-fame/hexylena/)

CC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)

**Objectives**

- What are Conda environments in software development and why you should use them?
- How can we manage Conda environments and external (third-party) libraries via Conda?

**Objectives**

- Set up a Conda environment for our software project using <code style="color: inherit">conda</code>.
- Run our software from the command line.

**Time Estimation: 30M**
</div>


<p>Conda environments, like Python Virtual Environments allow you to easily manage your installed packages and prevent conflicts between different project’s dependencies. This tutorial follows an identical structure to the virtualenv tutorial, but with conda.</p>
<blockquote class="comment" style="border: 2px solid #ffecc1; margin: 1em 0.2em">
<div class="box-title comment-title" id="comment"><i class="far fa-comment-dots" aria-hidden="true" ></i> Comment</div>
<p>This tutorial is significantly based on <a href="https://carpentries.org">the Carpentries</a> lesson <a href="https://carpentries-incubator.github.io/python-intermediate-development/">“Intermediate Research Software Development”</a>.</p>
</blockquote>
<p>If you have a python project you are using, you will often see something like
following two lines somewhere at the top.</p>
<div class="language-python highlighter-rouge"><div><pre style="color: inherit; background: transparent"><code style="color: inherit"><span class="kn">from</span> <span class="n">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
</code></pre></div></div>
<p>This means that our code requires two <em>external libraries</em> (also called third-party packages or dependencies) -
<code style="color: inherit">numpy</code> and <code style="color: inherit">matplotlib</code>.</p>
<p>Python applications often use external libraries that don’t come as part of the standard Python distribution. This means
that you will have to use a <em>package manager</em> tool to install them on your system.</p>
<p>Applications will also sometimes need a
specific version of an external library (e.g. because they require that a particular
bug has been fixed in a newer version of the library), or a specific version of Python interpreter.
This means that each Python application you work with may require a different setup and a set of dependencies so it
is important to be able to keep these configurations separate to avoid confusion between projects.
The solution for this problem is to create a self-contained
<em>environment</em> per project, which contains a particular version of Python installation plus a number of
additional external libraries.</p>
<p>If you see something like</p>
<div class="language-python highlighter-rouge"><div><pre style="color: inherit; background: transparent"><code style="color: inherit"><span class="kn">import</span> <span class="n">pysam</span>
</code></pre></div></div>
<p>You know you’ll need additional packages installed on your system, as it relies on <a href="https://github.com/samtools/htslib">htslib, a C library for working with HTS data</a>. This usually means installing additional packages and things that are not always available from within Python’s packaging ecosystem.</p>
<p>Conda environments go beyond virtual environments, and make it easier to develop, run, test and share code with others. In this tutorial, we learn how
to set up an environment to develop our code and manage our external dependencies.</p>
<blockquote class="agenda" style="border: 2px solid #86D486;display: none; margin: 1em 0.2em">
<div class="box-title agenda-title" id="agenda">Agenda</div>
<p>In this tutorial, we will cover:</p>
<ol id="markdown-toc">
<li><a href="#conda-environments" id="markdown-toc-conda-environments">Conda Environments</a></li>
</ol>
</blockquote>
<h2 id="conda-environments">Conda Environments</h2>
<p>So what exactly are conda environments, and why use them?</p>
<p>A conda environment is an <strong>isolated working copy</strong> of specific versions of
one of more packages and all of their dependencies.</p>
<p>This is in fact simply a <em>directory with a particular structure</em> which includes
links to and enables multiple side-by-side installations of different packages
or different versions of the same external library to coexist on your machine
and only one to be selected for each of our projects. This allows you to work on
a particular project without worrying about affecting other projects on your
machine.</p>
<p>As more external libraries are added to your project over time, you can add them to
its specific environment and avoid a great deal of confusion by having separate (smaller) environments
for each project rather than one huge global environment with potential package version clashes. Another big motivator
for using environments is that they make sharing your code with others much easier (as we will see shortly).
Here are some typical scenarios where the usage of environments is highly recommended (almost unavoidable):</p>
<ul>
<li>You have two dependencies with conflicting dependencies! You cannot install
the specific version of software X alongside software Y, as they both depend
on different versions of a dependency, that cannot co-exist. This is solved by
having different environments for both.</li>
<li>You have an older project that only works under Python 2. You do not have the time to migrate the project to Python 3
or it may not even be possible as some of the third party dependencies are not available under Python 3. You have to
start another project under Python 3. The best way to do this on a single machine is to set up two separate Python
environments.</li>
<li>One of your Python 3 projects is locked to use a particular older version of a third party dependency. You cannot use the
latest version of the
dependency as it breaks things in your project. In a separate branch of your project, you want to try and fix problems
introduced by the new version of the dependency without affecting the working version of your project. You need to set up
a separate conda environments for your branch to ‘isolate’ your code while testing the new feature.</li>
</ul>
<p>You do not have to worry too much about specific versions of external libraries that your project depends on most of the time.
Conda environments enable you to always use the latest available version without specifying it explicitly.
They also enable you to use a specific older version of a package for your project, should you need to.</p>
<blockquote class="tip" style="border: 2px solid #FFE19E; margin: 1em 0.2em">
<div class="box-title tip-title" id="tip-a-specific-package-version-is-only-ever-installed-once"><button class="gtn-boxify-button tip" type="button" aria-controls="tip-a-specific-package-version-is-only-ever-installed-once" aria-expanded="true"><i class="far fa-lightbulb" aria-hidden="true" ></i> Tip: A Specific Package Version is Only Ever Installed Once<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>Note that you will not have a separate package installations for each of your projects - they will only
ever be installed once on your system (in <code style="color: inherit">&#36;CONDA/pkgs</code>) but will be referenced from different environments.</p>
</blockquote>
<h3 id="managing-conda-environments">Managing Conda Environments</h3>
<p>There are several commonly used command line tools for managing environments:</p>
<ul>
<li><code style="color: inherit">homebrew</code>, historically used on OSX to manage packages.</li>
<li><code style="color: inherit">nix</code>, which has a steep learning curve but allows you to declare the state of your entire system</li>
<li><code style="color: inherit">conda</code>, package and environment management system (also included as part of the Anaconda Python distribution often used by the scientific community)</li>
<li><code style="color: inherit">docker</code> and <code style="color: inherit">singularity</code> are somewhat similar to other environment managers, as they can have isolated images with software and dependencies.</li>
<li>Other, language specific managers</li>
</ul>
<p>While there are pros and cons for using each of the above, all will do the job of managing
environments for you and it may be a matter of personal preference which one you go for. The Galaxy project is heavily invested in the Conda ecosystem and recommends it as an entry point as it is the most generally useful, and convenient. <a href="http://bioconda.github.io/">The BioConda ecosystem</a> provides an unbelievably large number of packages for bioinformatics specific purposes, which makes it a good choice in general.</p>
<h3 id="managing-packages">Managing Packages</h3>
<p>Part of managing your (virtual) working environment involves installing, updating and removing external packages
on your system. The Conda command (<code class="language-plaintext highlighter-rouge">conda</code>) is most commonly used for this - it interacts
 and obtains the packages from one or more Conda repositories (e.g. Conda Forge, BioConda, etc.)</p>
<blockquote class="tip" style="border: 2px solid #FFE19E; margin: 1em 0.2em">
<div class="box-title tip-title" id="tip-a-note-on-anaconda-and-code-style-quot-color-inherit-quot-conda-code"><button class="gtn-boxify-button tip" type="button" aria-controls="tip-a-note-on-anaconda-and-code-style-quot-color-inherit-quot-conda-code" aria-expanded="true"><i class="far fa-lightbulb" aria-hidden="true" ></i> Tip: A Note on Anaconda and <code style=&quot;color: inherit&quot;>conda</code><span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>Anaconda is an open source Python
distribution commonly used for scientific programming - it conveniently installs Python, package and environment management <code style="color: inherit">conda</code>, and a
number of commonly used scientific computing packages so you do not have to obtain them separately.
<code style="color: inherit">conda</code> is an independent command line tool (available separately from the Anaconda distribution too) with dual functionality: (1) it is a package manager that helps you find Python packages from
remote package repositories and install them on your system, and (2) it is also a virtual environment manager. So, you can use <code style="color: inherit">conda</code> for both tasks instead of using <code style="color: inherit">venv</code> and <code style="color: inherit">pip</code>.</p>
</blockquote>
<p><code style="color: inherit">venv</code> and <code style="color: inherit">pip</code> are considered the <em>de facto</em> standards for environment and package management for Python 3.
However, the advantages of using Anaconda and <code style="color: inherit">conda</code> are that you get (most of the) packages needed for
scientific code development included with the distribution. If you are only collaborating with others who are also using
Anaconda, you may find that <code style="color: inherit">conda</code> satisfies all your needs.</p>
<p>It is good, however, to be aware of all these tools (<code class="language-plaintext highlighter-rouge">pip</code>, <code style="color: inherit">venv</code>, <code style="color: inherit">pyenv</code>, etc.),
and use them accordingly. As you become more familiar with them you will realise that equivalent tools work in a similar
way even though the command syntax may be different (and that there are equivalent tools for other programming languages
too to which your knowledge can be ported).</p>
<figure id="figure-1" style="max-width: 90%; margin:auto;"><img src="../../images/xkcd/python_environment.png" alt="Python environment hell XKCD comic  showing boxes like pip, easy_install, homebrew 2.7, anaconda, homebrew 3.6, /usr/local/Cellar, ~/python/, and a chaotic mess of arrows moving between them all. At the bottom is the text: My python environment has become so degraded that my laptop has been declared a superfund site. (A superfund site is generally an environmental disaster area.). " width="492" height="487" loading="lazy" /><figcaption><span class="figcaption-prefix"><strong>Figure 1</strong>:</span> Python Environment Hell from XKCD 1987 (CC-BY-NC 2.5)</figcaption></figure>
<p>Let us have a look at how we can create and manage environments and their packages from the command line using <code style="color: inherit">conda</code>.</p>
<h3 id="instaling-miniconda">Instaling Miniconda</h3>
<p>We will use Miniconda, a minimal conda installer that is commonly used, in place of the larger and slower to download full anaconda distribution.</p>
<blockquote class="hands_on" style="border: 2px solid #dfe5f9; margin: 1em 0.2em">
<div class="box-title hands-on-title" id="hands-on-installing-conda-via-miniconda"><i class="fas fa-pencil-alt" aria-hidden="true" ></i> Hands-on: Installing Conda via Miniconda</div>
<ol>
<li>Go to the <a href="https://docs.conda.io/en/latest/miniconda.html">Miniconda installation page</a> and find the appropriate installer for your system.</li>
<li>Download and run the script.</li>
<li>You will probably need to close, and restart your terminal.</li>
<li>Check that you can run the <code style="color: inherit">conda</code> command, otherwise something may have gone wrong.</li>
</ol>
</blockquote>
<p><em>If</em> you’re running on Linux <em>and</em> following this tutorial via a Jupyter/CoCalc notebook, and you agree to the <a href="https://legal.anaconda.com/policies/en/?name=terms-of-service">Anaconda terms of service</a>, you can simply run the following cell:</p>


In [None]:
wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b

<p>Now you can add it to your <code style="color: inherit">~/.bashrc</code> which will cause Conda to be automatically loaded whenever you open a terminal:</p>
<div class="language-plaintext highlighter-rouge"><div><pre style="color: inherit; background: transparent"><code style="color: inherit">~/miniconda3/bin/conda init bash
</code></pre></div></div>
<blockquote class="tip" style="border: 2px solid #FFE19E; margin: 1em 0.2em">
<div class="box-title tip-title" id="tip-reopen-shell-restart-jupyter-kernel"><button class="gtn-boxify-button tip" type="button" aria-controls="tip-reopen-shell-restart-jupyter-kernel" aria-expanded="true"><i class="far fa-lightbulb" aria-hidden="true" ></i> Tip: Reopen Shell / Restart Jupyter Kernel<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>Here you will need to restart your kernel, or if you’re in a desktop environment, restart your terminal.</p>
</blockquote>
<h3 id="installing-our-first-package">Installing our First Package</h3>
<p>Let’s install our first package, the new <code style="color: inherit">libmamba</code> solver for Conda, as an example of how to install a package. A side benefit is that it will significant speed up your package installations!</p>


In [None]:
conda install -y -q conda-libmamba-solver=22.8

<blockquote class="tip" style="border: 2px solid #FFE19E; margin: 1em 0.2em">
<div class="box-title tip-title" id="tip-does-it-get-stuck"><button class="gtn-boxify-button tip" type="button" aria-controls="tip-does-it-get-stuck" aria-expanded="true"><i class="far fa-lightbulb" aria-hidden="true" ></i> Tip: Does it get stuck?<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>This step we have sometimes seen get “stuck”, it will finish executing the transaction and hang, despite successfully installing the software. You can restart the kernel if this happens.</p>
</blockquote>
<p>Here we see a few things:</p>
<ul>
<li><code style="color: inherit">-y</code> - installs without asking questions like “do you want to do this”. Generally people don’t use this, but in a Notebook environment it’s a bit nicer.</li>
<li><code style="color: inherit">-q</code> - quiet installation, by default it prints a <em>lot</em> of progress update messages.</li>
<li><code style="color: inherit">conda-libmamba-solver=22.8</code>, the package and version of that package that we wish to install.</li>
</ul>
<p>We’ll now configure conda to use <code style="color: inherit">mamba</code> by default:</p>


In [None]:
conda config --set experimental_solver libmamba

<p>While we’re at it, let’s configure Conda to use the same default repositories as Galaxy:</p>


In [None]:
conda config --add channels bioconda
conda config --add channels conda-forge

<p>This will give us access to the vast repositories of BioConda (bioinformatics software) and Conda Forge (languages and libraries).</p>
<h3 id="creating-a-new-environment">Creating a new Environment</h3>
<p>Creating a new environment is done by executing the following command:</p>


In [None]:
conda create -y -n my-env

<p>where <code style="color: inherit">my-env</code> is any arbitrary name for this Conda environment. Environment names are global, so pick something meaningful when you create one!</p>
<p>For our project, let’s create an environment called <code class="language-plaintext highlighter-rouge">hts</code></p>


In [None]:
conda create -y -n hts

<p>You can list all of the created environments with</p>


In [None]:
conda env list

<p>You’ll notice that there is a <code style="color: inherit">base</code> environment created by default, where you can install packages and play around with Conda. We do not recommend installing things into the <code style="color: inherit">base</code> environment, if at all possible. Create a new environment for each tool you need to install</p>
<blockquote class="tip" style="border: 2px solid #FFE19E; margin: 1em 0.2em">
<div class="box-title tip-title" id="tip-why-separate-environments"><button class="gtn-boxify-button tip" type="button" aria-controls="tip-why-separate-environments" aria-expanded="true"><i class="far fa-lightbulb" aria-hidden="true" ></i> Tip: Why separate environments?<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>Conda’s package resolution takes into account every other package installed in an environment. Especially if you use R packages, this can result in environments taking an inreasing amount to time to install new packages and resolve all of the dependencies.</p>
<p>Thus by using isolated environments, you can be sure package resolution is quite fast.</p>
</blockquote>
<p>Once you’ve created an environment, you will need to activate it:</p>


In [None]:
conda activate hts

<p>Activating the environment will change your command line’s prompt to show what environment
you are currently using (indicated by its name in round brackets at the start of the prompt),
and modify the environment so that any packages you install will be available on the CLI.</p>
<p>When you’re done working on your project, you can exit the environment with:</p>


In [None]:
conda deactivate

<p>If you’ve just done the <code style="color: inherit">deactivate</code>, ensure you reactivate the environment ready for the next part:</p>


In [None]:
conda activate hts

<h3 id="installing-external-libraries-in-an-environment">Installing External Libraries in an Environment</h3>
<p>We noticed earlier that our code depends on two <em>external libraries</em> - <code style="color: inherit">numpy</code> and <code style="color: inherit">matplotlib</code> as well as <code style="color: inherit">pysam</code> which depends on <code style="color: inherit">htslib</code>. In order for the code to run on your machine, you need to
install these dependencies into your environment.</p>
<p>To install the latest version of a package with <code style="color: inherit">conda</code> you use conda’s <code style="color: inherit">install</code> command and specify the package’s name, e.g.:</p>


In [None]:
conda install -y -q python=3 numpy matplotlib pysam

<p>Note that we needed to pick a version of python that we’d use, here we specify  <code style="color: inherit">python=3</code> meaning “any Python version that starts with 3”, so it won’t use Python 2.7 or a future Python 4.</p>
<p>If you run the <code style="color: inherit">conda install</code> command on a package that is already installed, <code style="color: inherit">conda</code> will notice this and do nothing.</p>
<p>To install a specific version of a package give the package name followed by <code style="color: inherit">=</code> and the version number, e.g.
<code class="language-plaintext highlighter-rouge">conda install numpy=1.21.1</code>.</p>
<p>To specify a minimum version of a Python package, you can
do <code style="color: inherit">pip3 install 'numpy&gt;=1.20'</code>.</p>
<p>To upgrade a package to the latest version, e.g. <code style="color: inherit">conda update numpy</code>. (If it’s at the latest version it will attempt to downgrade the package)</p>
<p>To display information about the current environment:</p>


In [None]:
conda info

<p>To display information about a particular package installed in your current environment:</p>


In [None]:
conda list python

<p>To list all packages installed with <code style="color: inherit">pip</code> (in your current environment):</p>


In [None]:
conda list

<p>To uninstall a package installed in the environment do: <code style="color: inherit">conda remove package-name</code>.
You can also supply a list of packages to uninstall at the same time.</p>
<h3 id="exportingimporting-an-environment-with-conda">Exporting/Importing an Environment with <code style="color: inherit">conda</code></h3>
<p>You are collaborating on a project with a team so, naturally, you will want to share your environment with your
collaborators so they can easily ‘clone’ your software project with all of its dependencies and everyone
can replicate equivalent environments on their machines. <code style="color: inherit">conda</code> has a handy way of exporting,
saving and sharing environments.</p>
<p>To export your active environment - use <code style="color: inherit">conda env export</code> command to
produce a list of packages installed in the environment.
A common convention is to put this list in a <code style="color: inherit">environment.yml</code> file:</p>


In [None]:
conda env export > environment.yml
cat environment.yml

<p>The first of the above commands will create a <code style="color: inherit">environment.yml</code> file in your current directory.
The <code style="color: inherit">environment.yml</code> file can then be committed to a version control system and
get shipped as part of your software and shared with collaborators and/or users. They can then replicate your environment and
install all the necessary packages from the project root as follows:</p>


In [None]:
conda env create -y -f environment.yml

<p>The name is bundled directly into the environment, someone else re-creating
this environment from the yaml file will also be able to <code style="color: inherit">conda activate hts</code>
afterwards. If you want it under a different name, you can use the <code style="color: inherit">-n</code> flag to
supply your own name.</p>
<p>As your project grows - you may need to update your environment for a variety of reasons. For example, one of your project’s dependencies has
just released a new version (dependency version number update), you need an additional package for data analysis
(adding a new dependency) or you have found a better package and no longer need the older package (adding a new and
removing an old dependency). What you need to do in this case (apart from installing the new and removing the
packages that are no longer needed from your environment) is update the contents of the <code style="color: inherit">environment.yml</code> file
accordingly by re-issuing <code style="color: inherit">conda env export</code> command and propagate the updated <code style="color: inherit">environment.yml</code> file to your collaborators
via your code sharing platform (e.g. GitHub).</p>
<blockquote class="tip" style="border: 2px solid #FFE19E; margin: 1em 0.2em">
<div class="box-title tip-title" id="tip-official-documentation"><button class="gtn-boxify-button tip" type="button" aria-controls="tip-official-documentation" aria-expanded="true"><i class="far fa-lightbulb" aria-hidden="true" ></i> Tip: Official Documentation<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>For a full list of options and commands, consult the <a href="https://docs.conda.io/projects/conda/en/latest/commands.html">official <code style="color: inherit">conda</code> documentation</a></p>
</blockquote>


# Key Points

- Environments keep Python versions and dependencies required by different projects separate.
- An environment is itself a directory structure of software and libraries
- Use `conda create -n <name>` to create and manage environments.
- Use `conda install` to install and manage additional external (third-party) libraries.
- Conda allows you to declare all dependencies for a project in a separate file (by convention called `environment.yml`) which can be shared with collaborators/users and used to replicate an environment.
- Use `conda env export > environment.yml` to take snapshot of your project's dependencies.
- Use `conda env create -f environment.yml` to replicate someone else's environment on your machine from the `environment.yml` file.

# Congratulations on successfully completing this tutorial!

Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-conda/tutorial.html#feedback) and check there for further resources!
