# Conda

When working in data science, artificial intelligence, deep learning, or whatever you want to call it, we are going to do several projects. And you may have to install for example version 11.6 of cuda in some of them and 11.8 in others. And in those cases I advise you, never fight with cuda, it always wins.

Therefore it is best to create separate environments for each project. This way you can install what you want in each environment and not globally. And this way you will not have problems of incopatibilities with library versions.

To create environments python comes by default with `venv` which are your virtual environments. But I recommend you to use `conda` to create your virtual environments, because apart from creating virtual environments, it is also a package manager, and it is a better package manager than `pip`.

This is not a post explaining `conda`, so you will not find how to install it or how to use it. It is a post telling the advantages of using `conda` and also of using `mamba` (which we will explain later).

This notebook has been automatically translated to make it accessible to more people, please let me know if you see any typos.

I will create three different conda environments, one will be called `pip_env`, one `conda_env` and one `mamba_env`.

## Conda vs PIP

### pip_env

I will create a new environment called `pip_env`.

In [None]:
!conda create -n pip_env

In the `pip_env` environment I will install `pandas`.

In [1]:
# pip_env
!pip install pandas

Collecting pandas
  Using cached pandas-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
Installing collected packages: pandas
Successfully installed pandas-2.0.1


As you can see in the text that came out when installing `pandas`, it depends on `numpy` so it installs it in its version `1.24.3`. But if for whatever reason we need `numpy` in its version `1.19`, if we try to install it we will get an error

In [2]:
# pip_env
!pip install numpy==1.19.0

Collecting numpy==1.19.0
  Using cached numpy-1.19.0.zip (7.3 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: numpy
  Building wheel for numpy (pyproject.toml) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mBuilding wheel for numpy [0m[1;32m([0m[32mpyproject.toml[0m[1;32m)[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[1113 lines of output][0m
  [31m   [0m Running from numpy source directory.
  [31m   [0m Cythonizing sources
  [31m   [0m numpy/random/_bounded_integers.pxd.in has not changed
  [31m   [0m numpy/random/_bounded_integers.pyx.in has not changed
  [31m   [0m numpy/random/_philox.pyx has not changed
  [31m   [0m numpy/random/_mt19937.pyx has not changed
  [31m   [0m numpy/random/_sfc64.pyx has not changed

It has given us an error, and if we see what version of `numpy` we have, we see that we are still with `1.24.3`.

In [3]:
# pip_env
import numpy as np
np.__version__

'1.24.3'

And we see which version of `pandas` we have

In [4]:
# pip_env
import pandas as pd
pd.__version__

'2.0.1'

### conda_env

To resolve this conflict, we can use `conda`, I create a new environment called `conda_env`.

In [None]:
!conda create -n conda_env

and now we tell it that we want to install `numpy` in version `1.19` and `pandas`, and `conda` will look for the way to do it

In [8]:
# conda_env
!conda install -y numpy=1.19 pandas

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/wallabot/miniconda3/envs/conda_env

  added / updated specs:
    - numpy=1.19
    - pandas


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2023.01.10 |       h06a4308_0         120 KB
    certifi-2021.5.30          |   py36h06a4308_0         139 KB
    intel-openmp-2022.1.0      |    h9e868ea_3769         4.5 MB
    mkl-2020.2                 |              256       138.3 MB
    mkl-service-2.3.0          |   py36he8ac12f_0          52 KB
    mkl_fft-1.3.0              |   py36h54f3939_0         170 KB
    mkl_random-1.1.1           |   py36h0573a6f_0         327 KB
    numpy-1.19.2               |   py3

It seems to have succeeded, let's see

In [1]:
# conda_env
import numpy as np
np.__version__

'1.19.2'

In [2]:
# conda_env
import pandas as pd
pd.__version__

'1.1.5'

We can see that he was able to install both, only that in order to solve the conflicts he installed `pandas` in version `1.1.5`, instead of version `2.0.1` which he had installed `pip`.

## Mamba vs conda

Once we have seen that conda is better for resolving conflicts, let's see now the difference between using `mamba` and `conda`. `Conda` as we have seen is very good at resolving conflicts, but it has the problem that it is slow installing packages, since the dependencies are installed in series, one after the other. Thanks to `mamba` we will have the same benefits of `conda`, only that the dependencies will be installed in parallel, making use of the kernels that we have in our bug.

### conda_env

Let's stay in the `conda_env` environment and see how long it takes to install `pytorch`. By putting `time` before a command we can see how long it takes to run

In [2]:
# conda_env
!time conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/wallabot/miniconda3/envs/conda_env

  added / updated specs:
    - pytorch
    - pytorch-cuda=11.8
    - torchaudio
    - torchvision


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    bzip2-1.0.8                |       h7b6447c_0          78 KB
    cuda-cudart-11.8.89        |                0         197 KB  nvidia
    cuda-cupti-11.8.87         |                0        25.3 MB  nvidia
    cuda-libraries-11.8.0      |                0           1 KB  nvidia
    cuda-nvrtc-11.8.89         |                0        1

We see that it has taken 294.42 seconds, about 4.9 minutes, almost 5 minutes.

### mamba_env

Now we are going to reinstall `pytorch`, but with `mamba`. First we create an environment called `mamba_env`.

In [None]:
!conda create -n mamba_env

To install `mamba`, download it from [mambaforge](https://github.com/conda-forge/miniforge#mambaforge) and install it.

Now we reinstall `pytorch` in `mamba_env`.

In [1]:
# mamba_env
!time mamba install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia


                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/  `
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (1.3.1) supported by @QuantStack

        GitHub:  https://github.com/mamba-org/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████


Looking for: ['pytorch', 'torchvision', 'torchaudio', 'pytorch-cuda=11.8']

[?25l[2K[0G[+] 0.0s
[2K[1A[2K[

Now it has taken 121.61 seconds, about 2 minutes. Less than half as long as with `conda`.

## Create an environment from a file

We may want to create an environment with a certain list of packages, so we can pass a file to conda to create the environment with those packages. To do this we create a file called `environment.yml` with a content like the following one

````yml
name: environment_from_file
channels:
  - defaults
  - conda-forge
  - pytorch
  - nvidia
dependencies:
    - python=3.11
    - cudatoolkit=11.8
    - pytorch=2.2.1
    - torchaudio
    - torchvision
    - pip
    - pip:
        - transformers
```

As we can see, we indicate the name of the environment, the channels that we are going to use, the packages with their versions that we are going to install through conda and the packages that we are going to install through pip. Now we tell conda to create the environment with these packages.

````bash
conda env create -f environment.yml
```

We create the file

In [1]:
!touch environment.yml \
&& echo "name: entorno_desde_archivo" >> environment.yml \
&& echo "channels:" >> environment.yml \
&& echo "  - defaults" >> environment.yml \
&& echo "  - conda-forge" >> environment.yml \
&& echo "  - pytorch" >> environment.yml \
&& echo "  - nvidia" >> environment.yml \
&& echo "dependencies:" >> environment.yml \
&& echo "    - python=3.11" >> environment.yml \
&& echo "    - cudatoolkit=11.8" >> environment.yml \
&& echo "    - pytorch=2.2.1" >> environment.yml \
&& echo "    - torchaudio" >> environment.yml \
&& echo "    - torchvision" >> environment.yml \
&& echo "    - pip" >> environment.yml \
&& echo "    - pip:" >> environment.yml \
&& echo "        - transformers" >> environment.yml

Now that we have the file we can create the custom environment

In [2]:
!conda env create -f environment.yml

Retrieving notices: ...working... done
Channels:
 - defaults
 - conda-forge
 - pytorch
 - nvidia
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done


    current version: 23.11.0
    latest version: 24.1.2

Please update conda by running

    $ conda update -n base -c conda-forge conda



Downloading and Extracting Packages:
pytorch-2.2.1        | 1.35 GB   |                                       |   0% 
cudatoolkit-11.8.0   | 630.7 MB  |                                       |   0% [A

libcublas-12.1.0.26  | 329.0 MB  |                                       |   0% [A[A


libcusparse-12.0.2.5 | 163.0 MB  |                                       |   0% [A[A[A



libnpp-12.0.2.50     | 139.8 MB  |                                       |   0% [A[A[A[A




libcufft-11.0.2.4    | 102.9 MB  |                                       |   0% [A[A[A[A[A





libcusolver-11.4.4.5 | 98.3 MB   |                                       |   0% [A

## Install packages from an archive

Another thing we can do is to have a list of packages we want to install, in order to install all of them at once we can create a file called `requirements.yml` with a content like this

```txt
channels:
  - conda-forge
dependencies:
  - pandas==2.2.1
  - matplotlib==3.8.3
```

And now we tell conda to install those packages for us

````bash
conda install --file requirements.yml
```

In [3]:
!touch requirements.txt \
&& echo "pandas==2.2.1" >> requirements.txt \
&& echo "matplotlib==3.8.3" >> requirements.txt

Now that we have the file we install the packages

In [5]:
!conda install --file requirements.txt

Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package pandas-1.3.3-py37h40f5888_0 requires python >=3.7,<3.8.0a0, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ [32mpandas 1.3.3 [0m is installable with the potential options
│  ├─ [32mpandas 1.3.3[0m would require
│  │  └─ [32mpython >=3.7,<3.8.0a0 [0m, which can be installed;
│  ├─ [32mpandas 1.3.3[0m would require
│  │  └─ [32mpython >=3.8,<3.9.0a0 [0m, which can be installed;
│  └─ [32mpandas 1.3.3[0m would require
│     └─ [32mpython >=3.9,<3.10.0a0 [0m, which can be installed;
└─ [31mpin-1[0m is not installable because it requires
   └─ [31mpython 3.11.* [0m, which conflicts with any installable versions previously reported.

Pins seem to be involved in the conflict. Currently pinned specs:
 - python 3.1