**1st slide**

## Managing scientific python computing environments 
#### By Raphael Luthi 
##### Head of data science at Groupe Mutuel

**slide 2**

### **How much time** have you spent **worrying** about python environments? 

**TOO MUCH !**

I'll take you through **my pain-free workflow** to help you finally **tame these bloody environments**

**slide 3**

### Let's start with some definitions: 

- **virtualenv**:  an environment manager for python
- **pip**: the most popular package manager for python
- **PyPI**: a repository of python packages 

- **conda**: an open-source package and environment manager that runs on Windows, macOS, and Linux
- **Anaconda repository**: 8000 open-source packages compiled for all major operating systems
- **Anaconda company**: the company that maintains conda and owns the Anaconda repository
- **miniconda**: a minimal installer of conda

- **conda-forge**: a community-led package repository for the conda package manager

- **mamba**: a more efficient reimplementation of the conda package manager

**slide 4**

### pip vs. Anaconda
*The truth comes from the mouth of Google's search history:*

<img src="images/im4.png" width="800"/>

- *Anaconda* gets searched a little less than *pip* (frequency of the search term is not a perfect metric but gives an idea)
- But over time, they both go up and down together 
- They both found their niche and are good at what they specialise in.

**pip and Anaconda are not in competition!**

**Slide 5**

### How do you choose? 

**pip/virtualenv** is:
- (+) de facto standard for packaging python code
- (+) lightweight

BUT:
- (-) reproducibility can be hard to achieve, especially cross-platform
- (-) no dependency graph optimisation

**conda** is:
- (+) very popular among the scientific computing community
- (+) can optimise complex dependency graphs and make reproducable cross-platform environments 
- (+) handles multiple python versions and non-python binaries(numpy, scipy, ...)

BUT:
- (-) Anaconda (or miniconda) must be pre-installed
- (-) complex dependency graph can be extremely slow to solve

**mamba** is:
- (+) like conda but fast!

BUT:
- (-) not widely adopted

**Good news: Today pip & conda & mamba work GREAT together!**

**Slide 6**

### My pain-free workflow

#### 1. Setting things up

1. If missing, install [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
1. Always make sure conda is up to date: `conda update -n base conda`
1. Speed things up with the experimental [libmamba](https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community) solver:
  -  `conda install -n base conda-libmamba-solver`

#### 2. Creating a new conda environment 
**Create** a new conda environment with all the projet's dependencies:

```bash
conda create -n painless_env -c conda-forge --strict-channel-priority -v \
    --experimental-solver libmamba \
    pandas pandera pandas-profiling sweetviz lux-api \
    black flake8 pre-commit nbQA jupyterlab \
    pandas-vet flake8-builtins bandit flake8-markdown \
    pep8-naming flake8-bugbear flake8-variables-names isort
```

**Activate**:
```bash
conda activate painless_env
```

With `--experimental-solver libmamba`, the dependency graph is solved **much faster!**

More info: https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community

#### 3. Installing additional libraries
**Additional conda libraries**:   
```bash
conda install -c conda-forge -n painless_env tabulate
```

**Additional pip libraries**:   
```bash
python -m pip install dice
```

**Local package in editable mode:**:   
```bash
python -m pip install -e .
```

#### 4. Exporting and reproducing the environment
**Exporting** the current environment configurations : 
```bash
conda env export | grep -v "^prefix: " > environment.yml
```

**Deterministic recreation** of the environment:
```bash
conda env create -f environment.yml
```

**Tips for reproducibility:**: 
- The yaml file might become massive, but that's okay!
- The yaml file should be versioned (`git add environment.yml`) and distributed with the project so that  all users and developers work with the same environment.

**Slide 7**

### Change in Anaconda terms of service

**Be aware**: since March 2022, you need an enterprise licence is required for commercial usage of Anaconda's services 

*Non commercial* usage is defined as:
- individual hobbyists
- students
- universities
- non-profit organisations
- businesses with less than 200 employees

More info:
- [Anaconda Commercial Edition FAQ](https://www.anaconda.com/blog/anaconda-commercial-edition-faq)
- [Anaconda pricing](https://www.anaconda.com/pricing)
- [Anaconda terms of service](https://www.anaconda.com/terms-of-service)

**Slide 8**

# Thanks for listening! 

Here is the [link](https://github.com/rluthi/PyData_London_2022/blob/master/taming_python_environments.md) to this presentation

Happy to connect: 
* github.com/rluthi
* linkedin.com/in/raphaelluthi

SHOUTOUT to [calmcode.io](https://calmcode.io) (my favourite resource for any advanced python/data stuff!)

In [17]:
# TODO: try converting to MD
# TODO: push to github
# TODO: link to the pres. in the pres.