## Package management in Python
#### Background information
* The standard package manager in python is pip (Pip Installs Packages)
* The standard central repository for python packages is PyPI (Python Package Index)

### What's wrong with just using one Python installation for everything?
#### What if I want to use different versions of Python for different projects?
Well, you could install another version of python and add it to your path.

Then what happens when you use pip?  You can't really be sure without additional information.  You'd have to know which version of python is first on your path, and maybe specify the whole path to the python you want to use to be sure.

#### What if I have two projects that depend on conflicting versions of the same package?
This is a big one: Python can't handle this on its own.  If ProjectA requires basicpackage 2.0 and ProjectB requires basicpackage 3.0, you're out of luck, they can't both exist on the same python.  

#### How do I document dependencies for other users?
You can look at every file for the import statements. Or, you can fail upwards by repeatedly running the code, see which import fails, install that package, and repeat until the software works.

_What if there were a better way?_

## Enter the Virtual Environment!
#### What is a virtual environment?
* A virtual environment is a way to easily create a fresh python installation for your project.  Each one can have its own packages with whatever versions it needs.


#### Why should I use one?
It is Best Practice™ to have a separate virtual environment for every project you work on because:
* It becomes trivial to guarantee that every place you use python (your terminal, IDE debugger and terminal, etc) is using the same python installation.
* Isolate dependencies between projects to prevent version conflicts.
* Makes environments and imports easy to document and easy to replicate.

#### How does it work?
* A virtual environment creates a new Python installation directory. 
* In the bin/ subdirectory, it has hardlinks to the binaries of another, "base" Python installation directory.
* It points to its *own* `site-packages` or equivalent directories where packages can be installed.
* When we activate a virtual environment, it sets your environment variables to point to the new, hardlinked python.
* When we deactivate it, everything goes back to how it was before.

There are several packages that allow for creating python virtual environments (the default venv is good and easy to use) but my favorite is...
## Conda
* Conda is an open-source python version manager *and* package manager *and* environment manager that runs on Windows, macOS and Linux.
* Conda easily creates, saves, loads and switches between environments on your local computer.
* It was created for Python data scientists (and that's where it is most popular), but it can theoretically package and distribute software for any language (whereas pip and PyPI are only for python packages).
* It has better dependency conflict resolution than pip
    - Pip will install a package that breaks your code (or, worse, silently makes it impossible to replicate your results).
    - Conda will find a way to choose compatible versions or tell you if you've asked for the impossible.
* Some of conda's data science libraries are optimized with the Intel Math Kernel Library for faster training.

## Sounds great, how do I get started?

## Creating an Environment
Typical fully-loaded example:
```
conda create -n my-new-env python=3.6 scipy=0.15.0 astroid babel
```
* `conda create -n [name]` The base command.
* `-n` specifies the name of my new environment.
    * If you would prefer you can instead specify a path to the environment with `-p /prefix/for/env/location`/
* `python=3.6` Specifying a python version is optional.
* `scipy=0.15.0` Specifying a version of scipy to install to the environment, downloading it if you don't have it.
* `astroid babel` Install the latest versions of astroid and babel that are compatible with all my other dependencies.

## Activating and Deactivating an Environment

![Switching from Conda's base environment, to my default environment, to the environment I created to make this presentation](activate-envs.png "Switching Between Conda Environments")

No more worrying about which python installation to use:  Just run `python my_script.py` and the python of your active environment will be used to run it.

## Installing More Packages
* Find available versions with:

In [17]:
! conda search scikit-learn | tail

scikit-learn                  0.23.1  py38h58f5ce4_0  pkgs/main           
scikit-learn                  0.23.1  py38h603561c_0  pkgs/main           
scikit-learn                  0.23.2  py36h959d312_0  pkgs/main           
scikit-learn                  0.23.2  py37h959d312_0  pkgs/main           
scikit-learn                  0.23.2  py38h959d312_0  pkgs/main           
scikit-learn                  0.23.2  py39hb2f4e1b_0  pkgs/main           
scikit-learn                  0.24.1  py36hb2f4e1b_0  pkgs/main           
scikit-learn                  0.24.1  py37hb2f4e1b_0  pkgs/main           
scikit-learn                  0.24.1  py38hb2f4e1b_0  pkgs/main           
scikit-learn                  0.24.1  py39hb2f4e1b_0  pkgs/main           


* With your environment activated, run:
```
conda install scikit-learn=0.24.1
```
* Specifying a version is optional.
* Despite the warnings about pip's dependency resolution issues above, you can still use pip from within a conda environment, so you're not limited to only Anaconda packages.


## Documentation
![image.png](drake_requirements.jpeg)
* When your code is ready, run:
    ```
    conda env export > environment.yml
    ```
* Include the `environment.yml` file in your git repository for future users of your code.
* To create a new environment from an environment.yml file:
    ```
    conda env create -n my-env-from-file -f environment.yml
    ```

## Delivering Environments
Conda-pack is a command line tool that archives a conda environment, which includes all the binaries of the packages installed in the environment.  This is useful when you want to reproduce an environment with limited or no internet access
* Requirements:  Source and target machines are of the same OS type (Mac/Linux/Windows) and both have a miniconda install
* Activate the base environment so that the package will be available to all sub-environments, then install conda-pack with:
```
conda install -c conda-forge conda-pack
```
    * `-c conda-forge` specifies that the package comes from the conda-forge channel instead of the main Anaconda repository.
    
* Pack the environment with
```
conda pack -n my-env-for-customer
```
* Deliver to the .tar.tgz file to the target computer.  In the target's miniconda envs directory, create a new directory for your new env.  Activate the new environment, unpack it, and deactivate it with:
```
cd /dir/to/miniconda3/envs/my_env
source my_env/bin/activate
conda-unpack
source my_env/bin/deactivate
```

#### FAQ: I've heard of conda, Anaconda, and Miniconda.  What's the difference?
* Conda is the package and environment manager software.
* Anaconda and Miniconda are both distributions.  They both have conda and a python installation that is used as the base.
* Miniconda comes with a minimal python installation and the conda package and environment manager.
* Anaconda is a metapackage of 160 data science packages (numpy, scipy, pandas, etc).
* If you have Miniconda, you can install all the Anaconda packages with
    ```
    conda install anaconda
    ```