<a href="https://colab.research.google.com/github/rzl-ds/gu511/blob/master/006_environments_1_anaconda.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>

# environment management: `anaconda`

## wait, what class is this?

why are we talking about environments?

<br><div align="center"><img src="https://news-media.stanford.edu/wp-content/uploads/2016/11/10165436/environment_GettyImages-501231894.jpg" width="800px"></div>

in the computer science world, the phrase "environment" is often thrown around with slightly ambiguous meaning. in the broadest sense, it can be the "computing" environment or the "operating" environment -- the combination of hardware and software that a user interacts with; the whole enchilada.

in discussions about specific applications and for certain programming languages, it can be filtered down to the "runtime" environment -- the relevant aspects of the hardware (from that application's point of view) and the codebase which defines that application or language

generally speaking, when I talk about the **environment** I'm focusing on the software (literal files, on your computer's disk) that define how *something* behaves. for example...

## your `python` environment

your `python` environment is the tools and packages available to you for use within the `python` programming language, and the way those tools and packags behave. this is completely determined by the literal files defining the `python` language on your computer

### current system `python` environments

let's do a quick `python` version check:

on your `ec2` instance, what `python` version do you have installed?

```sh
python --version
```

```sh
# grrrrr.......
python3 --version
```

In [None]:
%%bash
python --version

different versions of `python` (and different versions of installed packages) correspond to different files defining the language's behavior and thus different levels of compatability. personally, I think knowing that these files exist is among the more important pieces of information in my `python` learning.

***the way that the code you wrote behaves depends on these files***

recall that the `bash` command `which` will tell us the path of the executible that will actually be called when we type in a command

```sh
which python3
```

In [None]:
%%bash
which python3

your out-of-the-box `ec2` instances will likely return `/usr/bin/python3`. so when you type `python3` on the command line, you will actually call the executible file `/usr/bin/python3`.

the same sort of thing is going on for individual `python` modules we import. Every module has a "private" member `__file__` which lists the path to the file used to define that module:

In [None]:
import os
os.__file__

let's look at that file!

```sh
# for you, it is:
less /usr/lib/python3.6/os.py

#for me, right now, it'll be different -- hence the craziness below. sorry!
```

In [None]:
%%bash
OS_FILE=$(python -c "import os; print(os.__file__)")
cat $OS_FILE

if you change that file, or your friend (who is running your code) doesn't have that same file, the code that uses `os` will be different.

the same caveat goes for every file or environment variable used by your python process on any machine. this collection of files defines what is often called the "`python` environment", and it can be different on any system. `sudo apt install` could totally change it.

yikes!

in the real world, the implication is immediate: if one of my programs only works for version 1.2, and another only works for version 2.1, and the `GOVERNMENT AGENCY NAME REDACTED` sysad just installed library 1.0 and *that* took two years, this  will probably be a problem.

It would be nice if this problem was solved...

### virtual environments

"virtual environments" are ways of isolating out the contents (the files) of libraries you're installing.

this is something you've actually probably (*kind of*) done in `R`, actually, without knowing it. if you've ever tried installing a package but didn't have admin rights, the `R` interpreter prompts you to see if there's some other place you'd like to install things (usually in your home directory).

that is a system-level isolation of the files you want to install. When the interpreter is told to load a package, it looks first for your local copy to see if you have anything spicy, and then it checks for a global copy, and then it cries.

so, generalize that idea: let's make *multiple* separate environments (collections of files defining how our `python` code behaves).

we can generalize this beyond just "global" and "user" (as with `R`), even creating a separate environment for each process or code base.

on a very basic level, all we're doing here is re-installing packages into a special sub-directory somewhere on the machine, and then telling `python` (through environment variables like the `PATH` variable) where to look to find them. 

we're tricking `python` into doing the right thing. and `python` is cool about it; once it realizes it's been tricked it's not even mad or anything, it knows it was all a bit of a goof and what's more, we all actually really had a great time and made some good memories.

often times finished `python` projects will ship with a `requirements.txt` file, which lists each `python` package which should be installed and the exact version that it was tested against, and it is expected that it will be executed by a system with the same packages and versions. 

the "virtual environment" is an isolated set of packages that will meet that requirement.

the original way of creating a virtual environment was the python utility `virtualenv`, which is awesome and worth checking out. That being said, however, it's not what I'll recommend.

**<div align="center">what are your quesitons so far?</div>**

## generalizing virtual environments: `conda`

`conda`, short for `anaconda`, is a *distribution* of python. it takes the virtual environment concept above and adds a special wrinkle: while most virtual environment managers allow you to install different versions of `python` *packages*, `conda` allows you to install different versions of `python` *itself*.

this should help you deal with any `python2` vs. `python3` problems you may experience.

so, let's go ahead and install `conda`, create a virtual environment, and install something.

*note: I would recommend you install `conda` on both your laptop and your `ec2` instance, but we will **require** you to install it on your `ec2` instance (it's part of the homework), so you may want to use that instance to do all of this right now*

#### installing `conda`

`conda`, by default, comes with many of the most commonly downloaded `python` packages. This is great because it gives you a pretty solid working base without any modification, *BUT* given our time and bandwidth limits, I'm going to recommend you install the `miniconda` version (the bare bones) and install packages *as needed* instead of up front.

+ [`conda`](https://www.continuum.io/downloads): a big installation, which will take a few minutes, and pre-installs several of the "must haves" (many of the above, and maybe more)
+ [`miniconda`](https://conda.io/miniconda.html): a bare-bones implementation of the above for the *discerning* gentleprogrammer

click on that `miniconda` link (https://conda.io/miniconda.html)

In [None]:
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

**<div align="center">mini exercise: everyone installs `conda`</div>**

```sh
cd ~
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# when prompted, we do the following:
# press ENTER to read the license
#     press `d` to scroll *d*own
# yes: approve the license
# ENTER: we are okay with this location
# yes: run conda init so that your PATH *always* includes conda
```

then log out and back in and run

```sh
rm ~/Miniconda3-latest-Linux-x86_64.sh
conda update conda
```

note: the download link for the miniconda bash script *could change*! update it by actually going to [the miniconda website](https://conda.io/miniconda.html)

+ go to [the miniconda website](https://conda.io/miniconda.html) to get the bash script name
    + we are looking at the 64-bit linux installer
+ download that bash script to your `ec2` server and run it

```sh
cd ~
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# when prompted, we do the following:
# press ENTER to read the license
#     press `d` to scroll *d*own
# yes: approve the license
# ENTER: we are okay with this location
# yes: allow your path to be updated to *always* include conda

conda update conda
```

recall that we previously called

```sh
which python3
```

and got `/usr/bin/python3`, and we also checked the file path to the `os` package (from within a `python` shell):

```python
import os
os.__file__
```

what do we get now, after installing `conda`?

*everything* the `conda` command creates or installs is put into one and only one directory. "uninstalling" `conda` is equivalent to simply deleting that directory.

take a step back and think about the **python environments** you have now:

1. our vanilla `ubuntu` `python` installation (came with the `ec2` instance)
1. this new `anaconda`-created environment
    + this environment is called the `anaconda` `base` environment
    
try the command

```sh
conda env list
```

why stop at only two environments?

we can use the `conda` command to *create* new environments as well. let's try that right now:

```sh
conda create -n l33tmode python=3
```

this will use `conda` to create a new environment named "`l33tmode`" with `python` version 3 installed.

`conda create` creats a new environment inside of new folder under the `env` sub-directory in that main `conda` directory, and installing all of our required packages there. 

as the little dialog will state after you create the environemnt, you have to "activate" that environment if you want to use it. You have to do this any time you want to use a virtual environment.

what we're *actually* doing here is updating the `PATH` environment variable to "point" `python` to our newly created set of files. Now, when we wish to use `python`, we will be using our specialized, isolated versions

So let's do that:

```sh
conda activate l33tmode
```

This should have made our terminal prompt 10 times l33t3r. To verify that we're now looking at different files:

```sh
which python3
```

and now, let's install something fun:

```sh
conda install ipython pandas
```

and then try it out

```sh
ipython
```

this should open a fancier python interpreter (`ipython`). inside, run

```python
import pandas as pd

pd.__version__
```

## freezing and sharing environments

one of the purposes of working with a `python` environment manager like `conda` was to enable us to install whatever we want, but the *reason* we wanted to be able to do that was so that we could make sure that no matter what computer we run our code on we have the same behavior

if we want to do that, we need to be able to

+ **specify** what our environment is when our code is working, and
+ **recreate** that environment in other places

`conda` can help us do both of these things easily

### specify and recreate with `conda env export`

there are two ways to specify the contents of a `conda` environment. first, we can do it in a `conda`-specific way:

```sh
# create an environment yaml file
conda env export > environment.yml

# look at the contents
cat environment.yml
```

this `environment.yml` file can be sent to other users or re-used by you on future `ec2` instances to create a new but completely identical environment:

```sh
conda env create -f environment.yml
```

*note: this will depend on the OS, so you will need to make tweaks if you are sharing between e.g. linux and mac OS environments*

### specify and recreate with `conda list -e`

a completely equivalent option for doing the above is to run

```sh
# create an environments txt file
conda list -e > spec-file.txt

# look at the contents
cat spec-file.txt
```

you can now create a new environment from this file with the command

```sh
conda create --name myenv --file spec-file.txt
```

*note: this will also depend on the OS*

the differences between these two are minor: basically,

+ the `environment.yml` file hard-codes the name of the environment whereas the `spec-file.txt` doesn't
+ the `environment.yml` file includes non-`conda` packages installed via `pip` and `conda` channel information, whereas the `spec-file.txt` doesn't (at least by default)

beyond that, they're basically interchangeable

### specify and recreate with `pip freeze`

the `environment.yml` file you create above can be read by `conda`, but not by other `python` virtual environment or package managers. there is a format for specifying packages to install that is much more broadly recognized in the `python` world -- a `requirements.txt` file. this is the sort of file you could use to install all packages using the basic `pip` package installer, for example.

to create a `requirements.txt` file, you can simply execute

```sh
pip freeze > requirements.txt

# look at the contents
cat requirements.txt
```

you can use this on any system which has `pip` installed to install the listed packages into the active environemnt with

```sh
pip install -r requirements.txt
```

<div align="center"><img src="https://i.ytimg.com/vi/BX1EIlwtQvU/maxresdefault.jpg" width="800px"></div>

# END OF LECTURE

next lecture: [environment management pt. 2: `docker`](006_environments_2_docker.ipynb)