# Python Environments

Welcome to the "Python Environments" unit! In this unit, we will learn:
- Environments 101
- Environment Managers for Python
- Collaboration with Environments

In [None]:
import pandas as pd
import numpy as np

## Environments 101

Imagine the following scenarios:

1. You just developed an awesome model, you send it to your colleague and it crashes
2. You receive a brand new laptop, and it takes you 2 days to configure your Python to work as before
3. You need to develop locally and upload your application to somewhere else

What do they all have in common? That's right, **they could be all solved by using environments**.

<div class="alert alert-info">
    &#9432; <strong>Environments</strong> are collections of packages, dependencies, libraries, and all the software you need to run code. 
</div>

Applications will often use packages that don't come as part of the standard library (i.e. Python vanilla) and they sometimes require specific version of a library. This means it may not be possible for one Python installation to meet the requirements of every application. Also, packages evolve through time and their changes may break your code if we are not careful enough.

Seems complicated? Let's get a real life example.

### Natural Environments


<p float="left">
    <img src=media/beach-water.jpg alt="Photo by Mink Mingle on Unsplash" width=200>
    <img src=media/forest.jpg alt="Photo by Federico Bottos on Unsplash" width=200>
</p>

Let's say we have two natural environments: a beach and a forest, that are far apart. The beach has water, sand, seashells; while the forest has plenty of wood, leaves and grass available.

**Environments are great because we can know what surround us**, what we can (and cannot) use. 
We can build a nice sand sculpture in the beach, using the water and the sand. And we can build a nice picnic table in the forest, using the wood.
But we cannot craft boats in the beach, because we don't have the wood to do it. Nor can we craft seashell necklaces in the forest, because we lack the specific resources in that environment. 

So, we need to move stuff around. We can plant trees near the beach to have the wood for the canoes. And we can move some seashells back to the forest to decorate our picnic table.

The wood for the canoe is a **dependency**: you need (i.e. depend on) wood to build your boat. Without wood, there is no canoe.

<img src=media/beach-trees.jpg title="Photo by Matthew Brodeur on Unsplash" width=300 />

A dependency can target multiple things: if instead of a canoe we want to build a sailboat, we will need the tree leaves on top of the wood to build the sails.

**Different environments are useful for different stuff**: a boat is awesome to have on a beach but it is useless in the forest.

### Coding Environments

Now let's go back to the code. 

Some packages work as useful resources: we'll need scikit-learn to build ML models, pandas to handle data, etc..

In [None]:
# Typical DS import
import pandas as pd
from sklearn.linear_model import LinearRegression

Instead of the natural environments, we can create our own **development environments**: virtual habitats were can do stuff and build applications **IF** we have the resources to do them.


<div class="alert alert-info">
    &#9432; An <strong>environment</strong> should be self-contained: everything inside it depends only on things that are within the same environment.
</div>

<img src=media/envsAB.png width=400>

Just like the natural environments, resources (i.e. packages) that exist only in one environment will (and should) not be available outside the environment's scope. We can run `pandas` and `scikit-learn` code on `EnvironmentA` but we cannot plot graphs with `altair`, because this package is only available in `EnvironmentB`.

The usefulness will be clearer in later stages. For now, just remember: **everything to its environment**.

## Environment Managers for Python



We leverage **Environment Managers** to handle all the heavy stuff of dealing with virtual environments. They allow us to:

1. Install/upgrade/downgrade/delete **packages**;
2. **Dependency** resolution, i.e. determining which version of a dependency to install;
3. **Reproduction** of environments, allowing to run projects with different dependencies on the same machine;

We will focus on [Conda](https://docs.conda.io/en/latest/), an open source environment management system that runs on multiple operating systems (Windows, macOS, Linux) for any language (e.g. Python, R, Julia).

For completeness, there are other alternatives you can also explore on your own like [venv](https://docs.python.org/3/library/venv.html), [Virtualenv](https://virtualenv.pypa.io/en/latest/), [Pipenv](https://pypi.org/project/pipenv/) or [Poetry](https://python-poetry.org/docs/).

Since notebooks are not very useful for running the next set of commands, **let's switch to a terminal**! 

You should have something available like the screen below, a command line interface where you can run text commands.

![terminal](media/terminal.png)

### Verify installation

Make sure conda is installed and running. You should get the number of the Anaconda version installed, like "conda 4.12.0".

```
$ conda --version
```


![conda version](media/conda-version.png)

### Create an environment

The first step is to establish a new environment. Separate environments contain files, packages and their dependencies that don't interact with the other environments.

```
$ conda create --name awesome python=3.8
```

Notice we **fully specified which Python we want** (3.8), despite the fact that **newer alternatives already exist** (3.10). This is much useful if you want to develop something to use in other machines, which are typically older and more stable (production grade) than your laptop used for different experiments.

Let's check everything went well and that our environment is ready to use.

We can list all the environments available and see if our *awesome* environment exists.

`$ conda info --envs`

![](media/conda-envs.png)

Notice we have 3 environments available: the conda default `base`, `awesome` for this specific unit and `data-academy` for the whole set of workshops.

Now that we have our environment installed, we should `activate` it. 

`$ conda activate awesome`

This is essentialy saying to conda *"Jump into my awesome virtual environment so that every action that impacts my environment doesn't affect the rest of the world!"*

# <span style="color:red"> O WHICH PYTHON NAO FUNCIONOU....??</span>

After you activate the environment, you will see the active environment displayed in (parentheses).

Too be extra sure, you can ask `$ which python` you are using. Notice that our python comes from the `awesome` folder under the conda environments folder.

![conda activate](media/conda-activate.png)

Remember that when you are done with your environment (no más!), it's a good practice to `deactivate`.

`$ conda deactivate`

![conda deactivate](media/conda-deactivate.png)

Notice how the (parentheses) disappear, which means my awesome environment stops being active.

### Installing a package

Let's start developing our data application. As any good Data Scientist, we will use `pandas` for data manipulation and `scikit-learn` for general machine learning.

`$ conda install pandas scikit-learn`


# <span style="color:red"> PQQ HEI DE INSTALAR O PANDAS E SCKITLEARN SE JA O INSTALEI QUANDO DEFINI O ENVIRONMENT?....??</span>

![](media/conda-install.png)

After that, we can check if everything went well and import our packages.

![pd version](media/pd-version.png)

### Create from `environment.yml`

An awesome feature of conda is that we can specify everything we need in a single [YAML](https://en.wikipedia.org/wiki/YAML) file, called `environment.yml`. This file is easily shareable and maintainable on the top of the repository (just like the one we are using!) to let everyone know what they'll need to run the application.

It is a good practice to have a single environment file per repository, that contains all the necessary dependencies to run your code. If you have multiple environments, it is also feasible to have a `prod.yml` and a `dev.yml` for the production and development machines, respectively.

```yaml
# environment.yml
name: awesome
dependencies:
  - python=3.8
  - pandas
  - scikit-learn
```

After we specify everything we need, we can create the environment with:

`$ conda env create -f environment.yml`

## More on Conda

### conda base

The default environment when working with anaconda is called `base`, and it works just like any other conda environment. This is the default environment you get when you install conda. 

<div class="alert alert-warning">
    ⚠️ <strong>Warning:</strong> Don't put all your programs in base environment!.
</div>

Think of `base` like your home ground: it's the place you start from, from where you build. Anytime you mess up and need to restart, you return to `base`. **So keep your house clean to guarantee you can return safely.**

Anytime you want to develop, you should move to some other environment (creating a new one if needed).

### conda install vs pip

A lot of Python programmers already used `pip` sometime in their life, the Package Installer for Python (pip). While `pip` and `conda` seem similar, they have key differences that are worth to mention:

1. `conda` is language agnostic, while `pip` supports only Python libraries
2. `conda` installs Python as a package and typically includes `pip` as just another package
3. `pip` installs from [PyPI](https://pypi.org/), `conda` supports multiple channels hosted on [Anaconda](https://anaconda.org/)
4. `conda` allows for portability, reproducibility and consistent configurations with a single `environment.yml`
5. `pip` allows for traceability with a `requirements.txt`

So, it is preferable that we use **`conda` first** for all the Python and non-Python dependencies we need (e.g. `gcc`). But because not all packages may be available in `conda`, `pip` can (but should't!) be used as a last resort. The **workflow** should follow something like this:

1. Create a conda environment
2. `conda install` as many packages you can
3. if package is not found in default conda, search alternative channels (e.g. conda-forge)
4. if you REALLY need, `pip install` the rest of the dependencies
5. Track the dependencies using an `environment.yml` for conda and `requirements.txt` for pip


### Workflow

As you can imagine, coding in Python with other people and without environments is AWESOME! (You got the irony, right?!)

Seasoned Python developers are full with nightmares about broken dependencies, trying stuff locally that breaks in another machine and all those sweet examples that can take you long hours of work (and years of life in stress). There is no need for that, so let's talk about how we can implement simple workflows.

<img src=media/dont-need-envs.png alt="you need environments" width=300>

A first requirement is that **everyone can install conda environments** in their machine or in the machine you want to deploy your application. Conda is far from the perfect manager for production grade, but it's a start.

Now we have the environment manager ready, let's think about how to start developing. There are important questions one should ask theirselves before writing even a line of code.

- Are you going to build on top of code from others? **Ask for the requirements.**
- Do you have any **restrictions** about the machine where you are deploying the code?
- What is the meaning of life? 


If the machine is not allowed to install conda, we can use **conda** instead to **replicate the environment of that machine** and, (fingers crossed) pray that code we develop on that replication environment will also work directly on our target machine.

> `conda` doesn't work only as a creator, but also as a replicator of environments

## Recap
Congratulations! You made it all the way "Python Environments" unit! Right now, you should have a clear idea of:

1. What are **environments**, virtual or real
2. Why **versions** are important
3. How to develop with **`conda`** and its caveats
4. **Collaborating** with others using environments