# Pandas, Conda and Jupyter

* [Short story of these tools](#Short-story-of-these-tools)
* [How to use them together](#How-to-use-them-together)
* [How to use conda to manage your envs](#How-to-use-conda-to-manage-your-envs)

## Short story of these tools

### [Pandas](pandas.pydata.org)

Pandas is a Python library, started by Wes McKinney around 5 years ago.

The library was specifically made to for with timeseries, mostly because it was made to work with financial data.

Today Pandas is de-facto the standard if you want to deal with Data and Python.

If you have some experience with **R**, Pandas tries to bring the *dataframe* experience in Python.

### [Conda](https://www.conda.org)

Conda is a packet manager develop by Anaconda Inc. (before was Continuum Analitycs).
Conda is the company attempt to resolve some of the issues that you have with the packet managers in Python, specifcally with **pip** (do you know what it PIP?).

It's not limited only to Python and to Python libraries and also comes with a virtual envs manager, similar to the Python virtual envs.

On the Anaconda's website you will find to ways to download `conda`:
- Anaconda
- Miniconda

Anaconda, apart from being also the name of the company right now, is the name of a software distribution from Anaconda Inc.. It contains conda, Python and a huge collection of libraries used with for analytics and research.

Miniconda, on the other side, is just conda with a Python interpreter, much more lean.

Conda is de-facto the standard when you are working with Python and Data, it makes much easier the painful routine of dealing with dependencies, specifically when they are not just related to Python.

### [Jupyter](https://www.jupyter.org)

Jupyter is the new name for what once called IPython Notebook.
The name has 2 purposes:
- underline the strong connection with science and with the first `notebook` wrote by Galileo Galileo in which data and analysis are put together, in the first `notebook` Galileo Galilei was studying Jupiter
- The strong connection with 3 languages, Julia, Python and R

Jupyter today is a big collection of projects, including Jupyter Notebook, JupyterHub, the new JupyterLab and many others.

Jupyter notebook brings an enchanced REPL experience inside your browser, basically you can run Python commands from your browser and Jupyter Notebook will take care of execute them.
Under the hood this process is done by the Jupyter Notebook, a communication protocol and a kernel (generally IPython)

![jupyter architecture](../images/notebook_components.png)

## How to use them together

Let's reformulate this question a little bit: **Why should you use them?**

You can use Python with an editor, install the dependencies using pip, an exe installer or apt.

**And Excel is still widely used....so?**

In [None]:
Well someone wrote something very interesting about the growth of Python:

* [The Incredible Growth of Python](https://stackoverflow.blog/2017/09/06/incredible-growth-python/)
* [Why is Python Growing So Quickly?](https://stackoverflow.blog/2017/09/14/python-growing-quickly/)
* [A Tale of Two Industries: How Programming Languages Differ Between Wealthy and Developing Countries](https://stackoverflow.blog/2017/08/29/tale-two-industries-programming-languages-differ-wealthy-developing-countries/)

The main point is that Python is growing and every day more companies add Python to their stack.

## How to use conda to manage your envs

We want to have a clean environment and we are going to use miniconda for this

```bash
conda --help
```

```bash
# create a new env called `pandas-on-jupyter` based on Python 3
conda create -n pandas-on-jupyter python=3
```

```bash
# to activate the env
source activate pandas-on-jupyter
```

```bash
# to deactivate the env
source deactivate
```

```bash
jupyter notebook
Error executing Jupyter command 'notebook': [Errno 2] No such file or directory
```

```bash
# install jupyter in your current env
conda install -c conda-forge jupyter
```

```bash
# add more packages
conda install -yc conda-forge numpy pandas scikit-learn matplotlib
```

```bash
conda list
```

```bash
pip list
```

```bash
$ conda info
               platform : osx-64
          conda version : 4.3.21
       conda is private : False
      conda-env version : 4.3.21
    conda-build version : not installed
         python version : 3.6.0.final.0
       requests version : 2.12.4
       root environment : /Users/christianbarra/miniconda3  (writable)
    default environment : /Users/christianbarra/miniconda3/envs/pandas-on-jupyter
       envs directories : /Users/christianbarra/miniconda3/envs
                          /Users/christianbarra/.conda/envs
          package cache : /Users/christianbarra/miniconda3/pkgs
                          /Users/christianbarra/.conda/pkgs
           channel URLs : https://repo.continuum.io/pkgs/free/osx-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/r/osx-64
                          https://repo.continuum.io/pkgs/r/noarch
                          https://repo.continuum.io/pkgs/pro/osx-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : None
             netrc file : /Users/christianbarra/.netrc
           offline mode : False
             user-agent : conda/4.3.21 requests/2.12.4 CPython/3.6.0 Darwin/16.7.0 OSX/10.12.6
                UID:GID : 501:20
```

### How to share your env?

With conda is very easy but there is a general pitfall: **they are specific to the platform you use.**

```bash
conda env export
```

```bash
conda list -e > spec-file.txt
cat spec-file.txt
```

```bash
source deactivate
```

We get a list all our envs

```bash
conda env list
```

Delete an env

```bash
conda env remove -n pandas-on-jupyter
```

We recreate our env from zero, using our spec-file.txt

```bash
conda create --name pandas-on-jupyter --file spec-file.txt
```

Ready to move to the next chapter!