# Jupyter Notebooks

## Major features
* Online environment for running snippets
* Combine hypertext, code and charts on the same page
* Perfect for sharing snippets for teach something to others.
* Don't need to have a local environment with all the languages and libs
 
## Language support

Multiple languages are supported through the concept of kernels: interpreters that execute tiny scripts one by one, on demand, while maintaining a runtime environment. Basically a REPL that's called from the web UI. The list currently includes:
* Python
* R
* F# (on Azure Notebooks)a
* Julia/Scala/etc.

## Data Science / Machine Learning

Python and R are also popular for data science and machine learning, so people made sure they integrate well with Jupyter Notebooks. This means that many objects render nicely on the Notebook UI:

* Pandas DataFrames are rendered as tables
* matplotlib charts are rendered as inline pictures

# Scientific Python

For machine learning, 3 types of libraries always pop up:
* Data Analysis: These are libraries that can load data from various sources, do various transformations, and compute basic statistics. Best Python example: [Pandas](http://pandas.pydata.org/)
* Machine Learning: These libraries implement machine learning algorithms. Best Python example: [scikit-learn](http://scikit-learn.org/)
* Charting: Render various graphs and plots of our data. Best Python example: [matplotlib](http://matplotlib.org/)



# Example 1: Titanic dataset

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
# Make charts a bit prettiers
plt.style.use('ggplot')

In [None]:
titanic = pd.read_csv('https://dl.dropboxusercontent.com/u/116126/datasets/titanic/train.csv', sep = ',')

In [None]:
# What are the dimensions
titanic.shape

In [None]:
# What are the column names
titanic.columns

In [None]:
# What do the first few rows look like
titanic.head()

In [None]:
# Let's x cleanup the data a bit
city_names =  {"C": "Cherbourg", "Q": "Queenstown", "S": "Southampton"} 
titanic["EmbarkedCode"] = titanic["Embarked"]
titanic["Embarked"] = titanic["EmbarkedCode"].apply(lambda value: city_names.get(value)) 

In [None]:
# Check if it worked
titanic.head()

In [None]:
#Tell matplotlib to render graphs inside this notebook
%matplotlib inline

In [None]:
# Let's create a contingency table
pd.crosstab(titanic.Pclass, titanic.Survived, margins = True) 

In [None]:
# Let's do the same but as percentages
pd.crosstab(titanic.Pclass, titanic.Survived, margins = True).apply(lambda row: row/len(titanic))

In [None]:
# Let's create a stacked bar chart for sex vs. survivability 
titanic.groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"].plot(kind="bar", stacked=True)


In [None]:
# Do the same graph, but only for people older than 18 years old
titanic[titanic.Age >= 18].groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"].plot(kind="bar", stacked=True)

# Example 2: Video Game Sales

In [None]:
games = pd.read_csv("https://dl.dropboxusercontent.com/u/116126/datasets/videogames/data.csv", sep = ",")

In [None]:
games.head()

In [None]:
by_publisher = games.groupby("Publisher").agg({"NA_Sales": sum, 
                                               "EU_Sales": sum, 
                                               "JP_Sales": sum, 
                                               "Global_Sales": sum, 
                                               "Critic_Score": np.mean}) 
by_publisher.head()

In [None]:
top_publishers = by_publisher.sort_values("Global_Sales", ascending = False)[0:15][["NA_Sales", "EU_Sales", "JP_Sales"]]
top_publishers

In [None]:
top_publishers.plot(kind="bar", figsize=(12,5))

In [None]:
# And again, as a barplot
top_publishers.plot(kind="bar", stacked = True, figsize=(12,5))

# Running Jupyter Notebooks locally with Docker

## Install Docker, either natively or with docker machine

If running Linux, MacOS or Windows 10, you can get Docker native at [docker.com](https://www.docker.com/products/docker).

If you're running Windows 8 then you need [Docker Toolbox](https://www.docker.com/products/docker-toolbox)

## Open a docker console to verify that docker is running

Open Docker Quickstart Terminal and run the following: 

```
$ docker ps
CONTAINER ID        IMAGE           COMMAND          CREATED          STATUS           PORTS            NAMES
```

You're probably seeing an empty list. This is ok, docker is running, you just don't have any container running.


## Choose an image from the Jupyter official Docker
1. Go here: https://hub.docker.com/u/jupyter/
2. Pick one of the images named *-notebook. For example, for python+scikit-learn+matplotlib, pick jupyter/scipy-notebook

## Create a new container

```
$ docker run -p 8888:8888 -v /home/jovyan/work --name jupyternb jupyter/scipy-notebook start-notebook.sh --NotebookApp.token=''
```

A bit about what this does:

* `docker run` is used to run a new container
* `-p 8888:8888` tells docker to map the port 8888 from the container to the host machine (or docker-machine vm)
* `-v /home/jovyan/work` tells docker to create a persistent volume for the directory where the notebooks are stored. Without this, all work will be lost when stopping the docker container.
* `--name jupyternb` specifies the name of the container. Without it, docker will generate a random name
* `jupyter/scipy-notebook` is the name of the image from the docker hub to run
* `start-notebook.sh --NotebookApp.token=''`: totally optional, but this is specific to the Jupyter Notebook Docker image and tells it to disable authentication. Otherwise, you would have to get the initial configuration token from the docker logs.

## Accessing the container

If using docker native, the app will be available at http://localhost:8888. 

If using docker-machine, you'll need to find out its IP first using `docker-machine inspect default | grep IPAddress`, usually 192.168.99.100. The app will be avaialble then at e.g. http://192.168.99.100:8888.

## Starting the container

If the container is stopped (e.g after reboot), it can be started with:

```
$ docker start jupyternb
```

## Good Luck!

You should now be able to upload or create notebooks, as well as datasets that can be loaded from the notebooks.

**Note**: All work will be persisted to the docker volume, but you are encouraged to keep your files separately anwyay. They can be downloaded by choosing File > Download as > Notebook from the menu.