# Improving Your Workflow with Jupyter Notebooks

A talk by **Shahzeb Khan**.

Part of **Dr. Crivelli**'s Lab.

UC Davis Computer Science Student. 

Email: mnkhan@ucdavis.edu

🚨 
Slides and code: **[tiny.cc/jupyter-cori](http://tiny.cc/jupyter-cori)**

## Why Jupyter:

### Jupyter makes your life easier.

#### Code → Git → Your team

#### Jupyter → Git → Your team

1. Faster prototyping
2. More interactive
3. Perfect for documenting code

#### Optimized workflow + Good tooling = Better collaboration and better results!

## Code vs. Jupyter

![code vs jupyter](images/code-and-jupyter.png "Code vs. Jupyter")

#### Two different types of cells:

1. Code cells (execute python)
2. Markdown cells

In [2]:
some_data = [1, 2, 3]
print(some_data)

[1, 2, 3]


I am a markdown cell. I can make my text look ***real cool***. 

In [5]:
%%latex
$y=mx+b$

<IPython.core.display.Latex object>

# Getting setup on CORI

### Goals:

1. Create a Python environment
2. Connect to a Jupyter Notebook
3. Bring in our own packages

### SSH:


```
ssh -L :9998:localhost:9998 -l <username> cori.nersc.gov
```

We will be binding our local port `9998` to the remote port `9998`.

### Get the environment ready:

After SSH-ing into CORI, run the following commands one by one.

```shell
module load python/3.6-anaconda-4.4
conda create --name myenv # Run me to create a new environment
source activate myenv
conda install plotly jupyter 
pip install matplotlib
jupyter notebook --no-browser --port 9998
```

# Our Notebook:

- Genereate some data
- Visualize it
- Interactive visualization
- Tensorflow + Tensorboard
- Show some tips and tricks (`magic`)

## Conclusion:

Jupyter Notebook file ≠ a Python file.

You'll be more productive if you use this tool well.

Some things to keep in mind: 

#### Preamble for dependencies:

In [17]:
# BAD:
import numpy as np
my_data = np.array([1,2,3])
import tensorflow as tf
import numpy.random as rnd
import csv
random_int = rnd.randint(1)

In [16]:
# GOOD:
import numpy as np
import numpy.random as rnd
import tensorflow as tf

# Global variables:
my_data = np.array([1,2,3])
random_int = rnd.randint(1)

#### Define functions:

Don't keep things in the global scope.

In [19]:
# BAD:
with open("snippets/actors.csv") as csvfile:
    reader = csv.reader(csvfile)
    next(reader)
    data = [r for r in reader]

In [21]:
# GOOD:
def load_actors():
    "Loads the actors from the .csv file"
    global data
    
    file_name = "snippets/actors.csv"
    
    with open(file_name) as csvfile:
        reader = csv.reader(csvfile)
        next(reader)
        data = [r for r in reader]

In [24]:
load_actors()
data

[['leo dicaprio', '43'], ['angelina jolie', '43'], ['brad pitt', '54']]

#### Use Pickle:

`pickle` is a great built in package to store large data arrays (or any datatype) to the disk.

In [25]:
import pickle as pkl
def save_pickle_file(name, data):
    "Save out the data as a pickle file"
    with open("output/{}.pkl".format(name), "wb") as pickle_file:
        pkl.dump(data, pickle_file)

def load_pickle_file(name):
    "Get the value from the pickle file and set it as the variable value"
    with open("output/{}.pkl".format(name), "rb") as pickle_file:        
        globals()[name] = np.array(pkl.load(pickle_file))

In [26]:
save_pickle_file("actors_data", data)

In [31]:
load_pickle_file("actors_data")
actors_data

array([['leo dicaprio', '43'],
       ['angelina jolie', '43'],
       ['brad pitt', '54']], dtype='<U14')

## Helpful tools:

1. Use [NBViewer](https://plot.ly/python/ipython-notebook-tutorial/) to view your Jupyter notebooks via Git.
2. Use [Plot.ly](https://plot.ly/python/ipython-notebook-tutorial/) for interactive graphs.
3. Use [Raw Git](http://rawgit.com/) to host .html files.

### Thank you

## 🚨 tiny.cc/jupyter-cori

Any questions?