# Jupyter notebooks tutorial

Hi, I'm a notebook.

Jupyter Notebooks are a fantastic tool to interactively develop and document your data projects. It integrates code, the output of your code and explanatory text - like this one here - into a single document. And being part of the open source project Jupyter: it's free. 

For any questions not covered in this tutorial, you will find the documentation for Jupyter here: https://jupyter.readthedocs.io/en/latest/

## Installation

This tutorial assumes that you managed to install and run Jupyter labs. Here's the short of it.

A typical way to install Jupyter lab, especially on Windows, is to install [Anaconda](https://anaconda.org/). Anaconda is a widely used distribution of Python and includes many, frequently used libraries for you convenience. 

Another way to install Jupyter lab, especially if you are used to managing your own venv (virtual environments) and to install your own packages, is to use pip:

```
pip3 install jupyterlab
```

Depending on your installation method, you run Jupyter lab by clicking on the icon provided by Anaconda or run `jupyter lab` from the console. If it does not open automatically, you can find your Jupyter lab in your web browser using http://localhost:8888/lab.

## Cells

One of the basic concepts of a Jupyter notebooks is cells - the little building blocks of your document. You can add new cells before or after existing ones and even drag around existing cells to reorder them. 

Cells can be either **Code**, **Markdown**, or **Raw** cells. 

* Code - A cell that holds your source code (e.g. Python) and when run adds the output of the code to your document below the cell.
* Markdown - A cell that contains text that is formatted using Markdown and displays its output in-place when it is run. 
* Raw - A cell that is just that, raw text that will not be changed in any way when the cell is run. 

## Wait, run cells?

Use the mouse to double-click on this text. Notice it gets highlighted by a blue bar on the left side and turns into a editor field. In case you have an older version of Jupyter, the blue bar may be a different colour (e.g. green).

You can run the cell by pressing the *Play* icon in the toolbar above or press `Ctrl + Enter`. Since it is a Markdown cell, it will be displayed as nicely formatted text again.

More on running Code cells a little bit further down.

## Markdown - a side note

All the documentation in a notebook is formatted using Markdown. Markdown is a very lightweight way to provide a plain text with formatting. 

If you double-click into any of the cells of this notebook, you will see how the formatting works. So if you want to know how a nice, big headline ends up being a headline, double-click in this cell and note the '##' at the beginning of the line. This is a level 2 headline. 

Go explore, or read the about the formatting options here in [Jupyter notebook documentation](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) or in the original markdown specification here: https://daringfireball.net/projects/markdown/.

## Hello World

Let's start with the classic hello world example. Select the next cell and run it.

In [None]:
print('Hello World!')

That worked nicely. As expected it runs our little Python command and prints 'Hello World!'. 

Notice how Code cells have a little gray label `[ ]:` next to it. 

When you run the cell, it changes to `[1]:` indicating that this is the first cell that has been run in the notebook. Select the above cell and run it again. The label changes to `[2]:`. This is really useful later on, because it tells us what has been executed in which order. 

While a cell is still running, the label is displayed as `[*]`. Let's try that out. Select the next cell and run it. It will not produce any output, but takes 5 seconds to run.

In [None]:
import time
time.sleep(5)

## Where does the output come from? 

Our first example printed "Hello World!" whereas the second example did not print anything or return any value.

In general, the output comes from anything that is specifically printed by your code, as well as the value of the last line in the cell. Let's test that.

In [None]:
x = 'C'
print('A') # specifically printed by the code
print('B')
x          # value of the last line

### Handling large outputs 

Let's make a mess and run the next cell. 

1. Click on the output cell below and the click on the blue bar to the left. This collapses the output and shows three dots (...) instead. Click on the three dots to show the output again. 

2. Right-click on the output cell below and select "Enable Scrolling for outputs". This turns the output cell into a smaller view with a scrollbar on the right hand side. Right-click again and switch to "Disable Scrolling for outputs" to change to the normal behaviour.

In [None]:
for i in range(500):
    print(2**i)

## Keyboard shortcuts

We already used `Ctrl + Enter` to run a cell. Jupyter notebooks offer many other shortcuts to quickly navigate and edit. You can find some of these shortcuts in the menus in your Jupyter Lab, e.g. the Edit menu. Here are some of my favorites.

* `ENTER` change to cell mode to edit the content of a cell
* `ESC` change to command mode which allows many other shortcuts 

In command mode: 

* `A` insert a cell above the currently selected cell
* `B` insert a cell below the currently selected cell
* `Arrow up/down` scroll up or down one cell
* `M` change cell to a Markdown cell
* `Y` change cell to a Code cell
* `R` change cell to a Raw cell
* `D, D` pressing D twice deletes the currently selected cell
* `Z` undo the deletion (or any other cell operation)
* `SHIFT + Arrow up/down` select multiple cells at once
* `SHIFT + M` merge multiple selected cells into one cell
* `CTRL + SHIFT + –` split the currently selected cell into two cells at your cursor

In edit mode:

* `TAB` select one or more lines and indent them
* `SHIFT + TAB` select one or more lines and outdent them

## A small data example

Before we move on to more advanced topics in Jupyter, let's have a short example how you would actually work with data using Python. Let's start by importing a few libraries.

In [None]:
import numpy as np
import pandas as pd

Next, we need some data. For example, by loading it from an Excel file, a csv file or a database. Or in this case by generating a random number distribution. 

In [None]:
df = pd.DataFrame(columns=list('A'))
df['A'] = np.random.randn(1000)
df.head(5)

Here we randomly filled a Pandas dataframe with a column labelled 'A' with a random sample of size 1000 from the standard distribution. We can use Pandas build in functions to calculate some statistics.

In [None]:
df.describe()

A picture says more than thousand words.

In [None]:
df.hist()

There is a detailed example of how to work with Pandas and your data in a separate tutorial. For now the message is that it's easy to read data into your notebook using Pandas, work with
and analyse the data as well as present it using text and diagrams. 

## Kernels

A kernel is the "computational engine" that executes the code contained in your notebook. When you run a code cell, its content is executed by the kernel and the output returned and displayed in your document. The kernel persists as you work in the notebook. That's why when we introduce a variable a few cells above called ```x```, we can use it again. 

In [None]:
print(x)

The kernel also determines which programming language you can use in Code cells. This notebook uses the default [Python](https://python.org) kernel, but kernels supporting [R](https://www.r-project.org) or many [others](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) are available. 

### Order of computation 

Usually, a notebook is read top to bottom. It is however also common to go back and forth while working on your data. That's where the labels next to your Code cells comes in handy. The kernel keeps track of the state of your computations - however it does not care about the actual order in the document, just about the order in which cells are executed. 

Here's an example: Let's say we have 2 variables ```a``` and ```b```.

In [None]:
a = 1
b = a

Now ```a``` and ```b``` should be the same.

In [None]:
print(a==b)

Good, that works as expected. Now let's change ```a```.

In [None]:
a = a + 1

Go back, select the cell with ```print(a==b)``` and run it again. Observe what happens and continue here.

In [None]:
print(a,b)

Looking at the order of how we executed that makes perfect sense. ```a``` and ```b``` start out as 1, thus are equal. We add 1 to ```a```, thus ```a``` is 2 and ```b``` is 1. If we go back and test equality again, it's false.

### Kernel commands

Sometimes, it may be necessary to reset the computations to ensure a clean flow. There are several ways to do so in the top menu bar under "Kernel".

* Restart: restart the kernel and clear the internal state, i.e. all the variables etc. (shortcut: ```0, 0```)
* Restart & Clear Output: same as above, and remove all the output below the cells
* Restart & Run All: same as above, and will run all the cells from top to bottom

If your computations are ever stuck, you can also interrupt the kernel.

* Interrupt: stop the current execution (shortcut: ```I, I```, press i twice)

In [None]:
time.sleep(500)

Do you really want to wait 500 seconds? Interrupt the kernel.

## Getting help

The ```Help``` menu provides useful links to the online documentation of Jupyter, but also to common libraries like NumPy, Pandas or Matplotlib. 

In addition, you can get help a lot of things prepending it with ?, for example on a python method.

In [None]:
?str.lstrip

## Sharing notebooks

Notebooks are self-contained (unless you run/load external scripts or data into it. Thus, the easiest way to share your work is simply using the notebook file (.ipynb) itself. 

For those that don't use Jupyter, you can convert the notebook to a html file using File > Export Notebook As... > Export Notebook to HTML. Anyone with a web browser will be able to look at this file. 

Alternatively you can upload your notebook to a [GitHub](https://github.com) or [GitLab](https://gitlab.com) repository, both of which display notebooks properly. See this example. 

## LaTeX support

LaTeX is often used in technical or scientific documents. It's a way of formatting and typesetting documents. It also is good at displaying mathematical formula. Thanks to [Mathjax](https://mathjax.org), it's possible to use LaTeX in the Markdown cells of your notebook. 

Here's how:

$$P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}$$

Or in a sentence: $c = \sqrt{a^2+b^2}$ ... see?

You can tell Jupyter that a cell contains only latex.

In [None]:
%%latex
\begin{equation*}
    \left(\sum_{k=1}^n a_k b_k \right)^2 
    \leq 
    \left(\sum_{k=1}^n a_k^2 \right) 
    \left(\sum_{k=1}^n b_k^2 \right)
\end{equation*}

Finally, you can use Python to display LaTeX.

In [None]:
from IPython.display import display, Math, Latex

display(Math(r'c = \sqrt{a^2+b^2}'))

## Magic commands

In addition to the full range of Python commands, the Jupyter notebooks provide a set of so-called *magic commands*. There are line magic commands which use a single ```%``` as prefix and work on a single line of input, and cell magic commands which use ```%%``` as prefix and work on the whole cell. Without telling you, we used one of the later earlier in this tutorial (```%%latex```). 

The list of magic commands depends on the Kernel you use for your notebook. Let's see what the Python kernel offers us.

In [None]:
%lsmagic

That's a lot. There is an in-built help system to get a short explanation for a magic command.

In [None]:
?lsmagic

### Matplotlib inline

If you use matplotlib to do your plotting, the following magic command allow you to display the graphs and diagrams directly inline the Jupyter notebook. Just add the following line to your notebook whenever you import matplotlib. 

In [None]:
%matplotlib inline

### Notebook state

There are a few magic commands that allow you to view the computations and the variables of your notebook.

You can look at the history of inputs that have been run in the current session.

In [None]:
%history -l 3

For space reason, we printed the last 3 inputs that have been run. Of course you can print the whole history.

```%history```

You can also add line numbers...

```%history -n```

Let's have a look at the variables that are stored in the session state.

In [None]:
%who

Only show variables of a certain type, e.g. Pandas data frames (which are of type 'DataFrame'):

In [None]:
type(df)

In [None]:
%who DataFrame

Instead of ```%who``` the magic command ```%whos``` works the same, but provides additional information on the variables. Here is an example showing only data frames (for shortness)

In [None]:
%whos DataFrame

Finally, it's possible to get the current variables as a list.

In [None]:
%who_ls

### Timing code execution

Sometimes it's good to know how long code runs. Jupyter Python notebooks provide you with several ways to do so.
%time = will give you information about the time taken in a single run of the code in your cell.
%timeit = uses the Python timeit module which runs a statement 100,000 times and gives the average time taken.


Measure the time the execution of a single command takes.

In [None]:
%time time.sleep(1)

Note that CPU times refers to CPU computational cycles wheres wall time refers the actual time.

Measure how long the execution takes averaged over multiple runs.

In [None]:
%timeit list = [n ** 2 for n in range(1000)]

Note that ```%timeit``` automatically limits the number of runs depending on the execution time of a single run. You can try that out if you change the above command to ```range(10000)```,  re-run the cell and see how the number of iterations/loops changes.  You can also set the number of iterations with a parameter ```-n```.

In [None]:
%timeit -n 100 list = [n ** 2 for n in range(1000)]

Measure the time of execution of the cell takes.

In [None]:
%%timeit
list = []
for n in range(1000):
    list.append(n ** 2)

That's interesting. Functionally the same, list comprehensions in Python are more efficient than for loops.

## Running code from outside the notebook

Let's say you have written a python script called ```helpers.py``` with some reusable functions that you want to use in this notebook. 

In [None]:
from helpers import hello
hello()

Ok, so far so good. If you are working on your python script and in the notebook at the same time, you will notice that changes in the python script are not picked up by the notebook. You would have to restart the notebook every time you change something in the script.

If you want to try that out, open the file ```helpers.py``` and change the greeting from English 'Hello' to the Spanish 'Hola'. Save the file and run the above cell again. Nothing changes.

To change that behaviour, activate auto reloading in this notebook. (Run the next cell)

In [None]:
# load the autoreload extension
%load_ext autoreload
# Set extension to reload modules every time before executing code
%autoreload 2

Next time you change the file ```helpers.py``` and save it, that change will be loaded into the notebook. Change the greeting from the Spanish 'Hola' to the German 'Hallo'. Run the cell with ```hello()``` again and see what happens.

### Loading code into the notebook

As an alternative to running code from outside the notebook, you can also load code into a cell in your notebook. 

Like this:

In [None]:
# %load ./helpers.py
def hello(name='world'):
    """
    Method to print a greeting, defaulting to 'Hello, world!'
    """
    print('Hello, {}!'.format(name))
    
def goodbye():
    """
    Method to print a 'Goodbye!'
    """
    print('Goodbye!')

Running the cell with ```%load``` just exchanged the content of the cell with the content of the script. You now have to run the cell again to actually execute it. Once you have done this, you should be able to run the cell below with the newly defined function ```goodbye()```.

In [None]:
goodbye()

### Writing files out

Just so you heard of it: It is not only possible to load a file as content into a cell, but also possible to write out the content of a cell into a file. Let's pretend we have written a wildly useful function in our notebook that we want to reuse later. One way to do so is to create a new python script and add the function to it manually. Another could be to use ```%%writefile```:  

* ```%%writefile file_name```: write the content of the cell to a file called file_name (will overwrite existing file)
* ```%%writefile -a file_name```: append the content of the cell to the existing file called file_name

In [None]:
%%writefile test.py
def magic_number():
    """
    Method to return the magic number.
    """
    return 42


Change it to ```%%writefile -a```, run the cell again and open test.py to see what happened

## Working with Shell commands

This section assumes that you are on MacOS or Linux, or another Unix like systems, and are somewhat familiar with using a shell. If so, you can access your system's command line from within the notebook.

The magic command is the exclamation mark ```!```. Here are some examples.

In [None]:
!echo "Hello, World!"

Conveniently enough, it's also possible to combine ```!``` shell commands with comments in your source code.

In [None]:
!pwd # what is the current work directory?

In [None]:
!ls # list the files in the current directory

You can save the output of a shell command to a Python list (or to be more precise as type ```IPython.utils.text.SList```).

In [None]:
directory = !pwd
print(directory)

In [None]:
type(directory)

You can pass Python variables into the shell. The curly brackets ```{variable}``` contain the variable name which is replaced with the variable content upon execution of the shell command. An alternative way to pass a variable is to prefix it with ```$``` in the shell command. Both ways are illustrated in the example below.

In [None]:
greeting = "Hello"
target = "Python + Shell"
!echo {greeting}, $target

Environment variables of your notebook and your shell can be read and set as well. 

In [None]:
?env

## Working with pip within the notebook

In [None]:
%pip show jupyter

Running pip from the notebook may be useful if you need to install a certain Python package into your work environment for the Jupyter notebooks. 

Note: If you are using Anaconda, then the usual way to install a new package is the conda package manager.

---

**Copyright 2019 by Daniel Paarmann**

Licensed under the Apache License, Version 2.0 (the "License"); 
you may not use this file except in compliance with the License. 
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, 
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
See the License for the specific language governing permissions and 
limitations under the License.