<img src="imgs/header.png" width="100%">

---------------

 


# Jupyter Quickstart
This tutorial very briefly introduces the Jupyter Notebook environment. The most important things to know:
* code cells are grey and you can select them by clicking on them
* pressing SHIFT-ENTER runs the selected code in that cell.

### Cells

The notebook is made up of **cells**. A cell is a snippet of code or text. You can break up code into cells as you wish. Each cell with code in it can be run on its own.

* Click on a cell to highlight it (you will see a highlight box appear around it)
* Press SHIFT-ENTER to run the code (hold SHIFT, then press ENTER).
<br><br>

<font color="green"> **1) Try this on the cell below:** </font>


In [1]:
print "Hello, world"

Hello, world


After selecting the cell above and pressing SHIFT-ENTER. you should see:

    In [1]: print "Hello, world"
    Hello, world

The line underneath is the **output** of the code you ran. All the code in the cell is run when you press SHIFT-ENTER. 

---
### Editing
You can edit a code cell just by clicking on it, and editing the text. Press SHIFT-ENTER to run it again. 

<font color="green"> **2) Edit the code in the cell above to print out a different message, then run it again.** </font> 

### Text cells
The instructions you are reading are also in cells, but these are **text cells**.
If you *double*-click on these text cells you can edit them as well. Press SHIFT-ENTER to "format" the cell.

The text inside these boxes is [Markdown](http://daringfireball.net/projects/markdown/syntax). 

Jupyter will also recognise LaTeX formulae if you surround them with dollar symbols 
`$ x = x + 1 $` $ x = x + 1 $
And you can make display equations using double dollar signs:
`$$ \sum_{i=0}^{N} i^{\alpha_i} + \beta_i $$` becomes
$$ \sum_{i=0}^{N} i^{\alpha_i} + \beta_i $$

<font color="green"> **3) Double-click on this cell. It will turn gray and the font will change. Edit this text to say something different. Then press SHIFT-ENTER to format it.** </font>

### Navigation and editing

* Editing *inside* cells works like a normal text form on a web page. 

<font color="green">
**4) Try typing `print "Hello Jupyter"` in the box below, then pressing SHIFT-ENTER to run it.** </font>

### Copy and paste

* You can cut, copy and paste entire cells using the toolbar or menu, or use `CTRL-X` (cut), `CTRL-C` (copy) and `CTRL-V` (paste). 

<font color="green"> **5) Try copying the cell below that says "copy me!" and paste it twice below.** </font>

<h1> Copy me! </h1>

### Adding new cells
* You can insert new cells (e.g. to write code in) by using `Insert/Cell Above` or `Insert/Cell Below` from the menu. Or use the + button on the toolbar. 

<font color="green"> **6) Insert a cell below, and enter `print "hello from a new cell"`** </font>

Sometimes you want to add a text cell, to write notes, for example. To do this, add a cell as before, *then* go to the `Cell` menu and select `Cell Type/Markdown`. To change the cell type back to code, go to `Cell/Cell Type/Code`.

**<font color="green"> 7) Insert a text cell below. Put the text `# Title` in the cell. Notice that when you press SHIFT-ENTER the text will become a title line. This is caused by the leading # symbol. </font>**

### Saving

* Remember to **save** regularly! Press `CTRL-S` to save, or use the menu option `File/Save and Checkpoint`

### Making copies of notebooks

* You can make a copy of your work using `File/Make a Copy` (e.g. if you want to change something but have a "good" version to go back to). 

<font color="green">  **9) Make a copy of this notebook. You can change the copies name using the menu option `File/Rename...` Save. ** </font>

### Undo

* You can undo typing inside cells (`Edit/Undo` or `CTRL-Z`) but be aware that undo works independently inside each cell (i.e. each cell maintains its own undo history).


-----
### Stopping and restarting
You can make Python run for a very long time, for example if you create an infinite loop. No other cells can run while we are waiting.

If this happens, you will see an asterisk `In [*]` beside the cell that is running. The circle next to `Python 2` at the very top left of the screen will be filled while the process is busy. To stop it, click on the cell and press the stop button on the toolbar. 

<font color="green"> **10) Run the cell below. It will run forever. Look for the asterisk, and then stop the cell.** </font>

In [2]:
# infinite loop
while 1:
    pass

KeyboardInterrupt: 

### Variables
When you run a cell, it changes the Python state. For example if you assign a value to a variable, the next cell you run will know about that value. 

<font color="green"> **11) There are two cells below. Run them both. **</font>

In [None]:
# greeting 1
greeting = "Hello, world"

In [None]:
# print greeting
# Python remembers the value of greeting from the last run cell.
print greeting

<font color="green"> **12) Now run the cell below (`#greeting 2`), then run the `print greeting` cell again. Notice that the output changes. Try running the first greeting cell again. ** </font>

Python runs each cell when you tell it, and remembers the values you set. The order the cells appear in the notebook isn't important -- it is the order in which you **run** the cells.

In [None]:
# greeting 2
greeting = "Is this thing on?"

Occasionally you might want to reset the Python state back to what it was before any code ran -- to forget any variables you have stored. 

<br><font color="green"> **12) Press the restart button on the toolbar, or go to the menu option `Kernel/Restart`. The contents of the notebook will not change. Now try running the `print greeting` cell above. You will get an error, because Python has forgotten the value. Run the `greeting 1` cell and then run the `print greeting cell`**. </font>


## A little more advanced
Let's compute the length of a path in the [Collatz](http://en.wikipedia.org/wiki/Collatz_conjecture) sequence. That is, we take a number $n$ and halve it if $n \equiv 0 \mod 2$, otherwise replace it with $3n+1$. We count the number of steps until we reach 1. 

Remember that to execute the cell, and thus to define the function, select the cell and press SHIFT-ENTER.

In [5]:
def collatz(n, steps=0):
    """Compute the number of steps to reach 1 in the Collatz sequence.
    Note the use of triple quotes to specify a docstring.
    
    Also note the use of a default parameter (steps) to count the number 
    of recursive calls."""
    if n==1:
        return steps
    if n%2==0:
        return collatz(n/2, steps+1)
    else:
        return collatz(3*n+1, steps+1)

Now try it in the cell below (e.g. entering *collatz(331)* and pressing SHIFT-ENTER should print 24).

In [6]:
collatz(331)

24

OK, let's plot the graph of this function for various $n$. We'll use numpy to manipulate vectors of values, and matplotlib to plot the graph. First we must import them. In future workbooks this will already be done at the start, but we do it explicitly here for clarity.

In [7]:
import numpy as np # np is the conventional short name for numpy
import matplotlib.pyplot as plt # and plt is the conventional name for matplotlib
import seaborn # all this does (in this case) is restyle matplotlib to use better layouts

Now we create an array of integers `1:n` and plot it. Note the use of `arange` to create an array of integers, and the list comprehension `[collatz(n) for n in ns]`, which applies the `collatz` function to each element of `ns`.


In [8]:
ns = np.arange(1,500)   # generate n = [1,2,3,4,...]
collatzed = np.array([collatz(n) for n in ns]) # apply collatz(n) to each value and put it in a numpy array
plt.plot(ns, collatzed)  # plot the result

[<matplotlib.lines.Line2D at 0xbebf4a8>]

If you hit SHIFT-ENTER on the above, you should see a plot. 

It looks pretty noisy; perhaps there is some periodic structure. We can use the FFT to look at this. np.fft.fft() computes the [http://en.wikipedia.org/wiki/Fourier_transform](Fourier transform); we can compute the magnitude spectrum $|f(x)|$ by taking the absolute value, and discarding the symmetric half:

In [None]:
fftd = np.fft.fft(collatzed)
# take absolute value
real_magnitude = np.abs(fftd)
# trim off symmetric part (note the slice syntax)
real_magnitude = real_magnitude[1:len(fftd)/2] # note that we drop the 0th element (DC)
fig = plt.figure() # make a new figure
ax = fig.add_subplot(111) # this just creates a new single blank axis
ax.plot(real_magnitude)   # and plots onto it (we can create multi-panel plots using add_subplot)

That looks pretty unstructured. Let's try more numbers, and make the fft plot a function we can reuse later.

In [None]:
ns = np.arange(1,10000)
collatzed = [collatz(n) for n in ns]


In [None]:
def fft_plot(x):
    """Plot the magnitude spectrum of x, showing only the real, positive-frequency 
    portion, and excluding component 0 (DC). """
    fftd = np.fft.fft(x)
    # get absolute (magnitude spectrum)
    real_magnitude = np.abs(fftd)
    # chop off symmetric part
    real_magnitude = real_magnitude[1:len(fftd)/2] 
    fig = plt.figure() # make a new figure
    ax = fig.add_subplot(111)
    ax.plot(real_magnitude)

In [None]:
fft_plot(collatzed)

Some interesting structure, with big spikes, but the frequency approach isn't revealing much. Let's investigate the distribution of the variable. We can get a histogram with `plt.hist()`.

In [None]:
# normed forces the frequency axis to sum to 1
plt.hist(collatzed, bins=50, normed=True);

OK, this is more interesting. Let's show a normal fit to the distribution, using maximum likelihood estimation. `scipy.stats` has the tools we need to do this.

In [9]:
mean, std = np.mean(collatzed), np.std(collatzed)
import scipy.stats as stats # we must import scipy.stats, as we've not used it yet

# np.linspace() linearly spaces points on a range: here 200 points spanning the distribution
pdf_range = np.linspace(np.min(collatzed), np.max(collatzed), 200)

# scipy.stats has many distribution functions, including normal (norm)
pdf = stats.norm.pdf(pdf_range, mean, std)
plt.hist(collatzed, bins=50, normed=True)
plt.plot(pdf_range, pdf, 'g', linewidth=3) # plot using thick green line 

[<matplotlib.lines.Line2D at 0xac49278>]

Obviously, this distribution is non-Gaussian, but let's test to make sure. `scipy.stats` provides many statistical tools, including normality testing. `scipy.stats.normaltest` gives us a combination of D’Agostino and Pearson’s test.

Note: to see the documentation for `normaltest`, try clicking at the end of `normaltest` and press SHIFT-TAB to see the tooltip. Hit the ^ symbol to bring up the full help in a pane below. This works for any function.


In [None]:
import scipy.stats as stats # we must import scipy.stats as we've not used it yet
k2, p = stats.normaltest(collatzed)
print p # p-value, testing if the distribution differs from the normal. p<0.05 suggests it is

We can safely assume this distribution is non-Gaussian. 

As an additional measure, we can plot a [Q-Q plot](http://en.wikipedia.org/wiki/Q%E2%80%93Q_plot), showing the quantiles of the `collatzed` distribution against the quantiles of a normal distribution.  If the distribution is normal, the plot would be show as a straight line.

The Q-Q plot is a very useful way of eyeballing distribution fits.

`scipy.stats.probplot()` does the job easily:

In [None]:
plt.figure() # don't plot on the same axis as the previous plot
qq = stats.probplot(collatzed, dist="norm", plot=plt) # note the use of "norm" to specify the test distribution


The "wobbliness" of the line indicates that this is not a good distribution match.