<p style="float:right">
<img src="images/logos/cu.png" style="display:inline" />
<img src="images/logos/cires.png" style="display:inline" />
<img src="images/logos/nasa.png" style="display:inline" />
<img src="images/logos/nsidc_daac.png" style="display:inline" />
</p>

# Python, Jupyter & pandas: Module 1

## Introduction and background

### Jupyter

- Jupyter evolved from the IPython project
    - Extends IPython to support "over 40" programming languages -- not just Python
    - Notebooks provide an interactive Python sessions with facilities for code editing, text display and data visualization
      - Similar to Mathematica (or Matlab's Live Editor) notebooks
      - Web browser interface (or console, emacs integration, etc.)
- Browser environment
    - Home tab
        - _Files_
        - _Running_
        - _Clusters_
    - Notebook tab
        - Cells
            - **Shift + Return to run a cell, or use "play" button in toolbar.**
            - _Cell_ menu: Run all, all above, etc.
              - Note that evaluation will stop on any errros!
            - Cut/copy/paste/move cells
            - When to create new cells
            - Clearing output: _Cell_ > _Current Outputs_ > _Clear_ vs _Cell_ > _All Output_
            - Restarting kernel
              - Can also shut down kernels from the _Running_ menu on the home tab.
            - Keyboard shortcuts
        - Text formatting with Markdown
            - You're looking at it!
            - Double-click a rendered Markdown cell to see the source
        - LaTeX support
            - Use in Markdown cells, e.g.:
$$e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$$

#### Magics

The `lsmagic` magic shows all the line and cell magics available:

In [None]:
%lsmagic

The `ls` magic, for example, shows the contents of the current directory, just like the shell's `ls` command.

In [None]:
%ls

The `timeit` line magic executes its argument (a Python statement) three times, and provides the best of the three execution times. (Notice the asterisk in `In [*]` when a cell is busy running. It does not make sense to run another cell whose output depends on a busy cell.)

In [None]:
import time
%timeit time.sleep(1)

The `time` cell magic executes all the Python statements in the cell once and reports the execution time for the entire cell.

In [None]:
%%time
time.sleep(1)
time.sleep(2)

There are cell magics to execute cells in several non-Python languages -- for example, in Ruby...

In [None]:
%%ruby
3.times { puts 'hello' }

...or in bash

In [None]:
%%bash
whoami

### Python

- First released in 1991 and actively developed since then
- Extremely successful and popular, obviously
- But, as an interpreted language, relatively slow vs e.g. C or Fortran
- Also, Python's `List` object (similar to a vector) can be awkward in numerical contexts:

In [None]:
v = [1.0, 2.0, 3.0, 4.0]
v * 3

### NumPy

- Python's applicability to problems in science was bolstered by NumPy
    - Released in 2006
    - Technically part of the larger SciPy ecosystem, but can be installed independently

In [None]:
import numpy as np

- NumPy provides support for large, multidimensional arrays / matrices with natural semantics...

_**Note:** In Python, calling `print()` on an object gets its string value; Jupyter (like the Pyhton REPL) will automatically print the string version of whatever expression occurs on the last line of a cell. But this **only** works on the last line of a cell; elsewhere, we must use `print`._

In [None]:
a = np.array([1.0, 2.0, 3.0, 4.0])
a * 3

... and functions useful for working with such objects...

In [None]:
reshaped = a.reshape(2, 2)
reshaped

...and taking slices:

In [None]:
a[1:3] # lower bound is inclusive, upper is exclusive

We can use Python's `type()` function to look at the type of any object:

In [None]:
type(a)

- `ndarray` has [lots of powerful functions](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.ndarray.html) and NumPy as a whole provides a [wide range of numerical routines](https://docs.scipy.org/doc/numpy/reference/routines.html) in areas such as linear algebra, finance, logic, trigonometry, etc. built to work on NumPy data structures.

- In addition to functionality, NumPy provides better performance by
    - Expressing `ndarray` as a type-homogenous, densely-packed memory representation vs `List`'s dynamic arrays
    - Implementing underlying routines in C or Fortran, with convenient Python wrappers
    - Reusing well-tuned libraries like BLAS for linear algebra

### pandas

- Pandas builds on NumPy and adds higher-level data-manipulation capabilities.

In [None]:
import pandas as pd

As `ndarray` is NumPy's essential data structure, [`Series`](http://pandas.pydata.org/pandas-docs/stable/api.html#series) and [`DataFrame`](http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe) are pandas'.

`Series`: _"One-dimensional ndarray with axis labels (including time series)."_

In [None]:
series_a = pd.Series([11, 10.5, 13.2, 12, 10, 9.3, 10.1], name='elapsed')
print(series_a)
series_b = pd.Series([115, 100, 125, np.nan, 103, 83, 102], name='counts')
print(series_b)

Series often ultimately represent columns (or rows) in a data frame, and have indices (the position of the element in the series, by default) that are used when concatenating multiple series into a data frame.

`DataFrame`: _"Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns)."_

In [None]:
df = pd.concat([series_a, series_b], axis=1)
df

We can filter our data by selecting only rows that match some criteria:

In [None]:
df[df['elapsed'] >= 10]

Or remove bad (`NaN` = "Not a Number") data:

In [None]:
df.dropna()

pandas has lots of support for dates and periods.  Here's a simple example of creating a date range.

In [None]:
dates = pd.date_range(start='1999-01-01', periods=15, freq='D' )
dates

A great beginniner tutorial for pandas is [10 minutes to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html).

In Module 2, we'll obtain data over the internet from an OpenDAP server and do some basic data inspection.