# The Scientific Python Ecosystem

The Scientific Python Ecosystem is made up of a robust collection of packages that provide functionality for everything from simple numeric arrays to sophisticated machine learning algorithms. In this notebook, we'll introduce the core scientific python packages and some important terminology.

![](images/stack.png)

### Outline
- Python
- Numpy
- Scipy
- Pandas

### Tutorial Duriation
10 minutes

### Going Further

This notebook is just meant to make sure we all have the same base terminology before moving on to the fun `xarray` and `dask`. If you are new to Python or just want to brush up, you may be interested in the following online resources:

- Scientific Python Lectures: http://scipy-lectures.org/
- Numpy Tutorial: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html
- Scipy Tutorial: https://docs.scipy.org/doc/scipy/reference/tutorial/index.html
- Pandas Tutorials: https://pandas.pydata.org/pandas-docs/stable/tutorials.html

## Python built-ins

In [1]:
# data types
x = 4
type(x)

int

In [2]:
pi = 3.14
type(pi)

float

In [3]:
name = 'my string'
type(name)

str

In [4]:
# data structures / objects

my_list = [2, 4, 10]  # a list

my_list[2]  # access by position

10

In [5]:
my_dict = {'pi': 3.14, 'd': 4}  # a dictionary


my_dict['pi']  # access by key

3.14

## Numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

Numpy Documentation: https://docs.scipy.org/doc/numpy/

In [6]:
import numpy as np

In [7]:
x = np.zeros(shape=(4, 5))
x

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [8]:
y = x + 4
y

array([[4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4.],
       [4., 4., 4., 4., 4.]])

In [9]:
# random numbers
z = np.random.random(x.shape)
z

array([[0.17670524, 0.45706153, 0.37068002, 0.60829251, 0.89041037],
       [0.51646878, 0.61218885, 0.39475621, 0.12381254, 0.58774489],
       [0.76973108, 0.83818912, 0.97771392, 0.56292154, 0.79302364],
       [0.37730033, 0.91355679, 0.29141734, 0.85639295, 0.03420799]])

In [10]:
# aggregations
z_sum = z.sum(axis=1)
z_sum

array([2.50314967, 2.23497127, 3.94157929, 2.4728754 ])

In [11]:
# broadcasting
y.transpose() * z_sum

array([[10.01259869,  8.93988507, 15.76631717,  9.8915016 ],
       [10.01259869,  8.93988507, 15.76631717,  9.8915016 ],
       [10.01259869,  8.93988507, 15.76631717,  9.8915016 ],
       [10.01259869,  8.93988507, 15.76631717,  9.8915016 ],
       [10.01259869,  8.93988507, 15.76631717,  9.8915016 ]])

In [12]:
# slicing
z[2:4, ::2]  # 2-4 on the first axis, stride of 2 on the second

array([[0.76973108, 0.97771392, 0.79302364],
       [0.37730033, 0.29141734, 0.03420799]])

In [None]:
# data types

xi = np.array([1, 2, 3], dtype=np.int)  # integer
xi.dtype

In [None]:
xf = np.array([1, 2, 3], dtype=np.float)  # float
xf.dtype

In [None]:
# universal functions (ufuncs, e.g. sin, cos, exp, etc)
np.sin(z_sum)


### SciPy

SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. SciPy includes a number of subpackages covering different scientific computing domains:

| Subpackage | Description|
| ------| ------|
| cluster |	Clustering algorithms|
| constants |	Physical and mathematical constants|
| fftpack |	Fast Fourier Transform routines|
| integrate |	Integration and ordinary differential equation solvers|
| interpolate |	Interpolation and smoothing splines|
| io |	Input and Output|
| linalg |	Linear algebra|
| ndimage |	N-dimensional image processing|
| odr |	Orthogonal distance regression|
| optimize |	Optimization and root-finding routines|
| signal |	Signal processing|
| sparse |	Sparse matrices and associated routines|
| spatial |	Spatial data structures and algorithms|
| special |	Special functions|
| stats |	Statistical distributions and functions

Because SciPy is built directly on Numpy, we'll skip any examples for now. The SciPy API is well documented with examples how to use specific subpackages.

SciPy Documentation: 

### Pandas

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

Pandas Documentation: http://pandas.pydata.org/pandas-docs/stable/

In [None]:
import pandas as pd

In [None]:
# This data can also be loaded from the statsmodels package
# import statsmodels as sm
# co2 = sm.datasets.co2.load_pandas().data 

co2 = pd.read_csv('./data/co2.csv', index_col=0, parse_dates=True)

In [None]:
# co2 is a pandas.DataFrame
co2.head()  # head just prints out the first few rows

In [None]:
# The pandas DataFrame is made up of an index
co2.index

In [None]:
# and 0 or more columns (in this case just 1 - co2)
# Each column is a pandas.Series
co2['co2'].head()  


In [None]:
# label based slicing
co2['1990-01-01': '1990-02-14']

In [None]:
# aggregations just like in numpy
co2.mean(axis=0)

In [None]:
# advanced grouping/resampling

# here we'll calculate the annual average timeseris of co2 concentraions
co2_as = co2.resample('AS').mean()  # AS is for the start of each year

co2_as.head()

In [None]:
# we can also quickly calculate the monthly climatology

co2_climatology = co2.groupby(co2.index.month).mean()
co2_climatology

In [None]:
%matplotlib inline

# and even plot that using pandas and matplotlib
co2_climatology.plot()