#Introduction to Scientific Python 
##*Lecture 5: An introduction to Numpy and Scipy*

Luke de Oliveira (lukedeo@stanford.edu)

January 26th, 2016

##Overview
Last time, we talked about classes, OOP, and file read/writes usign pyt

###IPython notebooks
Technically, you'll be able to follow along on your computer today while we learn! If you don't have a working python installation, don't worry! You will still be able to follow along. 

You can download the notebook directory from [this link](https://github.com/icme/cme193/archive/master.zip). Make sure you unzip the zip file! Alternatively, you can *clone* the repository by typing 

```
git clone https://github.com/icme/cme193
```

in your terminal in the folder you want this to live.

Now, go to your command line and `cd` to where this is. From your commandline (after you install ipython via `sudo pip install ipython`) you can now type `ipython notebook`, and your browser will open up with your IPython session!

##Homework

Let's talk about the homework! I hope I can clarify a few things today. There will be office hours tonight for in-depth questions.

Let's say we want to implement a class for rational numbers. Reminder, for $a \in \mathbb{Q}$, $a = \frac{p}{q}$ with $p,q \in \mathbb{Z}$ and $q\neq 0$. What is the minimal specification for this in a class?

* We need a numerator
* We need a denominator
* We need to be able to simplify...
* We need to check our conditions...

In [1]:
class Rational:
    def __init__(self, p, q=1):
        
        assert(q != 0)
        assert(isinstance(p, int))
        assert(isinstance(q, int))
        
        self.p = p
        self.q = q

This is great! But, something is missing -- we still need to simplify! We'll use the Euclidean Algorithm to simplify fractions

In [2]:
# Let's implement the Euclidean Alg.
def gcd(a, b): 
    if b == 0:
        return a 
    else:
        return gcd(b, a%b)

print gcd(35, 49) # 7

7


In [3]:
# lets add this to our function! Lets also add a __str__ method

class Rational(object):
    def __init__(self, p, q=1):
        
        # usually we would use exceptions, but we 
        # haven't learned about them yet!
        assert(q != 0)
        assert(isinstance(p, int))
        assert(isinstance(q, int))
        
        g = gcd(p, q)
        
        self.p = p / g
        self.q = q / g

    def __str__(self):
        return '{} / {}'.format(self.p, self.q)

In [4]:
# Lets see if these behave as we expect...
a = Rational(6, 4)
b = Rational(3, 2)

print 'a = {}'.format(a)
print 'b = {}'.format(b)

a = 3 / 2
b = 3 / 2


This is great! As part of your homework assignment, you'll be extending the `Rational` class to be able to handle operators and other fun stuff!

A weird thing you may have noticed in your homework -- for every operator you implement, for example `__add__`, I'm also asking you to implement this thing called `__radd__`. What is this?

Well, it is R.H.S. addition! Let's say you're a Python interpreter, and you read left to right. Also, let's say you've **already implemented __add__**

```python
>>> a, b = Rational(3, 2), 5
>>> print a.__class__.__name__
Rational
>>> print b.__class__.__name__
int
>>> print a + b
13 / 2
>>> print b + a
TypeError: unsupported operand type(s) for +: 'int' and 'Rational'
```

Why is this the case?

## Numpy

Ok, let's shift gears! Let's talk numpy. 

* Fundamental package for scientific computing with Python
* N-dimensional array object
* Linear algebra, Fourier transform, random number capabilities
* Building block for other packages (e.g. Scipy, scikit-learn)
* Open source, huge dev community!

###Installation
If you installed Python with `anaconda`, you should already have Python installed. To test if you have numpy already, go to your terminal or command prompt and type:

```bash
python -c 'import numpy'
```

If this does nothing, congrats! You gave numpy. If the output looks something like this:

```bash
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named numpy
```

Then you don't...


To install numpy, simply go to your terminal and type 

```bash
pip install numpy
```

You may need to type 

```bash
sudo pip install numpy
```

and type your computer password if you get an error that says "blah blah permission denied blah blah".

###Why numpy?

A very common question people ask is "why can't I just use lists for math?"

Any ideas?

Here are a few reasons why not:

* Real vectors can be big!
* How to handle $n$ dimensions? If we have lists, there is no restriction. 
* How about very sparse data?
* *abstraction*! Something like $A = U\Sigma V^T$ is common enough that we want to encapsulate that.
* Speed

###A quick lesson on `import`ing in Python

There are 3 basic ways to import a package in Python.

* `from numpy import linspace`
* `import numpy as np`
* `import numpy`
    
Lets say you know that numpy has the function `linspace`. Here is how you access that function in each scenario:

* `linspace(...)`
* `np.linspace(...)`
* `numpy.linspace(...)`

Hurray! 

In [13]:
# -- let's jump in! The first thing to do is import numpy.
import numpy as np

# -- lets make a function for seperating output...
def linebreak():
    print '-' * 20

In [14]:
A = np.array([[1, 2, 3], [4, 5, 6]]) 
print A
linebreak()
Af = np.array([[1, 2, 3], [4, 5, 6]], float)
print Af

[[1 2 3]
 [4 5 6]]
--------------------
[[ 1.  2.  3.]
 [ 4.  5.  6.]]


In [17]:
# -- numpy provides many ways to create arrays subject to mathematical constraints
print np.arange(0, 1, 0.2)
linebreak()

print np.linspace(0, 2*np.pi, 4)
linebreak()
# -- a matrix of zeros
A = np.zeros((2,3))
print A
linebreak()
print A.shape ## a tuple!

[ 0.   0.2  0.4  0.6  0.8]
--------------------
[ 0.          2.0943951   4.1887902   6.28318531]
--------------------
[[ 0.  0.  0.]
 [ 0.  0.  0.]]
--------------------
(2, 3)


In [18]:
# -- numpy provides routines for random array creation
print np.random.random((2,3))
linebreak()

a = np.random.normal(loc=1.0, scale=2.0, size=(2,2))
print a
linebreak()

# -- we can serialize!
np.savetxt("a_out.txt", a)
b = np.loadtxt("a_out.txt")
print b

[[ 0.38732245  0.70620747  0.82777124]
 [ 0.28384959  0.8485654   0.30733713]]
--------------------
[[ 0.7516955   1.91584949]
 [-4.68465101  3.33771411]]
--------------------
[[ 0.7516955   1.91584949]
 [-4.68465101  3.33771411]]


In [19]:
# -- numpy arrays are mutable
A = np.zeros((2, 2))
C = A
C[0, 0] = 1

# -- what will this be?
print A 

[[ 1.  0.]
 [ 0.  0.]]


In [21]:
# -- arrays are extremely flexible...
a = np.arange(10)
print 'a =', a

a = a.reshape((2,5))
print '\nafter reshape, a =', a

print '\na.ndim =', a.ndim
print '\na.shape =', a.shape
print '\na.size =', a.size
print '\na.T =', a.T
print '\na.dtype =', a.dtype

a = [0 1 2 3 4 5 6 7 8 9]

after reshape, a = [[0 1 2 3 4]
 [5 6 7 8 9]]

a.ndim = 2

a.shape = (2, 5)

a.size = 10

a.T = [[0 5]
 [1 6]
 [2 7]
 [3 8]
 [4 9]]

a.dtype = int64


In [22]:
# -- just like your rational class, arrays have overloaded math operators
a = np.arange(4)
print 'a = ', a

b = np.array([2, 3, 2, 4])
print 'b =', b

print 'a * b =', a * b 
print 'b - a =', b - a  
c = [2, 3, 4, 5]
print 'c =', c
print 'a * c =', a * c 
# if we want, we can also use +=, -=, *=, etc

a =  [0 1 2 3]
b = [2 3 2 4]
a * b = [ 0  3  4 12]
b - a = [2 2 0 1]
c = [2, 3, 4, 5]
a * c = [ 0  3  8 15]


####Array Broadcasting

When operating on two arrays, numpy compares shapes. Two dimensions are compatible when

* They are of equal size
* One of them is 1

What does this look like in a picture?

![bc](./img/broadcasting.png)

In [25]:
# -- array broadcasting also works with scalars
# This also allows us to add a constant to a 
# matrix or multiply a matrix by a constant

A = np.ones((3,3))
print 3 * A - 1

[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]


In [24]:
# -- numpy gives us vector ops
u = [1, 2, 3]
v = [1, 1, 1]

print 'np.inner(u, v) =', np.inner(u, v)

print 'np.outer(u, v) =\n', np.outer(u, v)

print 'np.dot(u, v) =', np.dot(u, v)


np.inner(u, v) = 6
np.outer(u, v) =
[[1 1 1]
 [2 2 2]
 [3 3 3]]
np.dot(u, v) = 6


####More matrix operations

In [26]:
# first, some matricies
A = np.ones((3, 2))
print 'A.T =\n', A.T
B = np.ones((2, 3))
print 'B =\n', B

A.T =
[[ 1.  1.  1.]
 [ 1.  1.  1.]]
B =
[[ 1.  1.  1.]
 [ 1.  1.  1.]]


In [27]:
# -- are these all valid?

print 'np.dot(A, B) =\n', np.dot(A, B)

print 'np.dot(B, A) =\n', np.dot(B, A)

print 'np.dot(B.T, A.T) =\n', np.dot(B.T, A.T)

print 'np.dot(A, B.T) =\n', np.dot(A, B.T)


np.dot(A, B) =
[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]
np.dot(B, A) =
[[ 3.  3.]
 [ 3.  3.]]
np.dot(B.T, A.T) =
[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]
np.dot(A, B.T) =


ValueError: shapes (3,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)

In [29]:
# -- lets see what operations we can do across the axes of a matrix
a = np.random.random((2,3))
print 'a =\n', a

print '\na.sum() =', a.sum()

print '\na.sum(axis=0) =', a.sum(axis=0)

print '\na.cumsum() =', a.cumsum()

print '\na.cumsum(axis=1) =', a.cumsum(axis=1)

print '\na.min() =', a.min()

print '\na.max(axis=0) =', a.max(axis=0)


a =
[[ 0.83481011  0.79204333  0.81680118]
 [ 0.7732253   0.57648099  0.00180735]]

a.sum() = 3.79516825472

a.sum(axis=0) = [ 1.60803541  1.36852432  0.81860853]

a.cumsum() = [ 0.83481011  1.62685343  2.44365461  3.21687991  3.7933609   3.79516825]

a.cumsum(axis=1) = [[ 0.83481011  1.62685343  2.44365461]
 [ 0.7732253   1.34970629  1.35151365]]

a.min() = 0.00180735257307

a.max(axis=0) = [ 0.83481011  0.79204333  0.81680118]


In [30]:
# -- arrays are like lists, they can be sliced!

a = np.random.random((4,5))
print 'a =\n', a
print '\na[2, :] =', a[2, :]
# third row, all columns
print '\na[1:3] =', a[1:3]
# 2nd, 3rd row, all columns
print '\na[:, 2:4] =', a[:, 2:4]
# all rows, columns 3 and 4

a =
[[ 0.35570616  0.56789007  0.49257881  0.69755026  0.81386473]
 [ 0.07385069  0.7266599   0.00784031  0.76184057  0.73782388]
 [ 0.43143641  0.90700812  0.0404472   0.63831358  0.0964778 ]
 [ 0.27114795  0.13442414  0.61816229  0.78268455  0.36967036]]

a[2, :] = [ 0.43143641  0.90700812  0.0404472   0.63831358  0.0964778 ]

a[1:3] = [[ 0.07385069  0.7266599   0.00784031  0.76184057  0.73782388]
 [ 0.43143641  0.90700812  0.0404472   0.63831358  0.0964778 ]]

a[:, 2:4] = [[ 0.49257881  0.69755026]
 [ 0.00784031  0.76184057]
 [ 0.0404472   0.63831358]
 [ 0.61816229  0.78268455]]


####Iterating
Iterating over multidimensional arrays is done with respect to the first axis: `for row in A`

One can loop over all elements with `for element in A.flat`

####Reshaping

Reshape using `reshape`. Total size must remain the same. For example, `a = np.arange(10).reshape((2,5))`.

####Linear Algebra

Start with `import numpy.linalg as la`

* `la.eye(3)`, Identity matrix
* `la.trace(A)`, Trace
* `la.column_stack((A,B))`, Stack column wise
* `la.row_stack((A,B,A))`, Stack row wise
* `la.qr`, Computes the QR decomposition
* `la.cholesky`, Computes the Cholesky decomposition
* `la.inv(A)`, Inverse
* `la.solve(A,b)`, Solves $Ax = b$ for $A$ full rank
* `la.lstsq(A,b)`, Solves $\arg\min_x \|Ax-b\|_2$
* `la.eig(A)`, Eigenvalue decomposition
* `la.eig(A)`, Eigenvalue decomposition for
symmetric or hermitian
* `la.eigvals(A)`, Computes eigenvalues.
* `la.svd(A, full)`, Singular value decomposition
* `la.pinv(A)`, Computes pseudo-inverse of A

####Random Numbers

Start with `import numpy.random as rng`

* `rng.rand(d0,d1,...,dn)`, Random values in a given shape
* `rng.randn(d0, d1, ...,dn)`, Random standard normal
* `rng.randint(lo, hi, size)`, Random integers `[lo, hi)`
* `rng.choice(a, size, repl, p)`, Sample from a
* `rng.shuffle(a)`, Permutation (in-place)
* `rng.permutation(a)`, Permutation (new array)
* Also, have parameterized distributions: `beta`, `binomial`, `chisquare`, `exponential`, `dirichlet`, `gamma`, `laplace`, `lognormal`, `pareto`, `poisson`, `power`...

###SciPy

SciPy is a library of algorithms and mathematical tools built to work with NumPy arrays.

* linear algebra - `scipy.linalg`
* statistics - `scipy.stats`
* optimization - `scipy.optimize`
* sparse matrices - `scipy.sparse`
* signal processing - `scipy.signal`
* etc.


Slightly different from numpy.linalg., SciPy *always* uses BLAS/LAPACK support, so could be faster in many cases!


####Optimization

* General purpose minimization: CG, BFGS, least-squares
* Constrainted minimization; non-negative least-squares
* Minimize using simulated annealing
* Scalar function minimization
* Root finding
* Check gradient function
* Line search

####Statistics

* Mean, median, mode, variance, kurtosis
* Pearson correlation coefficient
* Hypothesis tests (ttest, Wilcoxon signed-rank test, Kolmogorov-Smirnov)
* Gaussian kernel density estimation

####Matricies (sparse)

* Sparse matrix classes: CSC, CSR, etc.
* Functions to build sparse matrices
* `sparse.linalg` module for sparse linear algebra
* `sparse.csgrap`h for sparse graph routines
* Matlab files
* Matrix Market files (sparse matrices)
* `.wav` files
