### Python NumPy

Jay Urbain, PhD

6/26/2018

Numpy forms the basis of Scipy statistical operations, Pandas dataframes, Scikit-learn, and many Deep Learning frameworks.   
- Central object: Numpy array 
- Not like a Java or C++ array 
- Like a vector/matrix 
- Can add, subtract, multiply, etc. 
- Optimized for speed 
- Built in matrix operations: product, inverse, determinant, solving linear systems of equations 

Prerequisite: 
- Vectors and matrices  
- Linear algebra basics 
- Basic Python  

References:  
http://www.numpy.org/  
https://docs.scipy.org/doc/numpy/user/quickstart.html     
https://jakevdp.github.io/PythonDataScienceHandbook/   
http://shop.oreilly.com/product/0636920033400.do  

NumPy and Pandas are fundamental tools in data science for loading, storing, and manipulating in-memory data in Python. 

The topic is very broad: datasets can come from a wide range of sources and a wide range of formats, including collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else. Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

#### Import numpy

In [1]:
import numpy as np
np.__version__

'1.14.3'

#### Getting help¶

? - intro and overview of Pythons features  
?? - source  
%quickref - Quick reference to magic commands    
help - Python's help system  
object? - Details about 'object', use 'object??' or extra information  
help(object) - E.g., help(math.sqrt)

In [2]:
np?

In [3]:
%timeit np.random

64.7 ns ± 6.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [4]:
import numpy as np
help(np.random)

Help on package numpy.random in numpy:

NAME
    numpy.random

DESCRIPTION
    Random Number Generation
    
    Utility functions
    random_sample        Uniformly distributed floats over ``[0, 1)``.
    random               Alias for `random_sample`.
    bytes                Uniformly distributed random bytes.
    random_integers      Uniformly distributed integers in a given range.
    permutation          Randomly permute a sequence / generate a random sequence.
    shuffle              Randomly permute a sequence in place.
    seed                 Seed the random number generator.
    choice               Random sample from 1-D array.
    
    
    Compatibility functions
    rand                 Uniformly distributed values.
    randn                Normally distributed values.
    ranf                 Uniformly distributed floating point numbers.
    randint              Uniformly distributed integers in a given range.
    
    Univariate distributions
    beta                 

#### Basics

Main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called `axes`.

The coordinates of a point in 3D space [1, 2, 1] has one axis. That axis has 3 elements in it, so we say it has a length of 3. In the example  below, the array has 2 axes. The first axis has a length of 2, the second axis has a length of 3.

[[ 1., 0., 0.],
 [ 0., 1., 2.]]

NumPy’s array class is called `ndarray`. Note that numpy.array is not the same as the Standard Python Library class array.array, which only handles one-dimensional arrays and offers less functionality. 

Important attributes:

- ndarray.ndim - the number of axes (dimensions) of the array.  
- ndarray.shape - the dimensions of the array.  
- ndarray.size - the total number of elements of the array.  
- ndarray.dtype - an object describing the type of the elements in the array: E.g., - numpy.int32, numpy.int16, and numpy.float64.  
- ndarray.itemsize - the size in bytes of each element of the array.   
- ndarray.data - the buffer containing the actual elements of the array.   


#### Basic examples

In [5]:
import numpy as np
a = np.arange(15).reshape(3, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [6]:
len(a)

3

In [7]:
a.shape

(3, 5)

In [8]:
len(a[1])

5

In [9]:
a.ndim

2

In [10]:
a.dtype.name

'int64'

In [11]:
# number of bytes per item
a.itemsize

8

In [12]:
a.size

15

In [13]:
type(a)

numpy.ndarray

In [14]:
b = np.array([6, 7, 8])
b

array([6, 7, 8])

In [15]:
type(b)

numpy.ndarray


#### Creating Arrays

Create an array from a Python list or tuple.

In [16]:
mylist = [1, 2, 3]
x = np.array(mylist)
x

array([1, 2, 3])

In [17]:
x.dtype

dtype('int64')

In [18]:
b = np.array([1.2, 3.5, 5.1])
b.dtype

dtype('float64')

Pass in a list of lists to create a multidimensional array.

array transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, and so on.

In [19]:
m = np.array([[7, 8, 9], [10, 11, 12]])
m

array([[ 7,  8,  9],
       [10, 11, 12]])

<br>
Use the shape method to find the dimensions of the array. (rows, columns)

In [20]:
m.shape

(2, 3)

The type of the array can also be explicitly specified at creation time:

In [21]:
c = np.array( [ [1,2], [3,4] ], dtype=complex )
c

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

<br>
`arange` returns evenly spaced values within a given interval.

In [22]:
n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30
n

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

<br>
`reshape` returns an array with the same data with a new shape.

In [23]:
n = n.reshape(3, 5) # reshape array to be 3x5
n

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

<br>
`linspace` returns evenly spaced numbers over a specified interval.

In [24]:
o = np.linspace(0, 4, 9) # return 9 evenly spaced values from 0 to 4
o

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

<br>
`resize` changes the shape and size of array in-place.

In [25]:
o.resize(3, 3)
o

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

#### Lists versus Numpy Arrays

![Array Memory Layout](img/array_vs_list.png)

In [26]:
import numpy as np

# Python list
L = [1,2,3]

# Numpy array
A = np.array([1,2,3])

print("List:")
for i in L:
    print(i)
print(type(L)) 

print("Array:")
for i in A:
    print(i)   
print(type(A)) 

List:
1
2
3
<class 'list'>
Array:
1
2
3
<class 'numpy.ndarray'>


In [27]:
# Append elements to a list
L2 = []
print("List:")
for i in L:
    L2.append(i + i)
    
print("L2: ", L2)

# Adding 2 numpy arrays
print("A+A: ", A+A)

# Elementwise multiplication in numpy arrays
print("2*A: ", 2*A)

# be careful doing this with a list, get concatenation
print("3*L: ", 3*L)

A.min()


List:
L2:  [2, 4, 6]
A+A:  [2 4 6]
2*A:  [2 4 6]
3*L:  [1, 2, 3, 1, 2, 3, 1, 2, 3]


1

In [28]:
# This will cause an ERROR - no numpy append
A.append([1,2,3])

AttributeError: 'numpy.ndarray' object has no attribute 'append'

Appending numpy arrays

In [None]:
np.append(A, np.array([1,2,3]))

In [None]:
# Appending a list to a list
L = L + [4,5]
L

In [None]:
L2 = []
print("L^2:")
for i in L:
    L2.append(i ** 2)
L2

In [None]:
# Elementwise operations on list - no go
L**2

In [None]:
# Elementwise operations on list - go slow
L2=[]
for i in range(len(L)):
    L2.append(L[i]**2)
print(L2)

In [None]:
# Elementwise operations on list 
# List comprehension - you're still going slow, but you're more "Pythonic"
L2= [i**2 for i in L]
print(L2)

#### Question

Write a list comprehension to square each element in the list L.

In [None]:
L = [1, 2, 3, 4, 5]

# Your work here



#### Question

Write a list comprehension to square each element in the following ndarray.

In [None]:
A = np.array([1, 2, 3, 4, 5])

# Your work here
A**2


#### Question

A list comprehension for ndarray is not very useful. Square each element in ndarray using vector method.


In [None]:
A = np.array([1, 2, 3, 4, 5])

# Your work here



#### Performance

Square random number in list vs. ndarray

In [None]:
import numpy as np
from datetime import datetime
import random

# create random list
L = [random.randint(0,100) for i in range(10000)]

random.randint(0,100000)
t0 = datetime.now()
L2 = [i**2 for i in range(len(L))]
dt1 = datetime.now() - t0

A = np.array(L)
t0 = datetime.now()
A2 = A**2
dt2 = datetime.now() - t0

print ("dt1 / dt2:", dt1.total_seconds() / dt2.total_seconds() )


#### Initializing arrays

Often, the elements of an array are originally unknown, but its size is known. 

The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

<br>
`ones` returns a new array of given shape and type, filled with ones.

In [None]:
np.ones((3, 2))

`zeros` returns a new array of given shape and type, filled with zeros.

In [None]:
np.zeros((2, 3))

`empty` creates an array whose initial content is random and depends on the state of the memory

In [None]:
np.empty( (3,5) )     

<br>
`eye` returns a 2-D identity matrix with ones on the diagonal and zeros elsewhere.

In [None]:
np.eye(10)

<br>
`diag` extracts a diagonal or constructs a diagonal array.

In [None]:
y = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(y)
np.diag(y)

<br>
Create an array using repeating list (or see `np.tile`)

In [None]:
np.array([1, 2, 3] * 3)

<br>
Repeat elements of an array using `repeat`.

In [None]:
np.repeat([1, 2, 3], 3)

#### Combining Arrays

In [None]:
a = np.arange(0, 12).reshape((3, 4))
a

<br>
Use `vstack` to stack arrays in sequence vertically (row wise).

In [None]:
np.vstack([a, 2*a])

<br>
Use `hstack` to stack arrays in sequence horizontally (column wise).

In [None]:
np.hstack([a, 2*a])

#### Operations

Use `+`, `-`, `*`, `/` and `**` to perform `element wise` addition, subtraction, multiplication, division and power.

In [None]:
x = np.array([1,2,3])
y = np.array([4,5,6])

print(x + y) # elementwise addition     [1 2 3] + [4 5 6] = [5  7  9]
print(x - y) # elementwise subtraction  [1 2 3] - [4 5 6] = [-3 -3 -3]

In [None]:
print(x * y) # elementwise multiplication  [1 2 3] * [4 5 6] = [4  10  18]
print(x / y) # elementwise divison         [1 2 3] / [4 5 6] = [0.25  0.4  0.5]

In [None]:
print(x**2) # elementwise power  [1 2 3] ^2 =  [1 4 9]

<br>
**Dot Product:**  

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [None]:
x.dot(y) # dot product  1*4 + 2*5 + 3*6

Transposing arrays. Transposing permutes the dimensions of the array.

#### Speed comparison

dt1 = time using slow loop-based dot product  
dt2 = time using fast vector-based dot product

In [None]:
import numpy as np
from datetime import datetime

a = np.random.randn(10000)
b = np.random.randn(10000)

t0 = datetime.now()
result = 0
for e, f in zip(a, b):
    result += e*f
dt1 = datetime.now() - t0

t0 = datetime.now()
a.dot(b)
dt2 = datetime.now() - t0

print ("dt1 / dt2:", dt1.total_seconds() / dt2.total_seconds() )

#### Some matrix ops


The shape of array `z` is `(2,3)` before transposing.

In [None]:
z = np.arange(6).reshape((2,3))
z

In [None]:
z.shape

<br>
Use `.T` to get the transpose.

In [None]:
z.T

<br>
The number of rows has swapped with the number of columns.

In [None]:
z.T.shape

<br>
Use `.dtype` to see the data type of the elements in the array.

In [None]:
z.dtype

<br>
Use `.astype` to cast to a specific type.

In [None]:
z = z.astype('f')
z.dtype

In [None]:
z = z.astype(np.float32)
z.dtype


#### Math Functions

Numpy has many built in math functions that can be performed on arrays.

In [None]:
a = np.array([-4, -2, 1, 3, 5])
a

In [None]:
a.sum()

In [None]:
a.max()

In [None]:
a.min()

In [None]:
a.mean()

In [None]:
a.std()

<br>
`argmax` and `argmin` return the index of the maximum and minimum values in the array.

In [None]:
a.argmax()

In [None]:
a.argmin()

#### Indexing / Slicing

In [None]:
s = np.arange(13)**2
s

Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.

In [None]:
s[0], s[4], s[-1]

<br>
Use `:` to indicate a range. `array[start:stop]`


Leaving `start` or `stop` empty will default to the beginning/end of the array.

In [None]:
s[1:5]

Use negatives to count from the back.

In [None]:
s[-4:]

A second `:` can be used to indicate step-size. `array[start:stop:stepsize]`

Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached.

In [None]:
s[-5::-2]

Let's look at a multidimensional array.

In [None]:
r = np.arange(36)
r.resize((6, 6))
r

Use bracket notation to slice: `array[row, column]`

In [None]:
r[2, 2]

And use : to select a range of rows or columns

In [None]:
r[3, 3:6]

Select all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.

In [None]:
r[:2, :-1]

This is a slice of the last row, and only every other element.

In [None]:
r[-1, ::2]

We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see `np.where`)

In [None]:
r[r > 30]

In [None]:
r > 30

Here we are assigning all values in the array that are greater than 30 to the value of 30.

In [None]:
r[r > 30] = 30
r

#### Copying Data

Be careful with copying and modifying arrays in NumPy!


`r2` is a slice of `r`

In [None]:
r

In [None]:
r2 = r[:3,:3]
r2

Set this slice's values to zero ([:] selects the entire array)

In [None]:
r2[:] = 0
r2

`r` has also been changed!

In [None]:
r

To avoid this, use `r.copy` to create a copy that will not affect the original array

In [None]:
r_copy = r.copy()
r_copy

Now when r_copy is modified, r will not be changed.

In [None]:
r_copy[:] = 10
print(r_copy, '\n')
print(r)

#### Iterating Over Arrays

Let's create a new 4 by 3 array of random numbers 0-9.

In [None]:
test = np.random.randint(0, 10, (4,3))
test

Iterate by row:

In [None]:
for row in test:
    print(row)

Iterate by index:

In [None]:
for i in range(len(test)):
    print(test[i])

Iterate by row and index:

In [None]:
for i, row in enumerate(test):
    print('row', i, 'is', row)

Use `zip` to iterate over multiple iterables.

In [None]:
test2 = test**2
test2

In [None]:
for i, j in zip(test, test2):
    print(i,'+',j,'=',i+j)

#### More matrix operations


In [None]:
A = np.array([[1,2],[3,4]])
A

In [None]:
Ainv = np.linalg.inv(A)
Ainv

In [None]:
Ainv.dot(A)

In [None]:
A.dot(Ainv)

In [None]:
np.diag(A)

#### Outer Product and Inner Product

An outer product is the tensor product of two vectors, a special case of the Kronecker product of matrices. The outer product of two coordinate vectors ${\displaystyle \mathbf {u} }$  and ${\displaystyle \mathbf {v} }$ , denoted ${\displaystyle \mathbf {u} \otimes \mathbf {v} }$, is a matrix ${\displaystyle \mathbf {w} }$  such that ${\displaystyle \mathbf {w} _{ij}=\mathbf {u} _{i}\mathbf {v} _{j}}$. The outer product for general tensors is also called the tensor product.

<img src="img/outerproduct.svg">

Shows up with calculation of co-variance.

$\Sigma= E[(x-\mu)(x-\mu)^T] \approx\dfrac{1}{N-1}\sum_{n=1}^{N}(x_n-\bar{x})(x_n-\bar{x})^T$

Note: Inner product is same as dot product.

$C=\sum_i (A_i*B_i)$

In [None]:
a = np.array([1,2])
a

In [None]:
b = np.array([3,4])
b

In [None]:
np.outer(a,b)

In [None]:
np.inner(a,b)

In [None]:
a.dot(b)

In [None]:
# Return the sum along diagonals of the array.
np.diag(A).sum()

In [None]:
# Return the sum along diagonals of the array.
np.trace(A)

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, $X = [x_1, x_2, ... x_N]^T$, then the covariance matrix element $C_{ij}$ is the covariance of $x_i$ and $x_j$. The element  $C_{ii}$ is the variance of $x_i$.

$cov(X,Y) = E[X=E[X])(Y-E[Y])$


In [None]:
X = np.random.randn(10,10)
cov = np.cov(X)
cov

#### Eigenvalues, eigenvectors

Eigenvalues, eigenvectors = np.eigh(M)

eigh is for symmetric and Hermitian matrices

Symmetric means $A=A^T$

Hermitian means $A=A^H$ (a complex square matrix that is equal to its own conjugate transpose)

$A^H$ = conjugate transpose of $A$

In [None]:
np.linalg.eigh(cov)

#### Solving a liearn system

Problem: $Ax = b$

Solution: $A^{-1}Ax=x=A^{-1}b$

System of $D$ equations and $D$ unknowns.

$A$ is $Dx$, assume it is invertible. We have everything we need to solve this problem.  
- Matrix inverse  
- Matrix multiple (dot)

In [None]:
A

In [None]:
b=np.array([1,2])
b

In [None]:
x=np.linalg.inv(A).dot(b)
x

In [None]:
# more efficient and more accurate to use solve()
x=np.linalg.solve(A, b)
x

#### Question:

The admission fee at a small fair is \$1.50 for children and \$4.00 for adults. On a certain day, 2200 people enter the fair and $5050 is collected. How many children and how many adults attended?

Let:  
$x_1$=number of children, $x_2$= number of adults.  
$x_1$ + $x_2 = 2200$  
$1.5 x_1 + 4 x_2 = 5050$  

In [None]:
A = np.array([[1,1],[1.5,4]])
A

In [None]:
b= np.array([2200, 5050])
b

In [None]:
# Your work here




#### Loading data

In [None]:
import numpy as np

X = []
for line in open("data/data_2d.csv"):
    row = line.split(',')
    #sample = map(float, row)
    X.append(row)
X    