# Numpy Array for Vectorize Operation

## Preparation

- Bring your laptop
- Download and Install [Anaconda](https://www.continuum.io/downloads) (Python 3.5 version)
- Open Jupyter Notebook
- Open saved notebook last time
    - or create a new notebook: Right-upper area $\rightarrow$ New $\rightarrow$ Notebooks $\rightarrow$ Python
- Be ready to type in code!

## What is Numpy

- A fundamental package for scientific computing with Python. 
- It contains among other things:
    - A powerful N-dimensional array object
    - Sophisticated (broadcasting) functions
    - Tools for integrating C/C++ and Fortran code
    - Useful linear algebra, Fourier transform, and random number capabilities
    
Reference: [Numpy official website](http://www.numpy.org/)

## Contents

1. Vectorize Operations
2. Numeric Functions
3. Set Operations
4. Generate Arrays
5. 2D Array (Matrix)
6. Axis-wise Functions
7. Linear Algebra
8. Reshape and Concatenate
9. Missing Values and Infinity

## Numpy Array (1): Vectorize Operations

In [1]:
import numpy as np
x = np.array([-1, 0, 2, 5])
x

array([-1,  0,  2,  5])

In [2]:
x + [1, 2, 3, 4]

array([0, 2, 5, 9])

In [3]:
x * 3

array([-3,  0,  6, 15])

In [4]:
# Index and slice like a list (-N denotes the Nth index counted from the tail)
x[:-1]

array([-1,  0,  2])

In [5]:
# Index and slice by boolean condition
x[x > 0]

array([2, 5])

In [6]:
# shape allows for showing more than 1 dimensions
x.shape

(4,)

## Numpy Array (2): Numeric Functions

In [7]:
# Maximum and minimum
x.max(), x.min()

(5, -1)

In [8]:
# The index of maximum and minimum
x.argmax(), x.argmin()

(3, 0)

In [9]:
# Sum, mean, variance and standard deviation
x.sum(), x.mean(), x.var(), x.std()

(6, 1.5, 5.25, 2.2912878474779199)

In [10]:
# Cumulative sum
np.cumsum(x)

array([-1, -1,  1,  6])

In [11]:
# Difference
np.diff(x)

array([1, 2, 3])

## How to get help

- From Function Help: ? + function
    - ?np.diff
    - ?da.head
- From Autofill with *Tab*:
    - np. + *Tab*
    - (a numpy array). + *Tab*
- From Google: Your Question (what you want to do) + numpy
    - upper triangular matrix numpy
    - convolution numpy
- From Official Website: http://www.numpy.org/
- From Online Course: https://www.datacamp.com/
- From Book: [Python for Data Analysis](http://www3.canisius.edu/~yany/python/Python4DataAnalysis.pdf), Wes McKinney

## Numpy Array (3): Set Operations

In [12]:
# Find unique element and sort them
z = np.unique([1, 2, 1, 0, 5, 5])
z

array([0, 1, 2, 5])

In [13]:
# Set union
np.union1d(x, z)

array([-1,  0,  1,  2,  5])

In [14]:
# Set intersect
np.intersect1d(x, z)

array([0, 2, 5])

In [15]:
# Set exclusive values in x
np.setdiff1d(x, z)

array([-1])

In [16]:
# Set exclusive values in x and z
np.setxor1d(x, z)

array([-1,  1])

## Numpy Array (4): Generate Arrays

In [17]:
# Generate all 1 array of shape [row, column] = [2, 3]
np.ones([2, 3])

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [18]:
# Repeat each value in a list for 3 times
np.repeat(x, 3)

array([-1, -1, -1,  0,  0,  0,  2,  2,  2,  5,  5,  5])

In [19]:
# Generate a sequence from 4 to 20 (right-side not included) with step-size 2
np.arange(4, 20, 2)

array([ 4,  6,  8, 10, 12, 14, 16, 18])

In [20]:
# Generate a uniform random sample between 2 and 3 of length 5
np.random.seed(1)
np.random.uniform(low = 2, high = 3, size = 5)

array([ 2.417022  ,  2.72032449,  2.00011437,  2.30233257,  2.14675589])

In [21]:
# Random sample from a list
np.random.choice(x, size = 10)

array([-1,  0, -1,  5,  0, -1,  2,  0,  2, -1])

## Numpy Array (5): 2D Array (Matrix)

In [22]:
y = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
y

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [23]:
# Select by row 2 and column 0
y[2,0]

9

In [24]:
# Slice a sub-matrix by row 0 to 2 and column 1 to 3
y[0:2, 1:3]

array([[2, 3],
       [6, 7]])

In [25]:
# Slice a sub-matrix by all row and column 2 to last
y[:, 2:]

array([[ 3,  4],
       [ 7,  8],
       [11, 12]])

## Numpy Array (6): Axis-wise Function

In [26]:
# axis = 0: sum all rows
y.sum(axis = 0)

array([15, 18, 21, 24])

In [27]:
# axis = 1: sum all columns
y.sum(axis = 1)

array([10, 26, 42])

In [28]:
# default: sum all entries
y.sum()

78

In [29]:
# If each row has any columns satisfying condition
(y > 5).any(axis = 1)

array([False,  True,  True], dtype=bool)

In [30]:
# If each row has all columns satisfying condition
(y > 5).all(axis = 1)

array([False, False,  True], dtype=bool)

## Numpy Array (7): Linear Algebra

In [31]:
# Transpose
y.T

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [32]:
# Element-wise product on each row
y * x

array([[-1,  0,  6, 20],
       [-5,  0, 14, 40],
       [-9,  0, 22, 60]])

In [33]:
# Dot product of matrix
y.dot(x)

array([25, 49, 73])

In [34]:
# Eigen-decomposition of yy'
y2 = y.dot(y.T)
np.linalg.eig(y2)

(array([  6.47032607e+02,   2.96739296e+00,   4.87870499e-14]),
 array([[-0.20673589, -0.88915331,  0.40824829],
        [-0.51828874, -0.25438183, -0.81649658],
        [-0.82984158,  0.38038964,  0.40824829]]))

## Numpy Array (8): Reshape and Concatenate

In [35]:
# Reshape an array to another shape
y.reshape([2, 6])

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [36]:
# Extend the dimension of 1d-array with np.newaxis
x[:, np.newaxis]

array([[-1],
       [ 0],
       [ 2],
       [ 5]])

In [37]:
# Concatenate two 2d-arrays along columns (axis = 1)
np.concatenate([y, y], axis = 1)

array([[ 1,  2,  3,  4,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  5,  6,  7,  8],
       [ 9, 10, 11, 12,  9, 10, 11, 12]])

In [38]:
# Concatenate 2d-array y with 1d-array x, need to add a dimension to x first
np.concatenate([y, [x]])

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [-1,  0,  2,  5]])

## Numpy Array (9): Missing Values and Infinity

- np.nan: missing value
- np.inf: (positive) infinity

In [39]:
e = np.array([np.nan, np.inf, -np.inf, 2, 3])
e

array([ nan,  inf, -inf,   2.,   3.])

In [40]:
# inf and -inf are different values, to find both we use np.inf()
e == np.inf

array([False,  True, False, False, False], dtype=bool)

In [41]:
np.isinf(e)

array([False,  True,  True, False, False], dtype=bool)

In [42]:
# We cannot find nan value by "== np.nan", instead we use np.isnan()
e == np.nan

array([False, False, False, False, False], dtype=bool)

In [43]:
np.isnan(e)

array([ True, False, False, False, False], dtype=bool)

In [None]:
'''
The Blessings of Wisdom 


    When he made firm the skies above,

        when he established the fountains of the deep,

    when he assigned to the sea its limit,

        so that the waters might not transgress his command,

    when he marked out the foundations of the earth,

        then I was beside him, like a master workman,

    and I was daily his delight,

        rejoicing before him always,

    rejoicing in his inhabited world

        and delighting in the children of man.


(Proverbs 8:28-31 ESV)
'''