# Numpy Array for Vectorize Operation

## Preparation

- Download and Install [Anaconda](https://www.continuum.io/downloads) (Python 3 version)
- Run command in a new shell: `jupyter lab`
- If you run jupyter lab in a linux server through ssh, open a new terminal
    - `ssh -L 8000:localhost:(jupyter port, default is 8888) (user)@(server address)`
    - after log in, `jupyter notebook list`, check and copy the token of your jupyter
    - open browser http://localhost:8000/, enter the token
- Create a new notebook: Left-upper area -> "+" (New launcher) -> Notebook -> Python 3
    - Create a new cell before / after current cell: `A` / `B`
    - Delete the current cell: `D D`
    - Run current cell: `Shift + Enter`
    - Notebook/cell operations in Toolset under Tab

## What is Numpy

- A fundamental package for scientific computing with Python. 
- It contains among other things:
    - A powerful N-dimensional array object
    - Sophisticated (broadcasting) functions
    - Tools for integrating C/C++ and Fortran code
    - Useful linear algebra, Fourier transform, and random number capabilities
    
Reference: [Numpy official website](http://www.numpy.org/)

## Contents

1. Vectorize Operations
2. Numeric Functions
3. Set Operations
4. Generate Arrays
5. 2D Array (Matrix)
6. Axis-wise Functions
7. Linear Algebra
8. Reshape and Concatenate
9. Missing Values and Infinity

## Numpy Array (1): Vectorize Operations

In [1]:
import numpy as np
x = np.array([-1, 0, 2, 5])
x

array([-1,  0,  2,  5])

In [2]:
x + [1, 2, 3, 4]

array([0, 2, 5, 9])

In [3]:
x * 3

array([-3,  0,  6, 15])

In [4]:
# Index and slice like a list (-N denotes the Nth index counted from the tail)
x[:-1]

array([-1,  0,  2])

In [5]:
# Index with list
x[[1, 1, 3, 0]]

array([ 0,  0,  5, -1])

In [6]:
# Index and slice by boolean condition
x[x > 0]

array([2, 5])

In [7]:
# shape allows for showing more than 1 dimensions
x.shape

(4,)

In [8]:
# type of numpy array
x.dtype, (x / 2).dtype

(dtype('int64'), dtype('float64'))

## Numpy Array (2): Numeric Functions

In [9]:
# Maximum and minimum
x.max(), x.min()

(5, -1)

In [10]:
# The index of maximum and minimum
x.argmax(), x.argmin()

(3, 0)

In [11]:
# Sum, mean, variance and standard deviation
x.sum(), x.mean(), x.var(), x.std()

(6, 1.5, 5.25, 2.2912878474779199)

In [12]:
# Cumulative sum
np.cumsum(x)

array([-1, -1,  1,  6])

In [13]:
# Difference
np.diff(x)

array([1, 2, 3])

## How to get help

- From Function Help: ? + function
    - `?np.diff`
    - `?da.head`
- From Autofill with *Tab*:
    - `np.` + *Tab*
    - `(a numpy array).` + *Tab*
- From Google: Your Question (what you want to do) + numpy
    - upper triangular matrix numpy
    - convolution numpy
- From Official Website: http://www.numpy.org/
- From Online Course: https://www.datacamp.com/
- From Book: [Python for Data Analysis](http://www3.canisius.edu/~yany/python/Python4DataAnalysis.pdf), Wes McKinney

## Numpy Array (3): Set Operations

In [14]:
# Find unique element and sort them
z = np.unique([1, 2, 1, 0, 5, 5])
z

array([0, 1, 2, 5])

In [15]:
# Set union
np.union1d(x, z)

array([-1,  0,  1,  2,  5])

In [16]:
# Set intersect
np.intersect1d(x, z)

array([0, 2, 5])

In [17]:
# Set exclusive values in x
np.setdiff1d(x, z)

array([-1])

In [18]:
# Set exclusive values in x and z
np.setxor1d(x, z)

array([-1,  1])

## Numpy Array (4): Generate Arrays

In [19]:
# Generate all 1 array of shape [row, column] = [2, 3]
np.ones([2, 3])

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [20]:
# Repeat each value in a list for 3 times
np.repeat(x, 3)

array([-1, -1, -1,  0,  0,  0,  2,  2,  2,  5,  5,  5])

In [21]:
# Generate a sequence from 4 to 20 (right-side not included) with step-size 2
np.arange(4, 20, 2)

array([ 4,  6,  8, 10, 12, 14, 16, 18])

In [22]:
# Generate a sequence from 1 to 10 with 5 values
np.linspace(1, 10, 5)

array([  1.  ,   3.25,   5.5 ,   7.75,  10.  ])

In [23]:
# Generate a Normal(2, 1) random sample between of length 5
np.random.seed(1)
np.random.normal(loc=2, scale=1, size=5)

array([ 3.62434536,  1.38824359,  1.47182825,  0.92703138,  2.86540763])

In [24]:
# Sample a random array from a list
np.random.choice(x, size=(3, 4))

array([[ 2,  0,  2, -1],
       [ 5, -1,  2, -1],
       [ 0,  2,  2, -1]])

## Numpy Array (5): 2D Array (Matrix)

In [25]:
y = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
y

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [26]:
# Select by row 2 and column 0
y[2,0]

9

In [27]:
# Slice a sub-matrix by row 0 to 2 and column 1 to 3
y[0:2, 1:3]

array([[2, 3],
       [6, 7]])

In [28]:
# Slice a sub-matrix by all row and column 2 to last
y[:, 2:]

array([[ 3,  4],
       [ 7,  8],
       [11, 12]])

In [29]:
# insert a row of 10 to position 2
np.insert(y, 2, 10, axis=0)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [10, 10, 10, 10],
       [ 9, 10, 11, 12]])

In [30]:
# delete the column 1
np.delete(y, 1, axis=1)

array([[ 1,  3,  4],
       [ 5,  7,  8],
       [ 9, 11, 12]])

## Numpy Array (6): Axis-wise Function

In [31]:
# axis = 0: sum all rows
y.sum(axis = 0)

array([15, 18, 21, 24])

In [32]:
# axis = 1: sum all columns
y.sum(axis = 1)

array([10, 26, 42])

In [33]:
# default: sum all entries
y.sum()

78

In [34]:
# If each row has any columns satisfying condition
(y > 5).any(axis = 1)

array([False,  True,  True], dtype=bool)

In [35]:
# numerical operations can boardcast along axis
100 * x + y

array([[-99,   2, 203, 504],
       [-95,   6, 207, 508],
       [-91,  10, 211, 512]])

In [36]:
# outer operation of numpy ufuncs
np.add.outer(x, x)

array([[-2, -1,  1,  4],
       [-1,  0,  2,  5],
       [ 1,  2,  4,  7],
       [ 4,  5,  7, 10]])

## Numpy Array (7): Linear Algebra

In [37]:
# Transpose
y.T

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [38]:
# Element-wise product on each row
y * x

array([[-1,  0,  6, 20],
       [-5,  0, 14, 40],
       [-9,  0, 22, 60]])

In [39]:
# Dot product of matrix
y.dot(x)

array([25, 49, 73])

In [40]:
# Dot product sign "@" since Python 3.5
y @ x

array([25, 49, 73])

In [41]:
# Eigen-decomposition of yy'
y2 = y.dot(y.T)
np.linalg.eig(y2)

(array([  6.47032607e+02,   2.96739296e+00,   4.87870499e-14]),
 array([[-0.20673589, -0.88915331,  0.40824829],
        [-0.51828874, -0.25438183, -0.81649658],
        [-0.82984158,  0.38038964,  0.40824829]]))

## Numpy Array (8): Reshape and Concatenate

In [42]:
# Reshape an array to another shape
y.reshape([2, 6])

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [43]:
# diagonal of matrix or create diagonal matrix
print(np.diag(y))
print(np.diag(x))

[ 1  6 11]
[[-1  0  0  0]
 [ 0  0  0  0]
 [ 0  0  2  0]
 [ 0  0  0  5]]


In [44]:
# Extend the dimension of 1d-array with np.newaxis
x[:, np.newaxis]

array([[-1],
       [ 0],
       [ 2],
       [ 5]])

In [45]:
# Stack arrays horizontally
np.hstack([y, y])

array([[ 1,  2,  3,  4,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  5,  6,  7,  8],
       [ 9, 10, 11, 12,  9, 10, 11, 12]])

In [46]:
# Stack arrays vertically
np.vstack([y, x])

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [-1,  0,  2,  5]])

In [47]:
# split an array into several near-equal-length arrays
np.array_split(range(10), 3)

[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]

## Numpy Array (9): Missing Values and Infinity

- np.nan: missing value
- np.inf: (positive) infinity

In [48]:
e = np.array([np.nan, np.inf, -np.inf, 2, 3])
e

array([ nan,  inf, -inf,   2.,   3.])

In [49]:
# Mathematical operations on nan/inf values
1 - 2 * e

array([ nan, -inf,  inf,  -3.,  -5.])

In [50]:
# inf and -inf are different values, to find both we use np.inf()
e == np.inf

array([False,  True, False, False, False], dtype=bool)

In [51]:
np.isinf(e)

array([False,  True,  True, False, False], dtype=bool)

In [52]:
# We cannot find nan value by "== np.nan", instead we use np.isnan()
e == np.nan

array([False, False, False, False, False], dtype=bool)

In [53]:
np.isnan(e)

array([ True, False, False, False, False], dtype=bool)

In [54]:
# Select only finite values
e[np.isfinite(e)]

array([ 2.,  3.])