# Chapter 04: NumPy Basics: Arrays and Vectorized Computation

## Initial Setup

In [0]:
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Why NumPy?


- **ndarray**, an efficient multidimensional array providing fast array-oriented arithmetic operation and flexible broadcasting capabilites.
- Mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random number generation,and Fourier transform capabilities.
- A C API for connecting NumPy with libraries written in C, C++, or FORTRAN

In [0]:
#Example - Performance of NumPy vs basic Python list
import numpy as np

basicList = list(range(1000000))
numpyList = np.arange(1000000)

%time for _ in range(10): basicList*2
%time for _ in range(10): numpyList*2

## 4.1 The NumPy ndarray: A Multidimensional Array Object

In [0]:
import numpy as np

### Creating narrays

- Using `numpy.array()` function
- The `numpy.array()` function accepts any sequence-like object and produces a new NumPy array.

In [0]:
#Example - Creating 1-dimensional array
temp = [1,2.3,4,5,66]

arr = np.array(temp)
arr
# array([ 1. ,  2.3,  4. ,  5. , 66. ])

array([ 1. ,  2.3,  4. ,  5. , 66. ])

- Nested sequences will be converted into a multidimensional array

In [0]:
#Example - Creating 2-dimensionl array
temp = [
    [2,3,4,5],
    [6,7,8,9]
]

arr = np.array(temp)
print('Array: \n{}'.format(arr))
# Array: 
# [[2 3 4 5]
#  [6 7 8 9]]

print('Number of dimension: {}'.format(arr.ndim))
# Number of dimension: 2

print('Shape: {}'.format(arr.shape))
#Shape: (2, 4)

Array: 
[[2 3 4 5]
 [6 7 8 9]]
Number of dimension: 2
Shape: (2, 4)


- Unless ***explicitly specified***, `numpy.array()` tries to infer a good data type for the array that it creates.
- There are many function for creating new arrays such as `numpy.zeros()`, `numpy.empty()`, etc.

<h3><b>IMPORTANT NOTE</b>:</h3> 

> It's not safe to **assume** that `numpy.empty()` will return an array of all zeros. In somecases, it may return uninitialized "**garbage**" values.

- Some standard creation function  

| Function | Description |
|----------|-------------|
| array | Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype; copies the input data by default|
| asarray | Convert input to ndarray, but do not copy if the input is already an ndarray| 
| arange | Like the built-in range but returns an ndarray instead of a list
| ones, <br>ones_like | Produce an array of all 1s with the given shape and dtype; <br>ones_like takes another array and produces a ones array of the same shape and dtype| 
| zeros, <br>zeros_like | Like ones and ones_like but producing arrays of 0s instead| 
| empty, empty_like | Create new arrays by allocating new memory, but do not populate with any values like ones and zeros| 
| full, <br>full_like | Produce an array of the given shape and dtype with all values set to the indicated “fill value” <br> full_like takes another array and produces a filled array of the same shape and dtype| 
| eye, identity | Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere)| 

### Data Types for ndarrays

- The data type or `dtype` is a special object containing the information (or metadata) the ndarray needs to interpret a chunk of memory as a particular type of data.
- `dtype` is a source of NumPy's flexibility for interacting with data coming from other systems. In most cases, they provide a mapping directly onto an underlying disk or memory representation, which makes it easy to read and write binary stream of data to disk and also to connect to code written in a low-level language like C or Fortran.

- NumPy data types

| Type | Type code | Description |
|------|-----------|-------------|
| int8, uint8 | i1, u1 | Signed and unsigned 8-bit (1 byte) integer types| 
| int16, uint16 | i2, u2 | Signed and unsigned 16-bit integer types| 
| int32, uint32 | i4, u4|  Signed and unsigned 32-bit integer types| 
| int64, uint64 | i8, u8|  Signed and unsigned 64-bit integer types| 
| float16 | f2 | Half-precision floating point| 
| float32 | f4 or f|  Standard single-precision floating point; compatible with C float| 
| float64 | f8 or d|  Standard double-precision floating point; compatible with C double and
Python float object| 
| float128 | f16 or g|  Extended-precision floating point| 
| complex64, <br>complex128, <br>complex256|  c8, c16, c32| Complex numbers represented by two 32, 64, or 128 floats, respectively| 
| bool | ? | Boolean type storing True and False values| 
| object | O | Python object type; a value can be any Python object| 
| string_ | S|  Fixed-length ASCII string type (1 byte per character); for example, to create a
string dtype with length 10, use 'S10'| 
| unicode_ | U | Fixed-length Unicode type (number of bytes platform specific); same specification semantics as string_ (e.g., 'U10')| 


<h3><b>IMPORTANT NOTE:</b></h3>

> - It's important to be cautious when using the `numpy.string_` type as string data in NumPy is **fixed size** and may **truncate input** without warning.
> - If casting were to fail for some reason, a **`ValueError`** will be raised.

### Arithmetic with NumPy Arrays

- NumPy arrays allow users to express batch operation on data without any for loops. This is called **vectorization**
- Any arithmetic operations between **equal-size** arrays applies the operation element-wise
- Comparisions between arrays of the **same size** yield boolean arrays

In [0]:
arr01 = np.array([
                  [1,2,3],
                  [3,4,6]
])

arr02 = np.array([
                  [2,2,2],
                  [3,3,3]
])

print('{}'.format(arr01*arr02))
# [[ 2  4  6]
#  [ 9 12 18]]

print('{}'.format(arr01>arr02))
# [[False False  True]
#  [False  True  True]]

[[ 2  4  6]
 [ 9 12 18]]
[[False False  True]
 [False  True  True]]


### Basic Indexing and Slicing

<h4><b>IMPORTANT NOTE:</b></h4>

> Array slices in NumPy are **views** on the original array. This mean that the data **is not copied**, and any **modification** to the view will be **reflected** in the source array.

In [0]:
#Example - Modify array slice
arr = np.array([1,2,3,4,5])
print('arr: {}'.format(arr))
# arr: [1 2 3 4 5]

temp = arr[1:4]
print('temp: {}'.format(temp))
# temp: [2 3 4]

temp[1] = 999
print('arr: {}'.format(arr))
# arr: [  1   2 999   4   5]

In [0]:
# Example - Access ndarray value
arr2d = np.array([
                [1,2,3],
                [4,5,6],
                [7,8,9]
])

print(arr2d[0][2]) # Result 3
# equivalent
print(arr2d[0, 2]) # Result 3

- Indexing with slices

In [0]:
# Example - Slice ndarray
arr2d = np.array([
                [1,2,3],
                [4,5,6],
                [7,8,9]
])

print(arr2d[:2], end='\n\n') # select the first two row of arr2d
# [[1 2 3]
#  [4 5 6]]

print(arr2d[:2, 1:], end='\n\n') # select the first two row of arr2d without values in the first column
# [[2 3]
#  [5 6]]

print(arr2d[1, :2], end='\n\n') # select the second row and take the first two values of arr2d
# [4 5]

print(arr2d[:2, 2], end='\n\n') # select first two row and take the third column of arr2d
# [3 6]

- Boolean indexing

In [0]:
# Example - Boolean selection
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

data = np.random.randn(7,4)

print(data, end='\n\n')
print(names, end='\n\n')


print(names == 'Bob', end='\n\n')
#[ True False False  True False False False]

# Passing boolean array as argument
# It will select elements with index 0 and index 3 according to the boolean array
print(data[names == 'Bob'], end='\n\n')

In [0]:
# Example - Boolean selection & using ~
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

data = np.random.randn(7,4)

print(data, end='\n\n')
print(names, end='\n\n')


print(~(names == 'Bob'), end='\n\n') # We can replace == with != instead of using ~
#[False  True  True False  True  True  True]

# Passing boolean array as argument
# It will select elements with index 1,2,4,5,6 according to the boolean array
print(data[~(names == 'Bob')], end='\n\n')

In [0]:
# Example - Boolean selection & compile multiple boolean conditions
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

data = np.random.randn(7,4)

print(data, end='\n\n')
print(names, end='\n\n')

conditions = (names == 'Bob') | (names == 'Will')

print(conditions, end='\n\n')
#[ True False  True  True  True False False]

# Passing boolean array as argument
# It will select elements with index 0,2,3,4 according to the boolean array
print(data[conditions], end='\n\n')

In [0]:
# Example - Set new value for array based on conditions
data = np.random.randn(7,4)

print(data, end='\n\n')

data[data < 0] = 0 # Set all elements have value < 0 to 0
print(data, end='\n\n')

<h4><b>IMPORTANT NOTE:</b></h4>

> - Boolean selection will not fail if the boolean array is not the correct length.
> - The Python keyword `and` and `or` **do not work** with boolean arrays. Use **&** and **|** instead.

- Fancy Indexing

In [0]:
# Example - Select list of row using 1-dimension list/array of integers
arr2d = np.empty((6,4))

for i in range(6):
    arr2d[i] = i
    
print(arr2d, end='\n\n')
# [[0. 0. 0. 0.]
#  [1. 1. 1. 1.]
#  [2. 2. 2. 2.]
#  [3. 3. 3. 3.]
#  [4. 4. 4. 4.]
#  [5. 5. 5. 5.]]

subArray = arr2d[[4,2,5]]

print(subArray, end='\n\n')
# [[4. 4. 4. 4.]
#  [2. 2. 2. 2.]
#  [5. 5. 5. 5.]]

In [0]:
# Example - Select listof row using 2-dimension list/array of integers
arr2d = np.arange(32).reshape((8,4))

print(arr2d, end='\n\n')
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]
#  [12 13 14 15]
#  [16 17 18 19]
#  [20 21 22 23]
#  [24 25 26 27]
#  [28 29 30 31]]

subArray = arr2d[[1,5,7,2], 
                 [0,3,2,1]]
# Select elements (1,0), (5,3), (7,2) and (2,1) in arr2d

print(subArray, end='\n\n')
# [ 4 23 30  9]
# Explain: 
# The first pair in [[1,5,7,2], 
#                    [0,3,2,1]]
# is 1 and 0
# First, we select the row with index = 1 which is [ 4  5  6  7] in arr2d
# then we choose the column with index = 0 and we have 4

- Transposing Arrays and Swapping Axes

In [0]:
# Example - Transpose array <numpy.array>.T
arr = np.arange(15).reshape((3,5))

print(arr, end='\n\n')
# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [10 11 12 13 14]]

print(arr.T, end='\n\n')
# [[ 0  5 10]
#  [ 1  6 11]
#  [ 2  7 12]
#  [ 3  8 13]
#  [ 4  9 14]]

In [0]:
# Example - Transpose array >=3D arrays
arr = np.arange(16).reshape((2,2,4))

print(arr, end='\n\n')
# [[[ 0  1  2  3]
#   [ 4  5  6  7]]
#  [[ 8  9 10 11]
#   [12 13 14 15]]]

print(arr.transpose((1,0,2)), end='\n\n')
# [[[ 0  1  2  3]
#   [ 8  9 10 11]]
#  [[ 4  5  6  7]
#   [12 13 14 15]]]

In [0]:
# Example - Swapping Axes
arr = np.arange(16).reshape((2,2,4))

print(arr, end='\n\n')
# [[[ 0  1  2  3]
#   [ 4  5  6  7]]
#  [[ 8  9 10 11]
#   [12 13 14 15]]]

print(arr.swapaxes(1,2), end='\n\n')
# [[[ 0  4]
#   [ 1  5]
#   [ 2  6]
#   [ 3  7]]
#  [[ 8 12]
#   [ 9 13]
#   [10 14]
#   [11 15]]]

## 4.2 Universal Functions: Fast Element-Wise Array Functions

### What is an Unversal Function?

- A universal function, or *ufunc*, is a function that performs element-wise operations on data in ndarrays.

### Unary universal functions

| Function | Description |
|----------|-------------|
| abs, fabs| Compute the absolute value element-wise for integer, floating-point, or complex values| 
| sqrt | Compute the square root of each element (equivalent to arr ** 0.5)| 
| square | Compute the square of each element (equivalent to arr ** 2)| 
| exp | Compute the exponent ex of each element| 
| log, log10, <br>log2, log1p| Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively| 
| sign | Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)| 
| ceil | Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that number)| 
| floor | Compute the floor of each element (i.e., the largest integer less than or equal to each element)| 
| rint | Round elements to the nearest integer, preserving the dtype| 
| modf | Return fractional and integral parts of array as a separate array| 
| isnan | Return boolean array indicating whether each value is NaN (Not a Number)| 
| isfinite, isinf|  Return boolean array indicating whether each element is finite (non-inf, non-NaN) or infinite, respectively| 
| cos, cosh, sin, <br>sinh, tan, tanh| Regular and hyperbolic trigonometric functions| 
| arccos, arccosh, <br> arcsin, arcsinh, <br> arctan, arctanh | Inverse trigonometric functions| 
| logical_not | Compute truth value of not x element-wise (equivalent to ~arr)| 


### Binary universal functions

| Function | Description |
|----------|-------------|
| add | Add corresponding elements in arrays| 
| subtract | Subtract elements in second array from first array| 
| multiply | Multiply array elements| 
| divide, floor_divide | Divide or floor divide (truncating the remainder)| 
| power | Raise elements in first array to powers indicated in second array| 
| maximum, fmax | Element-wise maximum; fmax ignores NaN| 
| minimum, fmin | Element-wise minimum; fmin ignores NaN| 
| mod | Element-wise modulus (remainder of division)
| copysign | Copy sign of values in second argument to values in first argument| 
| greater, greater_equal, <br>less, less_equal, <br> equal, not_equal | Perform element-wise comparison, yielding boolean array (equivalent to infix operators >, >=, <, <=, ==, !=)| 
| logical_and, <br>logical_or, <br>logical_xor | Compute element-wise truth value of logical operation (equivalent to infix operators & |, ^)| 

## 4.3 Array-Oriented Programming with Arrays

### Expressing Conditional Logic as Array Operation

- The `numpy.where` function is a vetorized version of the ternary expression `x if condition else y`.
- Can be used to produce a new array of values based on another array.

In [0]:
# Example - Using numpy.where vs Using typical condition expression
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

# Using typical expression
result01 = [(x if c else y) for x,y,c in zip(xarr,yarr,cond)]
print(result01) # result01 is a list
# [1.1, 2.2, 1.3, 1.4, 2.5]

# Using numpy.where
result02 = np.where(cond, xarr, yarr)
print(result02) # result02 is a numpy.array
# [1.1 2.2 1.3 1.4 2.5]

[1.1, 2.2, 1.3, 1.4, 2.5]
[1.1 2.2 1.3 1.4 2.5]


### Mathematical and Statistical Methods

- A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class

In [0]:
# Example - Using statistical method numpy.mean()
arr = np.random.randn(5,4)

print(arr)
print('Mean: {}'.format(arr.mean()))
print('Mean: {}'.format(np.mean(arr)))
print('Mean of axis 1: {}'.format(arr.mean(axis=1)))

- Basic array statistical methods  

| Method | Description |
|--------|-------------|
| sum | Sum of all the element in the array or along an axis; zero-length arrays have sum 0|
| mean | Arithmetic mean; zero-length arrays have NaN mean|
| std, var | Standard deviation and variance, respectively, with optional degrees of freedom adjustment (default denominator n) |
| min, man | Minimum and maximum |
| argmin, argmax | Indices of minimum and maximum elements |
| cumsum | cumulative sum of elements starting from 0 |
| cumprod | Cumulative product of elemnts starting from 1 |

### Methods for Boolean Arrays

- Boolean values are coerced to 1 (True) or O (False).
- `sum()`can be used to count True/False values
- `any()` can be used to test whether one or more values in an array is True
- `all()` can be used to check if every value is True

In [0]:
# Example - Using methods for Boolean Arrays

# Using sum() to count True values
arr = np.random.randn(100)
print('There are {} positivate values.'.format((arr > 0).sum())) 


# Using any() & all(0)
bools = np.array([False, True, True, False])
print('Is there any True value? {}'.format('Yes' if bools.any() else 'No'))
# True => Yes
print('Is all True values? {}'.format('Yes' if bools.all() else 'No'))
# False => No

There are 47 positivate values.
Is there any True value? Yes
Is all True values? No


### Unique and Other Set Logic

- Array set operations (1D-Array)

| Method | Description |
|--------|-------------|
| unique(x) | Compute the sorted, unique elements in x |
| intersect1d(x,y)| Compute the sorted, common elements in x and y |
| union1d(x,y) | Compute the sorted union of elements |
| in1d(x, y) | Compute a boolean array indicating whether each element of x is contained in y |
| setdiff1d(x,y) | Set difference, elemnts in x that are not in y |
| setxor1d(x,y) | Set symmetric differences; elements that are in either of the arrays, but not both|

## 4.4 File input and Output with Arrays (skipped)

## 4.5 Linear Algebra

- Commonly used `numpy.linalg` functions

| Function | Description |
|----------|-------------|
| diag | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal |
| dot | Matrix multiplication |
| trace | Compute the sum of the diagonal elements |
| det | Compute the matrix determinant |
| eig | Compute the eigenvalues and eigenvectors of a square matrix |
| inv | Compute the inverse of a square matrix |
| pinv | Compute the Moore-Penrose pseudo-inverse of a matrix |
| qr | Compute the QR decomposition |
| svd | Compute the singular value decomposition |
| solve | Solve the linear system Ax = b for x, where A is a square matrix|
| lstsq | Compute the least-squares solution to Ax = b|

## 4.6 Pseudorandom Number Generation

- Partial list of `numpy.random` functions

| Function | Description |
|----------|-------------|
| seed | Seed the random number generator |
| permutation | Return a random permutation of a sequence, or return a permuted range|
| suffle | Randomly permute a sequence in-place|
| rand | Draw samples from a uniform distribution |
| randint | Draw random integers from a given low-to-high range |
| randn | Draw samples from a normal distribution with mean 0 and standard deviation 1|
| binomial | Draw samples from a binomial distribution|
| normal | Draw samples from a normal (Gaussian) distribution |
| beta | Draw samples from a beta distribution |
| chisquare | Draw samples from a chi-square distribution |
| gamma | Draw samples from a gamma distribution |
| uniform | Draw samples from a uniform `[ 0 , 1 )` distribution