# NumPy

by Hanzholah Shobri, 22 April 2022

This is a personal notebook I use to learn Numpy as one of the Scientific Computing libraries. See more at [this repository](https://github.com/hanzholahs/AI-Engineering-Bootcamp).

To help me learn, I use these sites as my reference: [Numpy Documentation](https://numpy.org/doc/stable/user/whatisnumpy.html). Here, I also take some notes by copying texts from the website with and without summarising/paraphrasing.

## What is NumPy? 

It is a the fundamental package for scientific computing in Python that provides:
* a multidimensional array object
* various derived objects (such as masked arrays and matrices)
* an assortment of routines for fast operations

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.

Numpy arrays vs standard Python Squences:
* NumPy arrays have a **fixed size** at creation
* The elements in a NumPy array are all required to be of the **same data type**, and thus will be the same size in memory. 
* NumPy arrays facilitate **advanced operations on large numbers of data**.
* A plethora of scientific and mathematical Python-based packages are **using NumPy arrays behind the scene**.

## Why is NumPy Fast?

NumPy uses optimized, pre-compiled C code behind the scenes to compute its operation. Some of its advantages are:
* More consice and easier-to-read code
* Fewer lines -> fewer bugs
* Closely resembles standard mathematical notation
* More "Pythonic" code

## Using NumPy

In [4]:
# Import NumPy
import numpy as np

## Creating NumPy Arrays

### Creating NumPy Arrays

* `np.array`: create an array from a python list

In [9]:
# Create a 1-dimensional array
a1 = np.array([1,2,3,4])
a1

array([1, 2, 3, 4])

In [10]:
# Create a 2-dimensional array
a2 = np.array([[4,5,6], [7,6,5]])
a2

array([[4, 5, 6],
       [7, 6, 5]])

In [11]:
# a1 and a2 are an ndarray or N-dimensional array
type(a1), type(a2)

(numpy.ndarray, numpy.ndarray)

### Other ways to create NumPy Arrays

* `np.zeros`: create zeros vector or null vector (array filled with 0's)
* `np.ones`: create ones vector (array filled with 1's)
* `np.arrange`: create an array contains a range of evenly spaced intervals
* `np.linspace`: create an array contains values spaced linearly in an interval

In [13]:
# Create zero vector or null vector 
np.zeros(2)

array([0., 0.])

In [14]:
# Create ones vector 
np.ones(4)

array([1., 1., 1., 1.])

In [18]:
# Create an array based on a range of numbers
np.arange(4)

array([0, 1, 2, 3])

In [30]:
# Create an array based on a range of numbers - set start, stop, step
np.arange(2, 15, 2) # start: 2, stop: 10, step size:3

array([ 2,  4,  6,  8, 10, 12, 14])

In [24]:
# Create an array with values spaced linearly in an interval
np.linspace(0.5, 5.5, 11) # start: 0.5, stop: 5.5, num of elements: 11

array([0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5])

In [26]:
# Create an array with a specified data type - integer
np.ones(5, dtype = np.int64)

array([1, 1, 1, 1, 1], dtype=int64)

In [27]:
# Create an array with a specified data type - boolean
np.ones(5, dtype = np.bool)

array([ True,  True,  True,  True,  True])

### Random Generators

In [265]:
# Generate random uniformly distributed numbers between zero and one
np.random.rand(10)

array([0.49240254, 0.93582955, 0.37131731, 0.71535544, 0.48317579,
       0.29660975, 0.82853202, 0.07792746, 0.78178141, 0.18672644])

In [266]:
# Generate random uniformly distributed numbers between zero and one
np.random.rand(10, 2)

array([[0.6665787 , 0.83603671],
       [0.66187907, 0.69660588],
       [0.31879362, 0.07194154],
       [0.95814609, 0.04493414],
       [0.68831959, 0.36200273],
       [0.08392974, 0.15957744],
       [0.07589289, 0.77975243],
       [0.64313626, 0.7784971 ],
       [0.712102  , 0.31802608],
       [0.39527979, 0.48403187]])

In [261]:
# Generate random normally distributed numbers with mean = 0 and std = 1
np.random.randn(10)

array([-0.24447873,  1.50944412,  1.77257884, -0.69968959, -0.46959337,
       -1.75668419, -1.4337014 , -1.12168122,  0.75906261, -0.12599449])

In [263]:
# Generate random normally distributed numbers then compute the mean
np.random.randn(10**5).mean()

0.0016435119803080527

In [264]:
# Generate random normally distributed numbers then compute the std
np.random.randn(10**5).std()

0.9999220856520181

## Working with Array Structures 

* `ndarray.ndim`: number of dimensions
* `ndarray.size`: total number of elements
* `ndarray.shape`: number of elements along each dimension
* `ndarray.reshape`: change the shape of an array without changing the data (the number of elements is still the same before and after reshaping)
* `ndarray.transpose`: transpose (reverse the axis) of a matrix (2-dimensional array)
* `ndarray.T`: matrix transpose
* `ndarray.flatten`: transform an n-dimensional array into 1-dimensional array

Here, the `ndarray` represents a NumPy array object (e.g., an array that is created using `np.array` function).

In [279]:
# Create another array to work with
a3 = np.array([4,5,8,2,8,9,1,3,4,5,2,8,5,6,8])
a4 = np.array([[[2,4,2,4], [0,2,3,7]],
               [[7,5,3,2], [2,6,4,2]],
               [[4,7,2,4], [5,6,3,4]]])

In [99]:
# find the number of dimensions of a3 and a4
print(a3.ndim)
print(a4.ndim)

1
3


In [100]:
# find the number of elements in a3 and a4
print(a3.size)
print(a4.size)

15
24


In [101]:
# find the shape (number of elements and dimensions) of a3 and a4
print(a3.shape)
print(a4.shape)

(15,)
(3, 2, 4)


In [102]:
# Reshape a3 (from 1-dimensional array to 2-dimensional array)
a3.reshape(3, 5)

array([[4, 5, 8, 2, 8],
       [9, 1, 3, 0, 5],
       [2, 8, 5, 6, 8]])

In [267]:
# Reshape a3 (another way using np.reshape)
np.reshape(a3, (5, 3))

array([[0, 1, 2],
       [2, 3, 4],
       [5, 5, 5],
       [6, 8, 8],
       [8, 8, 9]])

In [297]:
# Transpose a4
a4.transpose()

array([[[2, 7, 4],
        [0, 2, 5]],

       [[4, 5, 7],
        [2, 6, 6]],

       [[2, 3, 2],
        [3, 4, 3]],

       [[4, 2, 4],
        [7, 2, 4]]])

In [292]:
# Transpose a4
a4[0].T

array([[2, 0],
       [4, 2],
       [2, 3],
       [4, 7]])

In [298]:
a4.flatten()

array([2, 4, 2, 4, 0, 2, 3, 7, 7, 5, 3, 2, 2, 6, 4, 2, 4, 7, 2, 4, 5, 6,
       3, 4])

## Working with Basic Functions and Methods

* `np.sort`: sort elements within an array
* `np.concatenate`: concatenate several arrays
* `np.unique`: find the unique elements in an array
* `np.flip`: reverse the elements along an axis

In [104]:
# Sort array a3 (this operates permanently)
print("before\t:", a3)
a3 = np.sort(a3)
print("after\t:", a3)

before	: [4 5 8 2 8 9 1 3 0 5 2 8 5 6 8]
after	: [0 1 2 2 3 4 5 5 5 6 8 8 8 8 9]


In [90]:
# Concatenate two 1-dimensional arrays
np.concatenate((a1, a3))

array([1, 2, 3, 4, 0, 1, 2, 2, 3, 4, 5, 5, 5, 6, 8, 8, 8, 8, 9])

In [91]:
# Concatenate two 2-dimensional arrays
np.concatenate((a2, np.array([[10, 11, 12], [20, 21, 22]])))

array([[ 4,  5,  6],
       [ 7,  6,  5],
       [10, 11, 12],
       [20, 21, 22]])

In [274]:
# Find unique values in a3
np.unique(a3)

array([1, 2, 3, 4, 5, 6, 8, 9])

In [277]:
# Find unique values in a4
np.unique(a4)

array([0, 2, 3, 4, 5, 6, 7])

In [305]:
# Reverse the elements within a3 
np.flip(a3)

array([8, 6, 5, 8, 2, 5, 4, 3, 1, 9, 8, 2, 8, 5, 4])

In [315]:
# Reverse the elements within a4
np.flip(a4)

array([[[4, 3, 6, 5],
        [4, 2, 7, 4]],

       [[2, 4, 6, 2],
        [2, 3, 5, 7]],

       [[7, 3, 2, 0],
        [4, 2, 4, 2]]])

In [310]:
# Reverse the elements within a4 along with axis 0
np.flip(a4, axis = 0)

array([[[4, 7, 2, 4],
        [5, 6, 3, 4]],

       [[7, 5, 3, 2],
        [2, 6, 4, 2]],

       [[2, 4, 2, 4],
        [0, 2, 3, 7]]])

In [314]:
# Reverse the elements within a4 along with axis 1
np.flip(a4, axis = 1)

array([[[0, 2, 3, 7],
        [2, 4, 2, 4]],

       [[2, 6, 4, 2],
        [7, 5, 3, 2]],

       [[5, 6, 3, 4],
        [4, 7, 2, 4]]])

## Indexing and Slicing NumPy Arrays

In [112]:
# Create another array to work with
a5 = np.array([4,1,3,0,5,5,8,2,8,9,2,8,5,6,8])
a6 = np.array([[[2,4,6,5], [9,2,3,7]],
               [[4,1,2,4], [5,6,3,4]],
               [[7,5,2,7], [1,6,3,5]]])

### Subset Elements from 1-dimensional Arrays

In [119]:
# Subset elements from an array - seventh element, index start from zero
a5[6]

8

In [114]:
# Subset elements from an array - fourth to ninth elements
a5[3:8]

array([0, 5, 5, 8, 2])

In [115]:
# Subset elements from an array - from sixth to the last elements
a5[5:]

array([5, 8, 2, 8, 9, 2, 8, 5, 6, 8])

In [118]:
# Subset elements from an array - the last 5 elements
a5[-5:]

array([2, 8, 5, 6, 8])

In [121]:
# Subset elements using a logical statement - elements less than 3
a5[a5 < 3]

array([1, 0, 2, 2])

In [124]:
# Subset elements using a logical statement - elements that are even
a5[a5 % 2 == 0]

array([1, 3, 5, 5, 9, 5])

In [130]:
# Subsets elements using a logical statement - elements between 3 and 8
a5[(2 < a5) & (a5 <= 8)]

array([4, 3, 5, 5, 8, 8, 8, 5, 6, 8])

In [131]:
# Subset elements using a logical statement - elements that are even or 3
a5[(a5 % 2 == 0) | (a5 == 3)]

array([4, 3, 0, 8, 2, 8, 2, 8, 6, 8])

### Subset Elements from n-dimensional Arrays (n > 1)

In [247]:
# Subset the first element
a6[0]

array([[2, 4, 6, 5],
       [9, 2, 3, 7]])

In [248]:
# Subset the first and second elements
a6[0:2]

array([[[2, 4, 6, 5],
        [9, 2, 3, 7]],

       [[4, 1, 2, 4],
        [5, 6, 3, 4]]])

In [250]:
# Subset the first elements of the first to the last elements
a6[1:, 0]

array([[4, 1, 2, 4],
       [7, 5, 2, 7]])

In [251]:
# Subset the third element then subset the second element
a6[2][1]

array([1, 6, 3, 5])

In [252]:
# Subset the third element then the second than the third
a6[2][1][2]

3

## Using `view` and `copy`

Be careful! Assignment existing data to a new variable will create another name for the data. Any operations done the data will be reflected on both the old and new variable names. 

Assigning an existing array object to a new variable is called `view` method and this will generate a *shallow copy* of that object. 

In [153]:
print("Before")

a = np.array([3,5,2,6,3,5,2,3,4,3,4,5,4])
print("b:", a)

b = a
print("b:", b)

a[3] = 100
print("\nAfter")

print("a:", a)
print("b:", b)
print()

Before
b: [3 5 2 6 3 5 2 3 4 3 4 5 4]
b: [3 5 2 6 3 5 2 3 4 3 4 5 4]

After
a: [  3   5   2 100   3   5   2   3   4   3   4   5   4]
b: [  3   5   2 100   3   5   2   3   4   3   4   5   4]



`view` method is important concept for performing operations that saves memory and runs faster. This is because `view` does not generate a copy of the data.

Any modification in a `view` also modifies the original as shown above. The method  also works for a subset of an array. 

In [158]:
c = a[6:9]
print(c)

c[0] = 1001
c[1] = 1001
c[2] = 1001
print(a)
print(b)
print(c)
print()

[1001 1001 1001]
[   3    5    2  100    3    5 1001 1001 1001    3    4    5    4]
[   3    5    2  100    3    5 1001 1001 1001    3    4    5    4]
[1001 1001 1001]



`copy` method will generate a completely new data from the array, and modification in the copied data will not affect the original.

In [162]:
d = a.copy()
print(d)

d[0] = 1001
print(a)
print(d)

[   3    5    2  100    3    5 1001 1001 1001    3    4    5    4]
[   3    5    2  100    3    5 1001 1001 1001    3    4    5    4]
[1001    5    2  100    3    5 1001 1001 1001    3    4    5    4]


## Array Operations and Functions

### Mathematical Operations

In [216]:
arr0 = np.zeros(10)
arr1 = np.ones(10)
arrA = np.arange(1, 11)
arrB = np.linspace(10, 100, 10)
arrC = np.array([6,3,6,8,5,5,6,8,4,2])

In [189]:
# Basic mathematical operations
arr1 * arrB, arr0 + arrA, arrC - arrB, arrB / arrA

(array([ 10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.]),
 array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 array([ -4., -17., -24., -32., -45., -55., -64., -72., -86., -98.]),
 array([10., 10., 10., 10., 10., 10., 10., 10., 10., 10.]))

In [192]:
# Combining several operations
arr1 * arrB + arrC / arrA

array([ 16.        ,  21.5       ,  32.        ,  42.        ,
        51.        ,  60.83333333,  70.85714286,  81.        ,
        90.44444444, 100.2       ])

In [303]:
arrC - np.flip(arrC)

array([ 4, -1, -2,  2,  0,  0, -2,  2,  1, -4])

### Broadcasting

Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes, but the dimenstions must be compatible.

In [193]:
arrA + 100

array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [197]:
arrB / 10

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [198]:
arrC * 5.12

array([30.72, 15.36, 30.72, 40.96, 25.6 , 25.6 , 30.72, 40.96, 20.48,
       10.24])

### Aggregation functions

The aggregate function performs some calculations to a NumPy array and returns a value based on that. Some of the methods here are: `min`, `max`, `sum`, `mean`, `std`.

In [201]:
arrC.sum()

53

In [202]:
arrB.min()

10.0

In [203]:
arrB.max()

100.0

In [206]:
(arrC * 15.25).mean()

80.825

In [209]:
(arrA * 13).std()

37.339657202497186

In [223]:
# Using an aggregation function with arrays with more than 1 dimensions
arrD = np.array([[0,2,3,6],
                 [7,4,3,6],
                 [2,3,6,9]])
arrD.mean(axis = 0)

array([3., 3., 4., 7.])

In [220]:
# Using an aggregation function with arrays with more than 1 dimensions
arrD.mean(axis = 1)

array([2.5, 5. , 5. ])

In [222]:
# Using an aggregation function with arrays with more than 1 dimensions
arrD[0].mean()

2.5

In [224]:
# Using an aggregation function with arrays with more than 1 dimensions
arrD[0].max()

6

## Working with Mathmatical Formula: Mean Square Error

[Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE) measures the average of the squares of the errors, that is, the average squared difference between the estimated values and the actual value. The formula for calculating Mean Squared Error is

$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2 $$

This can be rewriten in a function that uses NumPy arrays.

In [364]:
# Define MSE function
def MSE(prediction, actual):
    """Calculates the Mean Squared Error given the actual and predicted values"""
    return (1/2) * np.sum(np.square(prediction - actual))

In [354]:
# Create dummy actual response data
actual = np.int64(np.random.rand(10**5) * 100)
actual[11:40]

array([87,  4,  9, 29, 92, 24,  4, 31, 29, 71, 78, 80, 82,  5,  1, 72, 50,
       56, 96, 37, 14, 12, 81, 34, 51, 19, 75,  4, 34], dtype=int64)

In [355]:
# Create randomly distributed noise data
noise = np.int64(np.random.randn(10**5) * 1.5)
noise[11:40]

array([ 0,  0, -2,  0,  0,  0,  0, -1, -2,  0,  0,  0,  0,  0,  0,  2,  1,
       -2, -1,  4,  0, -1,  2, -1,  0,  1,  0,  1,  2], dtype=int64)

In [356]:
# Create dummy prediction data by adding noise to the actual responses
prediction = actual + noise 
prediction[11:40]

array([87,  4,  7, 29, 92, 24,  4, 30, 27, 71, 78, 80, 82,  5,  1, 74, 51,
       54, 95, 41, 14, 11, 83, 33, 51, 20, 75,  5, 36], dtype=int64)

In [344]:
# Calculate MSE
MSE(prediction, actual)

66349.5

In [362]:
# Create dummy prediction with more annoying noise
another_noise = np.int64(np.random.randn(10**5) * 5.5) 
another_prediction = actual + another_noise

another_noise[11:40]

array([  1,  -1, -10,   1, -10,  -5,  -2,  -2,   1,   7,  -3,   5,  -2,
        -1,  -9,   1,  -9,  10, -10,  -7,  -2,   1,   2,   0,   7,  10,
         1,   1,   0], dtype=int64)

In [363]:
# Calculate new MSE
MSE(another_prediction, actual) # it must have larger value

1329248.0

## More Things to Do

* [Universal Functions](https://numpy.org/doc/stable/user/basics.ufuncs.html) Official Documentation
* Additional [exercises](https://aaltoscicomp.github.io/python-for-scicomp/numpy/) by Aalto Scientific Computing
* [Real Python Tutorial](https://realpython.com/numpy-tutorial/) by Ryan Palo