# Advanced Numpy

In [2]:
import numpy as np

## 1. Universal functions

- Python is a dynamically-typed language, which means the data type is not explicitly declared when creating a variable.

```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

```python
# Python code
result = 0
for i in range(100):
    result += i
```

- With this mechanism, we can assign any type of data into a variable in Python. Therefore, a Python object contains more information than just the label to the position in memory.

![cint_vs_pyint](img/cint_vs_pyint.png)

- Python list can contains elements of different types, so each element must hold its own data type and other information, so they are different objects and are stored in non-contiguous block of memory.

![array_vs_list](img/array_vs_list.png)

- **Vectorized operation** is performing an operation on the array, which will then be applied to each element.

### What are universal functions?

- **Universal Functions (ufuncs)** in NumPy are functions that operate element-wise on arrays, meaning they perform an operation on each element of an array, producing an output array with the same shape. Ufuncs are designed for efficient and vectorized operations, which are much faster than using Python loops to perform the same computations.
- **unary ufuncs** operate on a single input, and **binary ufuncs** operate on two inputs.

In [3]:
# Binary ufuncs

x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)    # np.add(x, 5)
print("x - 5 =", x - 5)    # np.subtract(x, 5)
print("x * 2 =", x * 2)    # np.multiply(x, 2)
print("x / 2 =", x / 2)    # np.divide(x, 2)
print("x // 2 =", x // 2)  # np.floor_divide(x, 2)

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


In [4]:
# Unary ufuncs

print("-x     = ", -x)     # np.negative(x)
print("x ** 2 = ", x ** 2) # np.power(x)
print("x % 2  = ", x % 2)  # np.mod(x)

-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In [6]:
# Absolute
x = np.array((-1,-2,3))
print(np.absolute(x))
print(np.abs(x))

[1 2 3]
[1 2 3]


In [7]:
# Trigonometric functions

theta = np.linspace(0, np.pi, 3)

print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

x = [-1, 0, 1]
print("x         = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

theta      =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]
x         =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]


In [8]:
# Exponential
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]


In [9]:
# Logarithm

x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


In [11]:
a = np.array((1,2,3))
b = np.array((2,3,4))
a + b

array([3, 5, 7])

## 2. Broadcasting

**Broadcasting** is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes.

In [12]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

In [13]:
print(np.arange(3) + 5)
print(np.ones((3,3)) + np.arange(3))
print(np.arange(3).reshape((3,1)) + np.arange(3))

[5 6 7]
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]
[[0 1 2]
 [1 2 3]
 [2 3 4]]


![broadcasting](img/02.05-broadcasting.png)

Rules of broadcasting

1. When comparing two arrays with different numbers of dimensions, the array with fewer dimensions is adjusted by adding ones to its left side.

2. If the shapes of the two arrays do not align in any dimension, the array having a shape of 1 in that dimension is expanded to match the shape of the other array.

3. If there is a mismatch in the sizes of dimensions, and neither size is equal to 1, an error is triggered.

Example 1:

In [14]:
matrix = np.ones((2, 3))
array = np.arange(3)

print(matrix.shape)
print(array.shape)

(2, 3)
(3,)


matrix.shape -> (2, 3) <br>
array.shape -> (1, 3)

matrix.shape -> (2, 3) <br>
array.shape -> (2, 3)

In [15]:
print(matrix, end="\n\n")
print(array,end="\n\n")
print(matrix + array)

[[1. 1. 1.]
 [1. 1. 1.]]

[0 1 2]

[[1. 2. 3.]
 [1. 2. 3.]]


Example 2:

In [16]:
matrix = np.arange(3).reshape((3, 1))
array = np.arange(3)

print(matrix.shape)
print(array.shape)

(3, 1)
(3,)


matrix.shape -> (3, 1) <br>
array.shape -> (1, 3)

matrix.shape -> (3, 3) <br>
array.shape -> (3, 3)

In [17]:
print(matrix, end="\n\n")
print(array,end="\n\n")
print(matrix + array)

[[0]
 [1]
 [2]]

[0 1 2]

[[0 1 2]
 [1 2 3]
 [2 3 4]]


Example 3:

In [18]:
matrix = np.ones((3, 2))
array = np.arange(3)

print(matrix.shape)
print(array.shape)

(3, 2)
(3,)


matrix.shape -> (3, 2) <br>
array.shape -> (1, 3)

matrix.shape -> (3, 2) <br>
array.shape -> (3, 3)

In [19]:
print(matrix, end="\n\n")
print(array,end="\n\n")
print(matrix + array)

[[1. 1.]
 [1. 1.]
 [1. 1.]]

[0 1 2]



ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

In [20]:
# solution: reshape array
matrix = np.ones((3, 2))
array = np.arange(3)
array = array[:, np.newaxis]

print(matrix.shape)
print(array.shape)

matrix + array

(3, 2)
(3, 1)


array([[1., 1.],
       [2., 2.],
       [3., 3.]])

**EXERCISE**

Using Numpy to write these following metrics

1. Mean Square Error (MSE)

![mse](img/mse.png)

In [43]:
def mse(target, prediction):
    return ((target - prediction)**2).sum() / target.shape[0]

target = np.random.randint(low=0, high=2, size=(3,))
prediction = np.random.random(size=target.shape)
print(f"Target:\n{target}")
print(f"Prediction:\n{prediction}")
print(f"MSE: {mse(target, prediction)}")

Target:
[0 1 0]
Prediction:
[0.64052957 0.28886529 0.15752402]
MSE: 0.31360150752545196


2. Root Mean Square Error (RMSE)

![rmse](img/rmse.png)

In [22]:
def rmse(target, prediction):
    # YOUR CODE
    return np.sqrt(((target - prediction)**2).mean())

target = np.random.randint(low=0, high=2, size=(3,))
prediction = np.random.random(size=target.shape)
print(f"Target:\n{target}")
print(f"Prediction:\n{prediction}")
print(f"RMSE: {rmse(target, prediction)}")

Target:
[1 1 0]
Prediction:
[0.49566378 0.75199102 0.6079301 ]
RMSE: 0.47799668448589133


3. Mean Absolute Error (MAE)

![mae](img/mae.png)

In [44]:
def mae(target, prediction):
    # YOUR CODE
    return (np.abs(target - prediction)).mean()

target = np.random.randint(low=0, high=2, size=(3,))
prediction = np.random.random(size=target.shape)
print(f"Target:\n{target}")
print(f"Prediction:\n{prediction}")
print(f"MAE: {mae(target, prediction)}")

Target:
[1 1 1]
Prediction:
[0.92080389 0.82102796 0.49565699]
MAE: 0.2541703842865706


4. Update the `MSE()` function so that it can return mse along a certain axis.

In [46]:
def mse(target, prediction, axis=None):
    return ((target - prediction)**2).mean(axis=axis)

target = np.random.randint(low=0, high=2, size=(3,5))
prediction = np.random.random(size=target.shape)
print(f"Target:\n{target}")
print(f"Prediction:\n{prediction}")

mse(target, prediction, axis=1)

Target:
[[1 1 0 0 0]
 [0 0 1 0 1]
 [1 1 0 1 1]]
Prediction:
[[0.72619128 0.38969538 0.95980508 0.61971765 0.3257616 ]
 [0.99119271 0.00290491 0.67212439 0.81054461 0.45272384]
 [0.36864076 0.95301339 0.6847201  0.43645813 0.5585598 ]]


array([0.37176786, 0.40929352, 0.27642255])

## 3. Boolean Masking

### 3.1. Boolean Array

NumPy also implements comparison operators such as `<` (less than) and `>` (greater than) as element-wise ufuncs.

In [47]:
my_array = np.array((1,2,3,4,5,6))
print(my_array < 3)
print(np.less(my_array, 3))

[ True  True False False False False]
[ True  True False False False False]


| Operator | Equivalent ufunc |
| -------- | -----------------|
| ==       | np.equal         |
| <        | np.less          |
| >        | np.greater       |
| !=       | np.not_equal     |
| <=       | np.less_equal    |
| >=       | np.greater_equal |


#### Practical uses of Boolean Array

- Count elements based on a condition

In [48]:
# 1D array
my_array = np.arange(10)
print(f"my_array: {my_array}")

n_less_than_4 = (my_array < 4)
print(f"Boolean array of elements less than 4: {n_less_than_4}")
print(f"How many elements less than 4? {n_less_than_4.sum()}") # False is interpreted as 0, and True is interpreted as 1

sum_less_than_4 = (my_array < 4)

my_array: [0 1 2 3 4 5 6 7 8 9]
Boolean array of elements less than 4: [ True  True  True  True False False False False False False]
How many elements less than 4? 4


In [49]:
# 2D array
my_array = np.arange(12).reshape((3,4))
print(f"my_array:\n{my_array}")

bool_even = (my_array % 2 ==0)
print(f"Boolean array even elements:\n{bool_even}")

n_even = bool_even.sum(axis=1)
print(f"How many even elements on each rows: {n_even}")

my_array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Boolean array even elements:
[[ True False  True False]
 [ True False  True False]
 [ True False  True False]]
How many even elements on each rows: [2 2 2]


- Check all or any of the elements in an array match a condition with `np.all()` and `np.any()`

In [51]:
my_array = np.arange(10)
print(f"my_array:\n{my_array}")

print("All elements are non-negative?", np.all(my_array >= 0))
print("Any of elements are larger than 9?", np.any(my_array > 8))

my_array:
[0 1 2 3 4 5 6 7 8 9]
All elements are non-negative? True
Any of elements are larger than 9? True


### 3.2. Boolean operators

You can use bitwise logic operators to combine conditions.

| Operator | Equivalent ufunc |
| -------- | -----------------|
| &        | np.bitwise_and   |
| \|        | np.bitwise_or    |
| ^        | np.bitwise_xor   |
| ~        | np.bitwise_not   |

In [52]:
my_array = np.arange(20)
(my_array > 15) & (my_array %3 == 0)

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
        True, False])

In [53]:
my_array = np.arange(20)
(my_array %2 == 0) | (my_array %3 == 0)

array([ True, False,  True,  True,  True, False,  True, False,  True,
        True,  True, False,  True, False,  True,  True,  True, False,
        True, False])

#### Using keywords (`and`, `or`, `not`) vs operators (`&`, `|`, `~`)

**Keywords** determine the overall truth or falsity of an entire object, whereas **operators** operate on individual bits within each object.

In [56]:
my_array = np.arange(10)
(my_array > 4) & (my_array < 8)
#(my_array > 4) and (my_array < 8) # ValueError

array([False, False, False, False, False,  True,  True,  True, False,
       False])

### 3.3 Boolean Masking

Boolean masking in NumPy is a powerful technique for filtering and manipulating arrays based on a set of boolean (True/False) conditions. It allows you to extract, modify, or operate on elements of an array that satisfy specific criteria defined by boolean conditions. You can use a boolean mask to filter elements from an array or perform operations on elements that satisfy the conditions. 

In [58]:
my_array = np.arange(10)
#mask = my_array > 5
#filtered = my_array[mask]
filtered = my_array[my_array > 5]

print(f"my_array\n{my_array}")
# print(f"mask\n{mask}")
print(f"filterd\n{filtered}")

my_array
[0 1 2 3 4 5 6 7 8 9]
filterd
[6 7 8 9]


In [None]:
# Inplace modification
img = (np.random.random(size=(8, 8)) * 255).astype(np.uint8)
print(f"original img\n{img}")

img[img < 100] = 255
print(f"new img\n{img}")

## 4. Fancy Indexing

- With 1D arrays

In [59]:
my_array = np.arange(1, 12)
my_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [60]:
i = np.array([0,2,3,5])
my_array[i]

array([1, 3, 4, 6])

In [61]:
# The shape of the output is the shape of the indices, not the original array
i = np.array([0,2,3,5]).reshape((2,2))
my_array[i]

array([[1, 3],
       [4, 6]])

- With 2D arrays

In [62]:
my_array = np.arange(1,13).reshape(3,4)
my_array

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [63]:
row = np.array([0,2])
col = np.array([1,2])
my_array[row, col]

array([ 2, 11])

In [64]:
row = np.array([0,2])
col = np.array([1,2])
my_array[row[:, np.newaxis], col]

array([[ 2,  3],
       [10, 11]])

**EXERCISE**

1. Create binary image as a Numpy array with values as in `img/smile1.png`

In [71]:
# YOUR CODE
arr = np.zeros(shape=(6,6))
arr[1,1] = 1
arr

array([[0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

2. Reverse the colors of the image (black -> white, white -> black).
    - Create a mask from a copy of the image
    - Using the mask, modify 0 values of image to 1 and 1 values to 0.

In [72]:
# YOUR CODE
mask = arr.copy()
arr[mask == 0] = 1
arr[mask == 1] = 0

arr

array([[1., 1., 1., 1., 1., 1.],
       [1., 0., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

# REFENCES

1. [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html)