***

**PCA - Test Notebook**

The following notebook serves as a template and testing code for the implementation of the Principal Component Analysis (PCA).

***

- **Required Libraries:**

In [1]:
import cv2
import numpy as np

In [2]:
# Load an image for testing purposes.
test_img = cv2.imread("../data/fire_dataset/fire_images/fire.1.png")
test_img.shape

(460, 860, 3)

In [None]:
# Visualize the image.
cv2.imshow("Test Image", test_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

In [4]:
# Get the channels of the image.
b, g, r = cv2.split(test_img)

# Check the dimensions of the channels.
print("- Red Channel Dimensions:", r.shape)
print("- Green Channel Dimensions:", g.shape)
print("- Blue Channel Dimensions:", b.shape)

- Red Channel Dimensions: (460, 860)
- Green Channel Dimensions: (460, 860)
- Blue Channel Dimensions: (460, 860)


In [5]:
# Normalize each channel so that each value is in the scale of [0, 1].
# Each value (pixel) can have a maximum value of 255.
r_norm = r / 255
g_norm = g / 255
b_norm = b / 255

In [6]:
# "Flatten" the channels.
r_norm = r_norm.reshape([-1])
g_norm = g_norm.reshape([-1])
b_norm = b_norm.reshape([-1])

# Stack the individual arrays to a single array.
flat_rgb = np.vstack([r_norm, g_norm, b_norm])
flat_rgb.shape

(3, 395600)

- To perform a Principal Component Analysis (PCA), we must first center the data.
- This is done by subtracting the mean $\mu$.

$$x' = x - \mu$$

- We consider each pixel as a sample, living in a 3-channel (3D) space.
- This means we have a $3 \times n$ array:
  - The rows are the dimensions.
  - The columns are the pixels.
- The mean is calculated *across columns*.
- We do not need to standardize because data is already in the same scale.

In [7]:
# Center the data.
x_center = flat_rgb - np.mean(flat_rgb, axis = 1, keepdims = True)
x_center.shape

(3, 395600)

- Compute the covariance matrix of the centered data as follows:

$$C = \dfrac{1}{N} X \: X^T$$

- We get a $D \times D$ matrix (in this case, $3 \times 3$).

In [8]:
cov = np.cov(x_center)
cov.shape

(3, 3)

- Now, we must perform the eigendecomposition of the covariance matrix.
- We get $3$ eigenvalues associated with the $3$ corresponding eigenvectors.

In [9]:
eigenvalues, eigenvectors = np.linalg.eig(cov)

- NumPy does not return the eigenvalues and eigenvectors in sorted order.
- We must sort them manually.

In [10]:
# Sort the eigenvalues and eigenvectors in order of increasing variance kept.
# The "-" is needed for ascending order.
idx = np.argsort(-eigenvalues)
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:,idx]

In [11]:
# "Flatten" the eigenvectors for the model.
eigenvectors = eigenvectors.reshape([-1])
eigenvectors.shape

(9,)

***

- The theory is that each eigenvector is a *summary* of each channel (one eigenvector per channel).
- Each eigenvector captures the variation in each channel.
- It should be an accurate representation of the relevant parts of the channel.
- The model should be able to differentiate between a *fire* and a *non-fire* image using the eigenvectors.
- This also means we do not need the pixels anymore. We can give the eigenvectors directly to the model.