<a href="https://colab.research.google.com/github/SzymonNowakowski/Machine-Learning-2024/blob/master/Lab01_PCA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab1 - PCA
### Author: Szymon Nowakowski

In this lab, we will perform *exploratory data analysis* (EDA) with a focus on understanding complex data through Principal Component Analysis (PCA).

# Motivation

Let's assume $X$ is an $n \times k$ **data** matrix. Let's further assume it is **centered**:

$$
X =
\begin{bmatrix}
x_{11} & x_{12} & \dots & x_{1k} \\
x_{21} & x_{22} & \dots & x_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n1} & x_{n2} & \dots & x_{nk}
\end{bmatrix}
$$

## Inner Product as Projection Length

When we take the inner product of a vector $x$ with normalized $w$, we’re effectively measuring its "shadow" or "projection" onto
$w$. The inner product is a scalar value. This scalar represents the length of
$x$ along the direction of
$w$—essentially, how far
$x$ extends in the
$w$ direction. So, the inner product
$x \cdot w$ gives the projection length.

![inner product is a projection length](https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/inner_product_as_projection_length.png?raw=1)

The product
$X w$ generalizes this concept, projecting all rows (all data points) of
$X$ onto the line defined by
$w$.





To calculate the sample covariance matrix $C$ from this centered matrix $X$, we use the formula:

$$
C = \frac{1}{n - 1} X^T X
$$

$C$ is the $ k \times k $ sample covariance matrix. Since $C$ is symmetric, it can be decomposed as $C = V \Lambda V^T$, where
- $V$ is an orthonormal $ k \times k $ matrix and
- $\Lambda$ is a diagonal $ k \times k $ matrix containing the eigenvalues of $C$.

Because $V$ is orthonormal, we have $V^T V = I_k$, where $I_k$ is the identity matrix.

Our goal is finding such a $w$, that the projection's $X w$ variance is maximized:

$$
\text{Var}(X w) \longrightarrow \max_{\|w\| = 1}
$$

Obviously we restrict $w$ to normalized vectors.

In [None]:

# Importing necessary libraries for data manipulation, visualization, and PCA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Setting visualization style
sns.set(style="whitegrid")
