# Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a **dimensionality reduction technique** used in data analysis and machine learning to simplify datasets while retaining as much variability (information) as possible. It is particularly useful when dealing with high-dimensional data, where visualisation and computation can become challenging.

---

## Key Concepts of PCA

1. **Dimensionality Reduction**:
   PCA transforms a dataset with $n$ features (dimensions) into a smaller number $k$ of dimensions, where $k < n$. These new dimensions are called **principal components**.

2. **Variance Maximisation**:
   PCA finds directions (principal components) in the data that maximize the variance. The first principal component (PC1) captures the most variance, the second captures the next highest variance (orthogonal to PC1), and so on.

3. **Uncorrelated Features**:
   The principal components are **uncorrelated** (orthogonal to one another). This is particularly useful for avoiding redundancy in data.

4. **Linear Transformation**:
   PCA is a linear transformation method—it projects data into a new coordinate system defined by the principal components.

---

## Steps of PCA

1. **Standardise the Data**:
   PCA is sensitive to the scale of features. Standardise the data so that each feature has a mean of 0 and a standard deviation of 1.

2. **Compute the Covariance Matrix**:
   The covariance matrix captures the relationships between the features. It shows how much two features vary together.
   
   $$
   \text{Covariance Matrix} = \frac{1}{n} X^T X
   $$

3. **Compute the Eigenvalues and Eigenvectors**:
   The eigenvalues and eigenvectors of the covariance matrix determine the principal components. 
   - **Eigenvectors** represent the directions of the principal components.
   - **Eigenvalues** represent the amount of variance explained by each principal component.

4. **Sort and Select Principal Components**:
   Rank the eigenvalues in descending order and choose the top $k$ components (corresponding to the top $k$ eigenvectors).

5. **Project the Data**:
   Transform the original data into the new space using the selected eigenvectors. This gives the reduced data:
   $$
   Z = X W
   $$
   where $$X$$ is the original data, and $W$ contains the selected eigenvectors as columns.

---

## Example
Suppose you have a dataset with two features, $X_1$ and $X_2$:
- PCA will find the directions (principal components) that best describe the spread of the data.
- The first principal component might be a diagonal line if the data forms an elongated cluster.
- The second principal component will be orthogonal to the first, capturing less variance.

---

## Applications of PCA

- **Data Visualisation**: Reducing high-dimensional data to 2D or 3D for plotting.
- **Noise Reduction**: Removing less significant components can denoise the data.
- **Feature Reduction**: Simplify models by working with fewer features.
- **Compression**: Reduce data size without significant loss of information.

---

## Limitations

1. **Linear Assumptions**: PCA works best when relationships between features are linear.
2. **Interpretability**: The principal components are combinations of original features, making them harder to interpret.
3. **Loss of Information**: Some variance is inevitably lost when reducing dimensions.

---

In summary, PCA is a powerful tool for simplifying datasets and uncovering the key patterns in high-dimensional data.

