# PCA Basic Example - MA2003B Multivariate Statistics Course

This notebook demonstrates the fundamental concepts of Principal Component Analysis (PCA) using a simple 3×2 dataset. PCA is a dimensionality reduction technique that identifies the principal components (directions of maximum variance) in the data.

## Learning Objectives:
- Understand how PCA transforms correlated variables into uncorrelated principal components
- Interpret eigenvalues and explained variance ratios
- See how PCA rotates the coordinate system to align with data variance

**Data**: Simple 3 observations × 2 variables matrix to illustrate core concepts

**Expected Output**:
- PC1 captures most variance (horizontal spread)
- PC2 captures remaining variance (vertical spread)
- Transformed data shows uncorrelated coordinates

In [1]:
# Import Required Libraries
import numpy as np
from sklearn.decomposition import PCA

In [2]:
# Create Sample Data
# This represents a simple bivariate dataset where points form a diagonal pattern
X = np.array(
    [
        [5, 3],  # Point with high values on both variables
        [3, 1],  # Point with moderate values
        [1, 3],  # Point showing the correlation pattern
    ]
)

In [3]:
# Display Original Data
print("Original Data Matrix:")
print("Observations (rows) × Variables (columns)")
print(X)

Original Data Matrix:
Observations (rows) × Variables (columns)
[[5 3]
 [3 1]
 [1 3]]


In [4]:
# Initialize PCA
# No component limit means extract all possible components
pca = PCA()

In [5]:
# Fit and Transform Data
# fit_transform() centers the data and rotates it to align with principal components
X_transformed = pca.fit_transform(X)

In [6]:
# Extract PCA Results
eigenvalues = pca.explained_variance_  # Amount of variance explained by each PC
eigenvectors = pca.components_.T  # Directions of principal components
variance_ratio = pca.explained_variance_ratio_  # Proportion of total variance explained

In [7]:
# Display Eigenvalues and Variance Ratios
print("PCA Results:")
print("=" * 50)
print(f"Eigenvalues: {eigenvalues}")
print(f"Explained variance ratio (PC1): {variance_ratio[0]:.3f}")
print(f"Explained variance ratio (PC2): {variance_ratio[1]:.3f}")

PCA Results:
Eigenvalues: [4.         1.33333333]
Explained variance ratio (PC1): 0.750
Explained variance ratio (PC2): 0.250


In [8]:
# Display Principal Component Directions
print("\nPrincipal Component Directions (Eigenvectors):")
print("These show how original variables combine to form PCs")
print(eigenvectors)


Principal Component Directions (Eigenvectors):
These show how original variables combine to form PCs
[[1. 0.]
 [0. 1.]]


In [9]:
# Display Transformed Data
print("\nTransformed Data (Principal Component Scores):")
print("Original data in the new coordinate system")
print("Columns are uncorrelated with zero mean")
print(X_transformed)


Transformed Data (Principal Component Scores):
Original data in the new coordinate system
Columns are uncorrelated with zero mean
[[ 2.          0.66666667]
 [-0.         -1.33333333]
 [-2.          0.66666667]]


## Interpretation

- **PC1** captures the main diagonal trend in the data (most variance)
- **PC2** captures the remaining perpendicular variation (less variance)
- The transformation **decorrelates the original variables** - they become uncorrelated in the new coordinate system
- PCA rotates the coordinate system to align with the directions of maximum variance