# **Problem Statement**  
## **11. Implement PCA (Principal Component Analysis) using NumPy.**

Implement Principal Component Analysis (PCA) from scratch using NumPy to reduce the dimensionality of a dataset while preserving maximum variance.

### Constraints & Example Inputs/Outputs

### Constraints
- Use NumPy only (no sklearn PCA)
- Input data must be numeric
- Dataset shape: (n_samples, n_features)
- Number of components k ≤ n_features

### Example Input:
```python
X = [[2.5, 2.4],
     [0.5, 0.7],
     [2.2, 2.9],
     [1.9, 2.2]]
k = 1
```

### Expected Output:
- Reduced dataset with shape (n_samples, k)
- Principal components (eigenvectors)
- Explained variance

### Solution Approach

### What is PCA?
PCA is an unsupervised dimensionality reduction technique that:
- Finds directions (principal components)
- Maximizes variance
- Projects data onto lower dimensions

### Mathematical Steps
- Standardize the data
- Compute covariance matrix
- Compute eigenvalues & eigenvectors
- Sort by descending eigenvalues
- Select top k eigenvectors
- Project data onto new space

### Solution Code

In [1]:
# Approach1: Brute Force Solution PCA
import numpy as np

def pca_bruteforce(X, k):
    # Step 1: Mean centering
    mean = np.mean(X, axis=0)
    X_centered = X - mean

    # Step 2: Covariance matrix
    covariance_matrix = np.cov(X_centered.T)

    # Step 3: Eigen decomposition
    eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

    # Step 4: Sort eigenvalues and eigenvectors
    sorted_indices = np.argsort(eigenvalues)[::-1]
    eigenvalues = eigenvalues[sorted_indices]
    eigenvectors = eigenvectors[:, sorted_indices]

    # Step 5: Select top k eigenvectors
    principal_components = eigenvectors[:, :k]

    # Step 6: Project data
    X_reduced = np.dot(X_centered, principal_components)

    return X_reduced, eigenvalues[:k], principal_components


### Alternative Solution

In [3]:
# Approach2: Optimized PCA
#SVD-based PCA is preferred in real systems

def pca_optimized(X, k):
    # Mean centering
    X_centered = X - np.mean(X, axis=0)

    # Singular Value Decomposition
    U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)

    # Select top k components
    components = Vt[:k].T

    # Project data
    X_reduced = np.dot(X_centered, components)

    # Explained variance
    explained_variance = (S ** 2) / (len(X) - 1)

    return X_reduced, explained_variance[:k], components


### Alternative Approaches

### Brute Force Alternatives
- Manual covariance + eigen decomposition
- Gram matrix based PCA

### Optimized Alternatives
- SVD-based PCA (used in sklearn)
- Incremental PCA (large datasets)
- Kernel PCA (non-linear)

### Test Case

In [4]:
# Test Case1: Sample 2D Dataset

X = np.array([
    [2.5, 2.4],
    [0.5, 0.7],
    [2.2, 2.9],
    [1.9, 2.2],
    [3.1, 3.0],
    [2.3, 2.7]
])

X_reduced, eigenvalues, components = pca_bruteforce(X, k=1)

print("Reduced Data:\n", X_reduced)
print("Eigenvalues:\n", eigenvalues)
print("Principal Components:\n", components)


Reduced Data:
 [[ 0.35725229]
 [-2.26208195]
 [ 0.48967143]
 [-0.21285398]
 [ 1.2056734 ]
 [ 0.42233881]]
Eigenvalues:
 [1.43234945]
Principal Components:
 [[0.71824807]
 [0.69578712]]


In [5]:
# Test Case 2: Optimized PCA

X_reduced_opt, variance, components_opt = pca_optimized(X, k=1)

print("Reduced Data:\n", X_reduced_opt)
print("Explained Variance:\n", variance)
print("Principal Components:\n", components_opt)


Reduced Data:
 [[-0.35725229]
 [ 2.26208195]
 [-0.48967143]
 [ 0.21285398]
 [-1.2056734 ]
 [-0.42233881]]
Explained Variance:
 [1.43234945]
Principal Components:
 [[-0.71824807]
 [-0.69578712]]


In [6]:
# Test Case 3: Higher Dimensional Data

np.random.seed(42)
X_high_dim = np.random.rand(100, 5)

X_reduced, _, _ = pca_optimized(X_high_dim, k=2)

print("Original shape:", X_high_dim.shape)
print("Reduced shape:", X_reduced.shape)


Original shape: (100, 5)
Reduced shape: (100, 2)


# Expected Outputs

✔ Reduced dimensionality

✔ Maximum variance preserved

✔ Eigenvalues sorted in descending order

✔ Projection aligns with sklearn PCA

✔ Stable numerical results using SVD


## Complexity Analysis

### Brute Force PCA
- Time: O(n_features³)
- Space: O(n_features²)

### Optimized (SVD) PCA
- Time: O(min(n_samples, n_features)³)
- Space: O(n_samples × n_features)

#### Thank You!!