# Principal Component Analysis (PCA)

## Problem Type
**Principal Component Analysis (PCA)** is primarily used for:
- **Dimensionality Reduction** in both:
  - **Supervised** learning (as a preprocessing step)
  - **Unsupervised** learning (for exploratory data analysis)
- **Feature Extraction** and **Visualization**

### How PCA Works
- **Linear transformation:**
  - Transforms the original correlated features into a smaller set of uncorrelated features called principal components.
- **Maximizes variance:**
  - Each principal component captures the maximum variance from the original dataset.
  - The first principal component captures the highest variance, the second captures the next highest, and so on.
- **Orthogonal components:**
  - Principal components are orthogonal to each other, ensuring no redundancy in the new feature set.
- **Eigenvectors and Eigenvalues:**
  - PCA identifies the eigenvectors (directions) and eigenvalues (magnitude) of the covariance matrix of the data.
  - Eigenvectors form the principal components, and eigenvalues determine the amount of variance captured by each component.
- **Dimensionality reduction:**
  - By selecting the top `k` principal components, PCA reduces the dimensionality of the dataset while retaining most of the variance.

### Key Tuning Metrics
- **`n_components`:**
  - **Description:** Number of principal components to keep.
  - **Impact:** Reducing `n_components` reduces the dataset’s dimensionality but may lose some variance.
  - **Default:** `None` (all components are kept).
- **`svd_solver`:**
  - **Description:** Algorithm used to compute the principal components (`auto`, `full`, `arpack`, `randomized`).
  - **Impact:** Determines the computation method for PCA. `randomized` is efficient for large datasets.
  - **Default:** `auto`.
- **`whiten`:**
  - **Description:** If `True`, the components are scaled by their respective eigenvalues.
  - **Impact:** Removes the unit variance from each principal component, which is useful for some downstream tasks.
  - **Default:** `False`.
- **`tol`:**
  - **Description:** Tolerance for stopping criterion.
  - **Impact:** Controls the precision of the solver, particularly for iterative solvers like `arpack`.
  - **Default:** `0.0`.

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Reduces dimensionality, simplifying models            | May lose interpretability by transforming features into principal components |
| Captures the most variance in the data                | Assumes linearity, which might not capture complex relationships |
| Helps mitigate multicollinearity in features          | Sensitive to the scaling of features, requiring normalization |
| Improves computational efficiency for large datasets  | May discard smaller, but potentially important, variance when reducing dimensions |
| Can be used as a preprocessing step before modeling   | Not ideal for non-Gaussian distributions or non-linear relationships |

### Evaluation Metrics
- **Explained Variance Ratio:**
  - **Description:** Proportion of variance captured by each principal component.
  - **Good Value:** Values close to 1 for the first few components indicate effective dimensionality reduction.
  - **Bad Value:** Low values across all components suggest ineffective capture of variance.
- **Cumulative Explained Variance:**
  - **Description:** Cumulative sum of the explained variance ratios.
  - **Good Value:** Typically, 95% cumulative explained variance is considered sufficient.
  - **Bad Value:** If a large number of components are needed to reach 95%, PCA might not be reducing dimensionality effectively.
- **Reconstruction Error:**
  - **Description:** Difference between the original data and the data reconstructed from the selected principal components.
  - **Good Value:** Lower values indicate better retention of original data information.
  - **Bad Value:** High values suggest significant information loss due to dimensionality reduction.
- **Scree Plot:**
  - **Description:** Visual representation of eigenvalues, showing the amount of variance explained by each component.
  - **Good Value:** A sharp drop followed by a plateau suggests that only the first few components are needed.
  - **Bad Value:** A gradual decline suggests that more components are required to capture variance.



In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


In [None]:
# Apply PCA
pca = PCA(n_components=4, svd_solver="auto", whiten=False, tol=0)
X_pca = pca.fit_transform(X_scaled)

# Explained Variance Ratio
explained_variance_ratio = pca.explained_variance_ratio_
cumulative_explained_variance = np.cumsum(explained_variance_ratio)

# Print explained variance ratios
print("Explained Variance Ratio:", explained_variance_ratio)
print("Cumulative Explained Variance:", cumulative_explained_variance)

In [None]:
# Plot Scree Plot
plt.figure(figsize=(10, 6))
plt.plot(
    np.arange(1, len(explained_variance_ratio) + 1),
    explained_variance_ratio,
    "o-",
    label="Explained Variance Ratio",
)
plt.plot(
    np.arange(1, len(cumulative_explained_variance) + 1),
    cumulative_explained_variance,
    "o-",
    label="Cumulative Explained Variance",
)
plt.axhline(y=0.95, color="r", linestyle="--", label="95% Threshold")
plt.title("Scree Plot")
plt.xlabel("Principal Components")
plt.ylabel("Variance Explained")
plt.legend()
plt.grid(True)
plt.show()