# PCA

### What PCA is:
#### A method that reduces the number of features by creating new ones that capture the most important patterns in the data.

### Why it is used:
#### To make data simpler, remove noise, speed up models, and help visualize high-dimensional data.

In [1]:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import pandas as pd

# Load a sample dataset
data = load_iris()
X = data.data
y = data.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Run PCA and reduce to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Put results in a DataFrame
df = pd.DataFrame({
    "PC1": X_pca[:, 0],
    "PC2": X_pca[:, 1],
    "target": y
})

print(df.head())

# Print explained variance
print("Explained variance ratio:", pca.explained_variance_ratio_)


        PC1       PC2  target
0 -2.264703  0.480027       0
1 -2.080961 -0.674134       0
2 -2.364229 -0.341908       0
3 -2.299384 -0.597395       0
4 -2.389842  0.646835       0
Explained variance ratio: [0.72962445 0.22850762]
