# Applying PCA to a Dataset Using scikit-learn

### Objective
Gain practical experience in applying PCA using Python.

### Content
- Introduction to the scikit-learn library for PCA
- Loading a dataset (e.g., Iris dataset)
- Performing PCA with scikit-learn
- Visualizing the principal components using matplotlib or seaborn

### Instructions
To get the maximum value from the exercise, start with the empty notebook, and try to solve the exercises by relying only on what you've learned. Then, if you're really stuck, go to the filled notebook to see proposed solutions.


## Step 1: Import the necessary libraries
We will use `numpy`, `pandas`, `scikit-learn`, `matplotlib`, and `seaborn` for this activity.


In [None]:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")


## Step 2: Load the dataset
We will use the Iris dataset, which is a classic dataset in machine learning.


In [None]:
from sklearn.datasets import load_iris

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
df.head()


## Step 3: Standardize the data
PCA is affected by the scale of the data, so we need to standardize it.


In [None]:
from sklearn.preprocessing import StandardScaler

features = iris.feature_names
x = df.loc[:, features].values
x = StandardScaler().fit_transform(x)
df_standardized = pd.DataFrame(x, columns=features)
df_standardized['target'] = df['target']
df_standardized.head()


## Step 4: Perform PCA
We will perform PCA to reduce the dimensionality of the dataset to 2 components.


In [None]:
pca = PCA(n_components=2)
principal_components = pca.fit_transform(x)
df_pca = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
df_pca['target'] = df['target']
df_pca.head()


## Step 5: Visualize the principal components
We will use a scatter plot to visualize the principal components.


In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='PC1', y='PC2', hue='target', data=df_pca, palette='Set1')
plt.title('PCA of Iris Dataset')
plt.show()
