Sumeet Shinde T116

Aim : Principal Component Analysis (PCA):
1. Perform PCA on a dataset to reduce dimensionality.
2. Evaluate the explained variance and select the appropriate number of principal
components.
3. Visualize the data in the reduced-dimensional space.

About The Dataset : social_ads.csv

The social_ads.csv dataset contains information about users’ demographics and their purchasing behavior based on online advertisements. It is commonly used for classification and marketing prediction tasks.

1. Data Includes:

The dataset consists of 3 columns:

Age – numerical feature representing the user’s age

EstimatedSalary – numerical feature showing the user's annual income

Purchased – target variable indicating whether the user bought the product\
0 = Not Purchased\
1 = Purchased

2. Purpose of the Dataset:
This dataset is used to analyze:\
how age and salary influence purchasing decisions\
customer segmentation\
marketing strategy effectiveness\
prediction of buying behavior

3. Why this dataset is suitable for PCA, Kernel PCA, and LDA
Contains numeric features → supports PCA\
Shows non-linear patterns → suitable for Kernel PCA\
Has a clear class label (Purchased) → required for LDA

### Reducing Features Using Principal Components

In [1]:
# Load libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the data
df = pd.read_csv("social_ads.csv")

# Select numeric features
X = df[["Age", "EstimatedSalary"]]

# Standardize the feature matrix
X_scaled = StandardScaler().fit_transform(X)

# Create PCA to retain 99% variance
pca = PCA(n_components=0.99, whiten=True)

# Conduct PCA
X_pca = pca.fit_transform(X_scaled)

# Show results
print("PCA:")
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_pca.shape[1])


PCA:
Original number of features: 2
Reduced number of features: 2


### Reducing Features When Data Is Linearly Inseparable

In [2]:
# Load libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import KernelPCA

# Load the data
df = pd.read_csv("social_ads.csv")

# Select numeric features
X = df[["Age", "EstimatedSalary"]]

# Standardize features
X_scaled = StandardScaler().fit_transform(X)

# Apply Kernel PCA
kpca = KernelPCA(kernel="rbf", gamma=15, n_components=1)
X_kpca = kpca.fit_transform(X_scaled)

# Show results
print("Kernel PCA:")
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_kpca.shape[1])


Kernel PCA:
Original number of features: 2
Reduced number of features: 1


### Reducing Features by Maximizing Class Separability

In [3]:
# Load libraries
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Load the data
df = pd.read_csv("social_ads.csv")

# Select features and target
X = df[["Age", "EstimatedSalary"]]
y = df["Purchased"]

# Create LDA to reduce to 1 feature
lda = LinearDiscriminantAnalysis(n_components=1)

# Conduct LDA
X_lda = lda.fit(X, y).transform(X)

# Show results
print("LDA:")
print("Original number of features:", X.shape[1])
print("Reduced number of features:", X_lda.shape[1])
print("Explained variance ratio:", lda.explained_variance_ratio_)


LDA:
Original number of features: 2
Reduced number of features: 1
Explained variance ratio: [1.]
