#### 1. What is K-Nearest Neighbors (KNN) and how does it work in both
classification and regression problems?


#### 2. What is the Curse of Dimensionality and how does it affect KNN
performance?


#### 3. What is Principal Component Analysis (PCA)? How is it different from
feature selection?


#### 4. What are eigenvalues and eigenvectors in PCA, and why are they
important?

#### 5. How do KNN and PCA complement each other when applied in a single
pipeline?

#### 6. Train a KNN Classifier on the Wine dataset with and without feature
scaling. Compare model accuracy in both cases.

In [1]:
# Import libraries
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# -------------------------
# 1. KNN without Scaling
# -------------------------
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred_no_scaling = knn.predict(X_test)
acc_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# -------------------------
# 2. KNN with Scaling
# -------------------------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = knn_scaled.predict(X_test_scaled)
acc_scaled = accuracy_score(y_test, y_pred_scaled)

# -------------------------
# Results
# -------------------------
print("KNN Accuracy without Scaling:", acc_no_scaling)
print("KNN Accuracy with Scaling   :", acc_scaled)


KNN Accuracy without Scaling: 0.7222222222222222
KNN Accuracy with Scaling   : 0.9444444444444444


#### 7. Train a PCA model on the Wine dataset and print the explained variance
ratio of each principal component.

In [2]:
# Import libraries
import numpy as np
from sklearn.datasets import load_wine
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Standardize features before PCA
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
pca.fit(X_scaled)

# Print explained variance ratio
explained_variance = pca.explained_variance_ratio_

print("Explained Variance Ratio of Each Principal Component:")
for i, ratio in enumerate(explained_variance, start=1):
    print(f"PC{i}: {ratio:.4f}")


Explained Variance Ratio of Each Principal Component:
PC1: 0.3620
PC2: 0.1921
PC3: 0.1112
PC4: 0.0707
PC5: 0.0656
PC6: 0.0494
PC7: 0.0424
PC8: 0.0268
PC9: 0.0222
PC10: 0.0193
PC11: 0.0174
PC12: 0.0130
PC13: 0.0080


#### 8. Train a KNN Classifier on the PCA-transformed dataset (retain top 2
components). Compare the accuracy with the original dataset.


In [3]:
# Import libraries
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# -------------------------
# 1. KNN on Original Scaled Data
# -------------------------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_original = KNeighborsClassifier(n_neighbors=5)
knn_original.fit(X_train_scaled, y_train)
y_pred_original = knn_original.predict(X_test_scaled)
acc_original = accuracy_score(y_test, y_pred_original)

# -------------------------
# 2. KNN on PCA-Transformed Data (Top 2 Components)
# -------------------------
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
acc_pca = accuracy_score(y_test, y_pred_pca)

# -------------------------
# Results
# -------------------------
print("KNN Accuracy on Original Scaled Data:", acc_original)
print("KNN Accuracy on PCA (2 components):  ", acc_pca)


KNN Accuracy on Original Scaled Data: 0.9444444444444444
KNN Accuracy on PCA (2 components):   0.9444444444444444


#### 9. Train a KNN Classifier with different distance metrics (euclidean,
manhattan) on the scaled Wine dataset and compare the results.

In [4]:
# Import libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# -------------------------
# 1. KNN with Euclidean Distance
# -------------------------
knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_euclidean.fit(X_train_scaled, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test_scaled)
acc_euclidean = accuracy_score(y_test, y_pred_euclidean)

# -------------------------
# 2. KNN with Manhattan Distance
# -------------------------
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
knn_manhattan.fit(X_train_scaled, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test_scaled)
acc_manhattan = accuracy_score(y_test, y_pred_manhattan)

# -------------------------
# Results
# -------------------------
print("KNN Accuracy (Euclidean):", acc_euclidean)
print("KNN Accuracy (Manhattan):", acc_manhattan)


KNN Accuracy (Euclidean): 0.9444444444444444
KNN Accuracy (Manhattan): 0.9814814814814815


 #### You are working with a high-dimensional gene expression dataset to
classify patients with different types of cancer.
Due to the large number of features and a small number of samples, traditional models
overfit.
Explain how you would:
● Use PCA to reduce dimensionality
● Decide how many components to keep
● Use KNN for classification post-dimensionality reduction
● Evaluate the model
● Justify this pipeline to your stakeholders as a robust solution for real-world
biomedical data