## 1. What is K-Nearest Neighbors (KNN) and how does it work?
KNN is a supervised learning algorithm that predicts outcomes based on the majority class (classification) or average value (regression) of the K nearest data points.

## 2. What is the Curse of Dimensionality?
It occurs when high-dimensional data makes distance measures less meaningful, reducing KNN effectiveness.

## 3. What is PCA and how is it different from feature selection?
PCA transforms original features into principal components, while feature selection chooses a subset of original features.

## 4. Eigenvalues and Eigenvectors in PCA
Eigenvectors indicate principal directions, while eigenvalues quantify variance captured along those directions.

## 5. How do KNN and PCA complement each other?
PCA reduces dimensionality and noise, improving KNN performance and efficiency.

## 6. KNN with and without Feature Scaling (Wine Dataset)

In [1]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
pred1 = knn.predict(X_test)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn.fit(X_train_scaled, y_train)
pred2 = knn.predict(X_test_scaled)

print("Accuracy without scaling:", accuracy_score(y_test, pred1))
print("Accuracy with scaling:", accuracy_score(y_test, pred2))


Accuracy without scaling: 0.7407407407407407
Accuracy with scaling: 0.9629629629629629


## 7. PCA Explained Variance Ratio

In [2]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)
X_scaled = StandardScaler().fit_transform(X)

pca = PCA()
pca.fit(X_scaled)

print(pca.explained_variance_ratio_)


[0.36198848 0.1920749  0.11123631 0.0706903  0.06563294 0.04935823
 0.04238679 0.02680749 0.02222153 0.01930019 0.01736836 0.01298233
 0.00795215]


## 8. KNN on PCA-transformed Data (Top 2 Components)

In [3]:
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)
X_scaled = StandardScaler().fit_transform(X)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.3, random_state=42)

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
pred = knn.predict(X_test)

print("Accuracy after PCA:", accuracy_score(y_test, pred))


Accuracy after PCA: 0.9814814814814815


## 9. KNN with Different Distance Metrics

In [4]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)
X_scaled = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='manhattan')

knn_euclidean.fit(X_train, y_train)
knn_manhattan.fit(X_train, y_train)

print("Euclidean Accuracy:", accuracy_score(y_test, knn_euclidean.predict(X_test)))
print("Manhattan Accuracy:", accuracy_score(y_test, knn_manhattan.predict(X_test)))


Euclidean Accuracy: 0.9629629629629629
Manhattan Accuracy: 0.9629629629629629
