#### KNN Performance & Hyperparameter Tuning on Large Dataset
To evaluate how K-Nearest Neighbors (KNN) performs on a large, high-dimensional dataset, and to understand how hyperparameter tuning (neighbors, distance metric, and weighting strategy) impacts model accuracy.

Generated a synthetic classification dataset with 50,000 samples and 20 features to simulate real-world complexity.

In [5]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

# Generate large synthetic dataset
X, y = make_classification(
    n_samples=50000,
    n_features=20,
    n_informative=5,
    n_redundant=15,
    random_state=42
)

Trained a baseline KNN classifier and evaluated its accuracy.

In [6]:

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [7]:

# Baseline KNN model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)
baseline_accuracy = accuracy_score(y_test, y_pred)
print(f"Baseline Accuracy: {baseline_accuracy}")

Baseline Accuracy: 0.9612


Applied GridSearchCV to tune key KNN hyperparameters:
1.Number of neighbors (n_neighbors)

2.Distance metric (euclidean, manhattan)

3.Weighting strategy (uniform, distance)

Retrained the model using the best hyperparameters and compared performance.

In [8]:

# Hyperparameter tuning
param_grid = {
    'n_neighbors': [3, 5, 7, 9],
    'weights': ['uniform', 'distance'],
    'metric': ['euclidean', 'manhattan']
}

grid_search = GridSearchCV(
    KNeighborsClassifier(),
    param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

In [9]:

# Best model
best_params = grid_search.best_params_
best_knn = KNeighborsClassifier(**best_params)
best_knn.fit(X_train, y_train)

y_pred_best = best_knn.predict(X_test)
best_accuracy = accuracy_score(y_test, y_pred_best)

print(f"Best Hyperparameters: {best_params}")
print(f"Tuned Model Accuracy: {best_accuracy}")

Best Hyperparameters: {'metric': 'manhattan', 'n_neighbors': 9, 'weights': 'distance'}
Tuned Model Accuracy: 0.9621


This experiment demonstrates that while KNN performs well even on large datasets, systematic hyperparameter tuning is essential for achieving stable and reliable performance in real-world scenarios.