## Manual Hyperparameter Tuning for a k-NN Classifier
In this exercise, we will manually tune a k-NN classifier, which was covered in Chapter 7, The Generalization of Machine Learning Models, our goal being to predict incidences of malignant or benign breast cancer based on cell measurements sourced from the affected breast sample.

In [1]:
from sklearn import neighbors, datasets, model_selection



In [4]:
# dataset
cancer = datasets.load_breast_cancer()

# target
y = cancer.target

# features
X = cancer.data

In [3]:
# initialize knn
knn = neighbors.KNeighborsClassifier()

Feed this classifier into a 10-fold cross-validation (cv), calculating the precision score for each fold. Assume that maximizing precision (the proportion of true positives in all positive classifications) is the primary objective of this exercise

In [5]:
#10 folds, scored on precision
cv = model_selection.cross_val_score(knn, X, y, cv=10, scoring='precision')

In [6]:
# precision scores
print(cv)

[0.91666667 0.85       0.91666667 0.94736842 0.94594595 0.94444444
 0.97222222 0.92105263 0.96969697 0.97142857]


Calculate and print the mean precision score for all folds. This will give us an idea of the overall performance of the model.

In [10]:
# average over all folds
print(round(cv.mean(), 2))

0.94


In [11]:
knn.get_params()

{'algorithm': 'auto',
 'leaf_size': 30,
 'metric': 'minkowski',
 'metric_params': None,
 'n_jobs': None,
 'n_neighbors': 5,
 'p': 2,
 'weights': 'uniform'}

Run everything again, this time setting hyperparameter k to 15. You can see that the result is actually marginally worse (1% lower):

In [13]:
# k=15
knn = neighbors.KNeighborsClassifier(n_neighbors=15)
cv = model_selection.cross_val_score(knn, X, y, cv=10, scoring='precision')
print(round(cv.mean(), 2))

0.93


In [14]:
def evaluate_knn(k):
    knn = neighbors.KNeighborsClassifier(n_neighbors=k)
    cv = model_selection.cross_val_score(knn, X, y, cv=10, scoring='precision')
    print(round(cv.mean(), 2))

evaluate_knn(7)
evaluate_knn(3)
evaluate_knn(1)

0.93
0.93
0.92


In [16]:
knn.get_params()

{'algorithm': 'auto',
 'leaf_size': 30,
 'metric': 'minkowski',
 'metric_params': None,
 'n_jobs': None,
 'n_neighbors': 15,
 'p': 2,
 'weights': 'uniform'}

run code again with K=5 and weights='distance'

In [19]:
def evaluate_knn(k):
    knn = neighbors.KNeighborsClassifier(n_neighbors=k, weights='distance')
    cv = model_selection.cross_val_score(knn, X, y, cv=10, scoring='precision')
    print(round(cv.mean(), 2))

evaluate_knn(5)

0.93
