# 6. 📈 KNN- Final
Exported from Filament on Thu, 17 Mar 2022 19:30:45 GMT

---

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split

##
from sklearn import datasets
import sklearn.metrics as sm
from sklearn.cluster import KMeans
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
df = pd.read_csv('knn_mushrooms.csv') # reading in the cleaned and encoded mushroom data

These are all the features with importance above zero:

gill_size, cap_surface_grooves, odor_almond, odor_anise, odor_none, stalk_root_bulbous, stalk_root_club, stalk_surface_below_ring_scaly, spore_print_color_green, population_clustered, habitat_woods

So lets just use these for our refined KNN model and see what results we get.

## ✅ KNN Final

In [None]:
# train/test split using the refined list of features

feature_cols = ['gill_size', 'cap_surface_grooves', 'odor_almond', 'odor_anise', 'odor_none',
    'stalk_root_bulbous', 'stalk_root_club', 'stalk_surface_below_ring_scaly',
    'spore_print_color_green', 'population_clustered', 'habitat_woods']
y = df['class']
X = df[feature_cols]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=124, stratify=y)

In [None]:
# remembering to scale our data again

scaler = MinMaxScaler()

scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# creating that great visual to display our mean error rate

errors = []

for k in range(1, 40):
    knn = KNeighborsClassifier(n_neighbors= k)
    knn.fit(X_train, y_train)
    pred_i = knn.predict(X_test)
    errors.append(np.mean(pred_i != y_test))
    
plt.figure(figsize=(12, 6))
plt.plot(range(1, 40)
         , errors
         , color='black'
         , linestyle='dashed'
         , marker='o'
         , markerfacecolor='grey'
         , markersize=10)
plt.title('Error Rate K Value')
plt.xlabel('K Value')
plt.ylabel('Mean Error')
plt.show()

In [None]:
# now we can see the best neighbour parameter is 4, so lets try this and go from there

classifier = KNeighborsClassifier(n_neighbors= 4)
classifier.fit(X_train, y_train)

In [None]:
# so here we have it- creating the preicition, and running the matrix and report

y_pred = classifier.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

## ✅ Conclusion:

We can see comparatively that we have one more false negative in this refined KNN model, than we did in the original KNN model that used all the features. More importantly we can see that once again there are 2 false negatives! So unfortunately, the KNN method didn't improve our model either. In that case, lets tick with the final decision trees model, its far easier to interpret and we can see the full breakdown from node to node of how our data is being classified. 

So if you ever find yourself lost and foraging for mushrooms- just remember to differentiate the edible from the poisonous using these features: it's gill size, cap surface grooves, an almond, anise or odourless mushroom, whether its got a bulbous or clubbed stalk root, if it has a scaly ring below the surface, if the spore is green, if its amongst a clustered population and if you're in the woods! Good luck- and remember there's still a slim chance you could get it wrong, so at the very least- at least make sure it looks like its worth it!