# Learning KNN training and Testing using Hold-Out method

# The code below demonstrates how KNN model is trained and tested under Hold-Out method. For better understanding, codes below explains building and testing KNN model on data set supported in scikit learn and user specific data set. 


# 1. Loading libraries

In [1]:
import pandas as pd
from sklearn import datasets # imports built in data sets supported in scikit learn 
from sklearn.neighbors import KNeighborsClassifier   # imports KNN classifier
from sklearn import metrics
from sklearn.model_selection import train_test_split # using scikit learn for hold-out

# 2. Loading data set

In [2]:
# Loading load_wine() data set 

dataset_wine = datasets.load_wine()



# 3. Creating Hold-out Enviornment

In [3]:
# Creating hold-out enviornment

winedata_train, winedata_test, winetarget_train, winetarget_test = train_test_split(dataset_wine.data, dataset_wine.target, test_size=0.3)

# 4.  Building KNN model 

We will use training data set created in step 3 for model training(or learning). The supervised model we will be building now is KNN classifier. KNN classifier requires user to specify number of nearest neighbor to train the model.  Here, user can try different values of K to test the performance of the classifier. If user does not specify the value of K then, the model takes the default value of 5. 

In [4]:
#Create a KNN Classifier. The KNeighborsClassifier function takes value of k.

KNNnmodel_1 = KNeighborsClassifier(n_neighbors = 4)  # value of K entered  is 4. 

#Train the model using the training sets

KNNfitted_1 = KNNnmodel_1.fit(winedata_train, winetarget_train)

# 5. Testing trained KNN model on the test set 

In [5]:

#Predict the response on the test data set

KNN_predictions_1 = KNNfitted_1.predict((winedata_test))

# 6. Evaluating the performance of the model 

In [6]:
# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(winetarget_test, KNN_predictions_1),2) * 100, "%")

print ("---------------")

# Printing confusion matrix

print ("Confusion matrix")

print ("---------------")

print(metrics.confusion_matrix(winetarget_test, KNN_predictions_1))

# Model detailed classification report
target_names = ['class 0', 'class 1', 'class 2']


print ("---------------")

print("Classification report", metrics.classification_report(winetarget_test, KNN_predictions_1,target_names =target_names))

Accuracy: 76.0 %
---------------
Confusion matrix
---------------
[[16  0  4]
 [ 1 17  2]
 [ 1  5  8]]
---------------
Classification report              precision    recall  f1-score   support

    class 0       0.89      0.80      0.84        20
    class 1       0.77      0.85      0.81        20
    class 2       0.57      0.57      0.57        14

avg / total       0.76      0.76      0.76        54



# KNN on user specific data set

# 1. Loading data set

In [7]:
# Loading data set from local machine. The data set on predicting liver disorder.
    
My_dataset = pd.read_csv('/Users/sakshibabbar/Documents/ML/Supervised learning/datasets/liver_dataset.csv')

# 2. Dividing data set into sets  of indicator and predictive variables

In [8]:
# My_data contains all data points from My_data set from from first feature to  6th feature(indicator features)
My_data = My_dataset.iloc[:,0:6].values 

# My_target contains class information which is 7th feature in the data set of all the data points in My_dataset

My_target=My_dataset.iloc[:,6].values 

# 3. Creating Hold-out enviornment for the data set in step above

In [9]:
#  The pair of arrays liverdata_train and  livertarget_train will be used for learning the sueprvised model. 
# Whereas, liverdata_test and  livertarget_test for model testing

liverdata_train, liverdata_test, livertarget_train, livertarget_test = train_test_split(My_data, My_target, test_size=0.3)



# 4.  Building KNN model

In [19]:
#Create a KNN Classifier. The KNeighborsClassifier function takes value of k.

KNNnmodel_2 = KNeighborsClassifier(n_neighbors = 13)

#Train the model using the training sets

KNNfitted_2 = KNNnmodel_2 .fit(liverdata_train, livertarget_train)

# 5. Testing trained KNN model on the test set

In [20]:

#Predict the response on the test data set

KNN_predictions_2 = KNNfitted_2.predict((liverdata_test))

# 6. Evaluating the performance of the model

In [21]:
# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(livertarget_test, KNN_predictions_2),2) * 100, "%")

print ("---------------")

# Printing confusion matrix


print ("Confusion matrix")

print ("---------------")
print(metrics.confusion_matrix(livertarget_test, KNN_predictions_2))


# User specific target names   
    
target_names = ['disorder', 'nodisorder']

# Model detailed classification report

print ("---------------")
print("Classification report", metrics.classification_report(livertarget_test, KNN_predictions_2,target_names =target_names))

Accuracy: 65.0 %
---------------
Confusion matrix
---------------
[[24 17]
 [19 44]]
---------------
Classification report              precision    recall  f1-score   support

   disorder       0.56      0.59      0.57        41
 nodisorder       0.72      0.70      0.71        63

avg / total       0.66      0.65      0.66       104

