**KNN Classifier**

K nearest neighbor (k-NN) algorithm is a non parametric supervised machine learning algorithm used for classification and regression. For the context of text classification, the object is classified based on vote among the k nearest neighbors.

Here is a sample code for training, testing and evaluating the KNN classification model using sklearn.

In [15]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

Get labeled data

In [16]:
# First Feature
weather = ['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny','Rainy','Sunny','Overcast','Overcast','Rainy']
# Second Feature
temp = ['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']
# Label or target variable
play = ['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']

Convert labeled data to encoded data

In [17]:
le = preprocessing.LabelEncoder()

weather_encoded = le.fit_transform(weather)
print('Weather Encoded', weather_encoded)

temp_encoded = le.fit_transform(temp)
print('Temperature Encoded', temp_encoded)

label=le.fit_transform(play)
print('Label Encoded', label)

Weather Encoded [2 2 0 1 1 1 0 2 2 1 2 0 0 1]
Temperature Encoded [1 1 1 2 0 0 0 2 0 2 2 2 1 2]
Label Encoded [0 0 1 1 1 0 1 0 1 1 1 1 1 0]


Create feature set

In [18]:
features = list(zip(weather_encoded, temp_encoded))
print('Features', features)

Features [(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2)]


Split the data for train and test

In [23]:
# Splitting train : test to 70 : 30 ratio
X_train, X_test, y_train, y_test = train_test_split(features, label, test_size=0.3)

Train the classifier

In [20]:
# Applying k = 3, default Minkowski distance metrics
model = KNeighborsClassifier(n_neighbors=3)
# Training the classifier
model.fit(X_train, y_train)

Test the classifier

In [21]:
# Testing the classifier
y_pred = model.predict(X_test)
print('Predicted', y_pred)
print('Actual data', y_test)

Predicted [1 1 1 0 1]
Actual data [0 0 1 1 1]


Evaluate

In [22]:
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy', accuracy)

Accuracy 0.4


**Requirements**

1. Using sklearn library, perform classifications on the Iris dataset.

2. Break the sample into 70% for training, and 30% for validation datasets. 

3. Using standard functions, compute the F1-score and accuracy of the model for both training and validation.