# KNN

Importing required python modules
---------------------------------

In [None]:
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from sklearn.preprocessing import normalize,scale
from sklearn.cross_validation import cross_val_score
import numpy as np
import pandas as pd  

The following libraries have been used :
* **Pandas** : pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
* **Numpy** : NumPy is the fundamental package for scientific computing with Python.
* **Matplotlib** : matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments .
* **Sklearn** : It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Retrieving the dataset
----------------------

In [None]:
data = pd.read_csv('heart.csv', header=None)

df = pd.DataFrame(data)


x = df.iloc[:, 0:5]
x = x.drop(x.columns[1:3], axis=1)
x = pd.DataFrame(scale(x))

y = df.iloc[:, 13]
y = y-1

1. Dataset is imported.
2. The imported dataset is converted into a pandas DataFrame.
3. Attributes(x) and labels(y) are extracted.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4)

Train/Test split is 0.4

Plotting the dataset
--------------------

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(1,2,1)
ax1.scatter(x[1],x[2], c=y)
ax1.set_title("Original Data")

Matplotlib is used to plot the loaded pandas DataFrame.

Learning from the data
----------------------

In [None]:
model = KNeighborsClassifier(n_neighbors=5)


scores = cross_val_score(model, x, y, scoring='accuracy', cv=10)
print ("10-Fold Accuracy : ", scores.mean()*100)

model.fit(x_train,y_train)
print ("Testing Accuracy : ",model.score(x_test, y_test)*100)
predicted = model.predict(x)


Here **model** is an instance of KNeighborsClassifier method from sklearn.neighbors. 10 Fold Cross Validation is used to verify the results.

In [None]:
ax2 = fig.add_subplot(1,2,2)
ax2.scatter(x[1],x[2], c=predicted)
ax2.set_title("KNearestNeighbours")

The learned data is plotted.

In [None]:
cm = metrics.confusion_matrix(y, predicted)
print (cm/len(y))
print (metrics.classification_report(y, predicted))


plt.show()

Compute confusion matrix to evaluate the accuracy of a classification and build a text report showing the main classification metrics.