                                           Experiment No: 5
Objective : Implement a K-Nearest Neighbors (KNN) algorithm on the breast cancer dataset. The notebook includes:
1. Importing necessary libraries such as pandas, numpy, and scikit-learn.
2. Loading the breast cancer dataset from scikit-learn.
3. Displaying feature names and target names in the dataset.
4. Implementing KNN using the inbuilt scikit-learn KNN classifier.
5. Implementing KNN from scratch.

In [1]:
import pandas as pd
import numpy as np

In [2]:
from sklearn.datasets import load_breast_cancer
ds = load_breast_cancer()

In [5]:
print(f'Features names in the dataset is:\n{ds.feature_names}')

Features names in the dataset is:
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']


In [7]:
print(f'Target names in the dataset is:\n{ds.target_names}')

Target names in the dataset is:
['malignant' 'benign']


In [14]:
print(f'Data in the dataset is:\n{ds.data[:,0:5]}')

Data in the dataset is:
[[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01]
 [2.057e+01 1.777e+01 1.329e+02 1.326e+03 8.474e-02]
 [1.969e+01 2.125e+01 1.300e+02 1.203e+03 1.096e-01]
 ...
 [1.660e+01 2.808e+01 1.083e+02 8.581e+02 8.455e-02]
 [2.060e+01 2.933e+01 1.401e+02 1.265e+03 1.178e-01]
 [7.760e+00 2.454e+01 4.792e+01 1.810e+02 5.263e-02]]


In [13]:
print(f'target data in the dataset is:\n{ds.target[0:5]}')

target data in the dataset is:
[0 0 0 0 0]


In [49]:
from sklearn.model_selection import train_test_split

In [50]:
X_train,X_test,y_train,y_test = train_test_split(ds.data,ds.target,random_state=42,test_size=0.2)

In [63]:
X_train[0,:],X_test[0,:]

(array([9.029e+00, 1.733e+01, 5.879e+01, 2.505e+02, 1.066e-01, 1.413e-01,
        3.130e-01, 4.375e-02, 2.111e-01, 8.046e-02, 3.274e-01, 1.194e+00,
        1.885e+00, 1.767e+01, 9.549e-03, 8.606e-02, 3.038e-01, 3.322e-02,
        4.197e-02, 9.559e-03, 1.031e+01, 2.265e+01, 6.550e+01, 3.247e+02,
        1.482e-01, 4.365e-01, 1.252e+00, 1.750e-01, 4.228e-01, 1.175e-01]),
 array([1.247e+01, 1.860e+01, 8.109e+01, 4.819e+02, 9.965e-02, 1.058e-01,
        8.005e-02, 3.821e-02, 1.925e-01, 6.373e-02, 3.961e-01, 1.044e+00,
        2.497e+00, 3.029e+01, 6.953e-03, 1.911e-02, 2.701e-02, 1.037e-02,
        1.782e-02, 3.586e-03, 1.497e+01, 2.464e+01, 9.605e+01, 6.779e+02,
        1.426e-01, 2.378e-01, 2.671e-01, 1.015e-01, 3.014e-01, 8.750e-02]))

In [51]:
def predict_one(x_train,y_train,x_test,k):
    distances = []
    for i in range(len(x_train)):
        distance = ((x_train[i,:] - x_test)**2).sum()
        distances.append([distance,i])
    distances = sorted(distances)

    targets = []

    for i in range(k):
        index = distances[i][1]
        targets.append(y_train[index])
    return Counter(targets).most_common(1)[0][0]

In [52]:
def predict(x_train,y_train,x_test_data,k):
    predictions = []
    for x_test in x_test_data:
        predictions.append(predict_one(x_train,y_train,x_test,k))
    return predictions

In [53]:
predictions  = predict(X_train,y_train,X_test,7)

In [54]:
from sklearn.metrics import accuracy_score

In [55]:
accuracy_score(y_test,predictions)

0.956140350877193

In [56]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=7)

In [57]:
model.fit(X_train,y_train)

In [58]:
predictions = model.predict(X_test)

In [59]:
accuracy_score(y_test,predictions)

0.956140350877193