#  Classification Using KNN (Without Scaling)

<b> [Breast Cancer Diagnostic] </b>

There are two main classifications of tumors. One is known as benign and the other as malignant. A benign tumor is a tumor that does not invade its surrounding tissue or spread around the body. A malignant tumor is a tumor that may invade its surrounding tissue or spread around the body.

Our target it to train a KNN Regression model that can predict whether the cancer is benign (B) or malignant (M).

Attribute Information:
<br>1) ID number 
<br>2) Diagnosis (M = malignant, B = benign) 
<br>3-32) Ten real-valued features are computed for each cell nucleus: 
<br>a) radius (mean of distances from center to points on the perimeter) 
<br>b) texture (standard deviation of gray-scale values) 
<br>c) perimeter 
<br>d) area 
<br>e) smoothness (local variation in radius lengths) 
<br>f) compactness (perimeter^2 / area - 1.0) 
<br>g) concavity (severity of concave portions of the contour) 
<br>h) concave points (number of concave portions of the contour) 
<br>i) symmetry 
<br>j) fractal dimension ("coastline approximation" - 1)

**`'Diagnosis'`** column is the **Dependent Variable or target column** because we want our algorithm to predict this class.

**`'1,3-32'`** are your **Features or Independent Variables** which will help you predict the Benign/Malignant class. Vary any one of them and it is going to affect your Diagnostic.

## Read the CSV file  ../Data/Breast_Cancer_Diagnostic.csv

In [None]:
# Importing required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Loding the dataset into pandas dataframe.
#df = pd.read_csv('../Data/Breast_Cancer_Diagnostic.csv')

# the code is changed to read data from a github url
url='https://raw.githubusercontent.com/sujitcl/code/main/Data/Breast_Cancer_Diagnostic.csv'

df=pd.read_csv(url)


In [None]:
# Retain the 10 features and the target variable.
df = df[['radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean','diagnosis']]

In [None]:
# Check for nulls.
df.columns[df.isnull().any()]

Index([], dtype='object')

## Create the Dataframe of features (X) and the target (Y) variables

In [None]:
# Load the features to a variable X
# X is created by simply dropping the diagnosis column and retaining all others
X = df.drop('diagnosis', axis = 1)

# Load the target variable to y
y = df['diagnosis']

## Split Test Train

**> Train-Test split -** We split our data into two parts, namely, the train set and the test set (ideally its a 70-30 train-test split which is upto you). We then try to build our function f(x) (aka model) using the train set and see how well it does on the test set.   

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1)

## Create an Instance of the classifier and train it.

In [None]:
# Let's create an instance for the KNN model and then train it with the training set.
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)

# Train the model using the training sets. This does nothing other than store the points in 
# some internal data structure.
model.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

## Get the Predictions

In [None]:
# Getting predictions from the model 
y_test_hat = model.predict(X_test)

# Compare the predicted values with the actuals.
Results = pd.DataFrame({'Actual': y_test, 'Predictions': y_test_hat})

Results.head(5)

Unnamed: 0,Actual,Predictions
421,B,B
47,M,B
292,B,B
186,M,M
414,M,M


### 2. The confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix, recall_score, precision_score
 
cm = confusion_matrix(y_test, y_test_hat)
print(cm)

[[98 10]
 [13 50]]


In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_test_hat))

              precision    recall  f1-score   support

           B       0.88      0.91      0.89       108
           M       0.83      0.79      0.81        63

    accuracy                           0.87       171
   macro avg       0.86      0.85      0.85       171
weighted avg       0.86      0.87      0.86       171



In [None]:
# Assigning Variables for convinience
TN = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TP = cm[1][1]

recall = TP / float(FN + TP)
print("recall:", recall)

precision = TP / float(TP + FP)
print("precision:", precision)

specificity = TN / (TN + FP)
print("specificity:", specificity)

recall: 0.7936507936507936
precision: 0.8333333333333334
specificity: 0.9074074074074074


### Repeat of KNN with K=7

In [None]:
model = KNeighborsClassifier(n_neighbors=7)

# Train the model using the training sets. 
model.fit(X_train, y_train)

# Getting predictions from the model 
y_test_hat = model.predict(X_test)

cm = confusion_matrix(y_test, y_test_hat)
print(cm)



[[100   8]
 [ 14  49]]


In [None]:

print(classification_report(y_test, y_test_hat))

              precision    recall  f1-score   support

           B       0.88      0.93      0.90       108
           M       0.86      0.78      0.82        63

    accuracy                           0.87       171
   macro avg       0.87      0.85      0.86       171
weighted avg       0.87      0.87      0.87       171



In [None]:
# Assigning Variables for convinience
TN = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TP = cm[1][1]

recall = TP / float(FN + TP)
print("recall:", recall)

precision = TP / float(TP + FP)
print("precision:", precision)

specificity = TN / (TN + FP)
print("specificity:", specificity)

recall: 0.7777777777777778
precision: 0.8596491228070176
specificity: 0.9259259259259259
