<a href="https://www.kaggle.com/code/swapnilshivpuje/knn-breast-cancer-dataset-0-1?scriptVersionId=143381116" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Breast Cancer Wisconsin (Diagnostic) Data Set
##### Predict whether the cancer is benign or malignant

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

## 1. Importing Important Libraries

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 2. Reading CSV File

In [None]:
df = pd.read_csv('/kaggle/input/breast-cancer-wisconsin-data/data.csv')
df.head()

## 3. Data Analysis

In [None]:
df.isna().sum()

In [None]:
df.describe()

In [None]:
df.corr

In [None]:
temp = df.drop(columns=['diagnosis'],axis=1)

In [None]:
correlation_matrix = temp.corr()

In [None]:
plt.figure(figsize=(10, 8))  # Adjust the figure size as needed
plt.imshow(correlation_matrix, cmap='coolwarm', interpolation='nearest')
plt.colorbar(label='Correlation Coefficient')
plt.title('Correlation Matrix')
plt.xticks(range(len(correlation_matrix.columns)), correlation_matrix.columns, rotation=90)
plt.yticks(range(len(correlation_matrix.columns)), correlation_matrix.columns)
plt.show()


In [None]:
plt.figure(figsize=(20, 14))  # Adjust the figure size as needed
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

In [None]:
df['diagnosis'].value_counts().plot(kind="pie", autopct="%1.1f%%")

In [None]:
print("We have",df.diagnosis.nunique(),"output values to predict namely M and B")

## 4. Data Preprocessing

In [None]:
#id and unamed is not useful in trianing our model so we drop it
df.drop(columns = ['id','Unnamed: 32'],inplace=True)
df.head()

In [None]:
X = df.iloc[:,1:].values
y = df.iloc[:,0].values

In [None]:
y.shape

In [None]:
X.shape

## 5. Splitting Dataset

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

## 6. Scaler

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## 7. Implementing KNN Model

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train,y_train)

## 8.Prediction  

In [None]:
y_pred = knn.predict(X_test)

## 9. Finding Best K values

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)

In [None]:
scores = []

for i in range(1,16):
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train,y_train)
    y_pred = knn.predict(X_test)
    scores.append(accuracy_score(y_test,y_pred))

In [None]:
scores

In [None]:
plt.plot(range(1,16),scores)

## 10. Model Accuracy Score

In [None]:
data = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
data

## -DONE- 
*if you have any suggestion  or changes let me know in the comment*