---

<h1 style="text-align: center;font-size: 40px;">Breast Cancer Prediction</h1>
<h1 style="text-align: center;font-size: 30px;">(Malignant or Benign)</h1>

---

><h3>Dataset Information:</h3>

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

>## Import necessary Libraries & read the Data

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import plotly.express as px
import plotly.io as pio

pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
%matplotlib inline

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
df = pd.read_csv('/kaggle/input/breast-cancer-wisconsin-data/data.csv')
df.head()

In [None]:
df.info()

> Checking is there any Null values or Not

In [None]:
n = msno.bar(df,color='gold')

In [None]:
df.isnull().sum()

>## Dropping unnecessary Columns

In [None]:
df.drop(['id','Unnamed: 32'],axis=1,inplace=True)

> ## Visualizations

In [None]:
plt.rcParams['figure.figsize']=(10,8)
plt.style.use("classic")
color = ['yellowgreen','gold']
df['diagnosis'].value_counts().plot.bar(color=color)

In [None]:
plt.rcParams['figure.figsize']=(10,8)
plt.style.use("classic")
color = ['yellowgreen','gold']
labels =['Malignant','Benign']
df['diagnosis'].value_counts().plot.pie(y="diagnosis",colors=color,explode=(0,0.08),startangle=50,shadow=True,autopct="%0.1f%%")
plt.legend(labels,loc='best')
plt.axis('on');

In [None]:
pio.templates.default = 'plotly_dark'
def create_hist(xval,color):
    fig = px.histogram(df,x=xval,color=color,title=xval,color_discrete_sequence = ['yellowgreen','gold'],width=600,height=300)
    fig.show()

In [None]:
create_hist('radius_mean','diagnosis')
create_hist('texture_mean','diagnosis')
create_hist('perimeter_mean','diagnosis')
create_hist('area_mean','diagnosis')
create_hist('smoothness_mean','diagnosis')


In [None]:
pio.templates.default = 'plotly_dark'
def create_scatter(xval,yval):
    fig = px.scatter(df,x=xval,y=yval,color='diagnosis',title =xval +" "+"vs"+" "+ yval, color_discrete_sequence = ['yellowgreen','gold'],width=600,height=300)
    fig.show()
    
create_scatter('radius_mean','texture_mean')
create_scatter('texture_mean','perimeter_mean')
create_scatter('perimeter_mean','area_mean')
create_scatter('area_mean','smoothness_mean')
create_scatter('smoothness_mean','compactness_mean')


>Except  texture_mean & perimeter_mean all of the features are Positively Correlated with each other.As the value increases the type of Cancer becomes Benign to Malignant

>Worst Cases

In [None]:
create_scatter('radius_worst','texture_worst')
create_scatter('texture_worst','perimeter_worst')
create_scatter('perimeter_worst','area_worst')
create_scatter('area_worst','smoothness_worst')
create_scatter('smoothness_worst','compactness_worst')


>Comparing 3 variables

In [None]:
pio.templates.default = 'plotly_dark'
def create_3dscatter(xval,yval,zval):
    fig = px.scatter_3d(df,x=xval,y=yval,z=zval,color='diagnosis',title =xval +" "+"vs"+" "+ yval+" "+"vs"+" "+ zval, color_discrete_sequence = ['yellowgreen','gold','lightcoral'])
    fig.show()
    
create_3dscatter('radius_worst','texture_worst','perimeter_worst')
create_3dscatter('area_worst','smoothness_worst','compactness_worst')

>Spliting the dataset into train-test

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
x = df.drop('diagnosis',axis=1)
y = df['diagnosis']
le = LabelEncoder()
y = le.fit_transform(y)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0,stratify=y)

In [None]:
x_train.shape,x_test.shape

>Data Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

>Import necessary Libraries for Building CNN

In [None]:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv1D,MaxPool1D,Flatten,Dense,Dropout,BatchNormalization
from tensorflow.keras.optimizers import Adam

>CNN works with 3 dimensional data.So to work with CNN i'm going to reshape the data into 3 dimensional

In [None]:
x_train = x_train.reshape(455,30,1)
x_test = x_test.reshape(114,30,1);

In [None]:
x_train.shape

>Building CNN

In [None]:
epochs=50
model = Sequential()
model.add(Conv1D(filters=32,kernel_size=2,activation='relu',input_shape=(30,1)))
model.add(BatchNormalization())
model.add(Dropout(0.2))

model.add(Conv1D(filters=64,kernel_size=2,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1,activation='sigmoid'))

In [None]:
model.summary()

>Compiling the Model

In [None]:
model.compile(optimizer=Adam(lr=0.00005),loss='binary_crossentropy',metrics=['accuracy'])

>Train the Model

In [None]:
history = model.fit(x_train,y_train,epochs=60,validation_data=(x_test,y_test),verbose=1)

In [None]:
history.history

>Plotting Learning Curve

In [None]:
epoch_range= range(1,61)
plt.plot(epoch_range,history.history['accuracy'])
plt.plot(epoch_range,history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel("Epochs")
plt.ylabel('Accuracy')
plt.legend(['Train','Val'],loc='upper left')
plt.show()

epoch_range= range(1,61)
plt.plot(epoch_range,history.history['loss'])
plt.plot(epoch_range,history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel("Epochs")
plt.ylabel('Loss')
plt.legend(['Train','Val'],loc='upper left')
plt.show()

>Plotting Confusion Matrix

In [None]:
from mlxtend.plotting import plot_confusion_matrix
from sklearn.metrics import accuracy_score,confusion_matrix
y_pred = model.predict_classes(x_test)
accuracy_score(y_test,y_pred)

In [None]:
mat = confusion_matrix(y_test,y_pred)
classes_name=['Malignant','Benign']
plot_confusion_matrix(mat,figsize=(10,8),class_names=classes_name,show_normed=True)
plt.xticks(rotation=0);