# **About the dataset**[<a href="https://www.kaggle.com/uciml/breast-cancer-wisconsin-data">src</a>]

### **Description** 
This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. https://goo.gl/U2Uwz2

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

Separating plane described above was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes.

The actual linear program used to obtain the separating plane in the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:

ftp ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

.. topic:: References

    W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
    O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.
    W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171.



#### **Attribute Information:**

1. ID number
2. Diagnosis (M = malignant, B = benign)
3. Ten real-valued features are computed for each cell nucleus:
  - radius (mean of distances from center to points on the perimeter)
  - texture (standard deviation of gray-scale values)
  - perimeter
  - area
  - smoothness (local variation in radius lengths)
  - compactness (perimeter^2 / area - 1.0)
  - concavity (severity of concave portions of the contour)
  - concave points (number of concave portions of the contour)
  - symmetry
  - fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

# **Steps Involved**
1. Getting familier with dataset
2. Data Preprocessing
3. EDA
4. Data Preparation
5. Model Creation/Evaluation

# **1. Getting Familier with dataset**

In [None]:
# Importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import warnings

%matplotlib inline

pd.set_option("display.max_rows", None,"display.max_columns", None)
warnings.simplefilter(action='ignore')
plt.style.use('seaborn')

In [None]:
df = pd.read_csv('../input/breast-cancer-wisconsin-data/data.csv')
df.drop("Unnamed: 32",axis=1,inplace=True)
df.head()

In [None]:
df.info()

In [None]:
df.describe().transpose()

# **2. Data Preprocessing**

In [None]:
# Dropping "id" column
df.drop("id",axis=1,inplace=True)

In [None]:
# Replacing "M" with 0 and "B" with 1
df["diagnosis"].replace("M",0,inplace=True)
df["diagnosis"].replace("B",1,inplace=True)

In [None]:
df.head()

# **3. EDA**

In [None]:
plt.figure(figsize=(20,18))
sns.heatmap(df.corr(),annot=True,linewidths=1,cmap="YlGnBu")
plt.show()

In [None]:
num_mean = ['diagnosis','radius_mean', 'texture_mean', 'perimeter_mean',
       'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean',
       'concave points_mean', 'symmetry_mean', 'fractal_dimension_mean']
df[num_mean].corr()['diagnosis'].sort_values()

In [None]:
top_corr = ['diagnosis','concave points_mean','perimeter_mean','radius_mean','area_mean','concavity_mean','compactness_mean','texture_mean']

sns.pairplot(df[top_corr],hue='diagnosis')
plt.show()

# **4. Data Preparation**

## **a) Train-Test Split**

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop('diagnosis',axis=1)
y = df['diagnosis']

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25,random_state=101)

## **b) Feature Scaling**

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

**Remark:** We will be using test sample for validation as well.

# **5. Model Creation/Evaluation**

## **a) Model Creation**

In [None]:
# Importing keras related libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
# Shape of input i.e. a 1D array having 30 features
X_train_scaled.shape[1:]

Generally, all layers in Keras need to know the shape of their inputs in order to be able to create their weights.

In [None]:
model = Sequential()

# Adding input layer
model.add(Dense(30,activation='relu',input_shape=X_train_scaled.shape[1:],name="input"))
# Adding hidden layers
model.add(Dense(30,activation='relu',name="hidden_1"))
model.add(Dense(15,activation='relu',name="hidden_2"))
# Adding output layer and since this is a binary classification we are using "sigmoid" activation function
model.add(Dense(1,activation='sigmoid',name="output"))

model.summary()

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics='accuracy')

## **b) Model Training**

In [None]:
%%time
model.fit(x=X_train_scaled,
          y=y_train,
          validation_data=(X_test_scaled,y_test),
          batch_size=128,epochs=500)

In [None]:
pd.DataFrame(model.history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.show()

**Inferences:**
<br>Its clear that letting the model run for 500 epochs has lead to Overfitting. There are various ways using which overfitting can be tackled, among those most commonly used techniques are:
- Dropout Layer
- Early Stopping

**Refrence**: https://www.kdnuggets.com/2019/12/5-techniques-prevent-overfitting-neural-networks.html

### **i) Using Dropout layer to deal with Overfitting**

In [None]:
from tensorflow.keras.layers import Dropout

In [None]:
model = Sequential()

# Adding input layer
model.add(Dense(30,activation='relu',input_shape=X_train_scaled.shape[1:],name="input"))
model.add(Dropout(0.5))
# Adding hidden layers
model.add(Dense(30,activation='relu',name="hidden_1"))
model.add(Dropout(0.5))
model.add(Dense(15,activation='relu',name="hidden_2"))
model.add(Dropout(0.5))
# Adding output layer and since this is a binary classification we are using "sigmoid" activation function
model.add(Dense(1,activation='sigmoid',name="output"))

model.compile(optimizer='adam',loss='binary_crossentropy',metrics='accuracy')

In [None]:
%%time
model.fit(x=X_train_scaled,
          y=y_train,
          validation_data=(X_test_scaled,y_test),
          batch_size=128,epochs=500,verbose=0)

In [None]:
pd.DataFrame(model.history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.show()

### **ii) Using Early Stopping to deal with Overfitting**

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

In [None]:
model = Sequential()

# Adding input layer
model.add(Dense(30,activation='relu',input_shape=X_train_scaled.shape[1:],name="input"))
# Adding hidden layers
model.add(Dense(30,activation='relu',name="hidden_1"))
model.add(Dense(15,activation='relu',name="hidden_2"))
# Adding output layer and since this is a binary classification we are using "sigmoid" activation function
model.add(Dense(1,activation='sigmoid',name="output"))

model.compile(optimizer='adam',loss='binary_crossentropy',metrics='accuracy')
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=25)

In [None]:
%%time
model.fit(x=X_train_scaled,
          y=y_train,
          validation_data=(X_test_scaled,y_test),
          batch_size=128,epochs=500,
          callbacks=[early_stop],
          verbose=0)

In [None]:
pd.DataFrame(model.history.history).plot(figsize=(8, 5))
plt.grid(True)
plt.show()

## **c) Model Evaluation**

In [None]:
from sklearn.metrics import classification_report,confusion_matrix

predictions = model.predict_classes(X_test_scaled)
print(classification_report(y_test,predictions),"\n\n")
print(confusion_matrix(y_test,predictions))