# Breast Cancer Dectector using Deep Learning

## Import the libraries

In [1]:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


## Import the dataset

In [2]:
dataset = pd.read_csv('dataset.csv')
X = dataset.iloc[:, 1:dataset.shape[1] - 1].values
y = dataset.iloc[:, dataset.shape[1] - 1].values

## Handle the missing values

We replace all missing values of a column (feature) with the most common value of that column.

In [3]:
from sklearn.impute import SimpleImputer
X = SimpleImputer(missing_values = np.nan, strategy = 'most_frequent').fit_transform(X)

## Encode the dependent variable

We use One Hot Encoder to encode the dependent variable to make sure that the model will understand that the value 2 and 4 representing 2 classes: 2 for benign, 4 for malignant.

In [4]:
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categories = 'auto')
y = onehotencoder.fit_transform(y.reshape(-1, 1)).toarray()

## Split the dataset into the Training set and Test set

We use 15% of the dataset for Test set and 85% for Training set

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15, random_state = 0)

## Apply feature scaling

We apply feature scaling in order to avoid the domination between large value variables and small value ones.

In [6]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Initialize and fit the ANN to the Training set

In our ANN, we create 2 hidden layers with 5 nodes for each. Each node in hidden layers has Activation function as Rectified Linear Unit (reLU), Initial weights follow Uniform distribution. The output layer has only one node with Softmax as Activation function. We use Cross Entropy as the lost/cost function together with "Adam" optimizer as well as "Accuracy" metric.

*Initialize the ANN:*

In [7]:
classifier = Sequential()

*Adding the input layer and the first hidden layer:*

In [8]:
classifier.add(Dense(units = 5, input_dim = 9, activation = 'relu', kernel_initializer = 'uniform'))

Instructions for updating:
Colocations handled automatically by placer.


*Adding the second hidden layer:*

In [9]:
classifier.add(Dense(units = 5, activation = 'relu', kernel_initializer = 'uniform'))

*Add the output layer:*

In [10]:
classifier.add(Dense(units = 2, activation = 'softmax', kernel_initializer = 'uniform'))

*Compile the ANN:*

In [11]:
classifier.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

*Fit the ANN to the Training set (batch_size = 1 and number of epochs = 10):*

In [12]:
classifier.fit(X_train, y_train, batch_size = 1, epochs = 10)

Instructions for updating:
Use tf.cast instead.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x2641f1fe518>

## Predict the Test set results

We choose our set point to be 0.5, which means that if the probability of the cell to be belign equals or greater than 0.5, then it is predicted to be belign. Otherwise, it would be malignant.

In [13]:
y_pred = (classifier.predict(X_test) >= 0.5)

## Make the Confusion Matrix

Confusion Matrix gives us insight into our results.

In [14]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))

After testing all possible numbers of epochs, we finally achieves the results table:

<center>
    
| Number of epochs | Mean Accuracy(%) |
|:---------------:|:---------------:|
|10|97.81|
|20|97.98|
|50|98.15|
|100|98.48|

</center>
