# Breast Cancer Classification with Deep Learning

This notebook performs a deep learning algorithm to train a deep neural network with the breast cancer wisconsin (diagnostic) [dataset](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29) from UCI Machine Learning Repository to predict breast cancer whether the tumor is benign or malignant.

The dataset consists of features that describe characteristics of the cell
nuclei present in a digitised image, these features are defined as follows:

1. **Radius:** the average distance from the center of the nucleus to each of the boundary points<br>
2. **Texture:** the standard deviation of the gray-scale values, the gray-scale value represents the intensity of the shades of gray in each pixel of the image<br>
3. **Perimeter:** the total distance of the boundary of the cell nucleus<br>
4. **Area:** the number of pixels on the interior of the boundary and adding one-half of the pixels on the perimeter, to correct for the error caused by digitisation<br>
5. **Smoothness:** the difference between the length of a radius length and the mean length of the two radius lines surrounding it, hence the local variation in radius lengths<br>
6. **Compactness:** the perimeter and area are combined to obtain a measure of compactness of the cell nuclei<br>
7. **Concavity:** the severity of concave portions of the contour, a high concavity means that the boundary of the cell nucleus has indentations, and thus is rather rough than smooth<br>
8. **Concave points:** the number of concave portions of the contour of the cell nucleus<br>
9. **Symmetry:** the symmetry is determined by first finding the longest line from boundary point to boundary point through the center of the nucleus, subsequently the relative length differences between the lines perpendicular to the longest line to the boundary in both directions are measured, attention should be given to nuclei where the longest line cuts through the boundary because of concavity<br>
10. **Fractal dimension:** the fractal dimension is approximated by the "coastline approximation", the perimeter of the nucleus can be measured using different lengths of measuring sticks, as this length increases, the total length of the measured "coastline" decreases due to lower precision of the measurement, the theoretical fractal dimension is then determined by dividing the logarithm of the observed perimeter L(s) by the logarithm of the measuring stick length s, plotting log(L(s)) against log(s) and determining the negative value of the slope results in an approximation of the fractal dimension D, finally, the desired feature is determined by the calculation D - 1.

The dataset has 30 features (in vector format) rather than 10 because a single feature contains MEAN radius, SE radius, WORST radius, etc. The problem is straightforward, Scientists collected features on patients with (malignant) or without (benign) breast cancer

In [None]:
# import necessary libraries
import os
import torch
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
os.listdir('../input/')
import matplotlib.pyplot as plt
warnings.filterwarnings('ignore')
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
%matplotlib inline

In [None]:
# load the dataset
data = pd.read_csv('../input/breast-cancer-wisconsin-data/data.csv')

In [None]:
# check for basic information
data.info()

In [None]:
# show first few recoreds
data.head()

In [None]:
# check for any missing values
data.isnull().sum()

In [None]:
# diagnosis distribution
data['diagnosis'].value_counts()

Main target is the diagnosis column (M = malignant or B = benign) so let's turn the strings into 1 and 0

In [None]:
# encode categorical data
data['diagnosis'].replace({'M': 1, 'B': 0}, inplace = True)

Next step is to prepare the training and test sets, id and "Unamed: 32" columns won't help us in the prediction so we gonna drop them

In [None]:
Y = data['diagnosis'].to_numpy()

In [None]:
X = data.drop(['id', 'diagnosis', 'Unnamed: 32'], axis = 1)

In [None]:
# split the dataset into the training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)

To have better results, a bit of scaling is important so that neither of features dominate the other

In [None]:
# feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Now we can make sure the data is nicely distributed. Then the next step is to create the model with training data and afterwards, testing is done using test data.

In [None]:
# create the model
model = torch.nn.Linear(X_train.shape[1], 1)

We didn't use the sigmoid activation function here. Rather, we used the binary cross-entropy with logits loss function instead of the binary cross-entropy loss function combined with the sigmoid function because it's more numerically stable and leads to better results than using a plain Sigmoid followed by a BCELoss.

In [None]:
# load sets in format compatible with pytorch
X_train = torch.from_numpy(X_train.astype(np.float32))
X_test = torch.from_numpy(X_test.astype(np.float32))

In [None]:
y_train = torch.from_numpy(y_train).float().reshape(-1, 1)
y_test = torch.from_numpy(y_test).float().reshape(-1, 1)

Finally, we can now specify the hyperparameters and iterate through the train data to run the model

In [None]:
def configure_loss_function():
    return torch.nn.BCEWithLogitsLoss()

In [None]:
# use Adam optimiser for gradient descent
def configure_optimizer(model):
    return torch.optim.Adam(model.parameters(), lr = 0.0007)

In [None]:
# define the loss function to compare the output with the target
criterion = configure_loss_function()
optimizer = configure_optimizer(model)

As we have our model compiled, it's time for training

In [None]:
# run the model
epochs = 2000
# initialise the train_loss & test_losses which will be updated
train_losses = np.zeros(epochs)
test_losses = np.zeros(epochs)

for epoch in range(epochs): 
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)
    # clear old gradients from the last step
    optimizer.zero_grad()
    # compute the gradients necessary to adjust the weights
    loss.backward()
    # update the weights of the neural network
    optimizer.step()

    outputs_test = model(X_test)
    loss_test = criterion(outputs_test, y_test)

    train_losses[epoch] = loss.item()
    test_losses[epoch] = loss_test.item()

    if (epoch + 1) % 50 == 0:
      print (str('Epoch ') + str((epoch+1)) + str('/') + str(epochs) + str(',  training loss = ') + str((loss.item())) + str(', test loss = ') + str(loss_test.item()))

Good, our model has been trained, now it's time to show the loss and accuracy

In [None]:
# visualise the test and train loss
plt.plot(train_losses, label = 'train loss')
plt.plot(test_losses, label = 'test loss')
plt.legend()
plt.title('Model Loss')

Calculate the backward over the validation set

In [None]:
with torch.no_grad():
  output_train = model(X_train)
  output_train = (output_train.numpy() > 0)

  train_acc = np.mean(y_train.numpy() == output_train)

  output_test = model(X_test)
  output_test = (output_test.numpy() > 0)
  
  test_acc = np.mean(y_test.numpy() == output_test)

In [None]:
print ('Train accuracy is: ' + str(train_acc))

In [None]:
print ('Test accuracy is: ' + str(train_acc))