# Predicting Breast Cancer Using Artificial Neural Networks (ANN) Model

## Content
1. Healthcare problem/question of interest and objective in building a ANN model
2. Import required libraries and load data
3. Explore and preprocess the data
4. Build a classifying model using ANNs
5. Evaluate the model and predict results
6. Reference

## 1. Healthcare problem/question of interest and objective in building a ANN model

* Problem/question:  Breast cancer and how to classify its types using historical data? 
* Objective:         Build a simple ANN model to predict whether the cancer is benign or malignant

## 2. Import required libraries and load data

In [None]:
# Libraries for exploring and preprocessing data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Libraries for buiding a ANN model
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Load the breast cancer data
data = pd.read_csv('P:/Learning/Big Data and Machine Learning Resources/Deep Learning/Intro to Keras with breast cancer data[ANN]_Kaggle/data.csv')
del data['Unnamed: 32']

## 3. Explore and preprocess the data

In [3]:
data.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [None]:
data.summary()

Notes re the Breast Cancer Wisconsin (Diagnostic) Data Set
    * Collums contain data computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. 
    * Collums describe characteristics of the cell nuclei present in the image. 
    * Description of collumns:
            - ID number 
            - Diagnosis (M = malignant, B = benign) 3-32)
            - Radius (mean of distances from center to points on the perimeter)
            - Texture (standard deviation of gray-scale values)
            - Perimeter mean
            - Area mean
            - Smoothness_mean (local variation in radius lengths) 
            - Compactness (perimeter^2 / area - 1.0) 
            - Concavity (severity of concave portions of the contour) 
            - Concave points (number of concave portions of the contour)
            - Symmetry 
            - Fractal dimension ("coastline approximation" - 1)

    * The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed         for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is         Worst Radius.
    * All feature values are recoded with four significant digits.
    * Missing attribute values: none
    * Class distribution: 357 benign, 212 malignant

In [4]:
X = data.iloc[:, 2:].values
y = data.iloc[:, 1].values

# Encode categorical data using LabelEncoder
from sklearn.preprocessing import LabelEncoder
labelencoder_X_1 = LabelEncoder()
y = labelencoder_X_1.fit_transform(y)

# Split the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 0)

# Scale features/columns in the dataset
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## 4. Build a classifying model using ANN

In [4]:
# Initialise the ANN
classifier = Sequential()

In [5]:
# Add the input layer and the first hidden layer
classifier.add(Dense(output_dim=16, init='uniform', activation='relu', input_dim=30))

# Add dropout to prevent overfitting
classifier.add(Dropout(p=0.1))

Notes:
- input_dim - number of columns of the dataset 

- output_dim - number of outputs to be fed to the next layer, if any

- activation - activation function which is ReLU in this case

- init - the way in which weights should be provided to an ANN
 
The **ReLU** function is f(x)=max(0,x). Usually this is applied element-wise to the output of some other function, such as a matrix-vector product. In MLP usages, rectifier units replace all other activation functions except perhaps the readout layer. But I suppose you could mix-and-match them if you'd like. One way ReLUs improve neural networks is by speeding up training. The gradient computation is very simple (either 0 or 1 depending on the sign of x). Also, the computational step of a ReLU is easy: any negative elements are set to 0.0 -- no exponentials, no multiplication or division operations. Gradients of logistic and hyperbolic tangent networks are smaller than the positive portion of the ReLU. This means that the positive portion is updated more rapidly as training progresses. However, this comes at a cost. The 0 gradient on the left-hand side is has its own problem, called "dead neurons," in which a gradient update sets the incoming values to a ReLU such that the output is always zero; modified ReLU units such as ELU (or Leaky ReLU etc.) can minimize this. Source : [StackExchange](https://stats.stackexchange.com/questions/226923/why-do-we-use-relu-in-neural-networks-and-how-do-we-use-it)

In [6]:
# Add the second hidden layer
classifier.add(Dense(output_dim=16, init='uniform', activation='relu'))

# Add dropout to prevent overfitting
classifier.add(Dropout(p=0.1))

In [7]:
# Add the output layer
classifier.add(Dense(output_dim=1, init='uniform', activation='sigmoid'))

Notes:
- output_dim is 1 as we want only 1 output from the final layer.

- Sigmoid function is used when dealing with classfication problems with 2 types of results.(Submax function is used for 3 or more classification results)
<img src="https://cdn-images-1.medium.com/max/1000/1*Xu7B5y9gp0iL5ooBj7LtWw.png">

In [8]:
# Compile the ANN model
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Notes:
- Optimizer is chosen as adam for gradient descent.

- Binary_crossentropy is the loss function used. 

- Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value. A perfect model would have a log loss of 0. [More about this](http://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html)

In [9]:
# Fit the ANN to the Training set
classifier.fit(X_train, y_train, batch_size=100, nb_epoch=150)


Note:
- Batch size defines number of samples that going to be propagated through the network.

- An Epoch is a complete pass through all the training data.

## 5. Evaluate the model and predict results

In [11]:
# Make the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

In [12]:
print("Our accuracy is {}%".format(((cm[0][0] + cm[1][1])/57)*100))

In [10]:
# Predict the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

In [13]:
# Visualize the results
sns.heatmap(cm,annot=True)
plt.savefig('h.png')

## 6. Reference:

As indicated in the Readme file

* Data used in this project come from:
	- Breast Cancer Wisconsin (Diagnostic) Data Set available at:
	- https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
	
* ML codes are inherited from 
    - https://www.kaggle.com/thebrownviking20/intro-to-keras-with-breast-cancer-data-ann