## Breast Cancer Detection
#### (Data Source - University of Wisconsin Hospital, Madison)
For this assignment, I have used TFlearn library built on top of Tensorflow. It provides higher-level API to Tensorflow in order to speedup experimentation.

In [1]:
import tensorflow as tf
import tflearn
import numpy as np

In [2]:
# Seed random state initialization for consistent outcomes. 
np.random.seed(0)

In [3]:
#Read data from file
data = np.genfromtxt("breast_cancer.data", delimiter=",", missing_values="?", filling_values=1.)

#### Data preparation
The data available for this experimentation has 10 plus the class attribute. Attributes 2 through 10 will be used to represent instances. Each instance has one of 2 possible classes: benign or malignant.

class 2 = benign

class 4 = malignant

First, we will split the attributes into inputs (X) and outputs(Y). 

Also, we will convert Y into a binary matrix form for the ease of calculations.

Therefore, expected tranformation will be as follows:

2 = [1 0]

4 = [0 1]

In [4]:
#Separate the X
dataX = data[:, 1:-1]

#Separate the Y
pre_dataY = data[:, -1]

#Convert the Y to binary form
dataY = np.zeros((pre_dataY.size, 2))

for i in range(len(pre_dataY)):
    if pre_dataY[i] == 2:
        dataY[i][0] = 1
    else:
        dataY[i][1] = 1

Its a good practice to randomize the order of data to eliminate any biases while training the model.  Make sure to use same permutation for both X and Y to maintain data accuracy. 

In [5]:
#Mix up the data
permutation = np.random.permutation(dataX.shape[0])
dataX = dataX[permutation]
dataY = dataY[permutation]

There are 9 attributes (2 through 10) that will be input to the network; therefore, input layer will consist of 9 nodes.

The expected output is in binary form; therefore, output layer will consist of 2 nodes.

Setting up hyperparameters is more of art than science; therefore, we will keep tunning them until the desired accuracy is achieved.

In [6]:
#Set network variables and hyperparameters
nInput = 9
nHidden = 25
nOutput = 2
alpha = 0.01
nEpochs = 100
testSplit = 0.15
batchSize = 64

#### Setting up layers and network
input_layer: creates a placeholder for data input with 9 nodes

hidden_layer: creates a layer of 25 nodes where each node from input_layer is connected to each node in hidden layer. The activation function `sigmoid` is a pre-defined function and it takes care of biases and wieghts. 

output_layer: creates output layer with 2 nodes and both nodes are fully connected with each node from hidden layer. It uses pre-defined `softmax` function for the final output.

network: Tflearn makes it easy to create regression by taking care of calculating cost/loss, backpropogation and optimization. It uses pre-defined optimizer and loss functions. 

model: Let Tflearn create a deep neural network model.

In [7]:
input_layer = tflearn.input_data(shape=[None, nInput])
hidden_layer = tflearn.fully_connected(input_layer, nHidden, activation="sigmoid")
output_layer = tflearn.fully_connected(hidden_layer, nOutput, activation="softmax")

network = tflearn.regression(output_layer, optimizer="adam", loss="categorical_crossentropy",
                               learning_rate=alpha, batch_size=batchSize)

model = tflearn.DNN(network)

Instructions for updating:
keep_dims is deprecated, use keepdims instead


#### Split data into train and test

In [8]:
#Number of data points used for testing
num_test = int(testSplit * len(data))

#Split data into train and  test
trainX = dataX[:-num_test]
testX = dataX[-num_test:]

trainY = dataY[:-num_test]
testY = dataY[-num_test:]

#### Train the model and evaluate

In [9]:
model.fit(trainX, trainY, n_epoch=nEpochs, show_metric=True)

print("Final Accuracy:", model.evaluate(testX, testY))

Training Step: 999  | total loss: [1m[32m0.10862[0m[0m | time: 0.023s
| Adam | epoch: 100 | loss: 0.10862 - acc: 0.9637 -- iter: 576/595
Training Step: 1000  | total loss: [1m[32m0.10620[0m[0m | time: 0.026s
| Adam | epoch: 100 | loss: 0.10620 - acc: 0.9642 -- iter: 595/595
--
('Final Accuracy:', [0.9807692170143127])
