
##  Tensorflow Example

In this example, we are going to look at a data set with measurements taken on several tumors.  For the given data set, we also have the labels or classifications of each tumor (e.g., benign or malignant).

We want to build a Tensorflow model that we can use to predict the label for an unclassified tumor.


#### Load Keras Packages


In [None]:
from keras.models import Sequential 
from keras.layers import Dense, Dropout 
from keras.utils import to_categorical 
from keras.optimizers import SGD


####  Read in Data

In [None]:
import numpy as np
data_file = 'cancer_data.csv'
target_file = 'cancer_target.csv'
cancer_data=np.loadtxt(data_file,dtype=float,delimiter=',')
cancer_target=np.loadtxt(target_file, dtype=float, delimiter=',')

import pandas as pd
print("Data size:  ", cancer_data.shape)
print("Data Summary:")
print(pd.DataFrame(cancer_data).describe())
print("Classification Summary: ")
print(pd.DataFrame(cancer_target).describe())

******************************************
#### Decision Point
******************************************
This is quite a small data set -- only 569 observations.  Let's split it so that only 15% of the observations are used in the test data set.  That give us 85% for the training data.
******************************************

####  Split data into training and testing sets

In [None]:
from sklearn import model_selection
test_size = 0.15 
seed = 7   #The seed in only for reproducible results
data = model_selection.train_test_split(cancer_data, cancer_target, test_size=test_size, random_state=seed)

train_data = data[0]
test_data = data[1]
train_target = data[2]
test_target = data[3]


#### Pre-Process the Data

In [None]:
from sklearn.preprocessing import StandardScaler 
scaler = StandardScaler()
# Fit only to the training data 
scaler.fit(train_data)

# Now apply the transformations to the data: 
x_train = scaler.transform(train_data) 
x_test = scaler.transform(test_data)

# Convert the classes to ‘one-hot’ vector 
y_train = to_categorical(train_target, num_classes=2) 
y_test = to_categorical(test_target, num_classes=2)



******************************************
#### Decision Point
******************************************
Again, because the data set is small, let's use only one hidden layer. 
We'd like the number of nodes to be between 2 (the number of classifications, i.e., malignant or benign) and 30 (the number of columns in the data).  To keep things simple, I'm just going to take the average of those two numbers:  $\frac{2+30}{2}=16$

******************************************

####  Define the Model

In [None]:
model = Sequential() 
# in the first layer, you must specify the expected 
#input data shape
# here, 30-dimensional vectors. 
model.add(Dense(30, activation='relu', input_dim=30)) 
model.add(Dropout(0.5)) 
model.add(Dense(16, activation='relu')) 
model.add(Dropout(0.5))  
model.add(Dense(2, activation='softmax')) 
print(model.summary())


******************************************
#### Decision Point
******************************************
At this point, we need to decide on the optimizer function and loss function that we want to use.

Because our model is shallow (only one hidden layer), we should use Stochastic Gradient Descent (SGD).  To keep it simple, I will choose a learning rate of 0.01 (arbitrary), and I'm setting the nesterov parameter.  This parameter will adjust the momentum of descent so that it moves faster on the initial descent and moves slower as it approaches a minimum value.

For the loss function, I am choosing _binary_crossentropy_ (binary because we have two classifications and crossentropy but this is a classification model).
******************************************

####  Configure the Learning Process

In [None]:

sgd = SGD(lr=0.01, nesterov=True) 
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])



******************************************
#### Decision Point
******************************************
Our final decisions deal with how we want the training data fed through the model during the fitting process.


Once again, we have a small data set; so, I'm going to allow the batch size be all of the data.   I don't know how many times I should do the forward feed/back propagation.  For now, I'm going to choose 20 just to see how accurate my results can be. 
******************************************

####  Fit the Model to the Data

In [None]:

b_size = x_train.shape[0]
num_epochs = 20

model.fit(x_train, y_train, epochs=num_epochs, batch_size=b_size)


####  Apply Model to Test Data

In [None]:
predictions = np.argmax(model.predict(x_test), axis=-1)

#### Evaluate and Display the Results

In [None]:
score = model.evaluate(x_test, y_test, batch_size=b_size) 
print('\nAccuracy:  %.3f' % score[1])
from sklearn.metrics import confusion_matrix 
print(confusion_matrix(test_target, predictions))



#### Analysis of Results

These results are okay, but not great.  I am especially concerned that the accuracy at the last epoch is so low.  (Mine was only about 65%, but yours may vary due to randomness and how many time the cells were run). But, we want to know if there is anything we can tweak that will make the results better.


#### Hands-on Activities

Try the suggested changes (tweaks), and rerun all of the cells to ensure that all of the variables are cleared between runs.

1. Tweak #1:  
The first thing I would try is increasing the number of epochs.  SGD is known to be slow to arrive at a good solution.  Let's increase the number of epochs to 100 and rerun the training process.  Was there an improvement?
2. Tweak #2:
What if I added another hidden layer?  Let's try a second hidden layer (between the first hidden layer and the output layer), and give it 8 nodes.  Did that make the results better or worse?
3.  Tweak #3:
Let's go back to just one hidden layer.  What happens if we increase the number of nodes from 16 to 28?  (I just chose a number between the number of nodes at the input layer and at the hidden layer.)
4.  Tweak #4
I liked the results that I saw on the last tweak.  Let's increase the number of nodes on the hidden layer to twice the number in the input layer (i.e., 60).  What does that do?
5.  Tweak #5
What should we do next?