<a href="https://colab.research.google.com/github/hikmatfarhat-ndu/CSC645/blob/master/shallow_tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Using Tensorflow to model the shallow network
In this exercise will we redo, using tensorflow the shallow network that we trained from first
principles before to recognize the Sonar data


In [1]:
import tensorflow as tf
import numpy as np


## The Data

The data is from https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks) the mines vs rocks sonar data. It is in csv format and very small so download it to your computer. Next we upload it to colab and read it using the pandas package.


### Upload to colab


In [2]:
from google.colab import files
file=files.upload()

Saving sonar.csv to sonar.csv


### Read using Pandas

In [3]:
import pandas as pd
df=pd.read_csv("sonar.csv")


### Preprocessing the data

We need to perform several operations on the data before we use it. 
1. The data is sorted: all mines followed by all rocks so we shuffle it using numpy
2. We need to break it into train and test sets.
3. Make sure that the data in "float32" type instead of "float". For some reason  Tensorflow is sensitive about that 

In [4]:
#pandas data frame
m=df.values
# randomize (shuffle) the data
np.random.shuffle(m)
# Each row has 61 entries, 60 for data and the last one is the label "M" or "R"
# X contains all the data
X=m[:,0:60].astype("float32")
# Y contains all the labels
Y=m[:,60]
# convert the labels: "M"->1 and "R"->0
Y=np.array([1.0 if i=='M' else 0.0 for i in Y])
Y=Y.reshape((len(Y),1))
Y=Y.astype("float32")

# split the data and labels into a training and test sets
x_train=X[0:175]
x_test=X[175:208]
y_train=Y[0:175]
y_test=Y[175:208]

<class 'numpy.float32'>


### Important Note
Tensorflow stacks the samples row-wise instead of column-wise
as we have been doing when we did the gradient descent oursleves. We need to keep that in mind.

### Defining the parameters


In [5]:
learning_rate = 3
nb_iterations = 2500

# Network Parameters
n_h = 64 # number of neurons in hidden layer
n_x = X.shape[1] #number of neurons in input
n_y = Y.shape[1] #number of neurons in ouput


### Initialization
The forward propagation phase is the same as when we did this exercise from first principles but since tensorflow stacks the data row-wise the forward propagation is slightly different then we are used to.
Let $W^0$,$W^1$,$b^0$,$b^1$ be the weights and biases of the first and second layer respectively then forward propagation is given by
\begin{align*}
Z^1&=X\cdot W^0+b^0\\
A^1 &=\sigma(Z^1)\\
Z^2 &=A^1\cdot W^1+b^1\\
A^2 &=\sigma(Z^2)
\end{align*}
Compare the above with the equations in the previous exercise. For more details consult the lecture [backpropagation](https://github.com/hikmatfarhat-ndu/CSC645/blob/master/lectures/csc645-lecture-backprop.pdf).

According to the above equations we have to define the tensorflow variables that will hold the weights and biases. 
The biases are initialized to zero  and the weights randomly.

In [6]:

initializer = tf.initializers.RandomNormal()

W0=tf.Variable(initializer([n_x,n_h]),trainable=True,dtype=tf.float32)
W1=tf.Variable(initializer([n_h,n_y]),trainable=True,dtype=tf.float32)

b0=tf.Variable(tf.zeros([n_h]))            #biases of the first layer
b1=tf.Variable(tf.zeros([n_y]))            #biases of the second layer


### Defining the model
Our model has two layers. The function "model" below should return the ouput of our model for a given input. Note since Tensorflow uses the first index as the sample size the dot product has a different order.

In [7]:
def model(input):
    # Hidden fully connected layer with 256 neurons
   
    layer_1 = tf.add(tf.matmul(input, W0), b0)
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(tf.sigmoid(layer_1), W1) + b1
    return out_layer

Once the model is defined the remaining code is similar to our previous exercise. We define the loss
as an average over the cross-entropy but this time since it is binary classification we use the sigmoid instead
of the softmax function. Then our optimizer uses gradient descent to minimize the loss

In [8]:

# Define loss and optimize
def loss(pred,label):
   return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=label))


## Prediction

Given input X ( and the parameters we are trying to learn) this function predicts if it is Rock or Mine

In [9]:
def prediction(X):
  a=tf.math.sigmoid(model(X))
  return tf.cast((a>0.5),tf.int32)

The model is defined now we run our computation. Recal that our model depends on, the changing, parameters $W^0,W^1,b^0,b^1$ therefore its gradient will change also. The train function is a **single** training step

In [10]:
optimizer=tf.optimizers.SGD(learning_rate)

def train(data,labels):
  with tf.GradientTape() as tape:
    diff=loss(model(data),labels)

  grad=tape.gradient(diff,[W0,W1,b0,b1])
  optimizer.apply_gradients( zip( grad , [W0,W1,b0,b1] ) )
  pT=tf.transpose(prediction(data))
  correct=np.squeeze(np.dot(pT,labels)+np.dot(1-pT,1-labels))
  return diff,correct

### Training Loop

In [11]:
for i in range(nb_iterations):
 cost,corr=train(x_train,y_train)
 if(i%500==0):
  print("cost={},accuracy={}/{}".format(cost,corr,x_train.shape[0]))
  


cost=0.6938693523406982,accuracy=96.0/175
cost=0.2810496687889099,accuracy=147.0/175
cost=0.13114233314990997,accuracy=167.0/175
cost=0.04477405920624733,accuracy=173.0/175
cost=0.032321758568286896,accuracy=175.0/175


### Accuracy

In [12]:
pT=tf.transpose(prediction(x_test))
print(np.dot(pT,y_test))
print(np.dot(1-pT,1-y_test))
correct=np.dot(pT,y_test)+np.dot(1-pT,1-y_test)
accuracy=100*float(np.squeeze(correct))/float(y_test.shape[0])
print("Accuracy="+str(accuracy))

[[13.]]
[[12.]]
Accuracy=78.125
