# Feedforward Neural Network with Tensorflow #

We have built a tensorflow from scratch in Julia. Going through such process is crucial for us to be able to understand how Neural Networks learn and the mathematics that make it possible. 

Several tools available today make the building of Neural Networks an easy task, where the math becomes second nature, and the emergent properties of the Neural Network, such as its architecture, become the main focus.

[Tensorflow](https://www.tensorflow.org/) is such a tool. This notebook will guide you through the steps necessary to build, with Tenforflow, a Feedforward Neural Network identical to the last one we built in Julia.

In [8]:
# import necessary libraries
import numpy as np
import scipy.io
import tensorflow as tf

In [9]:
# let's reload the music dataset we used previously
Iowa_MIS_dataset = scipy.io.loadmat('Iowa_MIS_dataset.mat')
data = Iowa_MIS_dataset['dat_all']

Remember: the variable `data` is of size [660,88201], and is is already shuffled (it's always good to double check, just to make sure). The number of rows tells you the number of datapoints in the dataset, and the number of columns - 1 tells you the dimensionality of the dataset. The rightmost column in the matrix contains the labels for all datapoints. Possible labels are:

0 'Bass'

1 'Bassoon'

2 'Cello'

3 'Clarinet'

4 'Flute'

5 'Guitar'

6 'Horn'

7 'Sax'

8 'Trombone'

9 'Trumpet'

10 'Viola'

11 'Violin'

Now separate the datapoints into training (~80% of the data), validation (~10%), and test sets (~10%).

In [10]:
# general data parameters
N = data.shape[0]
D = data.shape[1]-1
C = 12

# split into training, validation, and test sets
perc_tr = 0.8
perc_vl = 0.1
perc_ts = 0.1

# randomly shuffle the data (just to make sure)
np.random.permutation(data)

# separate into training, validation, and test sets
# data
x_tr = data[:int(N*perc_tr),0:-1]
x_vl = data[int(N*perc_tr):int(N*perc_tr+N*perc_vl),0:-1]
x_ts = data[-int(N*perc_ts):,0:-1]
# labels
y_tr = data[:int(N*perc_tr),-1].reshape(int(N*perc_tr),1)
y_vl = data[int(N*perc_tr):int(N*perc_tr+N*perc_vl),-1].reshape(int(N*perc_vl),1)
y_ts = data[-int(N*perc_ts):,-1].reshape(int(N*perc_ts),1)

# we won't use the variable `data` after this point.
del data

Tensorflow will expect the data labels to be one-hot encodings with a $1$ at the index corresponding to teh correct label. The following snipper should fix this four you:

In [11]:
temp = np.zeros((y_tr.shape[0],C))
temp[np.arange(y_tr.shape[0]),y_tr.astype(int)[:,0]] = 1
y_tr = temp
temp = np.zeros((y_vl.shape[0],C))
temp[np.arange(y_vl.shape[0]),y_vl.astype(int)[:,0]] = 1
y_vl = temp
temp = np.zeros((y_ts.shape[0],C))
temp[np.arange(y_ts.shape[0]),y_ts.astype(int)[:,0]] = 1
y_ts = temp
del temp

Now that we have the data loaded as numpy arrays, we will build the feedforward Neural Network.

Remember that the expression describing the forward pass is:

$$\hat{y} = softmax(~\sigma(x~W^{(h)})~W^{(o)})$$

And that the training objective is:

$$J = minimize\{\hat{y}_{correct}\}$$

Let's start by defining all the general parameters that we will need

In [12]:
# general parameters
N = x_tr.shape[0] # number of training examples
D = x_tr.shape[1] # dimensionality of the data
C = y_tr.shape[1] # number of unique labels in the dataset

# hyperparameters
H = 128 # number of hidden units. In general try to stick to a power of 2
lr = 0.01 # the learning rate (previously refered to in the notes as alpha)

Now we will have to initialize the tensorflow "variables", and the tensorflow "placeholders".

The "variables" are the ones that tensorflow will be able to learn, given a neural network architecture and an objective function. 

Will out training data be a "variable" or a "placeholder"? What about the weights?

After that we will build and train the model also.

In [7]:
# let's initialize the weights
W_h = tf.Variable(tf.random_normal((D,H), stddev = 0.01)) # mean=0.0
W_o = tf.Variable(tf.random_normal((H,C), stddev = 0.01)) # mean=0.0

# We now initialize the placeholders for our data. At this point we won't be passing any data in yet
# Tensorflow build a graph with the information that we are giving it, so it just allocates graph space
# for the data that we will pass in
# we will have to tell it the datatype ("float"), and the shape of the data
X = tf.placeholder("float", shape=[None,D])
y = tf.placeholder("float", shape=[None,C])

# we now do the forward pass until we obtain the scores
h = tf.nn.sigmoid(tf.matmul(X,W_h))
scores = tf.matmul(h, W_o)

# now that we have the scores, we can turn that into probabilities
y_hat = tf.nn.softmax(scores)
# and see which of the class indices received the largest value
max_y_index = tf.argmax(y_hat, axis = 1)

# now we have to do the backpropagation, which in tensorflow will be extremely easy.
# 1st we compute the cross entropy loss using tensorflow's softmax_cross_entropy_with_logits function
loss_f = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=scores))
# 2nd we ask tensorflow to compute the gradients with respect to all the tensorflow "variables"
# which for us are the trainable weights. We will have to tell it what the objective is, and it will
# also need a "rate" (lr) with which to proceed toward that objetive using Gradient Descent.
GD_step = tf.train.GradientDescentOptimizer(lr).minimize(loss_f)

# everything we have done so far has set up the tensorflow graph, but will not make the neural
# network learn. For learning to take place, we need to initialize a tensorflow session and initialize 
# all the variables
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

# we now ask tensorflow to run actual datadata through the graph.
# The data must be passed in using the feed_dict argument.

# for example, if I want to obtain the initial loss before doing any training:
loss = sess.run(loss_f, feed_dict={X: x_tr, y: y_tr})
print "The initial loss is: ", loss

# If what you want is to train the network using all the training data, then you have to ask:
sess.run(GD_step, feed_dict={X: x_tr, y: y_tr})

# you can loop over this to train over more than one epoch.
nepochs = 15
for i in range(nepochs):
    sess.run(GD_step, feed_dict={X: x_tr, y: y_tr})

# If you want to obtain the accuracy of the network on the training set:
predicted_labels = sess.run(max_y_index, feed_dict={X: x_tr, y: y_tr})
tr_acc = np.mean(predicted_labels == np.argmax(y_tr, axis = 1))
loss = sess.run(loss_f, feed_dict={X: x_tr, y: y_tr})
print "The final training loss is: ", loss
print "The accuracy on the training set is: ", tr_acc

# If you want to obtain the accuracy of the network on the validation set:
predicted_labels = sess.run(max_y_index, feed_dict={X: x_vl, y: y_vl})
vl_acc = np.mean(predicted_labels == np.argmax(y_vl, axis = 1))
print "The accuracy on the validation set is: ", vl_acc
                 
sess.close()                 

The initial loss is:  2.48516
The final training loss is:  2.48272
The accuracy on the training set is:  0.123106060606
The accuracy on the validation set is:  0.0757575757576


We can make this network a little better in the following ways:

1. Updating the forward pass so that now we have "biases" in addition to the weights (why would this be helpful?)

$$\hat{y} = softmax(~\sigma(x~W^{(h)} + b^{(h)})~W^{(o)} + b^{(o)})$$

2. Showing the network not all data at once, but iterating over randomly-generated mini-batches.

3. Can you think of other ways?

Implement at least the first two points above, and any other ideas you can think about to improve this NN.

In [None]:
# your code here: