# Day 3: Multi-Layer Perceptrons

In this lab, we aim to solve the problem of classifying human faces into males and females. We have at our disposition a dataset of 200 images of celebrity faces and their associated labels (0 for male and 1 for female).  

We  use a face detector from the dlib library to  estimate  the  location  of  68  (x,  y) coordinates  that  map  to specific facial regions. The image below visualizes what each of these coordinates maps to:


![](face_feature_extraction.png)

We then use tensorflow to define and train a multi-layer perceptron (MLP) graph to classify images using the features visualized above. Using the code below, try to apply the following changes:
    
1 - Change the complexity of the 2-layer MLP by increasing or decreasing the number of neurons in each layer.

2 - Try increasing the number of layers used. This should increase the "depth" of the MLP. To do so, you must change the definition of the MLP function in "multilayer_perceptron" and the weights/biases allocation function "allocate_weights_and_biases".

3 - Try changing how the weights and parameters are initialized. What would happen if you initialize all parameters to zero ?

4 - Try increasing or decreasing the learning rate and number of training epochs. How does this affect the "fitting" to training data ?


### Import APIs to be used 

In [None]:
import tensorflow as tf
from AMLS_Week7 import lab3_data as import_data
import numpy as np

### Load  CelebA data and create train and test splits (Train: 100 exmaples, Test: 100 examples)

In [None]:
def get_data():
    X, y = import_data.extract_features_labels()
    Y = np.array([y, -(y - 1)]).T
    tr_X = X[:100] ; tr_Y = Y[:100]
    te_X = X[100:] ; te_Y = Y[100:]

    return tr_X, tr_Y, te_X, te_Y

### Allocate memory for weights and biases for all MLP layers
You can try changing the number of neurons to increase or decrease the complexity of the MLP.

In [None]:
def allocate_weights_and_biases():

    # define number of hidden layers ..
    n_hidden_1 = 2048  # 1st layer number of neurons
    n_hidden_2 = 2048  # 2nd layer number of neurons

    # inputs placeholders
    X = tf.placeholder("float", [None, 68, 2])
    Y = tf.placeholder("float", [None, 2])  # 2 output classes
    
    # flatten image features into one vector (i.e. reshape image feature matrix into a vector)
    images_flat = tf.contrib.layers.flatten(X)  
    
    # weights and biases are initialized from a normal distribution with a specified standard devation stddev
    stddev = 0.01
    
    # define placeholders for weights and biases in the graph
    weights = {
        'hidden_layer1': tf.Variable(tf.random_normal([68 * 2, n_hidden_1], stddev=stddev)),
        'hidden_layer2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], stddev=stddev)),
        'out': tf.Variable(tf.random_normal([n_hidden_2, 2], stddev=stddev))
    }

    biases = {
        'bias_layer1': tf.Variable(tf.random_normal([n_hidden_1], stddev=stddev)),
        'bias_layer2': tf.Variable(tf.random_normal([n_hidden_2], stddev=stddev)),
        'out': tf.Variable(tf.random_normal([2], stddev=stddev))
    }
    
    return weights, biases, X, Y, images_flat
    

### Define how the weights and biases are used for inferring classes from inputs (i.e. define MLP function)

You can add more layers to the MLP to fit more complicated functions. Adding more layers requires more learnable weights and biases, which need to defined in "allocate_weights_and_biases" first.

In [None]:
# Create model
def multilayer_perceptron():
        
    weights, biases, X, Y, images_flat = allocate_weights_and_biases()

    # Hidden fully connected layer 1
    layer_1 = tf.add(tf.matmul(images_flat, weights['hidden_layer1']), biases['bias_layer1'])
    layer_1 = tf.sigmoid(layer_1)

    # Hidden fully connected layer 2
    layer_2 = tf.add(tf.matmul(layer_1, weights['hidden_layer2']), biases['bias_layer2'])
    layer_2 = tf.sigmoid(layer_2)
    
    # Output fully connected layer
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']

    return out_layer, X, Y



### Define graph training operation
The loss function (i.e. the value to minimize) is defined as the cross entropy between the predicted classes and the class ground truth. The train operation is then included within the graph as a weight/bias update operation.

Try changing the learning rate, how would setting a low or high learning rate affect the "fitting" to the training set ?

In [None]:
# learning parameters
learning_rate = 1e-5
training_epochs = 500

# display training accuracy every ..
display_accuracy_step = 10

    
training_images, training_labels, test_images, test_labels = get_data()
logits, X, Y = multilayer_perceptron()

# define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)

# define training graph operation
train_op = optimizer.minimize(loss_op)

# graph operation to initialize all variables
init_op = tf.global_variables_initializer()

### Run graph for specified number of epochs.

After the graph is defined, different operations in the graph can be run by specifying them in the sess.run() function.
A session is wrapper for running graphs. Outputs can also be acquired from the graph by including them in the variable list of sess.run().

In [None]:
with tf.Session() as sess:

        # run graph weights/biases initialization op
        sess.run(init_op)
        # begin training loop ..
        for epoch in range(training_epochs):
            # complete code below
            # run optimization operation (backprop) and cost operation (to get loss value)
            _, cost = sess.run([train_op, loss_op], feed_dict={X: ...,
                                                               Y: ...})

            # Display logs per epoch step
            print("Epoch:", '%04d' % (epoch + 1), "cost={:.9f}".format(cost))
                
            if epoch % display_accuracy_step == 0:
                pred = tf.nn.softmax(logits)  # Apply softmax to logits
                correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))

                # calculate training accuracy
                accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
                print("Accuracy: {:.3f}".format(accuracy.eval({X: training_images, Y: training_labels})))

        print("Optimization Finished!")

        # -- Define and run test operation -- #
        
        # apply softmax to output logits
        pred = tf.nn.softmax(logits)
        
        #  derive inffered calasses as the class with the top value in the output density function
        correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
        
        # calculate accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
            
        # complete code below
        # run test accuracy operation ..
        print("Test Accuracy:", accuracy.eval({X: ..., Y: ...}))

