The dataset used in this project is MNIST dataset, you can download it by using built-in tensorflow functions.


For more information about Highway Networks read:

[Highway Networks](https://arxiv.org/pdf/1505.00387.pdf)

[This blog post](https://medium.com/jim-fleming/highway-networks-with-tensorflow-1e6dfa667daa)

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

### Step 1. Load the dataset

In [2]:
mnist_data = input_data.read_data_sets("MNIST_data", one_hot=True)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


### Step 2. Define helper functions

In [3]:
def weights_init(shape):
    '''
    Weights initialization helper function.
    
    Input(s): shape - Type: int list, Example: [5, 5, 32, 32], This parameter is used to define dimensions of weights tensor
    
    Output: tensor of weights in shape defined with the input to this function
    '''
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))

In [4]:
def bias_init(shape, bias_value=0.05):
    '''
    Bias initialization helper function.
    
    Input(s): shape - Type: int list, Example: [32], This parameter is used to define dimensions of bias tensor.
              bias_value - Type: float number, Example: 0.01, This parameter is set to be value of bias tensor.
    
    Output: tensor of biases in shape defined with the input to this function
    '''
    return tf.Variable(tf.constant(bias_value, shape=shape))

In [5]:
def convd2_custom(input, filter_size, number_of_channels, number_of_filters, max_pool=False, padding='SAME', 
                activation=tf.nn.relu):
    '''
    This function is used to define a convolutional layer for a network,
    
    Input(s): input - this is input into convolutional layer (Previous layer or an image)
              filter_size - also called kernel size, kernel is moved (convolved) across an image. Example: 3
              number_of_channels - how many channels the input tensor has
              number_of_filters - this is hyperparameter, and this will set one of dimensions of the output tensor from 
                                  this layer. Note: this number will be number_of_channels for the layer after this one
              max_pool - if this is True, output tensor will be 2x smaller in size. Max pool is there to decrease spartial 
                        dimensions of our output tensor, so computation is less expensive.
              padding - the way that we pad input tensor with zeros ("SAME" or "VALID")
              activation - the non-linear function used at this layer.
              
              
    Output: Convolutional layer with input parameters.
    '''
    weights = weights_init([filter_size, filter_size, number_of_channels, number_of_filters])
    bias = bias_init([number_of_filters])
    
    layer = tf.nn.conv2d(input, filter=weights, strides=[1, 1, 1, 1], padding=padding) + bias
    
    layer = activation(layer)
    
    if max_pool:
        layer = tf.nn.max_pool(layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        
    return layer

In [6]:
def dense_custom(input, input_shape, output_shape, activation=tf.nn.relu, dropout=None):
    '''
    This function is used to define a fully connected layer for a network,
    
    Input(s): input - this is input into fully connected (Dense) layer (Previous layer or an image)
              input_size - how many neurons/features the input tensor has. Example: input.shape[1]
              output_shape - how many neurons this layer will have
              activation - the non-linear function used at this layer.    
              dropout - the regularization method used to prevent overfitting. The way it works, we randomly turn off
                        some neurons in this layer
              
    Output: fully connected layer with input parameters.
    '''
    weights = weights_init([input_shape, output_shape])
    bias = bias_init([output_shape])
    
    layer = tf.matmul(input, weights) + bias
    
    if activation != None:
        layer = activation(layer)
    
    if dropout != None:
        layer = tf.nn.dropout(layer, dropout)
        
    return layer

In [7]:
def flatten(layer):
    '''
    This method is used to convert convolutional output (4 dimensional tensor) into 2 dimensional tensor.
    
    Input(s): layer - the output from last conv layer in your network (4d tensor)
    
    Output(s): reshaped - reshaped layer, 2 dimensional matrix
               elements_num - number of features for this layer
    '''
    shape = layer.get_shape()
    
    elements_num = shape[1:4].num_elements()
    
    reshaped = tf.reshape(layer, [-1, elements_num])
    return reshaped, elements_num

The highway layer what we want are two “gates” that control the flow of information. The “transform” gate controls how much of the activation we pass through and the “carry” gate controls how much of the unmodified input we pass through.

The formula for the highway layer:
![](formula_highway_layer.png?raw=true)

In [8]:
def highway_conv2d(input, filter_size, number_of_channels, number_of_filters, max_pool=False, padding='SAME', 
                activation=tf.nn.relu, transf_bias_value=-1.0):
    
    '''
    This function is used to define a highway convolutional layer for a network,
    
    Input(s): input - this is input into highway convolutional layer (Previous layer or an image)
              filter_size - also called kernel size, kernel is moved (convolved) across an image. Example: 3
              number_of_channels - how many channels the input tensor has
              number_of_filters - this is hyperparameter, and this will set one of dimensions of the output tensor from 
                                  this layer. Note: this number will be number_of_channels for the layer after this one
              max_pool - if this is True, output tensor will be 2x smaller in size. Max pool is there to decrease spartial 
                        dimensions of our output tensor, so computation is less expensive.
              padding - the way that we pad input tensor with zeros ("SAME" or "VALID")
              activation - the non-linear function used at this layer.
              transf_bias_value - Float number -  Bias value in Transform gate
              
    Output: Highway Convolutional layer with input parameters.
    '''
    
    weights = weights_init([filter_size, filter_size, number_of_channels, number_of_filters])
    bias = bias_init([number_of_filters])
    
    layer_H = tf.nn.conv2d(input, filter=weights, strides=[1, 1, 1, 1], padding=padding) + bias
    layer_H = activation(layer_H)
    
    weights_transform = weights_init([filter_size, filter_size, number_of_channels, number_of_filters])
    bias_transform = bias_init([number_of_filters], bias_value=transf_bias_value)
    
    gate_T = tf.nn.conv2d(input, filter=weights_transform, strides=[1, 1, 1, 1], padding=padding) + bias_transform
    gate_T = tf.nn.sigmoid(gate_T)
    
    gate_C = tf.subtract(1.0, gate_T)
    
    y = tf.add(tf.multiply(layer_H, gate_T), tf.multiply(input, gate_C), name="conv2d_highway")
    return y

### Step 3. Highway Deep Conv network

In [9]:
#Inputs to the Highway DC network
inputs = tf.placeholder(tf.float32, [None, 28, 28, 1], name='inputs')
targets = tf.placeholder(tf.float32, [None, 10], name='inputs')

In [21]:
#Defining the network
layer_1 = convd2_custom(inputs, 5, 1, 32)
layer_2 = highway_conv2d(layer_1, 3, 32, 32)
layer_3 = highway_conv2d(layer_2, 3, 32, 32, max_pool=True)
drop = tf.nn.dropout(layer_3, 0.8)

layer_4 = highway_conv2d(drop, 3, 32, 32)
layer_5 = highway_conv2d(layer_4, 3, 32, 32)
layer_6 = highway_conv2d(layer_5, 3, 32, 32, max_pool=True)
drop = tf.nn.dropout(layer_6, 0.6)

layer_7 = highway_conv2d(drop, 3, 32, 32)
layer_8 = highway_conv2d(layer_7, 3, 32, 32)
layer_9 = highway_conv2d(layer_8, 3, 32, 32, max_pool=True)
drop = tf.nn.dropout(layer_9, 0.4)

layer_10 = highway_conv2d(drop, 3, 32, 32)
layer_11 = highway_conv2d(layer_10, 3, 32, 32)
layer_12 = highway_conv2d(layer_11, 3, 32, 32, max_pool=True)
drop = tf.nn.dropout(layer_12, 0.2)

flat, num_elements = flatten(drop)
fc_1 = dense_custom(flat, num_elements, 512, activation=tf.nn.relu)
output = dense_custom(fc_1, 512, 10, activation=None)

In [22]:
# Loss function and optimizer for the network
cost = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=targets))
optimizer = tf.train.AdamOptimizer(0.01).minimize(cost)

In [23]:
#Testing accuracy of the network
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(output), 1), tf.argmax(targets, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

### Step 4. Train, test functions for the model

In [32]:
import time
epochs = 20
batch_size = 64
def optimize():
    
    for i in range(epochs):
        epoch_cost = []
        epoch_time = time.time()
        for ii in range(mnist_data.train.num_examples//batch_size):
            batch = mnist_data.train.next_batch(batch_size)
            imgs = batch[0].reshape((-1, 28, 28, 1))
            labs = batch[1]
            
            c, _ = session.run([cost, optimizer], feed_dict={inputs:imgs, targets:labs})

            epoch_cost.append(c)
        print("Epoch: {}/{}".format(i+1, epochs), " | Current loss: {}".format(np.mean(epoch_cost)),
             "  |  Epoch time: {:.2f}s".format(time.time() - epoch_time))

In [44]:
batch_size_validate = 1000
def validate_model():
    accuracy_per_batch = []
    for ii in range(mnist_data.validation.num_examples//batch_size_validate):
        batch = mnist_data.validation.next_batch(batch_size_validate)
        imgs = batch[0].reshape((-1, 28, 28, 1))
        labs = batch[1]

        accuracy_per_batch.append(session.run(accuracy, feed_dict={inputs:imgs, targets:labs}))

    print("Validation accuracy {}".format(accuracy_per_batch))
    print("Validation accuracy average: {}".format(np.mean(accuracy_per_batch)))

In [43]:
batch_size_test = 1000
def test_model():
    accuracy_per_batch = []
    for ii in range(mnist_data.test.num_examples//batch_size_test):
        batch = mnist_data.test.next_batch(batch_size_test)
        imgs = batch[0].reshape((-1, 28, 28, 1))
        labs = batch[1]

        accuracy_per_batch.append(session.run(accuracy, feed_dict={inputs:imgs, targets:labs}))

    print("Test accuracy {}".format(accuracy_per_batch))
    print("Test accuracy average: {}".format(np.mean(accuracy_per_batch)))

### Step 5. Training and testint the Highway DC network

In [27]:
session = tf.Session()

In [28]:
session.run(tf.global_variables_initializer())

In [33]:
optimize()

Epoch: 1/20  | Current loss: 9.693116188049316   |  Epoch time: 59.19s
Epoch: 2/20  | Current loss: 8.4194917678833   |  Epoch time: 59.06s
Epoch: 3/20  | Current loss: 7.4638142585754395   |  Epoch time: 59.09s
Epoch: 4/20  | Current loss: 6.606746673583984   |  Epoch time: 59.44s
Epoch: 5/20  | Current loss: 6.440258979797363   |  Epoch time: 58.99s
Epoch: 6/20  | Current loss: 6.112809658050537   |  Epoch time: 59.10s
Epoch: 7/20  | Current loss: 5.248502254486084   |  Epoch time: 59.00s
Epoch: 8/20  | Current loss: 9.59684944152832   |  Epoch time: 59.01s
Epoch: 9/20  | Current loss: 5.251861095428467   |  Epoch time: 59.17s
Epoch: 10/20  | Current loss: 9.391960144042969   |  Epoch time: 59.26s
Epoch: 11/20  | Current loss: 12.334983825683594   |  Epoch time: 59.29s
Epoch: 12/20  | Current loss: 24.280094146728516   |  Epoch time: 59.21s
Epoch: 13/20  | Current loss: 12.361469268798828   |  Epoch time: 59.22s
Epoch: 14/20  | Current loss: 8.058774948120117   |  Epoch time: 59.14s


In [45]:
test_model()

Test accuracy [0.93900001, 0.93000007, 0.93300003, 0.93599999, 0.93199998, 0.92500001, 0.926, 0.94300008, 0.93400002, 0.94300002]
Test accuracy average: 0.9340999722480774


In [46]:
validate_model()

Validation accuracy [0.92700011, 0.93400002, 0.92400002, 0.93000007, 0.93000007]
Validation accuracy average: 0.9290000796318054


In [None]:
# session.close()
#Close the session after testing the network