### Problem 1. 
Please find 2 files from Google’s tutorials sets. I used file mnist2.py in class yesterday and for preparation of my notes. If you read the file carefully you will see that you can run it in at least two modes. The way it is setup now it selects one learning rate and one particular neural network architecture and generates TensorBoard graph in a particular directory. One problem with this script is that its accuracy is surprisingly low. Such complex architecture and so many lines of code and we get 70% or lower accuracy. We expected more from Convolutional Neural Networks.  File cnn_mnist.py is practically the same, at least it does all the same things, creates the same architecture, sets the same or similar parameters, but does much better job. Its accuracy is in high 90%-s. Run two files compare results and then fix the first file (mnist2.py) based on what you saw in file cnn_mnist.py. Capture the Accuracy and Cross Entropy (summary) graphs from the corrected version of mnist2.py and provide working and fixed version of that file. Please describe in detail experiments you undertook and fixes you made. (45%)

![](img/original-TensorBoard.png)

![](img/original-TensorBoard-2.png)

#### Since we are benchmarking with cnn_mnist.py program, choose the parameters similar to that program. So we can compare apples to apples.
Iterations: 500 <br>
Learning rate: .005<br>
use_two_fc: True<br>
use_two_conv: True<br>
![](img/lr005_steps500.png)

#### Bias values for both conv_layer and fc_layer have been set to constant 0.1. This could cause problems changing these values to tf.zeros for conv_layer and  truncated_normal fc_layer, that is what's been used in cnn_mnist as well.
```python
## For conv_layer()
b = tf.Variable(tf.zeros([size_out], dtype=tf.float32), name="B")

## For fc_layer()
b = tf.Variable(tf.truncated_normal([size_out], stddev=0.1, dtype=tf.float32), name="B")
```

#### After removing constants
![](img/lr005_steps500_remove_constants.png)

#### That didn't really help much. Accuracy is still around 0.1.

#### Filter size for the conv_layer is set to a 5 x 5 matrix. This could be an issue as our images 28 x 28. Let's try changing it to 4 x4, similar to cnn_mist program.
```python
w = tf.Variable(tf.truncated_normal([4, 4, size_in, size_out], stddev=0.1), name="W")
```

![](img/lr005_steps500_remove_constants_4x4.png)

#### Changing the filter to 4x4 increased accuracy to about 40%.
Let's try changing the fully connected output to 100 from 1024.
```python
if use_two_fc:
    fc1 = fc_layer(flattened, 7 * 7 * conv2_features, 100, "fc1")
    embedding_input = fc1
    embedding_size = 100
    logits = fc_layer(fc1, 100, 10, "fc2")
```

![](img/lr005_steps500_rm_const_4x4_fc100.png)
#### Changing the fully connected layer's output to 100 increased the accuracy to about 70%

#### Let's try changing the optimizer to MomentumOptimizer.
```python
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(xent)
```

![](img/lr005_steps500_rm_const_4x4_fc100_mon.png)
#### We can see changing the optimizer increased the accuracy about 90%

#### Let's change the fully connected layer 1 size from 100 to 512. We can clearly see that increased accuracy of 98%
![](img/final-500-512.png)

#### Cross entropy
![](img/final-500-512-xent.png)

#### Fixed Code
```python

# Copyright 2017 Google, Inc. All Rights Reserved.
#
# ==============================================================================
import os
import tensorflow as tf
import sys
import urllib


if sys.version_info[0] >= 3:
    from urllib.request import urlretrieve
else:
    from urllib import urlretrieve

LOGDIR = 'log_mnist_500_512_2/'
GITHUB_URL = 'https://raw.githubusercontent.com/mamcgrath/TensorBoard-TF-Dev-Summit-Tutorial/master/'
GENERATIONS = 500

### MNIST EMBEDDINGS ###
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(
    train_dir=LOGDIR + 'data', one_hot=True)
### Get a sprite and labels file for the embedding projector ###
urlretrieve(GITHUB_URL + 'labels_1024.tsv', LOGDIR + 'labels_1024.tsv')
urlretrieve(GITHUB_URL + 'sprite_1024.png', LOGDIR + 'sprite_1024.png')

# Add convolution layer


def conv_layer(input, size_in, size_out, name="conv"):
    with tf.name_scope(name):
        #w = tf.Variable(tf.zeros([5, 5, size_in, size_out]), name="W")
        #b = tf.Variable(tf.zeros([size_out]), name="B")
        w = tf.Variable(tf.truncated_normal(
            [4, 4, size_in, size_out], stddev=0.1), name="W")
        b = tf.Variable(tf.zeros([size_out], dtype=tf.float32), name="B")
        conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
        act = tf.nn.relu(conv + b)
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")


# Add fully connected layer
def fc_layer(input, size_in, size_out, name="fc"):
    with tf.name_scope(name):
        w = tf.Variable(tf.truncated_normal(
            [size_in, size_out], stddev=0.1), name="W")
        b = tf.Variable(tf.truncated_normal(
            [size_out], stddev=0.1, dtype=tf.float32), name="B")
        act = tf.nn.relu(tf.add(tf.matmul(input, w), b))
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        return act


def mnist_model(learning_rate, use_two_conv, use_two_fc, conv1_features, conv2_features,
                hparam, generations=500, fully_connected_size1=100):
    tf.reset_default_graph()
    sess = tf.Session()

    # Setup placeholders, and reshape the data
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x")
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    tf.summary.image('input', x_image, 3)
    y = tf.placeholder(tf.float32, shape=[None, 10], name="labels")

    if use_two_conv:
        conv1 = conv_layer(x_image, 1, conv1_features, "conv1")
        conv_out = conv_layer(conv1, conv1_features, conv2_features, "conv2")
    else:
        conv1 = conv_layer(x_image, 1, conv2_features, "conv")
        conv_out = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[
                                  1, 2, 2, 1], padding="SAME")

    flattened = tf.reshape(conv_out, [-1, 7 * 7 * conv2_features])

    if use_two_fc:
        fc1 = fc_layer(flattened, 7 * 7 * conv2_features, 100, "fc1")
        embedding_input = fc1
        embedding_size = 100
        logits = fc_layer(fc1, 100, 10, "fc2")
    else:
        embedding_input = flattened
        embedding_size = 7 * 7 * conv2_features
        logits = fc_layer(flattened, 7 * 7 * conv2_features, 10, "fc")

    with tf.name_scope("xent"):
        xent = tf.reduce_mean(
            tf.nn.softmax_cross_entropy_with_logits(
                logits=logits, labels=y), name="xent")
        tf.summary.scalar("xent", xent)

    with tf.name_scope("train"):
        train_step = tf.train.MomentumOptimizer(
            learning_rate, 0.9).minimize(xent)

    with tf.name_scope("accuracy"):
        correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar("accuracy", accuracy)

    summ = tf.summary.merge_all()

    embedding = tf.Variable(
        tf.zeros([1024, embedding_size]), name="test_embedding")
    assignment = embedding.assign(embedding_input)
    saver = tf.train.Saver()

    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter(LOGDIR + hparam)
    writer.add_graph(sess.graph)

    config = tf.contrib.tensorboard.plugins.projector.ProjectorConfig()
    embedding_config = config.embeddings.add()
    embedding_config.tensor_name = embedding.name
    embedding_config.sprite.image_path = LOGDIR + 'sprite_1024.png'
    embedding_config.metadata_path = LOGDIR + 'labels_1024.tsv'
    # Specify the width and height of a single thumbnail.
    embedding_config.sprite.single_image_dim.extend([28, 28])
    tf.contrib.tensorboard.plugins.projector.visualize_embeddings(
        writer, config)

    for i in range(generations + 1):
        batch = mnist.train.next_batch(100)
        if i % 5 == 0:
            [train_accuracy, s] = sess.run([accuracy, summ], feed_dict={
                                           x: batch[0], y: batch[1]})
            writer.add_summary(s, i)
        if i % (generations / 4) == 0:
            sess.run(assignment, feed_dict={
                     x: mnist.test.images[:1024], y: mnist.test.labels[:1024]})
            saver.save(sess, os.path.join(LOGDIR, "model.ckpt"), i)
        sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})


def make_hparam_string(learning_rate, use_two_fc, use_two_conv, conv1_features, conv2_features):
    conv_param = "conv2" if use_two_conv else "conv1"
    fc_param = "fc2" if use_two_fc else "fc1"
    return "lr_%.0E%s%s_%d_%d" % (learning_rate, conv_param, fc_param, conv1_features, conv2_features)


def main():
    # You can try adding some more learning rates
    # for learning_rate in [1E-3, 1E-4, 1E-5]:
    for learning_rate in [.005]:
        # Include "False" as a value to try different model architectures
        # for use_two_fc in [True, False]:
        for use_two_fc in [True]:
            # for use_two_conv in [True, False]:
            for use_two_conv in [True]:
                # for use_two_conv in [25, 32]:
                for conv1_features in [32]:
                    # for use_two_conv in [50, 64]:
                    for conv2_features in [64]:
                        # Construct a hyperparameter string for each one (example:
                        # "lr_1E-3fc2conv2")
                        hparam = make_hparam_string(
                            learning_rate, use_two_fc, use_two_conv, conv1_features, conv2_features)
                        print('Starting run for %s' % hparam)
                        # this forces print-ed lines to show up.
                        sys.stdout.flush()

                        # Actually run with the new settings
                        mnist_model(learning_rate, use_two_fc, use_two_conv, conv1_features,
                                    conv2_features, hparam, GENERATIONS, fully_connected_size1=512)


if __name__ == '__main__':
    main()
```

### Problem 2. 
Run corrected version of mnist2.py for 4 different architectures (2 conv, 1 conv, 2 fully connected, 1 fully connected layer) and 3 values of the learning rate. As one learning rate choose the one you selected in Problem 1 and then add one smaller and one larger learning rate around that one. Capture Accuracy (summary) graphs and One of Histograms to demonstrate to us that your code is working. Please also capture an image of “colorful” T-SNE Embedding. Please be aware that you are running 12 models and the execution might take many minutes. You might want to run your models in smaller groups so that you see them finish their work without too much wait. Submit working code of  mnist2.py used in this problem. Collect execution times, final (smoothed) accuracies and final cross entropies for different models and provide tabulated presentation of the final results of different models (20%)

#### Collect execution times, final (smoothed) accuracies and final cross entropies for different models and provide tabulated presentation of the final results of different models 

In [1]:

# Copyright 2017 Google, Inc. All Rights Reserved.
#
# ==============================================================================
import os
import time
import sys
import tensorflow as tf
import urllib
import pandas as pd


if sys.version_info[0] >= 3:
    from urllib.request import urlretrieve
else:
    from urllib import urlretrieve

LOGDIR = 'log_mnist_fixed_1/'
GITHUB_URL = 'https://raw.githubusercontent.com/mamcgrath/TensorBoard-TF-Dev-Summit-Tutorial/master/'
GENERATIONS = 500

### MNIST EMBEDDINGS ###
mnist = tf.contrib.learn.datasets.mnist.read_data_sets(
    train_dir=LOGDIR + 'data', one_hot=True)
### Get a sprite and labels file for the embedding projector ###
#urlretrieve(GITHUB_URL + 'labels_1024.tsv', LOGDIR + 'labels_1024.tsv')
#urlretrieve(GITHUB_URL + 'sprite_1024.png', LOGDIR + 'sprite_1024.png')

# Add convolution layer
def conv_layer(input, size_in, size_out, name="conv"):
    with tf.name_scope(name):
        #w = tf.Variable(tf.zeros([5, 5, size_in, size_out]), name="W")
        #b = tf.Variable(tf.zeros([size_out]), name="B")
        w = tf.Variable(tf.truncated_normal(
            [4, 4, size_in, size_out], stddev=0.1), name="W")
        #b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
        b = tf.Variable(tf.zeros([size_out], dtype=tf.float32), name="B")
        conv = tf.nn.conv2d(input, w, strides=[1, 1, 1, 1], padding="SAME")
        act = tf.nn.relu(conv + b)
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        return tf.nn.max_pool(act, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")


# Add fully connected layer
def fc_layer(input, size_in, size_out, name="fc"):
    with tf.name_scope(name):
        w = tf.Variable(tf.truncated_normal(
            [size_in, size_out], stddev=0.1), name="W")
        #b = tf.Variable(tf.constant(0.1, shape=[size_out]), name="B")
        #b = tf.Variable(tf.zeros([size_out], dtype=tf.float32), name="B")
        b = tf.Variable(tf.truncated_normal(
            [size_out], stddev=0.1, dtype=tf.float32), name="B")
        act = tf.nn.relu(tf.add(tf.matmul(input, w), b))
        tf.summary.histogram("weights", w)
        tf.summary.histogram("biases", b)
        tf.summary.histogram("activations", act)
        return act


def mnist_model(learning_rate, use_two_conv, use_two_fc,
                hparam, conv1_features=32, conv2_features=64,
                generations=500, fully_connected_size1=100):
    tf.reset_default_graph()
    sess = tf.Session()

    # Setup placeholders, and reshape the data
    x = tf.placeholder(tf.float32, shape=[None, 784], name="x")
    x_image = tf.reshape(x, [-1, 28, 28, 1])
    tf.summary.image('input', x_image, 3)
    y = tf.placeholder(tf.float32, shape=[None, 10], name="labels")

    if use_two_conv:
        conv1 = conv_layer(x_image, 1, conv1_features, "conv1")
        conv_out = conv_layer(conv1, conv1_features, conv2_features, "conv2")
        # missing tf.nn.max_pool here ??
    else:
        conv1 = conv_layer(x_image, 1, conv2_features, "conv")
        # extra pooling here ??
        conv_out = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1],
                                  strides=[1, 2, 2, 1], padding="SAME")

    flattened = tf.reshape(conv_out, [-1, 7 * 7 * conv2_features])

    if use_two_fc:
        fc1 = fc_layer(flattened, 7 * 7 * conv2_features,
                       fully_connected_size1, "fc1")
        embedding_input = fc1
        embedding_size = fully_connected_size1
        logits = fc_layer(fc1, fully_connected_size1, 10, "fc2")
    else:
        embedding_input = flattened
        embedding_size = 7 * 7 * conv2_features
        logits = fc_layer(flattened, 7 * 7 * conv2_features, 10, "fc")

    with tf.name_scope("xent"):
        xent = tf.reduce_mean(
            tf.nn.softmax_cross_entropy_with_logits(logits=logits,
                                                    labels=y), name="xent")
        tf.summary.scalar("xent", xent)

    with tf.name_scope("train"):
        #train_step = tf.train.AdamOptimizer(learning_rate).minimize(xent)
        train_step = tf.train.MomentumOptimizer(
            learning_rate, 0.9).minimize(xent)

    with tf.name_scope("accuracy"):
        correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar("accuracy", accuracy)

    summ = tf.summary.merge_all()

    embedding = tf.Variable(tf.zeros([1024, embedding_size]),
                            name="test_embedding")
    assignment = embedding.assign(embedding_input)
    saver = tf.train.Saver()

    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter(LOGDIR + hparam)
    writer.add_graph(sess.graph)

    config = tf.contrib.tensorboard.plugins.projector.ProjectorConfig()
    embedding_config = config.embeddings.add()
    embedding_config.tensor_name = embedding.name
    embedding_config.sprite.image_path = LOGDIR + 'sprite_1024.png'
    embedding_config.metadata_path = LOGDIR + 'labels_1024.tsv'
    # Specify the width and height of a single thumbnail.
    embedding_config.sprite.single_image_dim.extend([28, 28])
    tf.contrib.tensorboard.plugins.projector.visualize_embeddings(
        writer, config)

    for i in range(generations + 1):
        batch = mnist.train.next_batch(100)
        if i % 5 == 0:
            [train_accuracy, s] = sess.run([accuracy, summ],
                                           feed_dict={x: batch[0], y: batch[1]})
            writer.add_summary(s, i)
        if i % (generations / 4) == 0:
            sess.run(assignment,
                     feed_dict={x: mnist.test.images[:1024], y: mnist.test.labels[:1024]})
            saver.save(sess, os.path.join(LOGDIR, "model.ckpt"), i)
        sess.run(train_step, feed_dict={x: batch[0], y: batch[1]})
    [train_accuracy, train_xent] = sess.run(
        [accuracy, xent], feed_dict={x: batch[0], y: batch[1]})
    return [train_accuracy, train_xent]


def make_hparam_string(learning_rate, use_two_fc, use_two_conv):
    conv_param = "conv2" if use_two_conv else "conv1"
    fc_param = "fc2" if use_two_fc else "fc1"
    return "lr_%.0E%s%s" % (learning_rate, conv_param, fc_param)

# error Starting run for lr_1E-03conv2fc1_25+50


def main():
    model_metrics_cols = ['Exec. Time', 'Accuracy', 'Cross Entropy']
    model_metrics_result = []
    model_metrics_idx = []
    # You can try adding some more learning rates
    # for learning_rate in [1E-3, 1E-4, 1E-5]:
    for learning_rate in [1E-3, 5E-3, 1E-4]:
        # Include "False" as a value to try different model architectures
        # for use_two_fc in [True, False]:
        for use_two_fc in [True, False]:
            # for use_two_conv in [True, False]:
            for use_two_conv in [True, False]:
                # Construct a hyperparameter string for each one (example:
                # "lr_1E-3fc2conv2")
                hparam = make_hparam_string(learning_rate,
                                            use_two_fc, use_two_conv)
                print('Starting run for %s' % hparam)
                # this forces print-ed lines to show up.
                sys.stdout.flush()
                start_time = time.time()
                # Actually run with the new settings
                accuracy, xent = mnist_model(
                    learning_rate, use_two_fc,
                    use_two_conv, hparam, GENERATIONS,
                    fully_connected_size1=100)
                total_time = time.time() - start_time
                model_metrics_idx.append(hparam)
                model_metrics_result.append(
                    [total_time, accuracy, xent])
                print(model_metrics_result)
    df = pd.DataFrame(model_metrics_result,
                      index=model_metrics_idx,
                      columns=model_metrics_cols)
    print(df)


if __name__ == '__main__':
    main()


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting log_mnist_fixed_1/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting log_mnist_fixed_1/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting log_mnist_fixed_1/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting log_mnist_fixed_1/data/t10k-labels-idx1-ubyte.gz
Starting run for lr_1E-03conv2fc2
[[853.5753090381622, 0.40000001, 1.4487785]]
Starting run for lr_1E-03conv1fc2
[[853.5753090381622, 0.40000001, 1.4487785], [812.8208549022675, 0.31, 1.8688296]]
Starting run for lr_1E-03conv2fc1
[[853.5753090381622, 0.40000001, 1.4487785], [812.8208549022675, 0.31, 1.8688296], [47.31326913833618, 0.63999999, 1.0907539]]
Starting run for lr_1E-03conv1fc1
[[853.5753090381622, 0.40000001, 1.4487785], [812.8208549022675, 0.31, 1.8688296], [47.31326913833618, 0.63999

                  Exec. Time  Accuracy  Cross Entropy
lr_1E-03conv2fc2  121.250245      0.77       0.591943
lr_5E-03conv2fc2  111.394670      0.32       1.591292
lr_1E-04conv2fc2  115.845690      0.58       1.320814

### Problem 3. 
Modify file cnn_mnist.py  so that it publishes its summaries to the TensorBoard. Describe changes you are making and provide images of Accuracy and Cross Entropy summaries as captured by the Tensor Board. Provide the Graph of your model. Describe the differences if any between the graph of this program and the graph generated by mnist2.py script running with 2 convolutional and 2 fully connected layers. Provide working code.  (35%).