# 2D Convolutional Neural Network (CNN) - MNIST Example
This is a script using tensorflow and the MNIST handwritten character dataset to build a multiple-class classifier -- the script to download the data onto your own. 

I'll draft up a (neater and better) one for a 3D image soon. This dataset might be useful for testing models on larger datasets.

## Install TensorFlow
To get tensorflow to work I had to create a new anaconda environment with Python 3.5:
```bash
> conda create -n data-x-tf python=3.5 anaconda
```

To install tensorflow, run the following commands:
```bash
> activate data-x-tf
> pip install --ignore-installed --upgrade tensorflow
```
Let me know if you have any issues with this.

In [1]:
# Importing required packages
import os
import numpy as np
import tensorflow as tf

from six.moves import cPickle as pickle  # useful package for saving and loading all sorts of datatype

Locate saved pickle file

In [2]:
data_root = 'mnist/'
pickle_file = os.path.join(data_root, 'MNISTu.pickle')

In [3]:
# Load data from pickle file
with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save

print("From file:")
print("Train: ", train_dataset.shape, train_labels.shape)
print("Valid: ", valid_dataset.shape, valid_labels.shape)
print("Test: ", test_dataset.shape, test_labels.shape)

From file:
Train:  (200000, 28, 28) (200000,)
Valid:  (10000, 28, 28) (10000,)
Test:  (10000, 28, 28) (10000,)


### Setting everything up
Information about the model and some convenience functions:

In [4]:
num_labels = 10
image_size = 28
num_channels = 1


def reformat(dataset, labels):
    labels = (np.arange(num_labels) == labels[:, None]).astype(np.float32)
    dataset = dataset.reshape((-1, image_size, image_size, num_channels)).astype(np.float32)
    return dataset, labels


def accuracy(predictions, labels):
    return 100.0 * np.sum((np.argmax(predictions, 1) == np.argmax(labels, 1))) / predictions.shape[0]

Re-checking the data that's been loaded in from the pickle file:

In [5]:
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

print("Reshaped")
print("Train: ", train_dataset.shape, train_labels.shape)
print("Valid: ", valid_dataset.shape, valid_labels.shape)
print("Test: ", test_dataset.shape, test_labels.shape)

Reshaped
Train:  (200000, 28, 28, 1) (200000, 10)
Valid:  (10000, 28, 28, 1) (10000, 10)
Test:  (10000, 28, 28, 1) (10000, 10)


Defining more model parameters -- hyperparameters and CNN architecture:

In [6]:
# defining parameters for the model architecture
batch_size = 32
patch_size = 5
k_size = 2
depth1 = 32
depth2 = 16
depth3 = 8
num_h1 = 32
num_h2 = 16

# hyperparameters
train_subset = 200000  # amount of the full dataset to use
lbd = 0.0000  # amount of regularization
keep_prob = 0.5  # this is for something called dropout -- another type of regularization
learning_rate = 0.1  # like 'alpha' in linear regression

### Defining the computation graph -- the 'tensors'
This sets up the computation network/nodes without actually calculating or running anything. This model (by no means the best option) includes the following:
- 28x28x1 input images
- stochastic optimization - training batchs of 32 images
- 2x2x32 2d conv + relu (rectified linear unit -- just adds a cheap non-linearity)
- 2x2 max pool
- 2x2x16 2d conv + relu
- 2x2 max pool
- 2x2x8 2d conv + relu
- 32 fully connected layers + relu
- 16 fully connected layers
- 10 output classes

It's a bit of a mess here but I can rewrite these parts to be more general:

In [7]:
graph = tf.Graph()
with graph.as_default():
    with tf.device('/gpu:0'):
        tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
        tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)

        w1 = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth1], stddev=0.1))
        b1 = tf.Variable(tf.zeros([depth1]))

        w2 = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth1, depth2], stddev=0.1))
        b2 = tf.Variable(tf.constant(1.0, shape=[depth2]))

        w3 = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth2, depth3], stddev=0.1))
        b3 = tf.Variable(tf.constant(1.0, shape=[depth3]))

        w4 = tf.Variable(tf.truncated_normal([depth3 * 4 * 4, num_h1],
                                             stddev=0.1))
        b4 = tf.Variable(tf.constant(1.0, shape=[num_h1]))

        w5 = tf.Variable(tf.truncated_normal([num_h1, num_h2], stddev=0.1))
        b5 = tf.Variable(tf.constant(1.0, shape=[num_h2]))

        w6 = tf.Variable(tf.truncated_normal([num_h2, num_labels], stddev=0.1))
        b6 = tf.Variable(tf.constant(1.0, shape=[num_labels]))

        def model(dataset):
            conv = tf.nn.conv2d(dataset, w1, [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + b1)
            hidden = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

            conv = tf.nn.conv2d(hidden, w2,  [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + b2)
            hidden = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

            conv = tf.nn.conv2d(hidden, w3,  [1, 1, 1, 1], padding='SAME')
            hidden = tf.nn.relu(conv + b3)
            hidden = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

            shape = hidden.get_shape().as_list()
            reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
            hidden = tf.nn.relu(tf.matmul(reshape, w4) + b4)
            hidden = tf.nn.relu(tf.matmul(hidden, w5) + b5)
            return tf.matmul(hidden, w6) + b6


    def model_drop(dataset):
        h1 = tf.matmul(dataset, w1) + b1
        h1 = tf.nn.dropout(tf.nn.relu(h1), keep_prob)
        h2 = tf.matmul(h1, w2) + b2
        h2 = tf.nn.dropout(tf.nn.relu(h2), keep_prob)
        h3 = tf.matmul(h2, w3) + b3
        h3 = tf.nn.dropout(tf.nn.relu(h3), keep_prob)
        return tf.matmul(h3, w4) + b4


    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits)) \
        + lbd * (tf.nn.l2_loss(w1)
                 + tf.nn.l2_loss(w2)
                 + tf.nn.l2_loss(w3)
                 + tf.nn.l2_loss(w4)
                 + tf.nn.l2_loss(w5))

    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
    a_optimizer = tf.train.AdagradOptimizer(learning_rate).minimize(loss)

    train_prediction = tf.nn.softmax(model(tf_train_dataset))
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

### Running the graph in a 'session'
A session is where data is actually fed into the model and it is trained. Or alternatively, an already-trained model is used to predict something.

Each time `session.run()` is called, it runs one 'iteration' of the graph we defined above -- in this case, one step of training.

In [8]:
num_steps = 25001

config = tf.ConfigProto()
# config.gpu_options.per_process_gpu_memory_fraction = 0.65
config.gpu_options.allow_growth = True
config.log_device_placement = False

with tf.Session(graph=graph, config=config) as session:
    tf.global_variables_initializer().run()
    print("Initialized")
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}

        _, l, predictions = session.run([a_optimizer, loss, train_prediction], feed_dict=feed_dict)

        if step % 1000 == 0:
            print("\nLoss at step %i: %f" % (step, l))
            print("Batch accuracy: %.1f%%" % (accuracy(predictions, train_labels[offset:(offset + batch_size), :])))
            print("Valid accuracy: %.1f%%" % (accuracy(valid_prediction.eval(), valid_labels)))

    print("Test accuracy: %.1f%%" % (accuracy(test_prediction.eval(), test_labels)))

Initialized

Loss at step 0: 2.333632
Batch accuracy: 9.4%
Valid accuracy: 10.0%

Loss at step 1000: 0.354879
Batch accuracy: 84.4%
Valid accuracy: 85.5%

Loss at step 2000: 0.642489
Batch accuracy: 90.6%
Valid accuracy: 87.5%

Loss at step 3000: 0.550981
Batch accuracy: 81.2%
Valid accuracy: 88.3%

Loss at step 4000: 0.591909
Batch accuracy: 81.2%
Valid accuracy: 88.9%

Loss at step 5000: 0.188805
Batch accuracy: 96.9%
Valid accuracy: 89.5%

Loss at step 6000: 0.377794
Batch accuracy: 84.4%
Valid accuracy: 89.8%

Loss at step 7000: 0.209640
Batch accuracy: 90.6%
Valid accuracy: 89.9%

Loss at step 8000: 0.413695
Batch accuracy: 87.5%
Valid accuracy: 90.0%

Loss at step 9000: 0.438875
Batch accuracy: 87.5%
Valid accuracy: 89.9%

Loss at step 10000: 0.319083
Batch accuracy: 93.8%
Valid accuracy: 90.2%

Loss at step 11000: 0.320049
Batch accuracy: 90.6%
Valid accuracy: 90.3%

Loss at step 12000: 0.391540
Batch accuracy: 84.4%
Valid accuracy: 90.6%

Loss at step 13000: 0.183492
Batch accu