The pooling and convolutional ops slide a "window" across the input tensor. Using tf.nn.conv2d as an example: If the input tensor has 4 dimensions:  [batch, height, width, channels], then the convolution operates on a 2D window on the height, width dimensions.

strides determines how much the window shifts by in each of the dimensions. The typical use sets the first (the batch) and last (the depth) stride to 1.

Let's use a very concrete example: Running a 2-d convolution over a 32x32 greyscale input image. I say greyscale because then the input image has depth=1, which helps keep it simple. Let that image look like this:

00 01 02 03 04 ...
10 11 12 13 14 ...
20 21 22 23 24 ...
30 31 32 33 34 ...
...
Let's run a 2x2 convolution window over a single example (batch size = 1). We'll give the convolution an output channel depth of 8.

The input to the convolution has shape=[1, 32, 32, 1].

If you specify strides=[1,1,1,1] with padding=SAME, then the output of the filter will be [1, 32, 32, 8].

The filter will first create an output for:

F(00 01
  10 11)
And then for:

F(01 02
  11 12)
and so on. Then it will move to the second row, calculating:

F(10, 11
  20, 21)
then

F(11, 12
  21, 22)
If you specify a stride of [1, 2, 2, 1] it won't do overlapping windows. It will compute:

F(00, 01
  10, 11)
and then

F(02, 03
  12, 13)
The stride operates similarly for the pooling operators.

Question 2: Why strides [1, x, y, 1] for convnets

The first 1 is the batch: You don't usually want to skip over examples in your batch, or you shouldn't have included them in the first place. :)

The last 1 is the depth of the convolution: You don't usually want to skip inputs, for the same reason.

The conv2d operator is more general, so you could create convolutions that slide the window along other dimensions, but that's not a typical use in convnets. The typical use is to use them spatially.

Why reshape to -1 -1 is a placeholder that says "adjust as necessary to match the size needed for the full tensor." It's a way of making the code be independent of the input batch size, so that you can change your pipeline and not have to adjust the batch size everywhere in the code.

# Simple examples of convolution to do some basic filters
Also demonstrates the use of TensorFlow data readers.

We will use some popular filters for our image.
It seems to be working with grayscale images, but not with rgb images.
It's probably because I didn't choose the right kernels for rgb images.

kernels for rgb images have dimensions 3 x 3 x 3 x 3
kernels for grayscale images have dimensions 3 x 3 x 1 x 1

Note:
When you call tf.train.string_input_producer,
a tf.train.QueueRunner is added to the graph, which must be run using
e.g. tf.train.start_queue_runners() else your session will run into deadlock
and your program will crash.

And to run QueueRunner, you need a coordinator to close to your queue for you.
Without coordinator, your threads will keep on running outside session and you will have the error:
ERROR:tensorflow:Exception in QueueRunner: Attempted to use a closed Session.

In [2]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import sys
sys.path.append('..')

from matplotlib import gridspec as gridspec
from matplotlib import pyplot as plt
import tensorflow as tf

import examples.kernels

FILENAME = 'examples/data/friday.jpg'

In [3]:
def read_one_image(filename):
    """ This is just to demonstrate how to open an image in TensorFlow,
    but it's actually a lot easier to use Pillow 
    """
    filename_queue = tf.train.string_input_producer([filename])
    image_reader = tf.WholeFileReader()
    _, image_file = image_reader.read(filename_queue)
    image = tf.image.decode_jpeg(image_file, channels=3)
    image = tf.cast(image, tf.float32) / 256.0 # cast to float to make conv2d work
    return image

In [4]:
def convolve(image, kernels, rgb=True, strides=[1, 3, 3, 1], padding='SAME'):
    images = [image[0]]
    for i, kernel in enumerate(kernels):
        filtered_image = tf.nn.conv2d(image, kernel, strides=strides, padding=padding)[0]
        if i == 2:
            filtered_image = tf.minimum(tf.nn.relu(filtered_image), 255)
        images.append(filtered_image)
    return images

This class implements a simple mechanism to coordinate the termination of a set of threads.
Create a coordinator.
coord = Coordinator()
Start a number of threads, passing the coordinator to each of them.
...start thread 1...(coord, ...)
...start thread N...(coord, ...)
Wait for all the threads to terminate.
coord.join(threads)
Any of the threads can call coord.request_stop() to ask for all the threads to stop. To cooperate with the requests, each thread must check for coord.should_stop() on a regular basis. coord.should_stop() returns True as soon as coord.request_stop() has been called.

In [5]:
def get_real_images(images):
    with tf.Session() as sess:
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)
        images = sess.run(images)
        coord.request_stop()
        coord.join(threads)
    return images

In [14]:
def show_images(images, rgb=True):
    gs = gridspec.GridSpec(1, len(images))
    for i, image in enumerate(images):
        plt.subplot(gs[0, i])
        if rgb:
            print(image)
            plt.imshow(image)
        else: 
            image = image.reshape(image.shape[0], image.shape[1])
            print(image)
            plt.imshow(image, cmap='gray')
        plt.axis('off')
    plt.show()

In [16]:
def main():
    rgb = True
    if rgb:
        kernels_list = [examples.kernels.BLUR_FILTER_RGB, examples.kernels.SHARPEN_FILTER_RGB, examples.kernels.EDGE_FILTER_RGB, 
                    examples.kernels.TOP_SOBEL_RGB, examples.kernels.EMBOSS_FILTER_RGB]
    else:
        kernels_list = [examples.kernels.BLUR_FILTER, examples.kernels.SHARPEN_FILTER, examples.kernels.EDGE_FILTER, 
                    examples.kernels.TOP_SOBEL, examples.kernels.EMBOSS_FILTER]

    image = read_one_image(FILENAME)
    if not rgb:
        image = tf.image.rgb_to_grayscale(image)
    image = tf.expand_dims(image, 0) # to make it into a batch of 1 element
    images = convolve(image, kernels_list, rgb)
    images = get_real_images(images)
    show_images(images, rgb)
    
if __name__ == '__main__':
    main()

[[[ 0.98046875  0.98828125  0.96875   ]
  [ 0.98046875  0.98828125  0.96875   ]
  [ 0.9765625   0.984375    0.97265625]
  ..., 
  [ 0.9765625   0.9921875   0.99609375]
  [ 0.9765625   0.9921875   0.99609375]
  [ 0.9765625   0.9921875   0.99609375]]

 [[ 0.984375    0.9921875   0.97265625]
  [ 0.9765625   0.99609375  0.97265625]
  [ 0.984375    0.9921875   0.98046875]
  ..., 
  [ 0.9765625   0.9921875   0.99609375]
  [ 0.9765625   0.9921875   0.99609375]
  [ 0.9765625   0.9921875   0.99609375]]

 [[ 0.96875     0.98828125  0.97265625]
  [ 0.96484375  0.98828125  0.97265625]
  [ 0.96875     0.98828125  0.97265625]
  ..., 
  [ 0.98046875  0.99609375  0.9765625 ]
  [ 0.98046875  0.99609375  0.9765625 ]
  [ 0.98046875  0.99609375  0.9765625 ]]

 ..., 
 [[ 0.97265625  0.98046875  0.96875   ]
  [ 0.97265625  0.98046875  0.96875   ]
  [ 0.97265625  0.98046875  0.96875   ]
  ..., 
  [ 0.99609375  0.9765625   0.99609375]
  [ 0.98828125  0.98046875  0.9921875 ]
  [ 0.99609375  0.9765625   0.99609

ValueError: Floating point image RGB values must be in the 0..1 range.

<matplotlib.figure.Figure at 0x7ff191f49eb8>

# Using convolutional net on MNIST dataset of handwritten digit

In [1]:
from __future__ import print_function
from __future__ import division
from __future__ import print_function

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import time 

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import examples.utils

In [2]:
N_CLASSES = 10

# Step 1: Read in data
# using TF Learn's built in function to load MNIST data to the folder data/mnist
mnist = input_data.read_data_sets("examples/data/mnist", one_hot=True)

# Step 2: Define paramaters for the model
LEARNING_RATE = 0.001
BATCH_SIZE = 128
SKIP_STEP = 10
DROPOUT = 0.75
N_EPOCHS = 1

Extracting examples/data/mnist/train-images-idx3-ubyte.gz
Extracting examples/data/mnist/train-labels-idx1-ubyte.gz
Extracting examples/data/mnist/t10k-images-idx3-ubyte.gz
Extracting examples/data/mnist/t10k-labels-idx1-ubyte.gz


In [3]:
# Step 3: create placeholders for features and labels
# each image in the MNIST data is of shape 28*28 = 784
# therefore, each image is represented with a 1x784 tensor
# We'll be doing dropout for hidden layer so we'll need a placeholder
# for the dropout probability too
# Use None for shape so we can change the batch_size once we've built the graph
with tf.name_scope('data'):
    X = tf.placeholder(tf.float32, [None, 784], name="X_placeholder")
    Y = tf.placeholder(tf.float32, [None, 10], name="Y_placeholder")

dropout = tf.placeholder(tf.float32, name='dropout')

# Step 4 + 5: create weights + do inference
# the model is conv -> relu -> pool -> conv -> relu -> pool -> fully connected -> softmax

global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')

examples.utils.make_dir('checkpoints')
examples.utils.make_dir('checkpoints/convnet_mnist')

In [4]:
with tf.variable_scope('conv1') as scope:
    # first, reshape the image to [BATCH_SIZE, 28, 28, 1] to make it work with tf.nn.conv2d
    # use the dynamic dimension -1
    images = tf.reshape(X, shape=[-1, 28, 28, 1])
    
    # TO DO

    # create kernel variable of dimension [5, 5, 1, 32]
    # use tf.truncated_normal_initializer()
    kernel = tf.get_variable('kernels', shape=[5, 5, 1, 32], initializer=tf.truncated_normal_initializer())
    # TO DO

    # create biases variable of dimension [32]
    # use tf.constant_initializer(0.0)
    bias = tf.get_variable('biases', shape=[32], initializer=tf.constant_initializer(0.0))
    # TO DO 

    # apply tf.nn.conv2d. strides [1, 1, 1, 1], padding is 'SAME'
    conv = tf.nn.conv2d(images, kernel, strides=[1, 1, 1, 1], padding='SAME')
    # TO DO

    # apply relu on the sum of convolution output and biases
    conv1 = tf.nn.relu(conv + bias, name=scope.name)
    # TO DO 

    # output is of dimension BATCH_SIZE x 28 x 28 x 32
    #conv1 = layers.conv2d(images, 32, 5, 1, activation_fn=tf.nn.relu, padding='SAME')

In [5]:
with tf.variable_scope('pool1') as scope:
    # apply max pool with ksize [1, 2, 2, 1], and strides [1, 2, 2, 1], padding 'SAME'
    pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
    # TO DO

    # output is of dimension BATCH_SIZE x 14 x 14 x 32

In [6]:
with tf.variable_scope('conv2') as scope:
    # similar to conv1, except kernel now is of the size 5 x 5 x 32 x 64
    kernel = tf.get_variable('kernels', [5, 5, 32, 64], 
                        initializer=tf.truncated_normal_initializer())
    biases = tf.get_variable('biases', [64],
                        initializer=tf.random_normal_initializer())
    conv = tf.nn.conv2d(pool1, kernel, strides=[1, 1, 1, 1], padding='SAME')
    conv2 = tf.nn.relu(conv + biases, name=scope.name)

    # output is of dimension BATCH_SIZE x 14 x 14 x 64

In [7]:
with tf.variable_scope('pool2') as scope:
    # similar to pool1
    pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
                            padding='SAME')

    # output is of dimension BATCH_SIZE x 7 x 7 x 64

In [8]:
with tf.variable_scope('fc') as scope:
    # use weight of dimension 7 * 7 * 64 x 1024
    input_features = 7 * 7 * 64
    
    # create weights and biases
    w = tf.get_variable('weights', [input_features, 1024], 
                        initializer=tf.truncated_normal_initializer())
    b = tf.get_variable('biases', [1024],
                        initializer=tf.random_normal_initializer())
    # TO DO

    # reshape pool2 to 2 dimensional
    pool2 = tf.reshape(pool2, [-1, input_features])

    # apply relu on matmul of pool2 and w + b
    fc = tf.nn.relu(tf.matmul(pool2, w) + b, name='relu')
    
    # TO DO

    # apply dropout
    fc = tf.nn.dropout(fc, dropout, name='relu_dropout')

In [9]:
with tf.variable_scope('softmax_linear') as scope:
    # this you should know. get logits without softmax
    # you need to create weights and biases
    w = tf.get_variable('weights', [1024, N_CLASSES],
                        initializer=tf.truncated_normal_initializer())
    b = tf.get_variable('biases', [N_CLASSES],
                        initializer=tf.random_normal_initializer())
    logits = tf.matmul(fc, w) + b
    # TO DO

In [10]:
# Step 6: define loss function
# use softmax cross entropy with logits as the loss function
# compute mean cross entropy, softmax is applied internally
with tf.name_scope('loss'):
    # you should know how to do this too
    entropy = tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits)
    loss = tf.reduce_mean(entropy, name='loss')
    # TO DO

# Step 7: define training op
# using gradient descent with learning rate of LEARNING_RATE to minimize cost
# don't forgot to pass in global_step
optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss, 
                                        global_step=global_step)
# TO DO

global_step refer to the number of batches seen by the graph. Everytime a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().



In [11]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    # to visualize using TensorBoard
    writer = tf.summary.FileWriter('./my_graph/mnist', sess.graph)
    ##### You have to create folders to store checkpoints
    ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/convnet_mnist/checkpoint'))
    # if that checkpoint exists, restore from checkpoint
    if ckpt and ckpt.model_checkpoint_path:
        saver.restore(sess, ckpt.model_checkpoint_path)
    
    initial_step = global_step.eval()

    start_time = time.time()
    n_batches = int(mnist.train.num_examples / BATCH_SIZE)

    total_loss = 0.0
    for index in range(initial_step, n_batches * N_EPOCHS): # train the model n_epochs times
        X_batch, Y_batch = mnist.train.next_batch(BATCH_SIZE)
        _, loss_batch = sess.run([optimizer, loss], 
                                feed_dict={X: X_batch, Y:Y_batch, dropout: DROPOUT}) 
        total_loss += loss_batch
        if (index + 1) % SKIP_STEP == 0:
            print('Average loss at step {}: {:5.1f}'.format(index + 1, total_loss / SKIP_STEP))
            total_loss = 0.0
            saver.save(sess, 'checkpoints/convnet_mnist/mnist-convnet', index)
    
    print("Optimization Finished!") # should be around 0.35 after 25 epochs
    print("Total time: {0} seconds".format(time.time() - start_time))
    
    # test the model
    n_batches = int(mnist.test.num_examples/BATCH_SIZE)
    total_correct_preds = 0
    for i in range(n_batches):
        X_batch, Y_batch = mnist.test.next_batch(BATCH_SIZE)
        _, loss_batch, logits_batch = sess.run([optimizer, loss, logits], 
                                        feed_dict={X: X_batch, Y:Y_batch, dropout: DROPOUT}) 
        preds = tf.nn.softmax(logits_batch)
        correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(Y_batch, 1))
        accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32))
        total_correct_preds += sess.run(accuracy)   
    
    print("Accuracy {0}".format(total_correct_preds/mnist.test.num_examples))

Average loss at step 10: 27132.8
Average loss at step 20: 14443.2
Average loss at step 30: 8716.1
Average loss at step 40: 5877.9
Average loss at step 50: 4839.7
Average loss at step 60: 3897.3
Average loss at step 70: 3066.1
Average loss at step 80: 2905.6
Average loss at step 90: 2452.8
Average loss at step 100: 2215.2
Average loss at step 110: 1924.2
Average loss at step 120: 1952.7
Average loss at step 130: 1820.5
Average loss at step 140: 1737.6
Average loss at step 150: 1756.0
Average loss at step 160: 1330.7
Average loss at step 170: 1561.6
Average loss at step 180: 1308.4
Average loss at step 190: 1379.1
Average loss at step 200: 1296.3
Average loss at step 210: 1307.0
Average loss at step 220: 1147.4
Average loss at step 230: 1041.0
Average loss at step 240: 1026.9
Average loss at step 250: 763.9
Average loss at step 260: 859.7
Average loss at step 270: 850.6
Average loss at step 280: 895.4
Average loss at step 290: 830.9
Average loss at step 300: 766.9
Average loss at step 31