In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow as tf

## Intro to Convolutional Neural Networks

Convolutional neural networks (CNNs) are the current state-of-the-art model architecture for image classification tasks. CNNs apply a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification. CNNs contains three components:

* Convolutional layers, which apply a specified number of convolution filters to the image. For each subregion, the layer performs a set of mathematical operations to produce a single value in the output feature map. Convolutional layers then typically apply a ReLU activation function to the output to introduce nonlinearities into the model.

* Pooling layers, which downsample the image data extracted by the convolutional layers to reduce the dimensionality of the feature map in order to decrease processing time. A commonly used pooling algorithm is max pooling, which extracts subregions of the feature map (e.g., 2x2-pixel tiles), keeps their maximum value, and discards all other values.

* Dense (fully connected) layers, which perform classification on the features extracted by the convolutional layers and downsampled by the pooling layers. In a dense layer, every node in the layer is connected to every node in the preceding layer.

Typically, a CNN is composed of a stack of convolutional modules that perform feature extraction. Each module consists of a convolutional layer followed by a pooling layer. The last convolutional module is followed by one or more dense layers that perform classification. The final dense layer in a CNN contains a single node for each target class in the model (all the possible classes the model may predict), with a softmax activation function to generate a value between 0–1 for each node (the sum of all these softmax values is equal to 1). We can interpret the softmax values for a given image as relative measurements of how likely it is that the image falls into each target class.

https://cs231n.github.io/convolutional-networks/

Building the CNN MNIST Classifier

Let's build a model to classify the images in the MNIST dataset using the following CNN architecture:

* Convolutional Layer #1: Applies 32 5x5 filters (extracting 5x5-pixel subregions), with ReLU activation function
* Pooling Layer #1: Performs max pooling with a 2x2 filter and stride of 2 (which specifies that pooled regions do not overlap)
* Convolutional Layer #2: Applies 64 5x5 filters, with ReLU activation function
* Pooling Layer #2: Again, performs max pooling with a 2x2 filter and stride of 2
* Dense Layer #1: 1,024 neurons, with dropout regularization rate of 0.4 (probability of 0.4 that any given element will be dropped during training)
* Dense Layer #2 (Logits Layer): 10 neurons, one for each digit target class (0–9).




The tf.layers module contains methods to create each of the three layer types above:

* conv2d(). Constructs a two-dimensional convolutional layer. Takes number of filters, filter kernel size, padding, and activation function as arguments.
* max_pooling2d(). Constructs a two-dimensional pooling layer using the max-pooling algorithm. Takes pooling filter size and stride as arguments.
* dense(). Constructs a dense layer. Takes number of neurons and activation function as arguments.

Each of these methods accepts a tensor as input and returns a transformed tensor as output. This makes it easy to connect one layer to another: just take the output from one layer-creation method and supply it as input to another.

Placeholders

We start building the computation graph by creating nodes for the input images and target output classes.

x = tf.placeholder(tf.float32, shape=[None, 784])

y_ = tf.placeholder(tf.float32, shape=[None, 10])

Here x and y_ aren't specific values. Rather, they are each a placeholder -- a value that we'll input when we ask TensorFlow to run a computation.

Dtype, shape, and name are self-explanatory. The only thing to note here is when you set the shape of the placeholder to None. shape=None means that tensors of any shape will be accepted. Using shape=None is easy to construct graphs, but nightmarish for debugging. You should always define the shape of your placeholders as detailed as possible. shape=None also breaks all following shape inference, which makes many ops not work because they expect certain rank. 




The input images x will consist of a 2d tensor of floating point numbers. Here we assign it a shape of [None, 784], where 784 is the dimensionality of a single flattened 28 by 28 pixel MNIST image, and None indicates that the first dimension, corresponding to the batch size, can be of any size. The target output classes y_ will also consist of a 2d tensor, where each row is a one-hot 10-dimensional vector indicating which digit class (zero through nine) the corresponding MNIST image belongs to.

The shape argument to placeholder is optional, but it allows TensorFlow to automatically catch bugs stemming from inconsistent tensor shapes.



Input Layer

The methods in the layers module for creating convolutional and pooling layers for two-dimensional image data expect input tensors to have a shape of [batch_size, image_width, image_height, channels], defined as follows:

batch_size. Size of the subset of examples to use when performing gradient descent during training.
image_width. Width of the example images.
image_height. Height of the example images.
channels. Number of color channels in the example images. For color images, the number of channels is 3 (red, green, blue). For monochrome images, there is just 1 channel (black).
Here, our MNIST dataset is composed of monochrome 28x28 pixel images, so the desired shape for our input layer is [batch_size, 28, 28, 1].

To convert our input feature map (features) to this shape, we can perform the following reshape operation:

input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])
Note that we've indicated -1 for batch size, which specifies that this dimension should be dynamically computed based on the number of input values in features["x"], holding the size of all other dimensions constant. This allows us to treat batch_size as a hyperparameter that we can tune. For example, if we feed examples into our model in batches of 5, features["x"] will contain 3,920 values (one value for each pixel in each image), and input_layer will have a shape of [5, 28, 28, 1]. Similarly, if we feed examples in batches of 100, features["x"] will contain 78,400 values, and input_layer will have a shape of [100, 28, 28, 1].

Weight Initialization

To create this model, we're going to need to create a lot of weights and biases. One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid "dead neurons". Instead of doing this repeatedly while we build the model, let's create two handy functions to do it for us.

In this most common case, the weights are initialized with the tf.truncated_normal and given their shape of a 2-D tensor with the first dim representing the number of units in the layer from which the weights connect and the second dim representing the number of units in the layer to which the weights connect. For the first layer, named hidden1, the dimensions are [IMAGE_PIXELS, hidden1_units] because the weights are connecting the image inputs to the hidden1 layer. The tf.truncated_normal initializer generates a random distribution with a given mean and standard deviation.

Then the biases are initialized with tf.zeros to ensure they start with all zero values, and their shape is simply the number of units in the layer to which they connect.

The graph's three primary ops -- two tf.nn.relu ops wrapping tf.matmul for the hidden layers and one extra tf.matmul for the logits -- are then created, each in turn, with separate tf.Variable instances connected to each of the input placeholders or the output tensors of the previous layer.


def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

Convolutional Layer #1

In our first convolutional layer, we want to apply 32 5x5 filters to the input layer, with a ReLU activation function. We can use the conv2d() method in the layers module to create this layer as follows:

conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=32,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)
The inputs argument specifies our input tensor, which must have the shape [batch_size, image_width, image_height, channels]. Here, we're connecting our first convolutional layer to input_layer, which has the shape [batch_size, 28, 28, 1].

The filters argument specifies the number of filters to apply (here, 32), and kernel_size specifies the dimensions of the filters as [width, height] (here, [5, 5]).

TIP: If filter width and height have the same value, you can instead specify a single integer for kernel_size—e.g., kernel_size=5.

The padding argument specifies one of two enumerated values (case-insensitive): valid (default value) or same. To specify that the output tensor should have the same width and height values as the input tensor, we set padding=same here, which instructs TensorFlow to add 0 values to the edges of the input tensor to preserve width and height of 28. (Without padding, a 5x5 convolution over a 28x28 tensor will produce a 24x24 tensor, as there are 24x24 locations to extract a 5x5 tile from a 28x28 grid.)

The activation argument specifies the activation function to apply to the output of the convolution. Here, we specify ReLU activation with tf.nn.relu.

Our output tensor produced by conv2d() has a shape of [batch_size, 28, 28, 32]: the same width and height dimensions as the input, but now with 32 channels holding the output from each of the filters.

Convolution and Pooling

TensorFlow also gives us a lot of flexibility in convolution and pooling operations. How do we handle the boundaries? What is our stride size? In this example, we're always going to choose the vanilla version. Our convolutions uses a stride of one and are zero padded so that the output is the same size as the input. Our pooling is plain old max pooling over 2x2 blocks. To keep our code cleaner, let's also abstract those operations into functions.



Pooling Layer #1

Next, we connect our first pooling layer to the convolutional layer we just created. We can use the max_pooling2d() method in layers to construct a layer that performs max pooling with a 2x2 filter and stride of 2:

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
Again, inputs specifies the input tensor, with a shape of [batch_size, image_width, image_height, channels]. Here, our input tensor is conv1, the output from the first convolutional layer, which has a shape of [batch_size, 28, 28, 32].

The pool_size argument specifies the size of the max pooling filter as [width, height] (here, [2, 2]). If both dimensions have the same value, you can instead specify a single integer (e.g., pool_size=2).

The strides argument specifies the size of the stride. Here, we set a stride of 2, which indicates that the subregions extracted by the filter should be separated by 2 pixels in both the width and height dimensions (for a 2x2 filter, this means that none of the regions extracted will overlap). If you want to set different stride values for width and height, you can instead specify a tuple or list (e.g., stride=[3, 6]).

Our output tensor produced by max_pooling2d() (pool1) has a shape of [batch_size, 14, 14, 32]: the 2x2 filter reduces width and height by 50% each.
    
    
    

To help improve the results of our model, we also apply dropout regularization to our dense layer, using the dropout method in layers:

dropout = tf.layers.dropout(
    inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)
Again, inputs specifies the input tensor, which is the output tensor from our dense layer (dense).

The rate argument specifies the dropout rate; here, we use 0.4, which means 40% of the elements will be randomly dropped out during training.

The training argument takes a boolean specifying whether or not the model is currently being run in training mode; dropout will only be performed if training is True. Here, we check if the mode passed to our model function cnn_model_fn is TRAIN mode.

Dropout

To reduce overfitting, we will apply dropout before the readout layer. We create a placeholder for the probability that a neuron's output is kept during dropout. This allows us to turn dropout on during training, and turn it off during testing. TensorFlow's tf.nn.dropout op automatically handles scaling neuron outputs in addition to masking them, so dropout just works without any additional scaling.1

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)



Logits Layer

The final layer in our neural network is the logits layer, which will return the raw values for our predictions. We create a dense layer with 10 neurons (one for each target class 0–9), with linear activation (the default):

logits = tf.layers.dense(inputs=dropout, units=10)
Our final output tensor of the CNN, logits, has shape [batch_size, 10].

Generate Predictions

The logits layer of our model returns our predictions as raw values in a [batch_size, 10]-dimensional tensor. Let's convert these raw values into two different formats that our model function can return:

The predicted class for each example: a digit from 0–9.
The probabilities for each possible target class for each example: the probability that the example is a 0, is a 1, is a 2, etc.
For a given example, our predicted class is the element in the corresponding row of the logits tensor with the highest raw value. We can find the index of this element using the tf.argmax function:

tf.argmax(input=logits, axis=1)
The input argument specifies the tensor from which to extract maximum values—here logits. The axis argument specifies the axis of the input tensor along which to find the greatest value. Here, we want to find the largest value along the dimension with index of 1, which corresponds to our predictions (recall that our logits tensor has shape [batch_size, 10]).

We can derive probabilities from our logits layer by applying softmax activation using tf.nn.softmax:

tf.nn.softmax(logits, name="softmax_tensor")



Calculate Loss

For both training and evaluation, we need to define a loss function that measures how closely the model's predictions match the target classes. For multiclass classification problems like MNIST, cross entropy is typically used as the loss metric. The following code calculates cross entropy when the model runs in either TRAIN or EVAL mode:



Training

In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad. We call this the cost, or the loss, and it represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.

One very common, very nice function to determine the loss of a model is called "cross-entropy." Cross-entropy arises from thinking about information compressing codes in information theory but it winds up being an important idea in lots of areas, from gambling to machine learning. It's defined as:

Hy′(y)=−∑iyi′log⁡(yi)

Where y is our predicted probability distribution, and y′ is the true distribution (the one-hot vector with the digit labels). In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth. Going into more detail about cross-entropy is beyond the scope of this tutorial, but it's well worth understanding.

To implement cross-entropy we need to first add a new placeholder to input the correct answers:

y_ = tf.placeholder(tf.float32, [None, 10])

Then we can implement the cross-entropy function, −∑y′log⁡(y):

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
First, tf.log computes the logarithm of each element of y. Next, we multiply each element of y_ with the corresponding element of tf.log(y). Then tf.reduce_sum adds the elements in the second dimension of y, due to the reduction_indices=[1] parameter. Finally, tf.reduce_mean computes the mean over all the examples in the batch.

Note that in the source code, we don't use this formulation, because it is numerically unstable. Instead, we apply tf.nn.softmax_cross_entropy_with_logits on the unnormalized logits (e.g., we call softmax_cross_entropy_with_logits on tf.matmul(x, W) + b), because this more numerically stable function internally computes the softmax activation. In your code, consider using tf.nn.softmax_cross_entropy_with_logits instead.
Now that we have defined our model and training loss function, it is straightforward to train using TensorFlow. Because TensorFlow knows the entire computation graph, it can use automatic differentiation to find the gradients of the loss with respect to each of the variables. TensorFlow has a variety of built-in optimization algorithms. For this example, we will use steepest gradient descent, with a step length of 0.5, to descend the cross entropy.

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
What TensorFlow actually did in that single line was to add new operations to the computation graph. These operations included ones to compute gradients, compute parameter update steps, and apply update steps to the parameters.

The returned operation train_step, when run, will apply the gradient descent updates to the parameters. Training the model can therefore be accomplished by repeatedly running train_step.

for _ in range(1000):
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})
We load 100 training examples in each training iteration. We then run the train_step operation, using feed_dict to replace the placeholder tensors x and y_ with the training examples. Note that you can replace any tensor in your computation graph using feed_dict -- it's not restricted to just placeholders.





Configure the Training Op

In the previous section, we defined loss for our CNN as the softmax cross-entropy of the logits layer and our labels. Let's configure our model to optimize this loss value during training. We'll use a learning rate of 0.001 and stochastic gradient descent as the optimization algorithm:

if mode == tf.estimator.ModeKeys.TRAIN:
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
  train_op = optimizer.minimize(
      loss=loss,
      global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
  
  


Train and Evaluate the Model

How well does this model do? To train and evaluate it we will use code that is nearly identical to that for the simple one layer SoftMax network above.

The differences are that:

We will replace the steepest gradient descent optimizer with the more sophisticated ADAM optimizer.
We will include the additional parameter keep_prob in feed_dict to control the dropout rate.
We will add logging to every 100th iteration in the training process.
We will also use tf.Session rather than tf.InteractiveSession. This better separates the process of creating the graph (model specification) and the process of evaluating the graph (model fitting). It generally makes for cleaner code. The tf.Session is created within a with block so that it is automatically destroyed once the block is exited.

Feel free to run this code. Be aware that it does 20,000 training iterations and may take a while (possibly up to half an hour), depending on your processor.

Add evaluation metrics

To add accuracy metric in our model, we define eval_metric_ops dict in EVAL mode as follows:

eval_metric_ops = {
    "accuracy": tf.metrics.accuracy(
        labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
    mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)


Evaluating Our Model

How well does our model do?

Well, first let's figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the correct label. We can use tf.equal to check if our prediction matches the truth.

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
That gives us a list of booleans. To determine what fraction are correct, we cast to floating point numbers and then take the mean. For example, [True, False, True, True] would become [1,0,1,1] which would become 0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Finally, we ask for our accuracy on our test data.

print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
This should be about 92%.





Load Training and Test Data

First, let's load our training and test data. Add a main() function to cnn_mnist.py with the following code:

We store the training feature data (the raw pixel values for 55,000 images of hand-drawn digits) and training labels (the corresponding value from 0–9 for each image) as numpy arrays in train_data and train_labels, respectively. Similarly, we store the evaluation feature data (10,000 images) and evaluation labels in eval_data and eval_labels, respectively.



Save a Checkpoint

In order to emit a checkpoint file that may be used to later restore a model for further training or evaluation, we instantiate a tf.train.Saver.

saver = tf.train.Saver()
In the training loop, the tf.train.Saver.save method will periodically be called to write a checkpoint file to the training directory with the current values of all the trainable variables.

saver.save(sess, FLAGS.train_dir, global_step=step)
At some later point in the future, training might be resumed by using the tf.train.Saver.restore method to reload the model parameters.

saver.restore(sess, FLAGS.train_dir)

Set Up a Logging Hook

Since CNNs can take a while to train, let's set up some logging so we can track progress during training. We can use TensorFlow's tf.train.SessionRunHook to create a tf.train.LoggingTensorHook that will log the probability values from the softmax layer of our CNN. Add the following to main():



TensorFlow Mechanics 101

Contents
Tutorial Files
Prepare the Data
Download
Inputs and Placeholders

Code: tensorflow/examples/tutorials/mnist/

The goal of this tutorial is to show how to use TensorFlow to train and evaluate a simple feed-forward neural network for handwritten digit classification using the (classic) MNIST data set. The intended audience for this tutorial is experienced machine learning users interested in using TensorFlow.

These tutorials are not intended for teaching Machine Learning in general.

Please ensure you have followed the instructions to install TensorFlow.

Tutorial Files

This tutorial references the following files:

File	Purpose
mnist.py	The code to build a fully-connected MNIST model.
fully_connected_feed.py	The main code to train the built MNIST model against the downloaded dataset using a feed dictionary.
Simply run the fully_connected_feed.py file directly to start training:

python fully_connected_feed.py
Prepare the Data

MNIST is a classic problem in machine learning. The problem is to look at greyscale 28x28 pixel images of handwritten digits and determine which digit the image represents, for all the digits from zero to nine.

MNIST Digits

For more information, refer to Yann LeCun's MNIST page or Chris Olah's visualizations of MNIST.

Download

At the top of the run_training() method, the input_data.read_data_sets() function will ensure that the correct data has been downloaded to your local training folder and then unpack that data to return a dictionary of DataSet instances.

data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
NOTE: The fake_data flag is used for unit-testing purposes and may be safely ignored by the reader.

Dataset	Purpose
data_sets.train	55000 images and labels, for primary training.
data_sets.validation	5000 images and labels, for iterative validation of training accuracy.
data_sets.test	10000 images and labels, for final testing of trained accuracy.
Inputs and Placeholders

The placeholder_inputs() function creates two tf.placeholder ops that define the shape of the inputs, including the batch_size, to the rest of the graph and into which the actual training examples will be fed.

images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
                                                       mnist.IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
Further down, in the training loop, the full image and label datasets are sliced to fit the batch_size for each step, matched with these placeholder ops, and then passed into the sess.run() function using the feed_dict parameter.

Build the Graph

After creating placeholders for the data, the graph is built from the mnist.py file according to a 3-stage pattern: inference(), loss(), and training().

inference() - Builds the graph as far as required for running the network forward to make predictions.
loss() - Adds to the inference graph the ops required to generate loss.
training() - Adds to the loss graph the ops required to compute and apply gradients.

Inference

The inference() function builds the graph as far as needed to return the tensor that would contain the output predictions.

It takes the images placeholder as input and builds on top of it a pair of fully connected layers with ReLU activation followed by a ten node linear layer specifying the output logits.

Each layer is created beneath a unique tf.name_scope that acts as a prefix to the items created within that scope.

with tf.name_scope('hidden1'):
Within the defined scope, the weights and biases to be used by each of these layers are generated into tf.Variable instances, with their desired shapes:

weights = tf.Variable(
    tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
                        stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
    name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),
                     name='biases')
When, for instance, these are created under the hidden1 scope, the unique name given to the weights variable would be "hidden1/weights".

Each variable is given initializer ops as part of their construction.

In this most common case, the weights are initialized with the tf.truncated_normal and given their shape of a 2-D tensor with the first dim representing the number of units in the layer from which the weights connect and the second dim representing the number of units in the layer to which the weights connect. For the first layer, named hidden1, the dimensions are [IMAGE_PIXELS, hidden1_units] because the weights are connecting the image inputs to the hidden1 layer. The tf.truncated_normal initializer generates a random distribution with a given mean and standard deviation.

Then the biases are initialized with tf.zeros to ensure they start with all zero values, and their shape is simply the number of units in the layer to which they connect.

The graph's three primary ops -- two tf.nn.relu ops wrapping tf.matmul for the hidden layers and one extra tf.matmul for the logits -- are then created, each in turn, with separate tf.Variable instances connected to each of the input placeholders or the output tensors of the previous layer.

hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
logits = tf.matmul(hidden2, weights) + biases
Finally, the logits tensor that will contain the output is returned.

Loss

The loss() function further builds the graph by adding the required loss ops.

First, the values from the labels_placeholder are converted to 64-bit integers. Then, a tf.nn.sparse_softmax_cross_entropy_with_logits op is added to automatically produce 1-hot labels from the labels_placeholder and compare the output logits from the inference() function with those 1-hot labels.

labels = tf.to_int64(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=labels, logits=logits, name='xentropy')
It then uses tf.reduce_mean to average the cross entropy values across the batch dimension (the first dimension) as the total loss.

loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
And the tensor that will then contain the loss value is returned.

Note: Cross-entropy is an idea from information theory that allows us to describe how bad it is to believe the predictions of the neural network, given what is actually true. For more information, read the blog post Visual Information Theory (http://colah.github.io/posts/2015-09-Visual-Information/)
Training

The training() function adds the operations needed to minimize the loss via Gradient Descent.

Firstly, it takes the loss tensor from the loss() function and hands it to a tf.summary.scalar, an op for generating summary values into the events file when used with a tf.summary.FileWriter (see below). In this case, it will emit the snapshot value of the loss every time the summaries are written out.

tf.summary.scalar('loss', loss)
Next, we instantiate a tf.train.GradientDescentOptimizer responsible for applying gradients with the requested learning rate.

optimizer = tf.train.GradientDescentOptimizer(learning_rate)
We then generate a single variable to contain a counter for the global training step and the tf.train.Optimizer.minimize op is used to both update the trainable weights in the system and increment the global step. This op is, by convention, known as the train_op and is what must be run by a TensorFlow session in order to induce one full step of training (see below).

global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)
Train the Model

Once the graph is built, it can be iteratively trained and evaluated in a loop controlled by the user code in fully_connected_feed.py.

The Graph

At the top of the run_training() function is a python with command that indicates all of the built ops are to be associated with the default global tf.Graph instance.

with tf.Graph().as_default():
A tf.Graph is a collection of ops that may be executed together as a group. Most TensorFlow uses will only need to rely on the single default graph.

More complicated uses with multiple graphs are possible, but beyond the scope of this simple tutorial.

The Session

Once all of the build preparation has been completed and all of the necessary ops generated, a tf.Session is created for running the graph.

sess = tf.Session()
Alternately, a Session may be generated into a with block for scoping:

with tf.Session() as sess:
The empty parameter to session indicates that this code will attach to (or create if not yet created) the default local session.

Immediately after creating the session, all of the tf.Variable instances are initialized by calling tf.Session.run on their initialization op.

init = tf.global_variables_initializer()
sess.run(init)
The tf.Session.run method will run the complete subset of the graph that corresponds to the op(s) passed as parameters. In this first call, the init op is a tf.group that contains only the initializers for the variables. None of the rest of the graph is run here; that happens in the training loop below.

Train the Model

Now we're ready to train our model, which we can do by creating train_input_fn ans calling train() on mnist_classifier. Add the following to main():


Train Loop

After initializing the variables with the session, training may begin.

The user code controls the training per step, and the simplest loop that can do useful training is:

for step in xrange(FLAGS.max_steps):
    sess.run(train_op)
However, this tutorial is slightly more complicated in that it must also slice up the input data for each step to match the previously generated placeholders.

Feed the Graph

For each step, the code will generate a feed dictionary that will contain the set of examples on which to train for the step, keyed by the placeholder ops they represent.

In the fill_feed_dict() function, the given DataSet is queried for its next batch_size set of images and labels, and tensors matching the placeholders are filled containing the next images and labels.

images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,
                                               FLAGS.fake_data)
A python dictionary object is then generated with the placeholders as keys and the representative feed tensors as values.

feed_dict = {
    images_placeholder: images_feed,
    labels_placeholder: labels_feed,
}
This is passed into the sess.run() function's feed_dict parameter to provide the input examples for this step of training.

Check the Status

The code specifies two values to fetch in its run call: [train_op, loss].

for step in xrange(FLAGS.max_steps):
    feed_dict = fill_feed_dict(data_sets.train,
                               images_placeholder,
                               labels_placeholder)
    _, loss_value = sess.run([train_op, loss],
                             feed_dict=feed_dict)
Because there are two values to fetch, sess.run() returns a tuple with two items. Each Tensor in the list of values to fetch corresponds to a numpy array in the returned tuple, filled with the value of that tensor during this step of training. Since train_op is an Operation with no output value, the corresponding element in the returned tuple is None and, thus, discarded. However, the value of the loss tensor may become NaN if the model diverges during training, so we capture this value for logging.

Assuming that the training runs fine without NaNs, the training loop also prints a simple status text every 100 steps to let the user know the state of training.

In [None]:
# low-level
https://www.tensorflow.org/get_started/mnist/pros#build_a_multilayer_convolutional_network

# high-level 
https://www.tensorflow.org/tutorials/layers
    
    
    

In [None]:
https://github.com/aymericdamien/TensorFlow-Examples
    
    

In [None]:
# Input Layer
  input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

  # Convolutional Layer #1
  conv1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=32,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)

  # Pooling Layer #1
  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

  # Convolutional Layer #2 and Pooling Layer #2
  conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=64,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
  pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

  # Dense Layer
  pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
  dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
  dropout = tf.layers.dropout(
      inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

  # Logits Layer
  logits = tf.layers.dense(inputs=dropout, units=10)

  predictions = {
      # Generate predictions (for PREDICT and EVAL mode)
      "classes": tf.argmax(input=logits, axis=1),
      # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
      # `logging_hook`.
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  }

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  # Calculate Loss (for both TRAIN and EVAL modes)
  onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=10)
  loss = tf.losses.softmax_cross_entropy(
      onehot_labels=onehot_labels, logits=logits)

  # Configure the Training Op (for TRAIN mode)
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

  # Add evaluation metrics (for EVAL mode)
  eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

In [None]:
# lower-level APIs

https://www.tensorflow.org/api_docs/python/tf/nn
    
    
tf.nn.conv2d
tf.nn.dropout
tf.nn.max_pool

tf.nn.batch_normalization
tf.nn.dilation2d
tf.nn.embedding_lookup
tf.nn.raw_nn
tf.nn.dynamic_rnn


tf.nn.sigmoid
tf.nn.redlu
tf.nn.tanh
tf.nn.softmax
tf.nn.softplus



In [None]:
https://www.tensorflow.org/api_docs/python/tf/layers
    
# higher-level APIs
tf.nn.layers.conv2d()
tf.nn.layers.dense
tf.nn.layers.dropout
tf.nn.layers.max_pooling2d
tf.nn.layers.batch_normalization

# alternative 
tf.contrib.layers is same 


tf.keras.layers

In [None]:
https://www.tensorflow.org/tutorials/

# Using GPUs

On a typical system, there are multiple computing devices. In TensorFlow, the supported device types are CPU and GPU. They are represented as strings. For example:

* "/cpu:0": The CPU of your machine.
* "/device:GPU:0": The GPU of your machine, if you have one.
* "/device:GPU:1": The second GPU of your machine, etc.

If a TensorFlow operation has both CPU and GPU implementations, the GPU devices will be given priority when the operation is assigned to a device. For example, matmul has both CPU and GPU kernels. On a system with devices cpu:0 and gpu:0, gpu:0 will be selected to run matmul.

To find out which devices your operations and tensors are assigned to, create the session with log_device_placement configuration option set to True.


In [None]:
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

You should see the following output:

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

### Manual device placement

If you would like a particular operation to run on a device of your choice instead of what's automatically selected for you, you can use with tf.device to create a device context such that all the operations within that context will have the same device assignment.

In [None]:
# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

You will see that now a and b are assigned to cpu:0. Since a device was not explicitly specified for the MatMul operation, the TensorFlow runtime will choose one based on the operation and available devices (gpu:0 in this example) and automatically copy tensors between devices if required.

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

### Using a single GPU on a multi-GPU system

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. If you would like to run on a different GPU, you will need to specify the preference explicitly:

In [None]:
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))