#Backprop guide: simple TensorFlow implementation

This is my first time using TensorFlow, so the code might not be great; however, some of the lessons I've learned as a newbie are fresh in my mind, so they may be useful to you too.

(Note: it's hard to see, but the text below has a few links to the TF documentation.)

TF runs things through a [`tf.Session()`](https://www.tensorflow.org/api_docs/python/tf/Session). When I first booted up this notebook, I was ready to start multiplying matrices NumPy style, but then I realized something was missing: `tf.Session`. You can do some setup before starting the session, but any operation must be pass through `tf.Session().run()` to actually run.

The other bit of confusion was with using Numpy arrays with TF. It turns out to be relatively easy. At first I thought I needed to [`tf.convert_to_tensor()`](https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor), but it turns out that the `feed_dict` method does not like TF.tensors and only likes Numpy (and a few other formats).

Feeding the data to the model turned out to be simple, after more confusion. You can feed data to your model using [`feed_dict=`](https://www.tensorflow.org/versions/r0.12/how_tos/reading_data/#feeding) and [`tf.placeholder()`](https://www.tensorflow.org/versions/r0.12/api_docs/python/io_ops/placeholders). The `feed_dict` is just a Python dictionary obect with variable names as keys (pointing to placeholders) and data as values. At first this seemed weird, but it really just seems to be you designating a drop-off point or a parking lot for your data. One pleasant surprise is that you can also feed a value like a learning rate this way.

There you have it. Another beginner example of TensorFlow, this one written by a beginner. I'd like to thank my friends for coming over for some BBQ and some TensorFlow as we tried to make sense of this all.

(Note: the SGD optimizer doesn't work too well. Uncomment the Adam optimizer for a boost in performance!)

In [1]:
import numpy as np
import tensorflow as tf

from sklearn.datasets import load_digits # For the digit recognition example
# Quickly get the data, as before
# TF will convert everything itself
digits = load_digits(n_class=10, return_X_y=True)
x_train = digits[0]/16
y_train = np.eye(10)[digits[1]]

In [2]:
# This class stores the NN's parameters
class TFNeuralNetwork:
    # This is the constructor: it's called as the object is created
    # The important elements here are that tf.random_normal returns TF tensors
    # and that tf.Variable marks these as parameters of the model, to be trained
    def __init__(self, input_size, hidden_size, output_size):
        # First, save settings into class members
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Define the parameters
        # These will only actually be initialized when 
        # tf.global_variables_initializer() is run
        self.b1 = tf.Variable(tf.random_normal(shape=(1,self.hidden_size)))
        self.w1 = tf.Variable(tf.random_normal(shape=(self.input_size, 
                                                      self.hidden_size)))
        
        self.b2 = tf.Variable(tf.random_normal(shape=(1,self.hidden_size)))
        self.w2 = tf.Variable(tf.random_normal(shape=(self.hidden_size, 
                                                      self.hidden_size)))
        
        self.b_out = tf.Variable(tf.random_normal(shape=(1,self.output_size)))
        self.w_out = tf.Variable(tf.random_normal(shape=(self.hidden_size, 
                                                         self.output_size)))
        
        # Create the placeholders
        # These are filled with data only when training occurs
        self.x = tf.placeholder(tf.float32, shape=[None, input_size])
        self.y = tf.placeholder(tf.float32, shape=[None, output_size])
        
        # You can also store a learning rate in them (pleasantly surprised)
        self.lr = tf.placeholder(tf.float32, shape=[])
        
        # Define the NLL loss
        # This has a gotcha: it wants the "logits", meaning outputs not passed 
        # through the sigmoid function. It does this internally to speed things 
        # up. I don't know what it's doing in the background to speed things up 
        # though.
        self.loss = tf.reduce_mean(tf.losses.sigmoid_cross_entropy(self.y, 
                                                                   self.feedforward(self.x)))
        
        # Define accuracy
        self.correct_prediction = tf.equal(tf.argmax(self.y,1), 
                                           tf.argmax(self.feedforward(self.x),1))
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))
        
        # Define the optimizer
        # This appears to be how TF calls the basic SGD optimizer. There are 
        # other options and they're pretty easy to choose if you want them.
        self.optimizer = tf.train.GradientDescentOptimizer(self.lr)
        # For example, here's the Adam optimizer commented out, ready to go
        #self.optimizer = tf.train.AdamOptimizer()
        
        # Define the backprop step
        self.train_step=self.optimizer.minimize(self.loss)
    def feedforward(self, x):
        # We just need to use the TF equivalents, otherwise not much different
        z1 = tf.matmul(x, self.w1) + self.b1
        a1 = tf.nn.sigmoid(z1)
        z2 = tf.matmul(a1, self.w2) + self.b2
        a2 = tf.nn.sigmoid(z2)
        z_out = tf.matmul(a2, self.w_out) + self.b_out
        a_out = z_out # don't apply sigmoid yourself
        return a_out

# Create a new NN
NewNeuralNetwork = TFNeuralNetwork(64, 64, 10)

# The session is what runs all TF operations
sess = tf.Session()

# When the initializer is run, all of our variables/parameters are given values
sess.run(tf.global_variables_initializer())

learning_rate = 1.0
batch_size = 10
batch_pos = list(range(0, digits[0].data.shape[0] - 1, batch_size))
batch_amount = len(batch_pos)
epochs = 10
for ep in range(1, epochs+1):
    batch_num = 1
    epoch_loss = 0
    epoch_accuracy = 0
    for b in batch_pos:
        # The session will run a training step, but also return loss and accuracy.
        # The feed_dict is how you assign values for TF to use during its operation. 
        # As you can see, x, y, and the lr are set this way
        _, loss, accuracy = sess.run([
                NewNeuralNetwork.train_step, 
                NewNeuralNetwork.loss, 
                NewNeuralNetwork.accuracy
            ], 
             feed_dict={
                 NewNeuralNetwork.x: x_train[b:b+batch_size], 
                 NewNeuralNetwork.y: y_train[b:b+batch_size],
                 NewNeuralNetwork.lr : learning_rate
             }
        )
        epoch_loss += loss
        epoch_accuracy += accuracy
        batch_num += 1
    print("epoch {:3d}, learning_rate {:6.2f}, loss {:6.2f}, accuracy {:6.2f}%".format(
        ep, learning_rate, epoch_loss/batch_num, 100.0*epoch_accuracy/batch_num))
    learning_rate *= 0.75 # learning rate slowdown


epoch   1, learning_rate   1.00, loss   0.25, accuracy  62.94%
epoch   2, learning_rate   0.75, loss   0.12, accuracy  86.61%
epoch   3, learning_rate   0.56, loss   0.09, accuracy  90.03%
epoch   4, learning_rate   0.42, loss   0.08, accuracy  91.97%
epoch   5, learning_rate   0.32, loss   0.07, accuracy  92.68%
epoch   6, learning_rate   0.24, loss   0.07, accuracy  93.13%
epoch   7, learning_rate   0.18, loss   0.06, accuracy  93.48%
epoch   8, learning_rate   0.13, loss   0.06, accuracy  93.59%
epoch   9, learning_rate   0.10, loss   0.06, accuracy  93.70%
epoch  10, learning_rate   0.08, loss   0.06, accuracy  93.98%
