# Introduction to Neural Networks, TensorFlow, and its Estimators Interface (with an eye towards learning quantifiers)

### About this notebook:
This notebook was written by Shane Steinert-Threlkeld for the Neural Network Methods for Quantifiers coordinated project at the ILLC, Universiteit van Amsterdam in January 2018 (http://shane.st/NNQ).  

It introduces the basics of working with TensorFlow to train neural networks, with an eye to applications to quantifiers.  (In particular, the code is a warm-up to understanding this repository: https://github.com/shanest/quantifier-rnn-learning.)

There are three sections:

1. Basic TF abstractions: sessions, the graph, Variables/Placeholders
2. Training a feed-forward neural network to classify bit sequences
3. Re-doing the above using TF estimators  

#### Intended working environment for this notebook:
* Python 2.7
* Tensorflow 1.4

To run: (i) install Jupyter; (ii) save this .ipynb file in a directory; (iii) from that directory, run `jupyter notebook`; (iv) open this file.

### License
Copyright 2018 Shane Steinert-Threlkeld

> This program is free software: you can redistribute it and/or modify
> it under the terms of the GNU General Public License as published by
> the Free Software Foundation, either version 3 of the License, or
> (at your option) any later version.
>
> This program is distributed in the hope that it will be useful,
> but WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU General Public License for more details.
>
> You should have received a copy of the GNU General Public License
> along with this program.  If not, see <http://www.gnu.org/licenses/>.

# 1. TensorFlow Mechanics

In [1]:
import tensorflow as tf
print tf.__version__

1.4.1


### Defining and running a computational graph

In [2]:
c1 = tf.constant(3.0)
c2 = tf.constant(4.0)
print c1

add1 = tf.add(c1, c2)
add2 = c1 + c2 #same as above, though I prefer to use the `tf.` versions of ops, to be most clear
print add1

Tensor("Const:0", shape=(), dtype=float32)
Tensor("Add:0", shape=(), dtype=float32)


Note that what's printed is not the value 3.0, but a Tensor, a TF data-type corresponding to a node in the computational graph.

To get its value, we need to _run_ the graph inside a _session_.

[Note: it's always good to use a `with` block to wrap a session, so that it closes automatically.]

In [3]:
with tf.Session() as sess:
    print sess.run(c1)
    print sess.run(add1)
    # you can also pass a list of ops instead of a single op to `run`
    print sess.run([c1, c2, add1])

3.0
7.0
[3.0, 4.0, 7.0]


Tensors also have a _shape_, telling you what how many dimensions, and the size of each dimension.  I find it to be a good practice to include the shape as a comment above every operation.  Because the shape is a property of the `Tensor`, it can be accessed without running the graph.

In [4]:
# -- mat: [3, 2]
mat = tf.constant([[1.0, 2.0],
                   [3.0, 4.0],
                   [5.0, 6.0]])
print mat.shape

# -- vec: [2, 1]
vec = tf.constant([[1.0],
                   [1.0]])

# -- mul: [3, 1]
mul = tf.matmul(mat, vec)

with tf.Session() as sess:
    print sess.run(mul)

(3, 2)
[[  3.]
 [  7.]
 [ 11.]]


### Variables and placeholders

A neural network learns to approximate a given function by seeing exmples and updating its _parameters_ in order to do a better job at approximating the data it has seen.  While we fore-stall an actual discussion of training to the next section, we note two other pieces of machinery that are required for this:

1. Variables: these are `Tensor`s whose values can be changed.  So parameters of a model -- and anything else you want to be updated -- will be Variables.
2. Placeholders: these are `Tensor`s that represent input to the network/computational graph: their value must be provided externally via what TensorFlow calls a `feed_dict`.

In [5]:
W = tf.Variable([[1.0, 2.0],
                   [3.0, 4.0],
                   [5.0, 6.0]])
b = tf.Variable([[1.0],
                 [1.0], 
                 [1.0]])

x = tf.placeholder(shape=(2,1), dtype=tf.float32)

linear = tf.matmul(W, x)
result = tf.add(linear, b)

with tf.Session() as sess:
    # variables must be initialized
    sess.run(tf.global_variables_initializer())
    # result depends on a placeholder, so input must be fed in
    print sess.run(result, feed_dict={x: [[1.0], [1.0]]})

[[  4.]
 [  8.]
 [ 12.]]


Note that the shape of the placeholder `x` was specified precisely.  While this is good practice, it's often convenient to leave one of the dimensions as `None`, so that batches of different numbers of input can be sent to the model.  (For example, mini-batches during training, one big batch during evaluation.  We'll see how this works later.)

# 2. Training a feed-forward neural network to learn 'at least three'

### Generating labeled data

First, we will generate labeled data.  

The Xs will be all sequences of 0s and 1s of a specified length.

The Ys will be labels -- 0 or 1 -- provided by a user-defined function that takes a sequence as its input.  Here we provide one: `at_least_three`.

The data is shuffled, so that the order is random.  Finally, it is split into training and test sets.

In [6]:
import itertools as iter
import random
import math

def generate_all_seqs(length, shuffle=True):
    seqs = list(iter.product([0,1], repeat=length))
    if shuffle:
        random.shuffle(seqs)
    return seqs

def at_least_three(seq):
    # we return [0,1] for True and [1,0] for False
    return [0,1] if sum(seq) >= 3 else [1,0]

def get_labeled_data(seqs, func):
    return seqs, [func(seq) for seq in seqs]

# generate all labeled data
SEQ_LEN = 16
NUM_CLASSES = 2
TRAIN_SPLIT = 0.8

X, Y = get_labeled_data(generate_all_seqs(SEQ_LEN), at_least_three)

# split into training and test sets
pivot_index = int(math.ceil(TRAIN_SPLIT*len(X)))

trainX, trainY = X[:pivot_index], Y[:pivot_index]
testX, testY = X[pivot_index:], Y[pivot_index:]

### Building a network to classify sequences

We will build the neural network inside a wrapper class which helps readability, separation of code components (graph building, session management/training, et cetera), and the ability to test many different models on the same data.

The initializer builds a simple feed-forward neural network with one hidden layer.

Instances of the class have properties for training, predicting, and evaluating, as well as for inputting sequences and labels.  These are the corresponding ops in the graph, so they can be passed directly to `Session.run()` and used in `feed_dict`s.

In [7]:
class FFNN(object):
    
    def __init__(self, input_size, output_size, hidden_size=10):
        
        # first, basic network architecture
        
        # -- inputs: [batch_size, input_size]
        inputs = tf.placeholder(shape=[None, input_size], dtype=tf.float32)
        self._inputs = inputs
        # -- labels: [batch_size, output_size]
        labels = tf.placeholder(shape=[None, output_size], dtype=tf.float32)
        self._labels = labels
        
        # we will have one hidden layer
        # in general, this should be parameterized
        
        # -- weights1: [input_size, hidden_size]
        weights1 = tf.Variable(tf.random_uniform(shape=[input_size, hidden_size]))
        # -- biases1: [hidden_size]
        biases1 = tf.Variable(tf.random_uniform(shape=[hidden_size]))
        # -- linear: [batch_size, hidden_size]
        linear = tf.add(tf.matmul(inputs, weights1), biases1)
        # -- hidden: [batch_size, hidden_size]
        hidden = tf.nn.relu(linear)
        
        # -- weights2: [hidden_size, output_size]
        weights2 = tf.Variable(tf.random_uniform(shape=[hidden_size, output_size]))
        # -- biases2: [output_size]
        biases2 = tf.Variable(tf.random_uniform(shape=[output_size]))
        # -- logits: [batch_size, output_size]
        logits = tf.add(tf.matmul(hidden, weights2), biases2)
        
        # second, define loss and training
        # -- cross_entropy: [batch_size]
        cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
                labels=labels,
                logits=logits)
        # -- loss: []
        loss = tf.reduce_mean(cross_entropy)
        optimizer = tf.train.AdamOptimizer()
        self._train_op = optimizer.minimize(loss)
        
        # finally, some evaluation ops
        
        # -- probabilities: [batch_size, output_size]
        probabilities = tf.nn.softmax(logits)
        self._probabilities = probabilities
        # -- predictions: [batch_size]
        predictions = tf.argmax(probabilities, axis=1)
        # -- targets: [batch_size]
        targets = tf.argmax(labels, axis=1)
        # -- correct_prediction: [batch_size]
        correct_prediction = tf.equal(predictions, targets)
        # -- accuracy: []
        accuracy = tf.reduce_mean(tf.to_float(correct_prediction))
        # more evaluation ops could be added here
        self._eval_dict = {
            'accuracy': accuracy
        }
        
    @property
    def train(self):
        return self._train_op
    
    @property
    def predictions(self):
        return self._probabilities
    
    @property
    def evaluate(self):
        return self._eval_dict
    
    @property
    def inputs(self):
        return self._inputs
    
    @property
    def labels(self):
        return self._labels

### Training the network

In [8]:
# reset the graph before building a model
tf.reset_default_graph()

with tf.Session() as sess:

    # build our model
    model = FFNN(SEQ_LEN, NUM_CLASSES)
    # initialize the variables
    sess.run(tf.global_variables_initializer())
    
    # MAIN TRAINING LOOP
    NUM_EPOCHS = 4
    BATCH_SIZE = 8
    num_batches = len(trainX) / BATCH_SIZE
    
    for epoch in xrange(NUM_EPOCHS):
        
        # shuffle the training data at start of each epoch
        train_data = zip(trainX, trainY)
        random.shuffle(train_data)
        trainX = [datum[0] for datum in train_data]
        trainY = [datum[1] for datum in train_data]
        
        for batch_idx in xrange(num_batches):
            # get batch of training data
            batchX = trainX[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE]
            batchY = trainY[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE]
            # train on the batch
            sess.run(model.train, 
                     {model.inputs: batchX,
                      model.labels: batchY})
            
        # evaluate at end of each epoch; this can also be done more often
        print '\nAt end of epoch {}'.format(epoch)
        print sess.run(model.evaluate, {model.inputs: testX, model.labels: testY})


At end of epoch 0
{'accuracy': 0.99816895}

At end of epoch 1
{'accuracy': 0.99946594}

At end of epoch 2
{'accuracy': 1.0}

At end of epoch 3
{'accuracy': 1.0}


# 3. Re-writing the above using TensorFlow Estimator