Experiments with TensorFlow

gully

June 30, 2016

## Do hello world as a sanity check:

In [1]:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

b'Hello, TensorFlow!'


## Fizzbuzz example

In [2]:
# %load fizz_buzz.py
# Fizz Buzz in Tensorflow!
# see http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

["Standard"](https://en.wikipedia.org/wiki/Scare_quotes) Imports:

In [3]:
import numpy as np
import tensorflow as tf

### How `binary_encode()` works

In [4]:
NUM_DIGITS = 10

# Represent each input by an array of its binary digits.
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

What is the `>>` operator?  [A Google search](http://www.tutorialspoint.com/python/python_basic_operators.htm) yields:
>The left operands value is moved right by the number of bits specified by the right operand.

In [5]:
#d = 0
i = 15
[(d, (i >> d) , ((i >> d) & 1), (i >> d & 1)) for d in range(10)]

[(0, 15, 1, 1),
 (1, 7, 1, 1),
 (2, 3, 1, 1),
 (3, 1, 1, 1),
 (4, 0, 0, 0),
 (5, 0, 0, 0),
 (6, 0, 0, 0),
 (7, 0, 0, 0),
 (8, 0, 0, 0),
 (9, 0, 0, 0)]

So this is all just a convenient way to represent the base-10 number $i$ in base 2, binary.  This solution has the advantage that it has a fixed number of digits, in this case `NUM_DIGITS` =10.  

In [6]:
binary_encode(i, NUM_DIGITS)

array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0])

Here's an equivalent representation:

In [7]:
np.array([int(el) for el in np.base_repr(i, 2, 10)])[:-10-1:-1]

array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0])

They're both a bit obfuscated.  Which one is faster?

In [8]:
%%timeit 
binary_encode(i, NUM_DIGITS)

The slowest run took 4.53 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.94 µs per loop


In [9]:
%%timeit
np.array([int(el) for el in np.base_repr(i, 2, 10)])[:-10-1:-1]

The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.48 µs per loop


Mine is twice as bad!

### `fizz_buzz_encode()` makes sense
One-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]

In [10]:
def fizz_buzz_encode(i):
    if   i % 15 == 0: return np.array([0, 0, 0, 1])
    elif i % 5  == 0: return np.array([0, 0, 1, 0])
    elif i % 3  == 0: return np.array([0, 1, 0, 0])
    else:             return np.array([1, 0, 0, 0])

Try it:

In [11]:
i = 7
np.array([i, 'fizz', 'buzz', 'fizzbuzz'])[np.array([bool(el) for el in fizz_buzz_encode(i)])]

array(['7'], 
      dtype='<U21')

### Train the model

 Our goal is to produce fizzbuzz for the numbers 1 to 100. So it would be unfair to include these in our training data.  
 Accordingly, the training data corresponds to the numbers 101 to `(2 ** NUM_DIGITS - 1)`.

In [12]:
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = np.array([fizz_buzz_encode(i)          for i in range(101, 2 ** NUM_DIGITS)])

In [13]:
trX.shape, trY.shape

((923, 10), (923, 4))

In [14]:
# We'll want to randomly initialize weights.
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

Our model is a standard 1-hidden-layer multi-layer-perceptron with ReLU activation.  
The softmax (which turns arbitrary real-valued outputs into probabilities) gets applied in the cost function. 

In [15]:
def model(X, w_h, w_o):
    h = tf.nn.relu(tf.matmul(X, w_h))
    return tf.matmul(h, w_o)

In [16]:
# Our variables. The input has width NUM_DIGITS, and the output has width 4.
X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 4])

In [17]:
X.get_shape(), Y.get_shape()

(TensorShape([Dimension(None), Dimension(10)]),
 TensorShape([Dimension(None), Dimension(4)]))

In [18]:
X.name

'Placeholder:0'

How many units in the hidden layer.

In [19]:
NUM_HIDDEN = 100

# Initialize the weights.
w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 4])

In [20]:
w_h.get_shape(), w_o.get_shape()

(TensorShape([Dimension(10), Dimension(100)]),
 TensorShape([Dimension(100), Dimension(4)]))

In [21]:
# Predict y given x using the model.
py_x = model(X, w_h, w_o)

In [22]:
py_x.get_shape()

TensorShape([Dimension(None), Dimension(4)])

We'll train our model by minimizing a cost function.

In [23]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)

And we'll make predictions by choosing the largest output.

In [24]:
predict_op = tf.argmax(py_x, 1)

Check against the correct answer, since we know it:

In [25]:
correct_answer = ["{}".format(np.array([i, 'fizz', 'buzz', 'fizzbuzz'])[np.array([bool(el) for el in fizz_buzz_encode(i)])][0])
 for i in np.arange(1,101)]

## Run everything

In [26]:
def fizz_buzz(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

In [27]:
BATCH_SIZE = 128

# Launch the graph in a session
with tf.Session() as sess:
    tf.initialize_all_variables().run()

    for epoch in range(10000):
        # Shuffle the data before each training iteration.
        p = np.random.permutation(range(len(trX)))
        trX, trY = trX[p], trY[p]

        # Train in batches of 128 inputs.
        for start in range(0, len(trX), BATCH_SIZE):
            end = start + BATCH_SIZE
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})

        # And print the current accuracy on the training data.
        val = np.mean(np.argmax(trY, axis=1) ==
                             sess.run(predict_op, feed_dict={X: trX, Y: trY}))

        # And now for some fizz buzz
        numbers = np.arange(1, 101)
        teX = np.transpose(binary_encode(numbers, NUM_DIGITS))
        teY = sess.run(predict_op, feed_dict={X: teX})
        output = np.vectorize(fizz_buzz)(numbers, teY)

        if (epoch % 500) == 0:
            print("{:04d}".format(epoch), 
                  "{:0.3f}".format(val), 
                  "{: >2d}".format((output != correct_answer).sum()))

0000 0.467 56
0500 0.534 47
1000 0.561 41
1500 0.756 25
2000 0.914 12
2500 0.962 12
3000 0.973  9
3500 0.988 10
4000 0.989 10
4500 0.997 10
5000 0.989  9
5500 0.999 10
6000 1.000 12
6500 1.000 12
7000 1.000 12
7500 1.000 13
8000 0.999 13
8500 1.000 13
9000 1.000 13
9500 1.000 13


## The "Correct" answer:

In [28]:
correct_answer = ["{}".format(np.array([i, 'fizz', 'buzz', 'fizzbuzz'])[np.array([bool(el) for el in fizz_buzz_encode(i)])][0])
 for i in np.arange(1,101)]

In [29]:
np.vstack([output, correct_answer]).T

array([['1', '1'],
       ['fizz', '2'],
       ['fizz', 'fizz'],
       ['buzz', '4'],
       ['buzz', 'buzz'],
       ['fizz', 'fizz'],
       ['7', '7'],
       ['8', '8'],
       ['fizz', 'fizz'],
       ['buzz', 'buzz'],
       ['11', '11'],
       ['fizz', 'fizz'],
       ['13', '13'],
       ['14', '14'],
       ['fizzbuzz', 'fizzbuzz'],
       ['16', '16'],
       ['17', '17'],
       ['fizz', 'fizz'],
       ['19', '19'],
       ['20', 'buzz'],
       ['21', 'fizz'],
       ['22', '22'],
       ['23', '23'],
       ['fizz', 'fizz'],
       ['buzz', 'buzz'],
       ['26', '26'],
       ['fizz', 'fizz'],
       ['28', '28'],
       ['29', '29'],
       ['fizzbuzz', 'fizzbuzz'],
       ['31', '31'],
       ['fizz', '32'],
       ['fizz', 'fizz'],
       ['fizz', '34'],
       ['buzz', 'buzz'],
       ['fizz', 'fizz'],
       ['37', '37'],
       ['fizz', '38'],
       ['fizz', 'fizz'],
       ['buzz', 'buzz'],
       ['41', '41'],
       ['fizz', 'fizz'],
       ['43', '43'],
   

In [30]:
fails = output != correct_answer

In [31]:
fails.sum()

13

In [32]:
np.vstack([output[fails], np.array(correct_answer)[fails]]).T

array([['fizz', '2'],
       ['buzz', '4'],
       ['20', 'buzz'],
       ['21', 'fizz'],
       ['fizz', '32'],
       ['fizz', '34'],
       ['fizz', '38'],
       ['buzz', 'fizz'],
       ['81', 'fizz'],
       ['84', 'fizz'],
       ['87', 'fizz'],
       ['93', 'fizz'],
       ['fizz', '98']], 
      dtype='<U8')

The success appears to be fickle.

The end.