Experiments with TensorFlow

gully

June 30, 2016

## Prime number example

### Adapted from:  
fizz_buzz.py  
Fizz Buzz in Tensorflow!  
see http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
NUM_DIGITS = 10

# Represent each input by an array of its binary digits.
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

### Write a prime number checker and encoder

In [3]:
def prime_encode(n):
    if (n == 2) or (n == 3): return [1, 0]
    if n < 2 or n%2 == 0: return [0, 1]
    if n < 9: return [1, 0]
    if n%3 == 0: return [0, 1]
    r = int(n**0.5)
    f = 5
    while f <= r:
        if n%f == 0: return [0, 1]
        if n%(f+2) == 0: return [0, 1]
        f +=6
    return [1, 0]    

### Train the model

 Our goal is to produce prime numbers for the numbers 1 to 100. So it would be unfair to include these in our training data.  
 Accordingly, the training data corresponds to the numbers 101 to `(2 ** NUM_DIGITS - 1)`.

In [4]:
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = np.array([prime_encode(i)          for i in range(101, 2 ** NUM_DIGITS)])

In [5]:
trX.shape, trY.shape

((923, 10), (923, 2))

In [6]:
# We'll want to randomly initialize weights.
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

Our model is a standard 1-hidden-layer multi-layer-perceptron with ReLU activation.  
The softmax (which turns arbitrary real-valued outputs into probabilities) gets applied in the cost function. 

In [7]:
def model(X, w_h, w_o):
    h = tf.nn.relu(tf.matmul(X, w_h))
    return tf.matmul(h, w_o)

In [8]:
# Our variables. The input has width NUM_DIGITS, and the output has width 4.
X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 2])

In [9]:
X.get_shape(), Y.get_shape()

(TensorShape([Dimension(None), Dimension(10)]),
 TensorShape([Dimension(None), Dimension(2)]))

How many units in the hidden layer.

In [11]:
NUM_HIDDEN = 100

# Initialize the weights.
w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 2])

In [12]:
w_h.get_shape(), w_o.get_shape()

(TensorShape([Dimension(10), Dimension(100)]),
 TensorShape([Dimension(100), Dimension(2)]))

In [13]:
# Predict y given x using the model.
py_x = model(X, w_h, w_o)

In [14]:
py_x.get_shape()

TensorShape([Dimension(None), Dimension(2)])

We'll train our model by minimizing a cost function.

In [15]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)

And we'll make predictions by choosing the largest output.

In [16]:
predict_op = tf.argmax(py_x, 1)

Check against the correct answer, since we know it:

## Check the performance-- we know the correct answer after all.

In [17]:
correct_answer = ["{}".format(np.array(['prime', i])[np.array([bool(el) for el in prime_encode(i)])][0])
 for i in np.arange(1,101)]

## Run everything

In [18]:
def prime(i, prediction):
    return ["prime", str(i)][prediction]

In [19]:
BATCH_SIZE = 128

# Launch the graph in a session
with tf.Session() as sess:
    tf.initialize_all_variables().run()

    for epoch in range(10000):
        # Shuffle the data before each training iteration.
        p = np.random.permutation(range(len(trX)))
        trX, trY = trX[p], trY[p]

        # Train in batches of 128 inputs.
        for start in range(0, len(trX), BATCH_SIZE):
            end = start + BATCH_SIZE
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})

        # And print the current accuracy on the training data.
        val = np.mean(np.argmax(trY, axis=1) ==
                             sess.run(predict_op, feed_dict={X: trX, Y: trY}))

        # And now for some fizz buzz
        numbers = np.arange(1, 101)
        teX = np.transpose(binary_encode(numbers, NUM_DIGITS))
        teY = sess.run(predict_op, feed_dict={X: teX})
        output = np.vectorize(prime)(numbers, teY)

        if (epoch % 500) == 0:
            print("{:04d}".format(epoch), 
                  "{:0.3f}".format(val), 
                  "{: >2d}".format((output != correct_answer).sum()))

0000 0.841 27
0500 0.841 25
1000 0.841 25
1500 0.860 25
2000 0.872 25
2500 0.899 25
3000 0.914 25
3500 0.940 22
4000 0.940 23
4500 0.962 24
5000 0.983 23
5500 0.984 23
6000 0.961 22
6500 1.000 22
7000 1.000 23
7500 1.000 24
8000 1.000 22
8500 1.000 19
9000 1.000 21
9500 1.000 21


## The "Correct" answer:

In [20]:
correct_answer = ["{}".format(np.array(['prime', i])[np.array([bool(el) for el in prime_encode(i)])][0])
 for i in np.arange(1,101)]

In [22]:
np.vstack([output, correct_answer]).T

array([['prime', '1'],
       ['2', 'prime'],
       ['prime', 'prime'],
       ['4', '4'],
       ['prime', 'prime'],
       ['6', '6'],
       ['prime', 'prime'],
       ['8', '8'],
       ['prime', '9'],
       ['10', '10'],
       ['prime', 'prime'],
       ['12', '12'],
       ['prime', 'prime'],
       ['14', '14'],
       ['15', '15'],
       ['16', '16'],
       ['prime', 'prime'],
       ['18', '18'],
       ['19', 'prime'],
       ['20', '20'],
       ['prime', '21'],
       ['22', '22'],
       ['23', 'prime'],
       ['24', '24'],
       ['prime', '25'],
       ['26', '26'],
       ['27', '27'],
       ['28', '28'],
       ['prime', 'prime'],
       ['30', '30'],
       ['prime', 'prime'],
       ['32', '32'],
       ['33', '33'],
       ['34', '34'],
       ['prime', '35'],
       ['36', '36'],
       ['prime', 'prime'],
       ['38', '38'],
       ['39', '39'],
       ['40', '40'],
       ['prime', 'prime'],
       ['42', '42'],
       ['prime', 'prime'],
       ['44', '4

In [23]:
fails = output != correct_answer

In [24]:
fails.sum()

21

In [25]:
np.vstack([output[fails], np.array(correct_answer)[fails]]).T

array([['prime', '1'],
       ['2', 'prime'],
       ['prime', '9'],
       ['19', 'prime'],
       ['prime', '21'],
       ['23', 'prime'],
       ['prime', '25'],
       ['prime', '35'],
       ['47', 'prime'],
       ['prime', '49'],
       ['prime', '51'],
       ['prime', '55'],
       ['prime', '57'],
       ['prime', '65'],
       ['prime', '69'],
       ['79', 'prime'],
       ['prime', '81'],
       ['83', 'prime'],
       ['89', 'prime'],
       ['prime', '91'],
       ['prime', '99']], 
      dtype='<U5')

The prime number example has much worse performance than the FizzBuzz example.  
This probably has some deep explanation from math and neural networks-- prime numbers are unpredictable.

# The end.