Although I failed in the interview last time, I still consider that neural networks have ability to learn and solve "fizz buzz". 
So I came back to modify the structure of my neural network.

即便經過上次面試的教訓，我仍然相信神經網路是有能力學會「Fizz Buzz」問題的，於是回來調整網路的架構。

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

In [3]:
def fizz_buzz_encode(i):
    if   i % 15 == 0: return np.array([0, 0, 0, 1])
    elif i % 5  == 0: return np.array([0, 0, 1, 0])
    elif i % 3  == 0: return np.array([0, 1, 0, 0])
    else            : return np.array([1, 0, 0, 0])

In [4]:
NUM_DIGITS = 10
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = np.array([fizz_buzz_encode(i)          for i in range(101, 2 ** NUM_DIGITS)])

This time, let's increase the number of hidden units to 1000.

這次，我增加隱藏層單元至1000個（10倍）。

In [5]:
NUM_HIDDEN = 1000

In [6]:
X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 4])

In [7]:
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 4])

In [8]:
def model(X, w_h, w_o):
    h = tf.nn.relu(tf.matmul(X, w_h))
    return tf.matmul(h, w_o)

In the interest of time, I raised the learning rate to 0.5 so we have larger adjustments of back propagation in each epoch. 
Note that a neural network is unable to converge with a too large learning rate.

為了縮短收斂時間，調大learning rate，提高每epoch反向傳遞時的梯度修正，但過大反而會無法收斂，因此我使用0.5作為這次的learning rate（10倍）。

In [9]:
py_x = model(X, w_h, w_o)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=py_x, labels=Y))
train_op = tf.train.GradientDescentOptimizer(0.5).minimize(cost)

In [10]:
predict_op = tf.argmax(py_x, 1)

In [11]:
def fizz_buzz(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

Then we could reduce it to 1000 epochs for training.

並從原先10000個epoch數減少為1000次（1/10倍）。

In [12]:
NUM_EPOCHS = 1000

In [13]:
BATCH_SIZE = 128

Likewise, we can print out the training accuracy on each epoch.

同樣地，在每個階段印出epoch和在訓練資料上的準確率。

In [14]:
sess = tf.Session()

with sess.as_default():
    tf.global_variables_initializer().run()
    
    for epoch in range(NUM_EPOCHS):
        p = np.random.permutation(range(len(trX)))
        trX, trY = trX[p], trY[p]
        
        for start in range(0, len(trX), BATCH_SIZE):
            end = start + BATCH_SIZE
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
        
        accuracy = np.mean(np.argmax(trY, axis=1) == 
                             sess.run(predict_op, feed_dict={X: trX, Y: trY}))
        print(epoch, accuracy)

0 0.534127843987
1 0.534127843987
2 0.534127843987
3 0.534127843987
4 0.534127843987
5 0.534127843987
6 0.534127843987
7 0.534127843987
8 0.534127843987
9 0.534127843987
10 0.534127843987
11 0.534127843987
12 0.534127843987
13 0.534127843987
14 0.534127843987
15 0.534127843987
16 0.534127843987
17 0.534127843987
18 0.534127843987
19 0.534127843987
20 0.534127843987
21 0.534127843987
22 0.534127843987
23 0.534127843987
24 0.534127843987
25 0.534127843987
26 0.534127843987
27 0.534127843987
28 0.534127843987
29 0.534127843987
30 0.534127843987
31 0.534127843987
32 0.534127843987
33 0.534127843987
34 0.534127843987
35 0.534127843987
36 0.534127843987
37 0.534127843987
38 0.534127843987
39 0.534127843987
40 0.534127843987
41 0.534127843987
42 0.534127843987
43 0.534127843987
44 0.534127843987
45 0.534127843987
46 0.548212351029
47 0.534127843987
48 0.534127843987
49 0.547128927411
50 0.56338028169
51 0.554712892741
52 0.534127843987
53 0.569880823402
54 0.561213434453
55 0.543878656555
56 

452 1.0
453 1.0
454 1.0
455 0.993499458288
456 1.0
457 1.0
458 0.998916576381
459 0.997833152763
460 1.0
461 1.0
462 1.0
463 1.0
464 1.0
465 1.0
466 0.998916576381
467 1.0
468 0.998916576381
469 1.0
470 0.996749729144
471 1.0
472 1.0
473 1.0
474 0.998916576381
475 0.997833152763
476 1.0
477 0.996749729144
478 0.995666305525
479 1.0
480 0.998916576381
481 1.0
482 0.995666305525
483 0.980498374865
484 0.998916576381
485 1.0
486 1.0
487 1.0
488 1.0
489 1.0
490 0.997833152763
491 0.998916576381
492 1.0
493 0.997833152763
494 0.997833152763
495 1.0
496 0.990249187432
497 1.0
498 1.0
499 1.0
500 1.0
501 1.0
502 1.0
503 0.998916576381
504 0.996749729144
505 0.995666305525
506 0.996749729144
507 1.0
508 1.0
509 1.0
510 1.0
511 1.0
512 1.0
513 1.0
514 0.996749729144
515 1.0
516 1.0
517 1.0
518 1.0
519 1.0
520 1.0
521 0.996749729144
522 1.0
523 1.0
524 1.0
525 1.0
526 1.0
527 1.0
528 1.0
529 1.0
530 1.0
531 1.0
532 1.0
533 0.986998916576
534 1.0
535 0.994582881907
536 1.0
537 1.0
538 0.965330444

In [15]:
numbers = np.arange(1, 101)
teX = np.transpose(binary_encode(numbers, NUM_DIGITS))

In [16]:
teY = sess.run(predict_op, feed_dict={X: teX})
output = np.vectorize(fizz_buzz)(numbers, teY)

print(output)

['1' '2' 'fizz' '4' 'buzz' 'fizz' '7' '8' 'fizz' 'buzz' '11' 'fizz' '13'
 '14' 'fizzbuzz' '16' '17' 'fizz' '19' 'buzz' 'fizz' '22' '23' 'fizz'
 'buzz' '26' 'fizz' '28' '29' 'fizzbuzz' '31' '32' 'fizz' '34' 'buzz'
 'fizz' '37' '38' 'fizz' 'buzz' '41' 'fizz' '43' '44' 'fizzbuzz' '46' '47'
 'fizz' '49' 'buzz' 'fizz' '52' '53' 'fizz' 'buzz' '56' 'fizz' '58' '59'
 'fizzbuzz' '61' '62' 'fizz' '64' 'buzz' 'fizz' '67' '68' 'fizz' 'buzz'
 '71' 'fizz' '73' '74' 'fizzbuzz' '76' '77' 'fizz' '79' 'buzz' 'fizz' '82'
 '83' 'fizz' 'buzz' '86' 'fizz' '88' '89' 'fizzbuzz' '91' '92' 'fizz' '94'
 'buzz' 'fizz' '97' '98' 'fizz' 'buzz']


Finally, the improved network made it! It gave us a correct result of the fizz buzz problem.

結果，修改過的網路真的學會如何分辨Fizz Buzz了！

In [17]:
actuals = [fizz_buzz(i, fizz_buzz_encode(i).argmax()) for i in numbers]

for i, (predicted, actual) in enumerate(zip(output, actuals)):
    if predicted != actual:
        print("{0} {1} {2}".format(i+1, predicted, actual))

Thanks to machine learning again! I believe I will get a job next time!

可喜可賀！再次謝謝機器學習！相信下次一定能拿到工作的。