# Fizz Buzz with Pytorch

this is a very simple task often chosen by an interviewer to get an idea of a candidate's ability to write simple functions. We will break it with a very simple feed forward neural network composed by a total of 3 weighted layers (2 hidden and 1 output layer).

This task usually consists of writing a function that takes an integer and returns the string 'fizz' if the number is divisible by (is a multiple of) 3, 'buzz' if the number is divisible by 5, 'fizzbuzz' if the number is divisible by 3*5=15 and returning the number itself otherwise.

We'll approach this task by first converting the decimal integer numbers to binary inputs, so our model will have `num_bits` values per sample and will output 4 values, corresponding to the possible classes for each sample (fizz, buzz, fizzbuzz, x).

So we will start by writing the **fizz_buzz_encode** method, and other two convenience methods for encoding/decoding binary and fizz buzz, obviously after importing the usual modules, and defining the number of possible digits for representing the numbers (bits). We will set it to 12.

source:
http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

In [1]:
import torch
from torch.utils import data

import torch.nn as nn

# functional module, fuctional implementations for 
# unparameterized neural network modules
import torch.nn.functional as F

from torch.autograd import Variable
import torch.optim as optim

import numpy as np

NUM_DIGITS = 12

write our solution and convenience methods

In [2]:
# Represent each input by an array of its binary digits.
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

# One-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]
def fizz_buzz_encode(i):
    if   i % 15 == 0: return 3
    elif i % 5  == 0: return 2
    elif i % 3  == 0: return 1
    else:             return 0

#printable and coherent labels
def fizz_buzz_decode(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

Now I will show you a way to build a custom dataset class, that subclasses data.Dataset. 

That's a simple way to create a sort of singleton object for our data, so that if and when we instantiate the dataset object multiple times, the data doesn't get created/loaded multiple times. To do this, we create an empty dictionary **DATA_CACHE** at global scope (it will be created when we import the module). Then when we instantiate the dataset object and its init method gets called, we first check if our **DATA_CACHE** actually contains the data, if not we fill the cache with our data, otherwise we simply compute the data at each index (we need to split in train/val/test). This way we could load even a gazillion samples without actually copying data and thus using more memory.

In [3]:
DATA_CACHE = {} 

def fill_cache(num_bits):
    DATA_CACHE.update({
        'X': [binary_encode(i, NUM_DIGITS) for i in range(2 ** NUM_DIGITS)],
        'y': [fizz_buzz_encode(i) for i in range(2 ** NUM_DIGITS)]
    })

class FizzbuzzDataset(data.Dataset):
    def __init__(self, num_bits=NUM_DIGITS, mode='train'):
        super(FizzbuzzDataset, self).__init__()
        
        if not DATA_CACHE:
            fill_cache(num_bits)
            
        start, end = (0, 100) if mode == 'val' else (100, len(DATA_CACHE['y']))
        self.idxs = list(range(start, end))
        
    def __len__(self):
        return len(self.idxs)
    
    def __getitem__(self, idx):
        x = DATA_CACHE['X'][self.idxs[idx]]
        x = x.astype(np.float32)
        y = DATA_CACHE['y'][self.idxs[idx]]
        return x, y

As said above, we create a simple model composed by a total of 3 feed forward layers, where the first one has `num_digits` inputs for each sample, followed by an activation fuction. We say our model must have 50 "neural units" for each one of the hidden layers and, given that we have 4 classes, an output dimension of 4.

In [4]:
# http://pytorch.org/docs/master/nn.html#torch.nn.LeakyReLU
class FizzbuzzModel(nn.Module):
    def __init__(self, h_dim=50, input_dim=NUM_DIGITS, num_classes=4):
        super(FizzbuzzModel, self).__init__()
        self.linear1 = nn.Sequential(
            nn.Linear(input_dim, h_dim),
            nn.LeakyReLU()
        )
        self.linear2 = nn.Sequential(
            nn.Linear(h_dim, h_dim),
            nn.LeakyReLU()
        )
        self.classifier = nn.Linear(h_dim, num_classes)
    
    def forward(self, x):
        x = self.linear1(x)
        x = self.linear2(x)
        x = self.classifier(x)
        return x   

In [5]:
# SAME AS PREVIOUS BUT USING F (nn.functional)
class FizzbuzzModel(nn.Module):
    def __init__(self, h_dim=50, input_dim=NUM_DIGITS, num_classes=4):
        super(FizzbuzzModel, self).__init__()
        self.linear1 = nn.Linear(input_dim, h_dim)
        self.linear2 = nn.Linear(h_dim, h_dim)
        self.classifier = nn.Linear(h_dim, num_classes)
    
    def forward(self, x):
        x = F.leaky_relu(self.linear1(x))
        x = F.leaky_relu(self.linear2(x))
        x = self.classifier(x)
        return x   

let's instantiate our dataset objects, their corresponding dataloaders, our FizzbuzzModel, the usual SGD optimizing algorithm and a cross entropy loss

In [6]:
dataset_tr = FizzbuzzDataset(mode='train')
dataset_val = FizzbuzzDataset(mode='val')
dataloader_tr = data.DataLoader(dataset_tr, batch_size=128, shuffle=True)
dataloader_val = data.DataLoader(dataset_val, batch_size=128, shuffle=False)

model = FizzbuzzModel()
optimizer = optim.SGD(model.parameters(), lr=.05, momentum=0.9)
loss = nn.CrossEntropyLoss()


we thus train our model for 500 epochs

In [7]:
for epoch in range(500):
    # train loop
    for x, y in dataloader_tr:
        x, y = Variable(x), Variable(y)
        l = loss(model(x), y)
        
        optimizer.zero_grad()
        l.backward()
        optimizer.step()
    if not epoch % 100:
        print('Epoch: {}, loss: {}'.format(epoch, l.data.numpy()[0]))
        

Epoch: 0, loss: 1.1389070749282837
Epoch: 100, loss: 0.0060807508416473866
Epoch: 200, loss: 0.003156117396429181
Epoch: 300, loss: 0.000773159321397543
Epoch: 400, loss: 0.00013801614113617688


Finally we run the prediction on our evaluation set, which contains numbers from 0 to 99, we then fizz buzz encode the model's predictions and print the fizz buzz encoded values.

In [8]:
preds = []
ys = []
for x, y in dataloader_val:
    x = Variable(x)
    preds.extend(model(x).max(1)[1].data.tolist())
    ys.extend(y)
    
correct = np.array(preds) == np.array(ys)
predictions = zip(range(0, 100), preds)

print('Accuracy: ', correct.mean(), ', Errors: ', np.logical_not(correct).sum())
print ([fizz_buzz_decode(i, x) for (i, x) in predictions])

Accuracy:  1.0 , Errors:  0
['fizzbuzz', '1', '2', 'fizz', '4', 'buzz', 'fizz', '7', '8', 'fizz', 'buzz', '11', 'fizz', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19', 'buzz', 'fizz', '22', '23', 'fizz', 'buzz', '26', 'fizz', '28', '29', 'fizzbuzz', '31', '32', 'fizz', '34', 'buzz', 'fizz', '37', '38', 'fizz', 'buzz', '41', 'fizz', '43', '44', 'fizzbuzz', '46', '47', 'fizz', '49', 'buzz', 'fizz', '52', '53', 'fizz', 'buzz', '56', 'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', '64', 'buzz', 'fizz', '67', '68', 'fizz', 'buzz', '71', 'fizz', '73', '74', 'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', 'fizz', '82', '83', 'fizz', 'buzz', '86', 'fizz', '88', '89', 'fizzbuzz', '91', '92', 'fizz', '94', 'buzz', 'fizz', '97', '98', 'fizz']
