# Solve the Fizzbuzz problem with Flux
Inspired by "Fizz Buzz in Tensorflow" blog by Joel Grus
http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

## Loading libs 
Use Flux and Test
If you don't have Flux install it (Test is included in the standard library)


In [1]:

import Pkg; Pkg.add("Flux")
using Flux: Chain, Dense, params, crossentropy, onehotbatch,
            ADAM, train!, softmax
using Test


[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.1/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.1/Manifest.toml`
[90m [no changes][39m


## Data preparation
Create a `fizzbuzz` function that takes an `Int` `x` and return `fizzbuzz` if `x` is divisible by 3 and 5, `fizz` if `x` divisible by 3, `buzz` if `x` divisible by 5, and return `else` in other cases

In [2]:
# Data preparation
function fizzbuzz(x::Int)
    is_divisible_by_three = x % 3 == 0
    is_divisible_by_five = x % 5 == 0

    if is_divisible_by_three & is_divisible_by_five
        return "fizzbuzz"
    elseif is_divisible_by_three
        return "fizz"
    elseif is_divisible_by_five
        return "buzz"
    else
        return "else"
    end
end

fizzbuzz (generic function with 1 method)

## Create the dataset
First create a list LABELS that stores our different targets ("fizz", "buzz", "fizzbuzz", "else")
Then generate a vector `raw_x` of the first 100 integers and apply `fizzbuzz` to get the outputs in `raw_y`

In [3]:
const LABELS = ["fizz", "buzz", "fizzbuzz", "else"];

@test fizzbuzz.([3, 5, 15, 98]) == LABELS

raw_x = 1:100;
raw_y = fizzbuzz.(raw_x);

## Feature engineering
Define a function `features(x)` that takes an int and return the list (as floats) of modulo by 3, 5 and 15


In [4]:
features(x) = float.([x % 3, x % 5, x % 15])

features (generic function with 1 method)

Try to apply it to our vector raw_x and find a way to get a 2d-array 

In [5]:
# Feature engineering
features(x::AbstractArray) = hcat(features.(x)...)


features (generic function with 2 methods)

Extract features from `raw_x` to `X `.
Build the output `y` from `raw_y` with `onehotbatch`

In [6]:
X = features(raw_x);
y = onehotbatch(raw_y, LABELS);

## Create the model
We will combine (with `Chain`) two dense neural layer (`Dense`) and use the `softmax` function.
Our first layer has the three modulo as input and has 10 neurons. The second layer takes these 10 neurons and return the probabilities for the 4 cases (buzz, fizz, fizzbuzz, else)

We will use the crossentry as a loss function and `ADAM` as optimizer.

In [7]:
# Model
m = Chain(Dense(3, 10), Dense(10, 4), softmax)
loss(x, y) = crossentropy(m(X), y)
opt = ADAM()

ADAM(0.001, (0.9, 0.999), IdDict{Any,Any}())

## Main function
Create a function `deepbuzz` that takes a number `x`, applies the model `m` on the features of `x` and return the most probable label.

In [8]:
# Helpers
deepbuzz(x) = (a = argmax(m(features(x))); a == 4 ? x : LABELS[a])

deepbuzz (generic function with 1 method)

## Monitor function 

Create a function `monitor(e)` that takes an iteration number, print the loss and and the current value of `deepbuzz` for some inputs.

In [9]:
function monitor(e)
    print("epoch $(lpad(e, 4)): loss = $(round(loss(X,y).data; digits=4)) | ")
    @show deepbuzz.([3, 5, 15, 98])
end

monitor (generic function with 1 method)

## Train
We are now ready to train our model. 
Try for example 1000 runs and monitor every 50 iterations.
You will need the `train!` function

In [10]:
# Training
for e in 0:1000
    train!(loss, params(m), [(X, y)], opt)
    if e % 50 == 0; monitor(e) end
end

epoch    0: loss = 2.6849 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", "fizz", 15, "fizz"]
epoch   50: loss = 1.6216 | deepbuzz.([3, 5, 15, 98]) = Any[3, 5, "buzz", 98]
epoch  100: loss = 1.0701 | deepbuzz.([3, 5, 15, 98]) = Any[3, 5, "buzz", 98]
epoch  150: loss = 0.8143 | deepbuzz.([3, 5, 15, 98]) = Any[3, 5, "buzz", 98]
epoch  200: loss = 0.6899 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "buzz", 98]
epoch  250: loss = 0.5955 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "buzz", 98]
epoch  300: loss = 0.5189 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "buzz", 98]
epoch  350: loss = 0.4551 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "buzz", 98]
epoch  400: loss = 0.4012 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "buzz", 98]
epoch  450: loss = 0.3552 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "fizzbuzz", 98]
epoch  500: loss = 0.3155 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", 5, "fizzbuzz", 98]
epoch  550: loss = 0.2811 | deepbuzz.([3, 5, 15, 98]) = Any["fizz", "buzz", "fizzbuzz", 9

You can now predict on a new data with `predict`

In [12]:
deepbuzz.([3*2,5*4,3*5*10,88])

4-element Array{Any,1}:
   "fizz"    
   "buzz"    
   "fizzbuzz"
 88          