# Yahtzee

__Fully Connected Networks__

_By Marnick van der Arend & Jeroen Smienk_

![Yahtzee](yahtzee.png)

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

%matplotlib inline

__MODEL_PATH = 'models'
__TENSOR_LOG_DIR = 'logs'

## Dataset

Let's start with looking at the provided dataset:

It consists of 5832 'dice throws' and the label that describes that type of throw.

The existing Yahtzee labels are: `nothing`, `small-straight`, `three-of-a-kind`, `large-straight`,
 `full-house`, `four-of-a-kind`, and `yathzee`.
 
 Name | Description
--- | ---
3-of-a-kind | Three dice the same
4-of-a-kind | Four dice the same
Full-house | Three of one number and two of another
Small-straight | Four sequential dice
Large-straight | Five sequential dice
Yahtzee | All five dice the same
Nothing | None of the above combinations

In [None]:
df = pd.read_csv('yahtzee-dataset.csv')
df.head(5)

In order to classify these categorical labels, we have to 'one-hot encode' them:

In [None]:
one_hot_df = pd.get_dummies(df, prefix=['label'])
one_hot_df.head(10)

Before we can train any model, we have to split the data and the labels into X and Y after shuffling them:

In [None]:
shuffled = one_hot_df.sample(frac=1.)
X = shuffled.iloc[:,:5].copy()
Y = shuffled.iloc[:,5:].copy()

X.head(5)

We also split the dataset into a 85:15 split for training and validating the model:

In [None]:
split = int(len(X.index) * .85)
X_train = X.iloc[:split]
X_valid = X.iloc[split:]
Y_train = Y.iloc[:split]
Y_valid = Y.iloc[split:]

split = int(len(X_train.index) * .85)
X_test = X_train.iloc[split:]
X_train = X_train.iloc[:split]
Y_test = Y_train.iloc[split:]
Y_train = Y_train.iloc[:split]

print('Split X (train, test, validation):', X_train.shape, X_test.shape, X_valid.shape)
print('Split Y (train, test, validation):', Y_train.shape, Y_test.shape, Y_valid.shape)

We define a function that returns a random batch of a certain size. This batch is used in training to let the model more easily adapt to the data.

In [None]:
def get_batch(data, labels, batch_size):
    x_batch = data.sample(frac=batch_size / len(data.index))
    return x_batch, labels.loc[x_batch.index]

## Models

We designed several models:

rank | name | layers | score
--- | --- | --- | ---
5 | model_8 | (64, Tanh, Drop=.2) (128, Tanh, Drop=.3) (256, Tanh, Drop=.4) (512, Tanh, Drop=.5) (64, Tanh) | 0.
4 | model_7 | (64, ReLU) (128, ReLU) (256, ReLU) (512, ReLU, Drop=.3) (64, ReLU) | 0.
3 | model_6 | (200, Tanh) (300, Tanh) (600, Tanh) | 0.
2 | model_5 | (600, Tanh) (300, Tanh) (200, Tanh) | 0.
6 | model_4 | (128, Tanh) (64, Tanh) (32, Tanh) | 0.
7 | model_3 | (12, ReLU) (24, ReLU) (48, ReLU, Drop=.1) (96, ReLU) | 0.
8 | model_2 | (128, ReLU) (64, ReLU) (32, ReLU) | 0.
9 | model_1 | (128, Sigmoid) | 0.

Dropout has a positive effect on the score as can be seen in the table. We also found that the tanh activation function performed well. 

- Model 1: BLUE
- Model 2: RED
- Model 3: LIGHT BLUE
- Model 4: PINK
- Model 5: GREEN
- Model 6: GRAY
- Model 7: ORANGE
- Model 8: ORANGE

### Batch Accuracy

![Batch Accuracy](yahtzee-acc.png)

### Batch Loss

![Batch Loss](yahtzee-loss.png)

In [None]:
def model_1(x, output_shape):
    """
    Single hidden layer with 128 neurons and Sigmoid activation function.
    """
    l_1 = tf.layers.dense(x, units=128, activation=tf.nn.sigmoid)
    return tf.layers.dense(l_1, units=output_shape, activation=None)

In [None]:
def model_2(x, output_shape):
    """
    Three hidden layers with different amounts of neurons and relu activation functions.
    """
    l_1 = tf.layers.dense(x, units=128, activation=tf.nn.relu)
    l_2 = tf.layers.dense(l_1, units=64, activation=tf.nn.relu)
    l_3 = tf.layers.dense(l_2, units=32, activation=tf.nn.relu)
    return tf.layers.dense(l_3, units=output_shape, activation=None)

In [None]:
def model_3(x, output_shape):
    """
    Six hidden layers with different amounts of neurons and 
    relu activation functions and 2 dropout layers.
    """
    l_1 = tf.layers.dense(x, units=12, activation=tf.nn.relu)
    l_2 = tf.layers.dense(l_1, units=24, activation=tf.nn.relu)
    l_3 = tf.layers.dense(l_2, units=48, activation=tf.nn.relu)
    d_3 = tf.layers.dropout(l_3, rate=.1)
    l_4 = tf.layers.dense(d_3, units=96, activation=tf.nn.relu)
    return tf.layers.dense(l_4, units=output_shape, activation=None)

In [None]:
def model_4(x, output_shape):
    """
    Three hidden layers with different amounts of neurons and relu activation functions.
    """
    l_1 = tf.layers.dense(x, units=128, activation=tf.nn.tanh)
    l_2 = tf.layers.dense(l_1, units=64, activation=tf.nn.tanh)
    l_3 = tf.layers.dense(l_2, units=32, activation=tf.nn.tanh)
    return tf.layers.dense(l_3, units=output_shape, activation=None)

In [None]:
def model_5(x, output_shape):
    """
    High number of neurons in layers, decreasing per layer
    """
    l_1 = tf.layers.dense(x, units=600, activation=tf.nn.tanh)
    l_2 = tf.layers.dense(l_1, units=300, activation=tf.nn.tanh)
    l_3 = tf.layers.dense(l_2, units=200, activation=tf.nn.tanh)
    return tf.layers.dense(l_3, units=output_shape, activation=None)

In [None]:
def model_5_2(x, output_shape):
    """
    High number of neurons in layers, decreasing per layer
    """
    l_1 = tf.layers.dense(x, units=600, activation=tf.nn.tanh)
    d_1 = tf.layers.dropout(l_1, rate=.3)
    l_2 = tf.layers.dense(d_1, units=300, activation=tf.nn.tanh)
    d_2 = tf.layers.dropout(l_2, rate=.3)
    l_3 = tf.layers.dense(d_2, units=200, activation=tf.nn.tanh)
    d_3 = tf.layers.dropout(l_3, rate=.3)
    return tf.layers.dense(d_3, units=output_shape, activation=None)

In [None]:
def model_6(x, output_shape):
    """
    High number of neurons in layers, increasing per layer
    """
    l_1 = tf.layers.dense(x, units=200, activation=tf.nn.tanh)
    l_2 = tf.layers.dense(l_1, units=300, activation=tf.nn.tanh)
    l_3 = tf.layers.dense(l_2, units=600, activation=tf.nn.tanh)
    return tf.layers.dense(l_3, units=output_shape, activation=None)

In [None]:
def model_7(x, output_shape):
    """
    """
    l_1 = tf.layers.dense(x, units=64, activation=tf.nn.relu)
    l_2 = tf.layers.dense(l_1, units=128, activation=tf.nn.relu)
    l_3 = tf.layers.dense(l_2, units=256, activation=tf.nn.relu)
    l_4 = tf.layers.dense(l_3, units=512, activation=tf.nn.relu)
    d_4 = tf.layers.dropout(l_4, rate=.3)
    l_5 = tf.layers.dense(d_4, units=64, activation=tf.nn.relu)
    return tf.layers.dense(l_5, units=output_shape, activation=None)

In [None]:
def model_8(x, output_shape):
    """
    """
    l_1 = tf.layers.dense(x, units=64, activation=tf.nn.tanh)
    d_1 = tf.layers.dropout(l_1, rate=.2)
    l_2 = tf.layers.dense(d_1, units=128, activation=tf.nn.tanh)
    d_2 = tf.layers.dropout(l_2, rate=.3)
    l_3 = tf.layers.dense(d_2, units=256, activation=tf.nn.tanh)
    d_3 = tf.layers.dropout(l_3, rate=.4)
    l_4 = tf.layers.dense(d_3, units=512, activation=tf.nn.tanh)
    d_4 = tf.layers.dropout(l_4, rate=.5)
    l_5 = tf.layers.dense(d_4, units=64, activation=tf.nn.tanh)
    return tf.layers.dense(l_5, units=output_shape, activation=None)

We start with the placeholder for our 5-dice input and 7-class output and choose a model:

In [None]:
x = tf.placeholder(tf.float32, shape=[None, X.shape[1]], name='x')
y = tf.placeholder(tf.float32, shape=[None, Y.shape[1]], name='y')

model_fn = model_1
y_pred = model_fn(x, Y.shape[1])

## Train, Test, Validate

We choose an optimizer, a loss functon and metrics:

In [None]:
# Loss function
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=y_pred)
loss_fn = tf.reduce_mean(cross_entropy)

# Optimizer minimizes the loss
optimizer = tf.train.AdamOptimizer(learning_rate=.001).minimize(loss_fn)

# Accuracy metric
#   checks if the indices of the highest values in the real 
#   and predicted arrays are equal
prediction = tf.equal(tf.argmax(y, axis=1), tf.argmax(y_pred, axis=1))
accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))

We train the model using a certain batch size and for a number of iterations while posting scalars to TensorBoard:

In [None]:
iters = 3000
train_batch_size = 200
test_batch_size = 50

session = tf.Session()
with session:
    session.run(tf.global_variables_initializer())

    # Defining the metrics we want to log in TensorBoard
    sum_loss_train = tf.summary.scalar('loss_train', loss_fn)
    sum_loss_test = tf.summary.scalar('loss_test', loss_fn)
    sum_acc_train = tf.summary.scalar('acc_train', accuracy)
    sum_acc_test = tf.summary.scalar('acc_test', accuracy)
    tf.summary.merge_all()
    writer = tf.summary.FileWriter(os.path.join(__TENSOR_LOG_DIR, model_fn.__name__), session.graph)

    # Start training for a certain number of iterations
    for i in range(iters):
        # Every iteration we get a random batch of the training data
        x_batch, y_batch = get_batch(X_train, Y_train, train_batch_size)
        # We train the model by providing the 'optimizer' variable to the run function.
        # We also want to calculate the accuracy and loss TensorBoard metrics
        loss_val, _, acc_val, sum_1, sum_2 = session.run([loss_fn, optimizer, accuracy, 
                                                          sum_loss_train, sum_acc_train], 
                                                         feed_dict={x: x_batch, y: y_batch})
        # Write the metrics to TensorBoard
        writer.add_summary(sum_1, global_step=i)
        writer.add_summary(sum_2, global_step=i)

        # Validate every 50 iterations
        if i % 50 == 0:
            # DO NOT PROVIDE THE 'optimzer' VARIABLE HERE
            # ELSE THE MODEL WILL TRAIN ON THE TEST DATA
            acc_val, sum_1, sum_2 = session.run([accuracy, sum_loss_test, sum_acc_test], 
                                                feed_dict={x: X_test, y: Y_test})
            # Write the metrics to TensorBoard
            writer.add_summary(sum_1, global_step=i)
            writer.add_summary(sum_2, global_step=i)
            print('Testing - i:', i+1, ' Accuracy:', acc_val)
    

    # Validate the model with unseen data
    acc_val = session.run([accuracy], feed_dict={x: X_valid, y: Y_valid})
    print('Validation accuracy:', acc_val)

## Exporting & Importing

In [None]:
save_path = '{}.ckpt'.format(os.path.join(__MODEL_PATH, model_fn.__name__, model_fn.__name__))

model_to_load = model_5_2

load_path = '{}.ckpt'.format(os.path.join(__MODEL_PATH, model_to_load.__name__, model_to_load.__name__))

We save the model that worked best:

In [None]:
# with session:
#     tf.train.Saver().save(session, save_path)

We load the model that worked best:

In [None]:
with tf.Session() as saved_session:
    tf.train.Saver().restore(saved_session, load_path)

    # Validate the model with unseen data
    acc_val = saved_session.run([accuracy], feed_dict={x: X_valid, y: Y_valid})

    # Print test metrics
    print('Accuracy:', acc_val)

## Conclusion