# Challenge 1 - Tic Tac Toe

In this lab you will perform deep learning analysis on a dataset of playing [Tic Tac Toe](https://en.wikipedia.org/wiki/Tic-tac-toe).

There are 9 grids in Tic Tac Toe that are coded as the following picture shows:

![Tic Tac Toe Grids](tttboard.jpg)

In the first 9 columns of the dataset you can find which marks (`x` or `o`) exist in the grids. If there is no mark in a certain grid, it is labeled as `b`. The last column is `class` which tells you whether Player X (who always moves first in Tic Tac Toe) wins in this configuration. Note that when `class` has the value `False`, it means either Player O wins the game or it ends up as a draw.

Follow the steps suggested below to conduct a neural network analysis using Tensorflow and Keras. You will build a deep learning model to predict whether Player X wins the game or not.

## Step 1: Data Engineering

This dataset is almost in the ready-to-use state so you do not need to worry about missing values and so on. Still, some simple data engineering is needed.

1. Read `tic-tac-toe.csv` into a dataframe.
1. Inspect the dataset. Determine if the dataset is reliable by eyeballing the data.
1. Convert the categorical values to numeric in all columns.
1. Separate the inputs and output.
1. Normalize the input data.

In [13]:
# your code here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('tic-tac-toe.csv')
display(df.head())
display(df.info())
df['class'].value_counts()

Unnamed: 0,TL,TM,TR,ML,MM,MR,BL,BM,BR,class
0,x,x,x,x,o,o,x,o,o,True
1,x,x,x,x,o,o,o,x,o,True
2,x,x,x,x,o,o,o,o,x,True
3,x,x,x,x,o,o,o,b,b,True
4,x,x,x,x,o,o,b,o,b,True


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 958 entries, 0 to 957
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   TL      958 non-null    object
 1   TM      958 non-null    object
 2   TR      958 non-null    object
 3   ML      958 non-null    object
 4   MM      958 non-null    object
 5   MR      958 non-null    object
 6   BL      958 non-null    object
 7   BM      958 non-null    object
 8   BR      958 non-null    object
 9   class   958 non-null    bool  
dtypes: bool(1), object(9)
memory usage: 68.4+ KB


None

class
True     626
False    332
Name: count, dtype: int64

In [14]:
# convert the categorical data to numerical data
df = pd.get_dummies(df, drop_first=True, dtype=int)
display(df.head())
display(df['class'].value_counts())
df.info()

Unnamed: 0,class,TL_o,TL_x,TM_o,TM_x,TR_o,TR_x,ML_o,ML_x,MM_o,MM_x,MR_o,MR_x,BL_o,BL_x,BM_o,BM_x,BR_o,BR_x
0,True,0,1,0,1,0,1,0,1,1,0,1,0,0,1,1,0,1,0
1,True,0,1,0,1,0,1,0,1,1,0,1,0,1,0,0,1,1,0
2,True,0,1,0,1,0,1,0,1,1,0,1,0,1,0,1,0,0,1
3,True,0,1,0,1,0,1,0,1,1,0,1,0,1,0,0,0,0,0
4,True,0,1,0,1,0,1,0,1,1,0,1,0,0,0,1,0,0,0


class
True     626
False    332
Name: count, dtype: int64

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 958 entries, 0 to 957
Data columns (total 19 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   class   958 non-null    bool 
 1   TL_o    958 non-null    int32
 2   TL_x    958 non-null    int32
 3   TM_o    958 non-null    int32
 4   TM_x    958 non-null    int32
 5   TR_o    958 non-null    int32
 6   TR_x    958 non-null    int32
 7   ML_o    958 non-null    int32
 8   ML_x    958 non-null    int32
 9   MM_o    958 non-null    int32
 10  MM_x    958 non-null    int32
 11  MR_o    958 non-null    int32
 12  MR_x    958 non-null    int32
 13  BL_o    958 non-null    int32
 14  BL_x    958 non-null    int32
 15  BM_o    958 non-null    int32
 16  BM_x    958 non-null    int32
 17  BR_o    958 non-null    int32
 18  BR_x    958 non-null    int32
dtypes: bool(1), int32(18)
memory usage: 68.4 KB


In [15]:
# # convert categorical to numerical
# df = df.replace('x', 1)
# df = df.replace('o', -1)
# df = df.replace('b', 0)

# # convert target to numerical
# df = df.replace(True, 1)
# df = df.replace(False, 0)

# display(df.head())


In [44]:
# import scaler
from sklearn.preprocessing import StandardScaler

inputs = df.drop('class', axis=1)
target = df['class']

# was going to normalize, but opted to use the StandardScaler after dividing the data into train and test sets. 
# i used the standard scaler with the train set to avoid data leakage

## Step 2: Build Neural Network

To build the neural network, you can refer to your own codes you wrote while following the [Deep Learning with Python, TensorFlow, and Keras tutorial](https://www.youtube.com/watch?v=wQ8BIBpya2k) in the lesson. It's pretty similar to what you will be doing in this lab.

1. Split the training and test data.
1. Create a `Sequential` model.
1. Add several layers to your model. Make sure you use ReLU as the activation function for the middle layers. Use Softmax for the output layer because each output has a single lable and all the label probabilities add up to 1.
1. Compile the model using `adam` as the optimizer and `sparse_categorical_crossentropy` as the loss function. For metrics, use `accuracy` for now.
1. Fit the training data.
1. Evaluate your neural network model with the test data.
1. Save your model as `tic-tac-toe.model`.

In [24]:
# !pip install tensorflow

In [166]:
# your code here
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.models import Sequential
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split

num_classes = 2

# split data
X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2, random_state=42)

# normalize training data
scaler = StandardScaler()
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns=df.columns[1:])

# normalize testing data
X_test = pd.DataFrame(scaler.transform(X_test), columns=df.columns[1:])

# create model
model = Sequential()

model.add(Input(shape=(X_train.shape[1],)))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

adam = Adam(learning_rate=0.01)
model.compile(optimizer=adam, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.summary()

In [43]:
print(y_train.shape)
print(y_test.shape)
print(y_train)

# y_train

(766,)
(192,)
302     True
467     True
294     True
548     True
465     True
       ...  
106     True
270     True
860    False
435     True
102     True
Name: class, Length: 766, dtype: bool


In [167]:
fitting = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test), verbose=1)

Epoch 1/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - accuracy: 0.6442 - loss: 0.6425 - val_accuracy: 0.6875 - val_loss: 0.5633
Epoch 2/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6863 - loss: 0.5707 - val_accuracy: 0.7552 - val_loss: 0.5129
Epoch 3/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7844 - loss: 0.4753 - val_accuracy: 0.7969 - val_loss: 0.4275
Epoch 4/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8330 - loss: 0.3668 - val_accuracy: 0.8646 - val_loss: 0.3530
Epoch 5/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8684 - loss: 0.3248 - val_accuracy: 0.9062 - val_loss: 0.2629
Epoch 6/10
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9029 - loss: 0.2685 - val_accuracy: 0.9427 - val_loss: 0.1792
Epoch 7/10
[1m24/24[0m [32m━━━━━━━━━

In [168]:
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test Loss: {test_loss:.4f}")

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.9726 - loss: 0.1106 
Test Accuracy: 0.9740
Test Loss: 0.1082


In [46]:
# save model as tic-tac-toe.model
model.save('tic-tac-toe.keras')

## Step 3: Make Predictions

Now load your saved model and use it to make predictions on a few random rows in the test dataset. Check if the predictions are correct.

In [None]:
# delete model variable
del model
model

In [139]:
# load model
model = keras.models.load_model('tic-tac-toe.keras')

X_sample = X_test.sample(20)
y_sample = y_test.values[X_sample.index]

# make prediction on a few random rows from the test dataset
predictions = model.predict(X_sample)

# Check if predictions are correct
results = pd.DataFrame({'Predictions': predictions[:,1]>0.5, 'Real values': y_sample})
results['Correct'] = results['Predictions'] == results['Real values']
results['Correct'] = results['Correct'].replace({True: 'Yes', False: 'No'})

display(results)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step


Unnamed: 0,Predictions,Real values,Correct
0,False,False,Yes
1,True,True,Yes
2,False,False,Yes
3,False,False,Yes
4,False,False,Yes
5,True,True,Yes
6,True,True,Yes
7,False,False,Yes
8,False,False,Yes
9,False,False,Yes


## Step 4: Improve Your Model

Did your model achieve low loss (<0.1) and high accuracy (>0.95)? If not, try to improve your model.

But how? There are so many things you can play with in Tensorflow and in the next challenge you'll learn about these things. But in this challenge, let's just do a few things to see if they will help.

* Add more layers to your model. If the data are complex you need more layers. But don't use more layers than you need. If adding more layers does not improve the model performance you don't need additional layers.
* Adjust the learning rate when you compile the model. This means you will create a custom `tf.keras.optimizers.Adam` instance where you specify the learning rate you want. Then pass the instance to `model.compile` as the optimizer.
    * `tf.keras.optimizers.Adam` [reference](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam).
    * Don't worry if you don't understand what the learning rate does. You'll learn about it in the next challenge.
* Adjust the number of epochs when you fit the training data to the model. Your model performance continues to improve as you train more epochs. But eventually it will reach the ceiling and the performance will stay the same.

In [189]:
# your code here
# create model
model2 = Sequential()

model2.add(Input(shape=(X_train.shape[1],)))
model2.add(Dense(8, activation='relu'))
model2.add(Dropout(0.2))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.2))
model2.add(Dense(32, activation='relu'))
model2.add(Dropout(0.2))
model2.add(Dense(num_classes, activation='softmax'))

adam = Adam(learning_rate=0.01)
model2.compile(optimizer=adam, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model2.summary()

In [190]:
fitting = model2.fit(X_train, y_train, epochs=16, batch_size=32, validation_data=(X_test, y_test), verbose=1)

Epoch 1/16
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 12ms/step - accuracy: 0.6281 - loss: 0.6574 - val_accuracy: 0.7500 - val_loss: 0.5191
Epoch 2/16
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7071 - loss: 0.5464 - val_accuracy: 0.7812 - val_loss: 0.4497
Epoch 3/16
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7489 - loss: 0.4660 - val_accuracy: 0.8229 - val_loss: 0.3783
Epoch 4/16
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7834 - loss: 0.4032 - val_accuracy: 0.9219 - val_loss: 0.2646
Epoch 5/16
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.8542 - loss: 0.3380 - val_accuracy: 0.9479 - val_loss: 0.1922
Epoch 6/16
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.8895 - loss: 0.2578 - val_accuracy: 0.9688 - val_loss: 0.1146
Epoch 7/16
[1m24/24[0m [32m━━━━━━━━━

In [191]:
test_loss2, test_accuracy2 = model2.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy2:.4f}")
print(f"Test Loss: {test_loss2:.4f}")

[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.9609 - loss: 0.1273 
Test Accuracy: 0.9688
Test Loss: 0.0909


**Which approach(es) did you find helpful to improve your model performance?**

> In the first iteration of my sequential model I used **16** and **32** neurons in the hidden layers and a dropout of 0.2. I also used 10 epochs and a learning rate of 0.001. The model achieved a **test loss of 0.145** and a **test accuracy of 0.97**.

> Since for the first iteration everyithing worked good, for the second one I changed the parameters to see the different results. I used 2 hidden layers with 8 and 16 neurons and a dropout of 0.2. I also used 4 epochs and a learning rate of 0.11. The model achieved a **test loss of 0.25453** and an **accuracy of 0.8906**.

> For the third iteration i left it as it was but only changed the epochs from 2 to 16. The results were a test **loss of 0.2418 and a test accuracy of 0.9531.**

> For the fourth iteration i changed the learning rate back to 0.001. The test accuracy was of 0.7396 and the test loss was of 0.5432. Not much change

> For the fifth iteration I used 1 hidden layer with 16 neurons and a dropout of 0.2. I also used 2 epochs and a learning rate of 0.11. The model achieved a **test accuracy of 0.8594** and a **test loss of 0.3559**.

> For the 6th iteration I used two hidden layers, one with 32 neurons and the other with 64 neurons, with dropouts of 0.2, **learning rate of 0.1**, and 16 epochs. The model achieved a **test accuracy of 0.9635** and a **test loss of 0.1374**.

> 7th iteration: same as 6th, but with a learning rate of 0.01. The model achieved a **test accuracy of 0.9688** and a **test loss of 0.0666**.

> For the 8th iteration i used 3 hidden layers: 8, 16 and 32 neurons, with dropouts of 0.2, **learning rate of 0.01**, and 16 epochs. The model achieved a **test accuracy of 0.9688** and a **test loss of 0.0909**.


Summarizing on a table:
| Iteration | Hidden Layers | Neurons | Dropout | Learning Rate | Epochs | Test Loss | Test Accuracy |
|-----------|---------------|---------|---------|---------------|--------|-----------|---------------|
| 1         | 2             | 16, 32  | 0.2     | 0.001         | 10     | 0.145     | 0.97          |
| 2         | 2             | 8, 16   | 0.2     | 0.11          | 4      | 0.25453   | 0.8906        |
| 3         | 2             | 8, 16   | 0.2     | 0.11          | 16     | 0.2418    | 0.9531        |
| 4         | 2             | 8, 16   | 0.2     | 0.001         | 16      | 0.5432    | 0.7396        |
| 5         | 1             | 16      | 0.2     | 0.11          | 2      | 0.3559    | 0.8594        |
| 6         | 2             | 32, 64  | 0.2     | 0.1           | 16     | 0.1374    | 0.9635     |
| 7         | 2             | 32, 64  | 0.2     | 0.01          | 16     | 0.0666    | 0.9688        |
| 8         | 3             | 8, 16, 32 | 0.2   | 0.01          | 16     | 0.0909    | 0.9688         |




> Observing the changes, there is no single way to improve results, but many. In iterations 2 and 3, changing only the epochs from 4 to 16 improved the results. When comparing iteration 3 and 4, changing only the learning rate (making it smaller) made a big change in results; the accuracy diminished from 0.95 to 0.74. 

> If we compare iteration 6 and 7, also the learning rate was the only thing changed, yet, the result variation was not as big as in iteration 3 and 4. It looks that the learning rate affects more when there are less neurons, but I can not generalize since I only tested it in this case.

> Between iterations 7 and 8, another hidden layer was added with different neurons, and everything else was kept the same. The results were the same, but the loss was smaller in iteration 8.

> In conclusion, neural networks are very complex and there are many ways to improve results. It is important to test different parameters and see how they affect the results.