# Predicting Clash Royale Battles
Clash Royale is a game of fast paced 1v1 battles. By training on dataset of a player's pre-battle information, including deck configuration and trophy count, can the final outcome of a battle be predicted before it happens? 

[Dataset collected by me.](https://www.kaggle.com/nonrice/clash-royale-battles-upper-ladder-december-2021)

## Load Dataset

In [None]:
import pandas as pd
df = pd.read_csv("../data/data_ord.csv").iloc[:,1:]
df

## Encoding
Player decks are:
* unordered
* distinct; no duplicates
* a set of 8 cards, selected from a pool of 106 cards

The current encoding of the dataset **wrongly assumes** player decks and cards are:
* ordered (False)
* might not be distinct (False)
* numerically related (False)

As a result, the data must be encoded in a better way.

The best way to do this is through One-Hot Encoding. A deck can be fully described with 106 columns, 1 column per card in the game. In these columns, exactly 8 (1 column per card in the deck) will have a value of `1`, signifying the usage of that card, and the rest will have a value of `0`. **This method of encoding preserves the unordered nature of a deck, as well as how individual cards are not numerically related.** In order to describe the two decks in each sample, `106*2=212` columns will be used.

In [None]:
# Add constant value to second player's deck so player 2's deck will use the next 106 columns after player 1
for c in range(1, 9):
    df["p2card{}".format(c)] = df["p2card{}".format(c)] + 106

In [None]:
# Generate Columns
cardlist = pd.read_csv("/kaggle/input/clash-royale-battles-upper-ladder-december-2021/cardlist.csv").iloc[:,1:]
cards = cardlist["card"]
columns = []
for i in range(1, 3):
    for card in cards:
        columns.append(card + str(i))
        
columns.append("trophies1")
columns.append("trophies2")
columns.append("outcome")
        
len(columns) # Should equal 215: 106*2 cards, 2 trophy values, 1 outcome value

In [None]:
def enc_row(row_ord=tuple):
    row_enc = [0] * 215 
    for c in range(1, 17):
        if row_ord[c] > 211: print(row_ord[c])
        row_enc[row_ord[c]] = 1
    row_enc[212] = row_ord[17] # p2trophies
    row_enc[213] = row_ord[18] # p1trophies
    row_enc[214] = row_ord[19] # outcome
    
    return row_enc

In [None]:
# make the encoded dataframe

rows = []
for row in df.itertuples():
    rows.append(enc_row(row))
    
df_enc = pd.DataFrame(data=rows, columns=columns)
df_enc

## Preprocessing


In [None]:
# Split data + Preprocessing

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

X = df_enc.iloc[:,0:214]
y = df_enc.iloc[:,214]

scaler = MinMaxScaler()
X = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## Neural Network
After fiddling around with various layer sizes, I decided to go with a "funnel" shaped network with L1 regularization. This makes the network learn to reduce dimensionality, allowing it to find more abstract relationships, as well as filter out less important features through the more aggresive L1 regularization.

In [None]:
# Create Network

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import L1
from tensorflow.keras.optimizers import Adam
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"

model = Sequential()
model.add(Dense(144, input_dim=214, activation="relu", kernel_regularizer=L1(0.0001), kernel_initializer="he_uniform"))
model.add(Dense(72, activation="relu", kernel_regularizer=L1(0.0001), kernel_initializer="he_uniform"))
model.add(Dense(36, activation="relu", kernel_regularizer=L1(0.0001), kernel_initializer="he_uniform"))
model.add(Dense(36, activation="relu", kernel_regularizer=L1(0.0001), kernel_initializer="he_uniform"))
model.add(Dense(36, activation="relu", kernel_regularizer=L1(0.0001), kernel_initializer="he_uniform"))
model.add(Dense(36, activation="relu", kernel_regularizer=L1(0.0001), kernel_initializer="he_uniform"))
model.add(Dense(1, activation="sigmoid", kernel_regularizer=L1(0.0001)))
model.summary()

model.compile(
    optimizer=Adam(learning_rate=0.00025),
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

In [None]:
# Training

from tensorflow.keras.callbacks import EarlyStopping
from tqdm.keras import TqdmCallback

epochs_hist = model.fit(
    X_train,
    y_train,
    shuffle=True,
    epochs=128,
    batch_size=196,
    verbose=0,
    validation_split=0.2,
    callbacks=[TqdmCallback(), EarlyStopping(monitor="val_loss", patience=10)]
)

## Training Results

In [None]:
# Plot loss

from matplotlib import pyplot as plt

plt.figure(dpi=100)
plt.plot(epochs_hist.history['loss'], label="Training Loss")
plt.plot(epochs_hist.history['val_loss'], label="Validation Loss")

In [None]:
# Confusion Matrix on novel data

from sklearn.metrics import confusion_matrix

y_pred = model.predict(X_test) 
y_pred = [1 if x >= 0.5 else 0 for x in y_pred]

tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
print("True Negative:", tn)
print("False Positive:", fp)
print("False Negative:", fn)
print("True Positive:", tp)
print("Final Accuracy:", (tp+tn)/(tp+tn+fp+fn))

## Conclusions
This is was one of my first few projects in the world of machine learning. Despite that, scraping my own data and developing a complex model was still a magical experience. Granted that yes, the predictive accuracy I achieved is nothing to gape at, I don't take it heavily, because so many factors in the world of gaming are not, and could not, be reflected in my dataset. However, that is all speculation. Maybe one day, a few years in the future, I will look back at this early project and think *wow... I could have done so much better.*