# Rocket League Notebook 5: Neural Networks

## Goals 

- Create models using neural networks

## Contents

- (I) First Neural Network (Kitchen Sink Approach)
    - Matches is grouped by match_id and aggregated by mean
    - Use VarianceThreshold with default settings
    - Use StandardScaler with default settings on all columns
    - Run MLPClassifier with hidden_layer_sizes set to (10,10), activation set to tanh, and max_iter set to 1000
- (II) Neural Network: Kitchen Sink without Aggregation
    - No prior aggregation of training set
    - Use VarianceThreshold with default settings
    - Use StandardScaler with default settings on all columns
    - Run MLPClassifier with hidden_layer_sizes set to (10,10), activation set to tanh, and max_iter set to 1000
- (III) Neural Network Kitchen Sink with Differences
    - Matches is grouped by absolute values of differences between player (color) features
    - Use VarianceThreshold with default settings
    - Use StandardScaler with default settings on all columns
    - Run MLPClassifier with hidden_layer_sizes set to (10,10), activation set to tanh, and max_iter set to 1000
- (IV) Neural Network: Not Aggregated but with Cars
    - No prior aggregation of training set
    - OneHotEncode car_names, passthrough all other features
    - Use VarianceThreshold with default settings
    - Use StandardScaler with default settings on all columns
    - Run MLPClassifier with hidden_layer_sizes set to (10,10), activation set to tanh, and max_iter set to 1000
- (V) Neural Network: Not Aggregated, include cars and include engineered features
    - No prior aggregation of training set
    - Add the following features to the dataset:
        - score_per_second = lambda x: x['score']/x['duration'],
        - lowvhigh = lambda x: x['percent_low_air']/x['percent_high_air'],
        - percent_boost_50_100 = lambda x: x['percent_boost_50_75']+x['percent_boost_75_100'],
        - goals_saves_pm = lambda x: (x['goals']+x['saves'])*60/x['duration'],
        - save_prop = lambda x: x['saves']/x['shots_against']
    - OneHotEncode car_names, passthrough all other features
    - Use VarianceThreshold with default settings
    - Use StandardScaler with default settings on all columns
    - Run MLPClassifier with hidden_layer_sizes set to (10,10), activation set to tanh, and max_iter set to 1000
- (VI) Neural Network with Matches Wide, no cars
    - No prior aggregation of training set
    - Widen dataset so that each column has a _winner and _loser suffix and is all on the same line
    - Use VarianceThreshold with default settings
    - Use StandardScaler with default settings on all columns
    - Run MLPClassifier with hidden_layer_sizes set to (10,10), activation set to tanh, and max_iter set to 1000

## Results

- (I) First Neural Network (Kitchen Sink Approach)
    - Accuracy Score:  0.5537113265170628
    - Improvement over logistic regression without much extra effort. Sacrificed interpretability.
    - Yields *submission_2022-03-30_v1.csv*
- (II) Neural Network: Kitchen Sink without Aggregation
    - Accuracy Score:  0.49392470619480777
    - Worse than with aggregation
- (III) Neural Network Kitchen Sink with Differences
    - Accuracy Score:  0.34085778781038373
    - Much worse than aggregated data
- (IV) Neural Network: Not Aggregated but with Cars
    - Accuracy Score:  0.4967133656463714
    - Very slight improvement over simply not including cars
- (V) Neural Network: Not Aggregated, include cars and include engineered features
    - Accuracy score:  0.4932607396587212
    - No improvement when tacking on engineered features, though perhaps these could replace some raw features
- (VI) Neural Network with Matches Wide, no cars
    - Accuracy score:  0.48293719293586507
    - Slightly worse than unaggregated long with no cars. More opportunity of feature engineering here, though.

## Imports

In [35]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import seaborn as sns
from sklearn.compose import ColumnTransformer
from sklearn.feature_selection import VarianceThreshold
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.preprocessing import MinMaxScaler, StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline

## Read In

In [19]:
matches = pd.read_csv('../data/train.csv')
test = pd.read_csv('../data/test.csv')

## Converters and Functions

In [20]:
converter = { 'bronze': 1, 'silver': 2, 'gold': 3, 'platinum': 4, 'diamond': 5, 'champion': 6 }
catvars = ['rank', 'color', 'map_code', 'car_name']

In [21]:
def filter_outliers(df, col):
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lowbound = Q1-1.5*IQR
    highbound = Q3+1.5*IQR
    df_bounded = df[(df[col] >= lowbound) & (df[col] <= highbound)]

    return df_bounded

## Outlier Filter

In [22]:
numcols = matches.drop(columns = ['match_id', 'color', 'rank', 'map_code', 'car_name', 'mvp']).columns

In [23]:
matches_inliers = matches.copy()

for col in matches_inliers[numcols].columns:

    matches_inliers = filter_outliers(matches_inliers, col)

matches_inliers.shape

(22216, 91)

## (I) First Neural Network (Kitchen Sink Approach)

In [24]:
matches_prepped = matches.groupby(['match_id', 'rank']).mean().reset_index().fillna(0)

X = matches_prepped.drop(columns = ['match_id', 'rank'])
y = matches_prepped['rank']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify = y)

pipe = Pipeline(steps = [
        ('vt', VarianceThreshold()),
        ('scaler', StandardScaler()),
        ('neural', MLPClassifier(hidden_layer_sizes = (10,10),
                                 activation = 'tanh',
                                 max_iter = 1000))
    ])

pipe.fit(X_train, y_train)
accuracy_score(y_test, pipe.predict(X_test))

0.5537113265170628

In [25]:
print(classification_report(y_test, pipe.predict(X_test)))

              precision    recall  f1-score   support

      bronze       0.53      0.30      0.38       184
    champion       0.69      0.71      0.70      1470
     diamond       0.49      0.48      0.48      1729
        gold       0.55      0.57      0.56      1563
    platinum       0.51      0.54      0.52      1875
      silver       0.58      0.48      0.52       710

    accuracy                           0.55      7531
   macro avg       0.56      0.51      0.53      7531
weighted avg       0.55      0.55      0.55      7531



In [26]:
pipe.fit(X, y)

Pipeline(steps=[('vt', VarianceThreshold()), ('scaler', StandardScaler()),
                ('logreg',
                 MLPClassifier(activation='tanh', hidden_layer_sizes=(10, 10),
                               max_iter=1000))])

In [27]:
test_prep = test.groupby('match_id').mean().reset_index().fillna(0)
y_pred = pipe.predict(test_prep.drop(columns = 'match_id'))

In [28]:
submission = pd.DataFrame({'match_id':test_prep.index, 'rank': y_pred})
submission['rank'] = submission['rank'].map(converter)
submission['match_id'] = submission['match_id']+30121
#submission.to_csv('../submissions/submission_2022-03-30_v1.csv', index = False)

## (II) Neural Network: Kitchen Sink without Aggregation

Worse than aggregated data.

In [29]:
X = matches.drop(columns = catvars).fillna(0)
y = matches['rank']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify = y)

pipe = Pipeline(steps = [
        ('vt', VarianceThreshold()),
        ('scaler', StandardScaler()),
        ('neural', MLPClassifier(hidden_layer_sizes = (10,10),
                                 activation = 'tanh',
                                 max_iter = 1000))
    ])

pipe.fit(X_train, y_train)
accuracy_score(y_test, pipe.predict(X_test))

0.49392470619480777

## (III) Neural Network Kitchen Sink with Differences

Much worse than aggregated data.

In [30]:
match_diffs = (matches.drop(columns=catvars)
        .fillna(0)
        .groupby('match_id')
        .diff()
        .abs()
        .dropna()
        .reset_index(drop = True).reset_index()
        .rename(columns = {'index' : 'match_id'})
    )
match_diffs = match_diffs.merge(matches[['match_id', 'rank']].drop_duplicates())
X = match_diffs.drop(columns = 'rank')
y = match_diffs['rank']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify = y)

pipe = Pipeline(steps = [
        ('vt', VarianceThreshold()),
        ('scaler', StandardScaler()),
        ('neural', MLPClassifier(hidden_layer_sizes = (10,10),
                                 activation = 'tanh',
                                 max_iter = 1000))
    ])

pipe.fit(X_train, y_train)
accuracy_score(y_test, pipe.predict(X_test))

0.34085778781038373

## (IV) Neural Network: Not Aggregated but with Cars

In [32]:
dropcols = ['match_id', 'color', 'rank', 'map_code']
X = matches.drop(columns = dropcols).fillna(0)
y = matches['rank']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify = y)

ohe = OneHotEncoder(sparse = False, drop = 'first')

ct = ColumnTransformer(transformers= [
        ('ohe', ohe, ['car_name'])
        ],
        remainder = 'passthrough'
    )

pipe = Pipeline(steps = [
        ('ct', ct),
        ('vt', VarianceThreshold()),
        ('scaler', StandardScaler()),
        ('neural', MLPClassifier(hidden_layer_sizes = (10,10),
                                 activation = 'tanh',
                                 max_iter = 1000))
    ])

pipe.fit(X_train, y_train)
accuracy_score(y_test, pipe.predict(X_test))

0.4967133656463714

## (V) Neural Network: Not Aggregated, include cars and include engineered features

In [36]:
matches_plus = matches.assign(
            score_per_second = lambda x: x['score']/x['duration'],
            lowvhigh = lambda x: x['percent_low_air']/x['percent_high_air'],
            percent_boost_50_100 = lambda x: x['percent_boost_50_75']+x['percent_boost_75_100'],
            goals_saves_pm = lambda x: (x['goals']+x['saves'])*60/x['duration'],
            save_prop = lambda x: x['saves']/x['shots_against']
    ).replace([np.inf, -np.inf], 0).fillna(0)

X = matches_plus.drop(columns=dropcols)
y = matches_plus['rank']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify= y)

ohe = OneHotEncoder(sparse= False, drop='first')

ct = ColumnTransformer(transformers = [
            ('ohe', ohe, ['car_name'])
        ],
        remainder='passthrough'
    )

neural = MLPClassifier(hidden_layer_sizes=(10,10),
                      activation='tanh',
                      max_iter=1000)

pipe = Pipeline(steps = [
    ('ct', ct),
    ('vt', VarianceThreshold()),
    ('scaler', StandardScaler()),
    ('neural', neural)
])

pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.4932607396587212


## (VI) Neural Network with Matches Wide, no cars

In [37]:
matches_win = matches.sort_values(['match_id', 'goals', 'score'], ascending=[True, False, False]).drop_duplicates(subset = ['match_id'], keep = 'first')
matches_lose = matches.sort_values(['match_id', 'goals', 'score'], ascending=[True, False, False]).drop_duplicates(subset = ['match_id'], keep = 'first')
matches_wide = matches_win.merge(matches_lose, on = ['match_id', 'rank', 'map_code'], suffixes=('_win','_lose'))

In [38]:
dropcols_wide = ['match_id', 'color_win', 'color_lose', 'rank', 'map_code', 'car_name_win', 'car_name_lose']
X = matches_wide.drop(columns = dropcols_wide).fillna(0)
y = matches_wide['rank']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify = y)

# ohe = OneHotEncoder(sparse = False, drop = 'first')

# ct = ColumnTransformer(transformers= [
#         ('ohe', ohe, ['car_name'])
#         ],
#         remainder = 'passthrough'
#     )

pipe = Pipeline(steps = [
        # ('ct', ct),
        ('vt', VarianceThreshold()),
        ('scaler', StandardScaler()),
        ('neural', MLPClassifier(hidden_layer_sizes = (10,10),
                                 activation = 'tanh',
                                 max_iter = 1000))
    ])

pipe.fit(X_train, y_train)
accuracy_score(y_test, pipe.predict(X_test))

0.48293719293586507