### League of Legends: Model Training
Standard Methodology:

1. Exploratory plots to get a sense of data (e.g. relationships, distribution etc.)
2. **Perform transformations (standardization, log-transform, PCA etc.)**
3. **Experiment with algorithms that make sense, feature selection and compare cross-validated performance.Algos to thinks about: Tree-Based, Basis Expansion, Logistic Regression, Discriminant  Analysis, Boosting, Neural Nets...**

4. **Run on test set**

In [1]:
import pandas as pd
import numpy as np
import pickle
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import preprocessing
import math
from sklearn.decomposition import PCA
from sklearn.decomposition import KernelPCA
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import statsmodels.api as sm
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import roc_curve

#BaseEstimator will inherit get_parms and set_parms methods. 
#TransformerMixin will inherit fit_transform, which calls fit and transform. We can customize our fit and transform
#These are used for consistency with existing sklearn classes
from sklearn.base import BaseEstimator, TransformerMixin

In [None]:
def read_pickle(path):
    
    input_file = open(path,'rb')
    variable = pickle.load(input_file)
    input_file.close()
    return(variable)

In [None]:
x_train = '../data/x_train.pickle'
x_test = '../data/x_test.pickle'
y_train = '../data/y_train.pickle'
y_test = '../data/y_test.pickle'

x_train = read_pickle(x_train) 
x_test = read_pickle(x_test) 
y_train = read_pickle(y_train) 
y_test = read_pickle(y_test) 

### Transformations ###

Let's begin with applying the transformations we deemed suitable during EDA. 
1. Standardize the data, 
2. Remove crit and crit per level variables 
3. Log cs feature 
4. Create per level * gamelength variable
5. Perform PCA with 30 components

In [3]:
#Custom transformer
class FeatureEngineering(BaseEstimator, TransformerMixin):
    
    __slots__ = ['x_df']
    
    #Initiate class
    def __init__(self, x_df, pca_components): 
        self.x_df = x_df
        self.pca_components = pca_components
        
    #We don't need to fit anything, so leave this as is
    def fit(self, x_df):
        return self
    
    #Perform our feature transformations
    def transform(self, x_df):
        
        #Standardize data
        standard_scaler = preprocessing.StandardScaler()
        x_scaled = standard_scaler.fit_transform(x_df)
        x_df = pd.DataFrame(x_scaled, columns = x_df.set_index('gameid').columns)
        
        #Log cs field
        x_df['log_delta_total_cs'] = math.log(x_df['delta_total_cs'])
        x_df = x_df.drop('delta_total_cs', axis = 1)
        
        #Create per_level * gamelength variables
        feature_columns = x_df.columns
        per_level = [feature for feature in feature_columns if "perlevel" in feature]
        
        for i in per_level:
            field_name = i + str('_gamelength')
            x_df[field_name] = x_df[i] * x_df['gamelength']
            
        #30 component PCA
        pca = PCA(n_components = self.pca_components) #subtracting out crit columns
        x_df = pca.fit_transform(x_df.values)
        
        self.x_df = x_df
    