### Introduction
This notebook demonstrates how I built highly accurate shot predictions for the MyDigitalTennisCoach app.  
I show the code and working towards the predictions, and how I combined them for the best results.  

I share my approaches and reasoning at different points and conclude with ways in which I could enhance this.  

There are elements which lay outside of building the predictions, such as the data creation and conversion.  
These will be covered in separate workbooks.

In [1]:
from jupyterthemes import get_themes
import jupyterthemes as jt
from jupyterthemes.stylefx import set_nb_theme
set_nb_theme("chesterish")

In [1]:
import pandas as pd
import numpy as np
import pickle
import xgboost as xgb
from sklearn.model_selection import train_test_split

pd.options.mode.chained_assignment = None  # default='warn'
from warnings import simplefilter 
simplefilter(action="ignore", category=pd.errors.PerformanceWarning)
# pd.set_option("display.max_columns", 500)
# pd.set_option("display.max_rows", 500)
from IPython.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

## Source Data
The data was gathered via IMUs, IoT devices that measure accelerameter and gyroscope data at frequencies of 50 - 100Hz.  
This data has been preprocessed, meaning it has been identified as a shot and cut to the core data and labelled.  
The method by which is has been identified as a shot will be built into a separate notebook.

In [13]:
# read in all data sets and combine
d1 = pd.read_csv("D:/OneDrive/DataSci/Tennis/03_Modelling/DataSets/Watch_Q122_DataAndPreds.csv")
d1["Source"] = "WatchQ122"
d2 = pd.read_csv("D:/OneDrive/DataSci/Tennis/03_Modelling/DataSets/Xsens_EvalData.csv")
d2.drop(d2.iloc[:,739:749], axis =1, inplace =True)
d2["Source"] = "XsensEval"
d3 = pd.read_csv("D:/OneDrive/DataSci/Tennis/03_Modelling/DataSets/Xsens_ModelData.csv")
d3["Source"] = "XsensModel"
d4 = pd.read_csv("D:/OneDrive/DataSci/Tennis/03_Modelling/DataSets/Watch_Q2Data.csv")
d4["Source"] = "WatchQ222"

d5 = pd.read_csv("D:/OneDrive/DataSci/Tennis/03_Modelling/DataSets/Xsens_ServeEvalData.csv")
d5["Source"] = "XsensServeEval"

df = pd.concat([d1,d2,d3,d4, d5] , axis = 0)
#restrict to broader key variables
df2 = df[d4.columns.to_list()]
df2 = df2[df2.Shot.notnull()]
df2 = df2[df2.Shot.isin(["FH", "BH", "Serve", "Slice", "Volley", "BH_Slice", "FH_Slice", "BHS", "OH", "Overhead", "FHS"])]
print(df2.shape)

  d2 = pd.read_csv("D:/OneDrive/DataSci/Tennis/03_Modelling/DataSets/Xsens_EvalData.csv")


(8606, 387)


In [14]:
#reset the index as the data is built on concats
df2 = df2.drop("Key", axis=1).reset_index(drop = True)
df2 = df2.reset_index().rename(columns = {"index": "origindex"})
print(df2.shape)
print(len(df2.groupby("origindex")["Shot"].count()))

(8606, 387)
8606


## A brief description of the data:  
There are 8606 shots that have been labelled, and each shot has 387 columns.  

366 of these are variables which can be used for prediction.  
The others are labels and other models which have been added to this data.  

There are 3 direction vectors measured by the devices, (X, Y & Z) for the accelerometer and gyroscope data.  
For each shot I found the strike point, and took the 30 milliseconds prior to this and the 30 millseconds post.  
This results in 61 measurements per measurement type, per directional plane.  
61 * 3 * 3 = 366

In [15]:
# There is a separate prediction for Serves and this notebook is focused on predictions for the other shots

# remove the serve data and create cleanshot which simplifies and aligns the labelling
df3 = df2[df2.Shot != "Serve"]
df3["CleanShot"] = np.where(df3.Shot.isin(["Slice", "BH_Slice", "FH_Slice", "BHS", "FHS"]), "Slice",
                           np.where(df3.Shot.isin(["OH", "Overhead"]), "OH", df3.Shot))
df3.groupby(["CleanShot", "Shot"])["Acc_X_01"].count()

CleanShot  Shot    
BH         BH          1833
FH         FH          3498
OH         OH            70
           Overhead      38
Slice      BHS          101
           BH_Slice     247
           FHS           37
           FH_Slice     113
           Slice        742
Volley     Volley       348
Name: Acc_X_01, dtype: int64

## Creating Training, Test, Validation & Eval Training Sets

This data is sourced from a number of players of different abilities.  
To ensure that the bias in technique and ability are not included in the models, each group includes a mix of those players.

I create 3 groups here, a Model group, a Validation group, and an Eval group.  
As I will bring multiple models together, it is important to be able to compare results against a mid "test" set, which is the validation set, and the Eval is the ultimate holdout data set.

In [16]:
df3["SourceHL"] = np.where(df3.Source.isin(["WatchQ122", "WatchQ222"]), "Watch", "Xsens")
df3["SourceHL2"] = np.where(df3.Source.isin(["WatchQ122", "WatchQ222"]), 1, 0)

df3["ModGroup"] = np.where(df3.Who.isin(["JJO", "SH","MH", "KS", "FS", "TL", "KD", "Sammy", "PS", "RO", "Frank", "Leo", "LM", "LeoMueller"]), "Mod",
                          np.where(df3.Who.isin(["JT", "PP_byEar", "KL", "SV", "RE", "RJ", "Roman" ]), "Validation", 
                          np.where(df3.Who.isin(["Toby_byEar","SVG", "LY", "JD", "Jimmy", "Kevin", "CH" ]), "Eval","Unassigned" )))

#generate split views to ensure that the mix is roughly right across the groups
splits = df3.groupby(["ModGroup","CleanShot"])["Acc_X_01"].count().reset_index().rename(columns = {"Acc_X_01": "Freq"})
splits_tot = df3.groupby("ModGroup")["Acc_X_01"].count().reset_index().rename(columns = {"Acc_X_01": "Totals"})
splits2 = pd.merge(splits, splits_tot, on = "ModGroup", how = "left" )
splits2["Prop"] = splits2.Freq / splits2.Totals
splits2

Unnamed: 0,ModGroup,CleanShot,Freq,Totals,Prop
0,Eval,BH,253,820,0.308537
1,Eval,FH,391,820,0.476829
2,Eval,OH,37,820,0.045122
3,Eval,Slice,103,820,0.12561
4,Eval,Volley,36,820,0.043902
5,Mod,BH,1304,5276,0.247157
6,Mod,FH,2676,5276,0.507202
7,Mod,OH,47,5276,0.008908
8,Mod,Slice,991,5276,0.187832
9,Mod,Volley,258,5276,0.048901


Looking at CleanShot == FH (Forehand) group, we can see from the Prop column, this represents 48% of the eval data, 51% in the Mod group, and 46% in the validation group.  
Ensuring these splits are similar across groups means I can evaluate the performance within by shot and at the ModGroup level.

In [24]:
# To avoid the possiblity of overfit, the mod data test is split to mod & test
df3["ModTestSplit"] = np.where(df3.ModGroup == "Mod",
                               np.where(df3.Who.isin([ "MH", "LM", "Leo", "LeoMueller", "Frank", "TL"]), "modTest", "modMod"), df3.ModGroup)

#generate split views to ensure that the mix is roughly right across the groups
splits = df3[df3.ModGroup == "Mod"].groupby(["ModTestSplit","CleanShot"])["Acc_X_01"].count().reset_index().rename(columns = {"Acc_X_01": "Freq"})
splits_tot = df3[df3.ModGroup == "Mod"].groupby("ModTestSplit")["Acc_X_01"].count().reset_index().rename(columns = {"Acc_X_01": "Totals"})
splits2 = pd.merge(splits, splits_tot, on = "ModTestSplit", how = "left" )
splits2["Prop"] = splits2.Freq / splits2.Totals
splits2

Unnamed: 0,ModTestSplit,CleanShot,Freq,Totals,Prop
0,modMod,BH,912,3759,0.242618
1,modMod,FH,1837,3759,0.488694
2,modMod,OH,27,3759,0.007183
3,modMod,Slice,826,3759,0.219739
4,modMod,Volley,157,3759,0.041766
5,modTest,BH,392,1517,0.258405
6,modTest,FH,839,1517,0.553065
7,modTest,OH,20,1517,0.013184
8,modTest,Slice,165,1517,0.108767
9,modTest,Volley,101,1517,0.066579


In [18]:
#create labels for the data to predict against
a = {}
a["BH"] = 0
a["Slice"] = 1
a["FH"] = 2
a["Volley"] = 3
a["OH"] = 4

    
df3["Label_Num"] = df3.CleanShot.apply(lambda x: a[x])

In [94]:
def mod_shotid(dinput, eta,start1, fin1, start2, fin2, start3, fin3, start4, fin4, start5, fin5, start6, fin6 ):
    """This function trains a model with early stop rounds.
    Given we are working with continous data, we try to identify which part of the shot is predictive - hence the multiple starts and fins.
    The learning rate is also adjustable through the run once a good model is found.
    Function outputs the model, cross tab and prediction results for quick eval, and a validation eval data set"""
    
    alle_y=dinput[["Label_Num", "CleanShot","origindex","Who", "When","Shot", "PredShot", "Source", "ModTestSplit","SourceHL2"]]
    y = alle_y[alle_y.ModTestSplit.isin(["modMod", "modTest"])]
    valEval_y = alle_y[~alle_y.ModTestSplit.isin(["modMod", "modTest"])]
    #adjust in i
    X1 = dinput.iloc[:, start1:fin1]
    X2 = dinput.iloc[:, start2:fin2]
    X3 = dinput.iloc[:, start3:fin3]
    X4 = dinput.iloc[:, start4:fin4]
    X5 = dinput.iloc[:, start5:fin5]
    X6 = dinput.iloc[:, start6:fin6]
    X7 = dinput[["ModTestSplit"]]
    
    alle = pd.concat([X1, X2, X3, X4, X5, X6, X7], axis = 1)
    X = alle[alle.ModTestSplit.isin(["modMod", "modTest"])]
    valEval_X = alle[~alle.ModTestSplit.isin(["modMod", "modTest"])]
    valEval = pd.concat([valEval_y, valEval_X.drop("ModTestSplit", axis =1)], axis = 1)
#     print(X.head())
#     X_train, X_test, y_train1,y_test1=train_test_split(X,y, test_size=0.3, random_state=785)

    X_train = X[X.ModTestSplit == "modMod"].drop("ModTestSplit", axis = 1)
    X_test = X[X.ModTestSplit == "modTest"].drop("ModTestSplit", axis = 1)
#     print(X_train.head())
    y_train1 = y[y.ModTestSplit == "modMod"]
    y_test1 = y[y.ModTestSplit == "modTest"]
    y_test = y_test1[["Label_Num"]]
    y_train = y_train1[["Label_Num"]]

    dtrain = xgb.DMatrix(X_train,label=y_train)
    dtest = xgb.DMatrix(X_test,label=y_test)
    params={
            'max_depth':6,
        'min_child_weight': 4,
        'eta':eta,
        'subsample': 0.8,
        'colsample_bytree': 0.8,
    
        # Other parameters
        'eval_metric' : "merror",
        #note merrror choosen here over mlogloss as I want to focus on the best classification results rather than overall loss
        'objective':'multi:softprob',
        "num_class":5,
        'seed':123        
    }
    num_boost_round = 999
    mod=xgb.train(params,
                 dtrain,
                 num_boost_round=num_boost_round,
                 evals=[(dtest, "Test")],
                 early_stopping_rounds=30)
    
    
    results= y_test1[["CleanShot", "Label_Num","Who", "origindex"]].reset_index()

    probs = mod.predict(dtest)
    probs2 = pd.DataFrame({'A': probs[:, 0], 'B': probs[:, 1], 'C': probs[:, 2], 'D': probs[:, 3],
                          'E': probs[:, 4]})
    results=pd.concat([results, probs2],axis=1)

    # results["Correct"]= np.where( results.Label3 == results.preds, 1, 0)
    results['Max_Col'] = results.iloc[:,-5:].idxmax(axis=1)
    results['Max'] = results.iloc[:,-6:-1].max(axis=1)

    b = {}
    b["A"] = "BH"
    b["B"] = "Slice"
    b["C"] = "FH"
    b["D"] = "Volley"
    b["E"] = "OH"
#


    results["Preds"] = results.Max_Col.apply(lambda x: b[x])
    results["Correct"] = np.where( results.Preds == results.CleanShot, 1, 0)
    print("CROSS TAB VIEW")
    print(pd.crosstab(results.CleanShot,results.Preds))
    print()
    print("PREDS Accuracy")

    print(results.groupby("Preds")["Correct"].agg({"count","sum","mean"}))
    print("ACTUAL Accuracy")
    print(results.groupby("CleanShot")["Correct"].agg({"count","sum","mean"}))
    return mod, results, y_test1, dtest, X_test, valEval

def print_results(results):
    """Repeated from above, but created so can just show results and not the training"""
    print("CROSS TAB VIEW")
    print(pd.crosstab(results.CleanShot,results.Preds))
    print()
    print("PREDS Accuracy")

    print(results.groupby("Preds")["Correct"].agg({"count","sum","mean"}))
    print("ACTUAL Accuracy")
    print(results.groupby("CleanShot")["Correct"].agg({"count","sum","mean"}))

In [None]:
#Ball contact happens at 30 - and shape of shot is what is different mostly at beginning for main shots - therefore try first 40 readings
#Model not run to save output space - results are printed below
mod_0_40, res_0_40, y_test_0_40, dtest_0_40, X_test_0_40, valEval_0_40  = mod_shotid(df3, 0.05, 14, 54, 75, 115, 136, 176, 197, 237, 258, 298, 319, 359 )

In [79]:
print_results(res_0_40)

CROSS TAB VIEW
Preds       BH   FH  OH  Slice  Volley
CleanShot                             
BH         340   29   0     22       1
FH          31  708   0     95       5
OH           0    3   4     13       0
Slice       33   19   0    106       7
Volley       8    8   0     51      34

PREDS Accuracy
        sum  count      mean
Preds                       
BH      340    412  0.825243
FH      708    767  0.923077
OH        4      4  1.000000
Slice   106    287  0.369338
Volley   34     47  0.723404
ACTUAL Accuracy
           sum  count      mean
CleanShot                      
BH         340    392  0.867347
FH         708    839  0.843862
OH           4     20  0.200000
Slice      106    165  0.642424
Volley      34    101  0.336634


In [None]:
# Model accuracy is already pretty good - especially for main shots - FH and BH.  
# Experiment to see if different cuts yield different results

In [None]:
#Model not run to save output space - results are printed below
mod_10_40, res_10_40, y_test_10_40, dtest_10_40, X_test_10_40, valEval_10_40 = mod_shotid(df3, 0.05, 24, 54, 85, 115, 146, 176, 207, 237, 268, 298, 329, 359 )

In [81]:
print_results(res_10_40)

CROSS TAB VIEW
Preds       BH   FH  OH  Slice  Volley
CleanShot                             
BH         342   29   0     20       1
FH          33  713   1     85       7
OH           0    3   5     12       0
Slice       34   20   0    105       6
Volley      10    9   0     44      38

PREDS Accuracy
        sum  count      mean
Preds                       
BH      342    419  0.816229
FH      713    774  0.921189
OH        5      6  0.833333
Slice   105    266  0.394737
Volley   38     52  0.730769
ACTUAL Accuracy
           sum  count      mean
CleanShot                      
BH         342    392  0.872449
FH         713    839  0.849821
OH           5     20  0.250000
Slice      105    165  0.636364
Volley      38    101  0.376238


In [None]:
#marginal difference here but an improvement in FH and BH and less mix with slice

In [None]:
mod_20_40, res_20_40, y_test_20_40, dtest_20_40, X_test_20_40, valEval_20_40 = mod_shotid(df3, 0.05, 34, 54, 95, 115, 156, 176, 217, 237, 278, 298, 339, 359 )

In [92]:
print_results(res_20_40)

CROSS TAB VIEW
Preds       BH   FH  OH  Slice  Volley
CleanShot                             
BH         337   30   1     21       3
FH          29  729   0     73       8
OH           0    9   0     11       0
Slice       20   26   1    110       8
Volley       9    9   2     43      38

PREDS Accuracy
        sum  count      mean
Preds                       
BH      337    395  0.853165
FH      729    803  0.907846
OH        0      4  0.000000
Slice   110    258  0.426357
Volley   38     57  0.666667
ACTUAL Accuracy
           sum  count      mean
CleanShot                      
BH         337    392  0.859694
FH         729    839  0.868892
OH           0     20  0.000000
Slice      110    165  0.666667
Volley      38    101  0.376238


In [None]:
#an improvement in FH identification - this is the 2nd biggest shot so a focus on this is important

In [None]:
#experiment with learning rates to see if this gives a signficantly better reults
mod_20_40_2, res_20_40_2, y_test_20_40_2, dtest_20_40_2, X_test_20_40_2, valEval_20_40_2  = mod_shotid(df3, 0.01, 34, 54, 95, 115, 156, 176, 217, 237, 278, 298, 339, 359 )

In [86]:
print_results(res_20_40_2)

CROSS TAB VIEW
Preds       BH   FH  OH  Slice  Volley
CleanShot                             
BH         336   29   1     24       2
FH          35  720   0     74      10
OH           0    5   0     15       0
Slice       19   20   1    114      11
Volley      10    9   2     41      39

PREDS Accuracy
        sum  count      mean
Preds                       
BH      336    400  0.840000
FH      720    783  0.919540
OH        0      4  0.000000
Slice   114    268  0.425373
Volley   39     62  0.629032
ACTUAL Accuracy
           sum  count      mean
CleanShot                      
BH         336    392  0.857143
FH         720    839  0.858164
OH           0     20  0.000000
Slice      114    165  0.690909
Volley      39    101  0.386139


In [None]:
#experiments with lower learning rate on the same data show worse results.  
# Therefore we try another cut

In [None]:
#model not run to save on output space - results are shown below
mod_20_50, res_20_50, y_test_20_50, dtest_20_50, X_test_20_50, valEval_20_50 = mod_shotid(df3, 0.05, 34, 64, 95, 125, 156, 186, 217, 247, 278, 308, 339, 369 )

In [88]:
print_results(res_20_50)

CROSS TAB VIEW
Preds       BH   FH  OH  Slice  Volley
CleanShot                             
BH         337   32   0     21       2
FH          31  715   1     87       5
OH           0    3  11      6       0
Slice       26   28   0    105       6
Volley      10   10   0     49      32

PREDS Accuracy
        sum  count      mean
Preds                       
BH      337    404  0.834158
FH      715    788  0.907360
OH       11     12  0.916667
Slice   105    268  0.391791
Volley   32     45  0.711111
ACTUAL Accuracy
           sum  count      mean
CleanShot                      
BH         337    392  0.859694
FH         715    839  0.852205
OH          11     20  0.550000
Slice      105    165  0.636364
Volley      32    101  0.316832


### Evaluation of Models vs Test Sets
The best model is uses the 20_40 data cut.  It has goot accuracy for the Forehand which is the most important shot when in a rally situation.

Within this model however, there is some confusion between Forehands and Backhands, which are the 2 biggest shots as the data frequency shows.

So now we bring in a prior model, which is focused on identifying Backhands to see if we can combine them to improve the accuracy.

In [93]:

#change df here to be shots_wide2 - also add in timetruestrike or how its defined
eval_fin = df3[["Who", "TimeTrueStrike", "origindex"]]

def gen_results_old(results, Name, dic):
        # results["Correct"]= np.where( results.Label3 == results.preds, 1, 0)
    eval_fin[f"{Name}_prob"] = results.max(axis=1)
    results['Max_Col'] = results.idxmax(axis=1)
    
    eval_fin[f"{Name}_pred"] = results.Max_Col.apply(lambda x: dic[x])
    return eval_fin

dic2 = {}
dic2["A"] = "BH"
dic2["B"] = "Slice"
dic2["C"] = "FH"
dic2["D"] = "Volley"
dic2["E"] = "OH"

# create the data so can predict on it
    #data is fine as is - just drop the extra cols at start
BHfocus_mod_cols = ['Acc_X_00', 'Acc_X_21', 'Acc_X_22', 'Acc_X_23', 'Acc_X_24', 'Acc_X_25', 'Acc_X_26', 'Acc_X_27',
 'Acc_X_28', 'Acc_X_29', 'Acc_X_30', 'Acc_X_31', 'Acc_X_32', 'Acc_X_33', 'Acc_X_34', 'Acc_X_35',
 'Acc_X_36', 'Acc_X_37', 'Acc_X_38', 'Acc_X_39', 'Acc_X_40', 'Acc_X_41', 'Acc_Y_21', 'Acc_Y_22',
 'Acc_Y_23', 'Acc_Y_24', 'Acc_Y_25', 'Acc_Y_26', 'Acc_Y_27', 'Acc_Y_28', 'Acc_Y_29', 'Acc_Y_30',
 'Acc_Y_31', 'Acc_Y_32', 'Acc_Y_33', 'Acc_Y_34', 'Acc_Y_35', 'Acc_Y_36', 'Acc_Y_37', 'Acc_Y_38',
 'Acc_Y_39', 'Acc_Y_40', 'Acc_Y_41', 'Acc_Z_21', 'Acc_Z_22', 'Acc_Z_23', 'Acc_Z_24', 'Acc_Z_25',
 'Acc_Z_26', 'Acc_Z_27', 'Acc_Z_28', 'Acc_Z_29', 'Acc_Z_30', 'Acc_Z_31', 'Acc_Z_32', 'Acc_Z_33',
 'Acc_Z_34', 'Acc_Z_35', 'Acc_Z_36', 'Acc_Z_37', 'Acc_Z_38', 'Acc_Z_39', 'Acc_Z_40', 'Acc_Z_41',
 'Gyr_X_21', 'Gyr_X_22', 'Gyr_X_23', 'Gyr_X_24', 'Gyr_X_25', 'Gyr_X_26', 'Gyr_X_27', 'Gyr_X_28',
 'Gyr_X_29', 'Gyr_X_30', 'Gyr_X_31', 'Gyr_X_32', 'Gyr_X_33', 'Gyr_X_34', 'Gyr_X_35', 'Gyr_X_36',
 'Gyr_X_37', 'Gyr_X_38', 'Gyr_X_39', 'Gyr_X_40', 'Gyr_X_41', 'Gyr_Y_21', 'Gyr_Y_22', 'Gyr_Y_23',
 'Gyr_Y_24', 'Gyr_Y_25', 'Gyr_Y_26', 'Gyr_Y_27', 'Gyr_Y_28', 'Gyr_Y_29', 'Gyr_Y_30', 'Gyr_Y_31',
 'Gyr_Y_32', 'Gyr_Y_33', 'Gyr_Y_34', 'Gyr_Y_35', 'Gyr_Y_36', 'Gyr_Y_37', 'Gyr_Y_38', 'Gyr_Y_39',
 'Gyr_Y_40', 'Gyr_Y_41', 'Gyr_Z_21', 'Gyr_Z_22', 'Gyr_Z_23', 'Gyr_Z_24', 'Gyr_Z_25', 'Gyr_Z_26',
 'Gyr_Z_27', 'Gyr_Z_28', 'Gyr_Z_29', 'Gyr_Z_30', 'Gyr_Z_31', 'Gyr_Z_32', 'Gyr_Z_33', 'Gyr_Z_34',
 'Gyr_Z_35', 'Gyr_Z_36', 'Gyr_Z_37', 'Gyr_Z_38', 'Gyr_Z_39', 'Gyr_Z_40', 'Gyr_Z_41']

mod_BH2040Focus = pickle.load(open("D:/OneDrive/DataSci/Tennis/03_Modelling/Models/ShotId_BH2040Focus_220126.pkl",'rb'))

probs = mod_BH2040Focus.predict(xgb.DMatrix(df3[BHfocus_mod_cols]))
eval_fin["BH_2040Focus_probs"] = probs
eval_fin["BH_2040Focus_preds"] = np.where(eval_fin.BH_2040Focus_probs>=0.2, "BH", "Other")

In [39]:
#combine the data so have all model predictions in one place

df4 = pd.merge(df3, eval_fin[["BH_2040Focus_probs","BH_2040Focus_preds","origindex"]], 
                             how = "left", on = "origindex")

#show the quality of the prediction via confusion matrix
pd.crosstab(df4.CleanShot, df4.BH_2040Focus_preds)

BH_2040Focus_preds,BH,Other
CleanShot,Unnamed: 1_level_1,Unnamed: 2_level_1
BH,1734,99
FH,108,3390
OH,0,108
Slice,174,1066
Volley,56,292


In [40]:
# on basis of this, build a prediction of if BH from focus, else the rest
res_20_40_2 = pd.merge(res_20_40, df4[["BH_2040Focus_probs","origindex"]], how = "left", on = "origindex")
res_20_40_2["BH_FHPred"] = np.where(res_20_40_2.BH_2040Focus_probs > 0.2, "BH", res_20_40_2.Preds)
res_20_40_2["NewCombo_Correct"] = np.where(res_20_40_2.BH_FHPred == res_20_40_2.CleanShot, 1,0)
pd.crosstab(res_20_40_2.CleanShot, res_20_40_2.BH_FHPred)

BH_FHPred,BH,FH,OH,Slice,Volley
CleanShot,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BH,379,8,0,4,1
FH,38,725,0,68,8
OH,0,9,0,11,0
Slice,34,26,1,96,8
Volley,16,9,2,37,37


In [41]:
res_20_40_2.groupby("CleanShot")["NewCombo_Correct"].agg({"count", "sum", "mean"}).reset_index()[["CleanShot", "count", "sum", "mean"]]

Unnamed: 0,CleanShot,count,sum,mean
0,BH,392,379,0.966837
1,FH,839,725,0.864124
2,OH,20,0,0.0
3,Slice,165,96,0.581818
4,Volley,101,37,0.366337


In [42]:
res_20_40_2.groupby("BH_FHPred")["NewCombo_Correct"].agg({"count", "sum", "mean"}).reset_index()[["BH_FHPred", "count", "sum", "mean"]]

Unnamed: 0,BH_FHPred,count,sum,mean
0,BH,467,379,0.811563
1,FH,777,725,0.933076
2,OH,3,0,0.0
3,Slice,216,96,0.444444
4,Volley,54,37,0.685185


### Eval of stacked model
The combination of these models appears to have a positive impact on the Forehand and Backhand accuracy with less confusion between the 2.  

We now compare these to the Validation data set to see whether our hypothesis is correct.

In [67]:
#run predictions of new model on val_eval data
probs = mod_20_40.predict(xgb.DMatrix(valEval_20_40.iloc[:, 10:]))
probs2 = pd.DataFrame({'A': probs[:, 0], 'B': probs[:, 1], 'C': probs[:, 2], 'D': probs[:, 3],
                      'E': probs[:, 4]})
valEval_20_40_2 = valEval_20_40.reset_index(drop=True)

v_results=pd.concat([valEval_20_40_2, probs2],axis=1)

v_results['Max_Col'] = v_results.iloc[:,-5:].idxmax(axis=1)
v_results['Max'] = v_results.iloc[:,-6:-1].max(axis=1)

b = {}
b["A"] = "BH"
b["B"] = "Slice"
b["C"] = "FH"
b["D"] = "Volley"
b["E"] = "OH"

v_results["Preds"] = v_results.Max_Col.apply(lambda x: b[x])

#merge Backhand model to val_eval
v_results2 = pd.merge(v_results, df4[["origindex", "BH_2040Focus_probs"]], how ="left", on = "origindex")
#create combined prediction column
v_results2["BHFocus_Rest_Pred"] = np.where(v_results2.BH_2040Focus_probs > 0.2, "BH", v_results2.Preds)

In [71]:
def eval_mods(d_in, predCol):
    """Take in the data set, compare predCol to CleanShot
    Generate cross tabs & accuracy metrics for validation & Eval data sets"""
    d_in["Correct"] = np.where(d_in[predCol] == d_in.CleanShot,1,0)
    
    v_results = d_in[d_in.ModTestSplit == "Validation"]
    e_results = d_in[d_in.ModTestSplit == "Eval"]
    print("Results of VALIDATION Set")
    print(v_results.groupby("CleanShot")["Correct"].agg({"count","sum","mean"}))
    print(v_results.groupby(predCol)["Correct"].agg({"count","sum","mean"}))
    print(pd.crosstab(v_results.CleanShot, v_results[predCol]))
    print()
    print()
    print("Results of Eval Set")
    print(e_results.groupby("CleanShot")["Correct"].agg({"count","sum","mean"}))
    print(e_results.groupby(predCol)["Correct"].agg({"count","sum","mean"}))
    print(pd.crosstab(e_results.CleanShot, e_results[predCol]))
    print()
    

In [76]:
def eval_mods_val(d_in, predCol_1, predCol_2):
    """Take in the data set, compare different predictions to CleanShot
    Generate cross tabs & accuracy metrics for validation to choose the model"""
    d_in["Correct_1"] = np.where(d_in[predCol_1] == d_in.CleanShot,1,0)
    
    d_in["Correct_2"] = np.where(d_in[predCol_2] == d_in.CleanShot,1,0)
    
    print(f"Results of VALIDATION Set with {predCol_1}")
    print(d_in.groupby("CleanShot")["Correct_1"].agg({"count","sum","mean"}))
    print(d_in.groupby(predCol_1)["Correct_1"].agg({"count","sum","mean"}))
    print(pd.crosstab(d_in.CleanShot, d_in[predCol_1]))
    print()
    print()
    print(f"Results of VALIDATION Set with {predCol_2}")
    print(d_in.groupby("CleanShot")["Correct_2"].agg({"count","sum","mean"}))
    print(d_in.groupby(predCol_2)["Correct_2"].agg({"count","sum","mean"}))
    print(pd.crosstab(d_in.CleanShot, d_in[predCol_2]))
    print()
    

In [75]:
eval_mods_val(v_results2[v_results2.ModTestSplit == "Validation"], "BHFocus_Rest_Pred", "Preds" )

Results of VALIDATION Set with BHFocus_Rest_Pred
           sum  count      mean
CleanShot                      
BH         266    276  0.963768
FH         393    431  0.911833
OH           3     24  0.125000
Slice      112    146  0.767123
Volley      21     54  0.388889
                   sum  count      mean
BHFocus_Rest_Pred                      
BH                 266    294  0.904762
FH                 393    413  0.951574
OH                   3     10  0.300000
Slice              112    184  0.608696
Volley              21     30  0.700000
BHFocus_Rest_Pred   BH   FH  OH  Slice  Volley
CleanShot                                     
BH                 266    1   0      7       2
FH                   1  393   5     31       1
OH                   0   11   3      8       2
Slice               23    5   2    112       4
Volley               4    3   0     26      21


Results of VALIDATION Set with Preds
           sum  count      mean
CleanShot                      
BH         254 

### Stacked Model gives an accuracy bump
Comparison of the validation set vs the different models shows that the stacked models give a cleaner view on what is a backhand, and helps delivery prediction accuracy of over 90% for both forehand and backhand.  
Whilst the Backhand overpredicts a little on slice vs the Preds model alone, the prediction accuracy is very comparable and given the higher frequency of backhands vs slice played, this is an acceptable trade off.  
Forehands have an excellent identification and the prediction quality above 95% is very good.

In [72]:
eval_mods(v_results2,"BHFocus_Rest_Pred")

Results of VALIDATION Set
           sum  count      mean
CleanShot                      
BH         266    276  0.963768
FH         393    431  0.911833
OH           3     24  0.125000
Slice      112    146  0.767123
Volley      21     54  0.388889
                   sum  count      mean
BHFocus_Rest_Pred                      
BH                 266    294  0.904762
FH                 393    413  0.951574
OH                   3     10  0.300000
Slice              112    184  0.608696
Volley              21     30  0.700000
BHFocus_Rest_Pred   BH   FH  OH  Slice  Volley
CleanShot                                     
BH                 266    1   0      7       2
FH                   1  393   5     31       1
OH                   0   11   3      8       2
Slice               23    5   2    112       4
Volley               4    3   0     26      21


Results of Eval Set
           sum  count      mean
CleanShot                      
BH         251    253  0.992095
FH         318    391  

### Evaluation shows still some work to do
The hypothesis of the stacked model was built in the Test data and refined against the Validation data.  
The Eval data set is the ultimate holdout to ensure there isn't overfitting.  
The results show that accuracy is still very good but slice is an issue and causing confusion.  
That would be the next step : to find a model that focuses on FH and ensures that it is not incorrectly categorised as slice