In [1]:
#imports
import pandas as pd
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)

import numpy as np
import matplotlib.pyplot as plt

In [2]:
#import base dataset created for the study
path = "base_dataset.csv"
pp_final_df = pd.read_csv(path)

#### Methodology:
* In the previous step, we identified and calculated metrics that will be used to help identify the best Powerplay players using events data from 2021 NWHL season.
* The metrics used for this are:
    * Goals Scored, Passes, Zone Entries, Shot and Pass Accuracy, Player Pass and Goal Contribution to the team, Player Zone Entry Contribution to the team and Assists.

* Additionally, the final list of players must also cover the various skills required to succeed in power plays:
    * Passing
    * Shooting - Long Shots and Short Shots
    * Dribbling
 
**Proposed Approach: Utilized a weighted approach to identify players who ranked the highest based on the metrics determined. This involved adding weights to each metric, to reflect its importance in determining a good power play player.**

## I. Normalize the features we will use for optimization
* The current metrics cannot directly be compared to each other.
    * Metrics like Goals and Passes are arbitary numbers, whereas metrics like Shot and Pass Accuracy are percentages whose values lie between 0 and 100. 
    * If we directly compare these values, the metrics with higher range of values will be given more importance.

In [3]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

pp_final_df = pp_final_df.fillna(0)

#create a df of fetures we want to normalize
opt_features = pp_final_df[['Goal_player','Play_player','Shot_player','Zone Entry_player','shot_accuracy','pass_accuracy','Player_Goal_Contribution','Player_Pass_Contribution','Player_Zone_Entry_Contribution','Assists']]
data_scaled = scaler.fit_transform(opt_features)
pp_final_df2 = pd.DataFrame(data_scaled, columns = ['n_Goal_player','n_Play_player','n_Shot_player','n_Zone Entry_player','n_shot_accuracy','n_pass_accuracy','n_Player_Goal_Contribution','n_Player_Pass_Contribution','n_Player_Zone_Entry_Contribution','n_Assists'])

#merge the normalized values with the original df [here columns beginning with n_ show normalized statistics]
pp_final_df3 = pd.merge(pp_final_df,pp_final_df2, left_index = True, right_index = True)
pp_final_df3.head()

Unnamed: 0.1,Unnamed: 0,Player,Goal_player,Play_player,Incomplete Play_player,Shot_player,Zone Entry_player,Team,shot_accuracy,pass_accuracy,Player_Goal_Contribution,Player_Pass_Contribution,Player_Zone_Entry_Contribution,Assists,shooter_type,n_Goal_player,n_Play_player,n_Shot_player,n_Zone Entry_player,n_shot_accuracy,n_pass_accuracy,n_Player_Goal_Contribution,n_Player_Pass_Contribution,n_Player_Zone_Entry_Contribution,n_Assists
0,0,Abbie Ives,0.0,4.0,1.0,0.0,0.0,Connecticut Whale,0.0,80.0,0.0,2.259887,0.0,0.0,0,0.0,0.033333,0.0,0.0,0.0,0.8,0.0,0.124922,0.0,0.0
1,1,Allie Thunstrom,0.0,15.0,2.0,4.0,3.0,Minnesota Whitecaps,0.0,88.235294,0.0,6.198347,10.0,0.0,Long,0.0,0.125,0.210526,0.157895,0.0,0.882353,0.0,0.342631,0.289474,0.0
2,2,Alyson Matteau,0.0,19.0,7.0,11.0,7.0,Buffalo Beauts,0.0,73.076923,0.0,6.859206,14.0,1.0,Short,0.0,0.158333,0.578947,0.368421,0.0,0.730769,0.0,0.379162,0.405263,0.5
3,3,Alyssa Wohlfeiler,0.0,31.0,2.0,2.0,3.0,Connecticut Whale,0.0,93.939394,0.0,17.514124,12.0,1.0,Long,0.0,0.258333,0.105263,0.157895,0.0,0.939394,0.0,0.968142,0.347368,0.5
4,4,Amanda Conway,0.0,6.0,1.0,0.0,1.0,Connecticut Whale,0.0,85.714286,0.0,3.389831,4.0,0.0,0,0.0,0.05,0.0,0.052632,0.0,0.857143,0.0,0.187382,0.115789,0.0


In [4]:
pp_final_df3.describe()

Unnamed: 0.1,Unnamed: 0,Goal_player,Play_player,Incomplete Play_player,Shot_player,Zone Entry_player,shot_accuracy,pass_accuracy,Player_Goal_Contribution,Player_Pass_Contribution,Player_Zone_Entry_Contribution,Assists,n_Goal_player,n_Play_player,n_Shot_player,n_Zone Entry_player,n_shot_accuracy,n_pass_accuracy,n_Player_Goal_Contribution,n_Player_Pass_Contribution,n_Player_Zone_Entry_Contribution,n_Assists
count,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0,102.0
mean,50.5,0.215686,19.872549,3.568627,4.088235,2.539216,3.866646,84.635508,5.882353,5.882353,5.882353,0.156863,0.107843,0.165605,0.21517,0.133643,0.038666,0.846355,0.058824,0.325163,0.170279,0.078431
std,29.588849,0.47984,19.75477,3.777237,4.834085,3.346852,12.187929,16.438829,16.693388,4.581074,6.825266,0.416137,0.23992,0.164623,0.254426,0.17615,0.121879,0.164388,0.166934,0.253232,0.197573,0.208069
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,25.25,0.0,6.0,0.25,0.0,0.0,0.0,77.898551,0.0,1.590395,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.778986,0.0,0.087914,0.0,0.0
50%,50.5,0.0,15.5,3.0,3.0,1.0,0.0,86.164274,0.0,5.824859,3.896104,0.0,0.0,0.129167,0.157895,0.052632,0.0,0.861643,0.0,0.321985,0.112782,0.0
75%,75.75,0.0,25.75,6.0,5.0,4.0,0.0,95.310559,0.0,8.858396,9.090909,0.0,0.0,0.214583,0.263158,0.210526,0.0,0.953106,0.0,0.489672,0.263158,0.0
max,101.0,2.0,120.0,21.0,19.0,19.0,100.0,100.0,100.0,18.090452,34.545455,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


**With the normalized dataset, all the key features are now between 0 & 1. We can compare the statistics now**

In [6]:
pp_final_opt_df = pp_final_df3[['Player','Team','shooter_type','n_Goal_player','n_Play_player','n_Shot_player','n_Zone Entry_player','n_shot_accuracy','n_pass_accuracy','n_Player_Goal_Contribution','n_Player_Pass_Contribution','n_Player_Zone_Entry_Contribution','n_Assists']]
pp_final_opt_df.head()

Unnamed: 0,Player,Team,shooter_type,n_Goal_player,n_Play_player,n_Shot_player,n_Zone Entry_player,n_shot_accuracy,n_pass_accuracy,n_Player_Goal_Contribution,n_Player_Pass_Contribution,n_Player_Zone_Entry_Contribution,n_Assists
0,Abbie Ives,Connecticut Whale,0,0.0,0.033333,0.0,0.0,0.0,0.8,0.0,0.124922,0.0,0.0
1,Allie Thunstrom,Minnesota Whitecaps,Long,0.0,0.125,0.210526,0.157895,0.0,0.882353,0.0,0.342631,0.289474,0.0
2,Alyson Matteau,Buffalo Beauts,Short,0.0,0.158333,0.578947,0.368421,0.0,0.730769,0.0,0.379162,0.405263,0.5
3,Alyssa Wohlfeiler,Connecticut Whale,Long,0.0,0.258333,0.105263,0.157895,0.0,0.939394,0.0,0.968142,0.347368,0.5
4,Amanda Conway,Connecticut Whale,0,0.0,0.05,0.0,0.052632,0.0,0.857143,0.0,0.187382,0.115789,0.0


In [7]:
pp_final_opt_df.to_csv('normalized_dataset.csv')

## II.I Add all metrics (unweighted approach)

In [8]:
pp_final_opt_df['overall_score'] = pp_final_opt_df['n_Goal_player'] + pp_final_opt_df['n_Play_player'] + pp_final_opt_df['n_Shot_player'] + pp_final_opt_df['n_Zone Entry_player'] + pp_final_opt_df['n_shot_accuracy'] + pp_final_opt_df['n_pass_accuracy'] + pp_final_opt_df['n_Player_Goal_Contribution'] + pp_final_opt_df['n_Player_Pass_Contribution'] + pp_final_opt_df['n_Player_Zone_Entry_Contribution'] + pp_final_opt_df['n_Assists']
pp_final_opt_df = pp_final_opt_df.sort_values(by = 'overall_score', ascending = False)
pp_final_opt_df.head(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pp_final_opt_df['overall_score'] = pp_final_opt_df['n_Goal_player'] + pp_final_opt_df['n_Play_player'] + pp_final_opt_df['n_Shot_player'] + pp_final_opt_df['n_Zone Entry_player'] + pp_final_opt_df['n_shot_accuracy'] + pp_final_opt_df['n_pass_accuracy'] + pp_final_opt_df['n_Player_Goal_Contribution'] + pp_final_opt_df['n_Player_Pass_Contribution'] + pp_final_opt_df['n_Player_Zone_Entry_Contribution'] + pp_final_opt_df['n_Assists']


Unnamed: 0,Player,Team,shooter_type,n_Goal_player,n_Play_player,n_Shot_player,n_Zone Entry_player,n_shot_accuracy,n_pass_accuracy,n_Player_Goal_Contribution,n_Player_Pass_Contribution,n_Player_Zone_Entry_Contribution,n_Assists,overall_score
70,Mikyla Grant-Mentis,Toronto Six,Long,0.5,0.4,0.684211,1.0,0.071429,0.888889,0.2,0.58963,1.0,0.5,5.834158
63,McKenna Brand,Boston Pride,Long,0.5,0.7,0.894737,0.526316,0.055556,0.913043,0.125,0.680841,0.37594,0.5,5.271432
78,Samantha Davis,Boston Pride,Long,0.5,0.408333,0.578947,0.894737,0.083333,0.859649,0.125,0.397157,0.639098,0.0,4.486255
33,Jillian Dempsey,Boston Pride,Long,1.0,0.283333,0.315789,0.157895,0.25,0.829268,0.25,0.275578,0.112782,1.0,4.474646
45,Lauren Kelly,Boston Pride,Short,1.0,0.475,0.947368,0.157895,0.1,0.838235,0.25,0.461999,0.112782,0.0,4.343279


### Top Players Identified according to Method 1:

1. Mikyla Grant-Mentis
2. McKenna Brand
3. Samantha Davis
4. Jillian Dempsey	
5. Lauren Kelly

## II.I Weighted Linear Score

* This method allows us to add weights to each metric being used.
* This makes the solution more flexible for different teams:
    * A team looking for a good passer can increase the weightage given to metrics like passes completed, pass accuracy and player pass contribution.

In [9]:
pp_final_ml_df = pp_final_opt_df.copy()
pp_final_ml_df['overall_score'] = pp_final_opt_df['n_Goal_player'] + 0.1*pp_final_opt_df['n_Play_player'] + 0.1*pp_final_opt_df['n_Shot_player'] + 0.5*pp_final_opt_df['n_Zone Entry_player'] + 0.5*pp_final_opt_df['n_shot_accuracy'] + 0.5*pp_final_opt_df['n_pass_accuracy'] + 0.2*pp_final_opt_df['n_Player_Goal_Contribution'] + 0.2*pp_final_opt_df['n_Player_Pass_Contribution'] + 0.1*pp_final_opt_df['n_Player_Zone_Entry_Contribution'] + 0.2*pp_final_opt_df['n_Assists']
pp_final_ml_df = pp_final_ml_df.sort_values(by = 'overall_score', ascending = False)
pp_final_ml_df.head(5)

Unnamed: 0,Player,Team,shooter_type,n_Goal_player,n_Play_player,n_Shot_player,n_Zone Entry_player,n_shot_accuracy,n_pass_accuracy,n_Player_Goal_Contribution,n_Player_Pass_Contribution,n_Player_Zone_Entry_Contribution,n_Assists,overall_score
33,Jillian Dempsey,Boston Pride,Long,1.0,0.283333,0.315789,0.157895,0.25,0.829268,0.25,0.275578,0.112782,1.0,1.994888
70,Mikyla Grant-Mentis,Toronto Six,Long,0.5,0.4,0.684211,1.0,0.071429,0.888889,0.2,0.58963,1.0,0.5,1.946506
73,Nina Rodgers,Minnesota Whitecaps,Long,1.0,0.166667,0.315789,0.0,0.25,0.833333,0.4,0.456841,0.0,0.5,1.861281
45,Lauren Kelly,Boston Pride,Short,1.0,0.475,0.947368,0.157895,0.1,0.838235,0.25,0.461999,0.112782,0.0,1.84398
63,McKenna Brand,Boston Pride,Long,0.5,0.7,0.894737,0.526316,0.055556,0.913043,0.125,0.680841,0.37594,0.5,1.705693


### Top Players Identified according to Method 2:

1. Jillian Dempsey
2. Mikyla Grant-Mentis
3. Nina Rodgers
4. Lauren Kelly
5. McKenna Brand