**3. Calculating metrics for passes**

The following tasks are taken into account in this notebook:

1. Cluster defender lineups based on the number of defenders in the formation. We classify these as four defenders at the back and three/five defenders at the back.

2. Compute multiple passing based attributes for defenders for each match using match lineup data (from **match+def_lineup+footedness_ver2_top5.pkl**) and events data (from **events_com.pkl**)

3. Verify if all the players have been assigned their associated metrics. Raise any discrepencies that are found in the data. 

The following are the resulting pickle files:

1. Cluster wise files with passing attributes for each defender for each match




In [54]:
import pandas as pd
import numpy as np
from unidecode import unidecode
from tqdm import tqdm
import re
from difflib import SequenceMatcher
pd.set_option("display.max_rows", 1000)
pd.set_option("display.max_columns",1000)

**Loading pickle file with Top 5 Leagues 2017-18 events data (along with player roles i.e. whether the player is a goalkeeper (GKP), defender (DEF), midfielder (MID) or forward (FWD))**

In [55]:
df_events_roles = pd.read_pickle("../data_top5/events/events_com.pkl")

**Loading the pickle file with defence lineup information for each team participating in a particular match.**

In [56]:
df_defence_footed = pd.read_pickle("../data_top5/matches/match+def_lineup+footedness_ver2_top5.pkl")

In [57]:
df_defence_footed.head()

Unnamed: 0,wyId,team,team_defense,RB,R-CB,L-CB,LB,RCB,CB,LCB,RWB,LWB,backline,match,gameweek,teamsData,dateutc,venue,referees,score,footedness
0,2499719,Arsenal,"[RobHolding, IgnacioMonrealEraso, SeadKolasinac]",,,,,RobHolding,IgnacioMonrealEraso,SeadKolasinac,,,3.0,Arsenal-Leicester City,1,"{'1609': {'scoreET': 0, 'coachId': 7845, 'side...",2017-08-11 18:45:00,Emirates Stadium,"[{'refereeId': 385909, 'role': 'referee'}, {'r...",4–3,right-left-left
1,2499719,Leicester City,"[DannySimpson, WesMorgan, HarryMaguire, Christ...",DannySimpson,WesMorgan,HarryMaguire,ChristianFuchs,,,,,,4.0,Arsenal-Leicester City,1,"{'1609': {'scoreET': 0, 'coachId': 7845, 'side...",2017-08-11 18:45:00,Emirates Stadium,"[{'refereeId': 385909, 'role': 'referee'}, {'r...",4–3,right-right-right-left
2,2499720,Brighton,"[BrunoSaltorGrau, LewisDunk, ShaneDuffy, Marku...",BrunoSaltorGrau,LewisDunk,ShaneDuffy,MarkusSuttner,,,,,,4.0,Brighton-Manchester City,1,"{'1651': {'scoreET': 0, 'coachId': 8093, 'side...",2017-08-12 16:30:00,The American Express Community Stadium,"[{'refereeId': 384965, 'role': 'referee'}, {'r...",0–2,right-right-right-left
3,2499720,Manchester City,"[VincentKompany, JohnStones, NicolasHernanOtam...",,,,,VincentKompany,JohnStones,NicolasHernanOtamendi,,,3.0,Brighton-Manchester City,1,"{'1651': {'scoreET': 0, 'coachId': 8093, 'side...",2017-08-12 16:30:00,The American Express Community Stadium,"[{'refereeId': 384965, 'role': 'referee'}, {'r...",0–2,right-right-right
4,2499721,Burnley,"[MatthewLowton, JamesTarkowski, BenMee, Stephe...",MatthewLowton,JamesTarkowski,BenMee,StephenWard,,,,,,4.0,Chelsea-Burnley,1,"{'1646': {'scoreET': 0, 'coachId': 8880, 'side...",2017-08-12 14:00:00,Stamford Bridge,"[{'refereeId': 378951, 'role': 'referee'}, {'r...",2–3,right-right-left-left


**Observing the unique footedness categories in the dataframe**

In [58]:
footedness_patterns = df_defence_footed["footedness"].unique()

**Renaming certain positional columns for better understanding**

In [59]:
df_defence_footed.rename(columns={'R-CB':'R_CB',"L-CB":'L_CB'},inplace=True)

**Filtering out pass data for defenders and finding league wise total passes and total accurate passes for defenders**

In [60]:
df_events_pass = df_events_roles.loc[df_events_roles['eventName'].str.contains('Pass')].loc[df_events_roles['role']=='DEF']

In [61]:
league_pass_info = dict()
league_pass_info['totalpasses'] = len(df_events_pass)

In [62]:
league_pass_info['totalaccuratepasses']=len(df_events_pass[df_events_pass['tags'].apply(lambda x: "Accurate" in x)])

In [63]:
league_pass_info

{'totalpasses': 660055, 'totalaccuratepasses': 552506}

**Creating seperate dataframes for four defenders and three/five defenders in the lineup**

In [74]:
df_four_defs = df_defence_footed[df_defence_footed['backline']==4]
# df_three_five_defs = df_defence_footed[df_defence_footed['backline'].isin([3,5])]
df_three_defs = df_defence_footed[df_defence_footed['backline']==3]
df_five_defs = df_defence_footed[df_defence_footed['backline']==5]

In [76]:
df_defs_atb = [df_four_defs,df_three_defs,df_five_defs]

**Creating a metrics collection function that takes in x (match_id) and y (player name) and returns the following metrics-**

**numpasses** - number of passes made by the player in the queried match

**numaccpasses** - number of accurate passes made by the player in the queried match

**numhighpasses** - number of high (aerial) passes made by the player in the queried match

**numhighaccpasses** - number of high (aerial) accurate passes made by the player in the queried match

**accpasslocs** - starting and ending coordinates of all the accurate passes made by the player in the queried match

**inaccpasslocs** - starting and ending coordinates of all the inaccurate passes made by the player in the queried match

**acchighpasslocs** - starting and ending coordinates of all the accurate high passes made by the player in the queried match

**inacchighpasslocs** - starting and ending coordinates of all the inaccurate high passes made by the player in the queried match

In [77]:
def getmetrics(x, y):
    split_y = re.findall('[A-Z][^A-Z]*', y)
    try:
        pass_df = df_events_pass.loc[
            (df_events_pass['playerName'].str.contains(split_y[-1]))
            & (df_events_pass['playerName'].str.contains(split_y[-2])) &
            (df_events_pass['playerName'].str.contains(split_y[-3])) &
            (df_events_pass['matchId'] == int(x))]
    except:
        try:
            pass_df = df_events_pass.loc[
                (df_events_pass['playerName'].str.contains(split_y[-1]))
                & (df_events_pass['playerName'].str.contains(split_y[-2])) &
                (df_events_pass['matchId'] == int(x))]
        except:
            pass_df = df_events_pass.loc[
                (df_events_pass['playerName'].str.contains(split_y[-1]))
                & (df_events_pass['matchId'] == int(x))]
    numpasses = len(pass_df)
    numaccpasses = len(
        pass_df.loc[pass_df['tags'].apply(lambda a: "Accurate" in a)])
    numhighpasses = len(pass_df.loc[pass_df['subEventName'] == 'High pass'])
    numhighaccpasses = len(
        pass_df.loc[(pass_df['subEventName'] == 'High pass')
                    & (pass_df['tags'].apply(lambda a: "Accurate" in a))])
    accpasslocs = pass_df.loc[pass_df['tags'].apply(
        lambda a: "Accurate" in a)]['positions'].tolist()
    inaccpasslocs = pass_df.loc[pass_df['tags'].apply(
        lambda a: "Not accurate" in a)]['positions'].tolist()
    acchighpasslocs = pass_df.loc[(pass_df['subEventName'] == 'High pass') & (
        pass_df['tags'].apply(lambda a: "Accurate" in a))]['positions'].tolist(
        )
    inacchighpasslocs = pass_df.loc[
        (pass_df['subEventName'] == 'High pass')
        & (pass_df['tags'].apply(lambda a: "Not accurate" in a)
           )]['positions'].tolist()
    return [
        numpasses, numaccpasses, numhighpasses, numhighaccpasses, accpasslocs,
        inaccpasslocs, acchighpasslocs, inacchighpasslocs
    ]

In [78]:
getmetrics(2500081,"Bruno")

[29,
 23,
 5,
 3,
 [[[30.16, 5.44], [26.0, 15.64]],
  [[33.28, 12.92], [29.12, 29.92]],
  [[32.24, 4.76], [37.44, 14.96]],
  [[75.92, 4.76], [83.2, 4.76]],
  [[99.84, 12.92], [91.52, 48.96]],
  [[69.68, 11.56], [78.0, 10.88]],
  [[32.24, 4.76], [36.4, 17.0]],
  [[78.0, 6.12], [71.76, 8.16]],
  [[32.24, 23.12], [26.0, 29.92]],
  [[47.84, 27.88], [28.08, 34.68]],
  [[4.16, 6.12], [16.64, 4.08]],
  [[71.76, 6.12], [74.88, 9.52]],
  [[39.52, 16.32], [23.92, 32.64]],
  [[43.68, 12.92], [30.16, 34.0]],
  [[35.36, 10.88], [29.12, 27.88]],
  [[46.8, 28.56], [93.6, 51.68]],
  [[9.36, 18.36], [10.4, 23.12]],
  [[63.44, 7.48], [65.52, 2.72]],
  [[64.48, 2.04], [58.24, 28.56]],
  [[17.68, 1.36], [43.68, 1.36]],
  [[31.2, 12.92], [34.32, 3.4]],
  [[58.24, 7.48], [30.16, 24.48]],
  [[36.4, 7.48], [60.32, 18.36]]],
 [[[32.24, 3.4], [53.04, 33.32]],
  [[20.8, 20.4], [40.56, 8.84]],
  [[17.68, 5.44], [59.28, 6.8]],
  [[87.36, 4.76], [76.96, 46.92]],
  [[100.88, 17.68], [0.0, 68.0]],
  [[36.4, 1.36], [6

In [79]:
new_cols = ['RB_all',
            'R_CB_all',
            'L_CB_all',
            'LB_all',
            'RCB_all',
            'CB_all',
            'LCB_all',
            'RWB_all',
            'LWB_all']

**Collecting metrics for each defender location for various clusters**

In [80]:
#R_CB - Right center back for 4 defender formation
#RCB - Right center back for 3 or 5 defender formation
#L_CB - Left center back for 4 defender formation
#LCB - Left center back for 3 or 5 defender formation
df_defs_atb_updated = list()
for df in tqdm(df_defs_atb):
    df = df.reindex(columns = df.columns.tolist() + new_cols)
    if df.iloc[0]['backline'] == 4.0:     
        df['RB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.RB), axis=1)
        df['R_CB_all'] = df.apply(lambda x: getmetrics(x.wyId,x['R_CB']), axis=1)
        df['L_CB_all'] = df.apply(lambda x: getmetrics(x.wyId,x['L_CB']), axis=1)
        df['LB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.LB), axis=1)
        df_defs_atb_updated.append(df)
    
    elif df.iloc[0]['backline'] == 3.0:
        df['RCB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.RCB), axis=1)
        df['CB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.CB), axis=1)
        df['LCB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.LCB), axis=1)
        df_defs_atb_updated.append(df)
        
    elif df.iloc[0]['backline'] == 5.0:
        df['RWB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.RWB), axis=1)
        df['RCB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.RCB), axis=1)
        df['CB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.CB), axis=1)
        df['LCB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.LCB), axis=1)
        df['LWB_all'] = df.apply(lambda x: getmetrics(x.wyId,x.LWB), axis=1)
        df_defs_atb_updated.append(df)

100%|██████████| 3/3 [4:39:56<00:00, 5598.80s/it]   


In [81]:
df_defs_atb_metrics = list()
for df in tqdm(df_defs_atb_updated):
    if df.iloc[0]['backline'] == 4.0:
        df[[
            'RB_pass', 'RB_accpass', 'RB_highpass', 'RB_acchighpass',
            'RB_accpassloc', 'RB_inaccpassloc', 'RB_acchighpassloc',
            'RB_inacchighpassloc'
        ]] = pd.DataFrame(df['RB_all'].to_list(), index=df.index)
        df[[
            'R_CB_pass', 'R_CB_accpass', 'R_CB_highpass', 'R_CB_acchighpass',
            'R_CB_accpassloc', 'R_CB_inaccpassloc', 'R_CB_acchighpassloc',
            'R_CB_inacchighpassloc'
        ]] = pd.DataFrame(df['R_CB_all'].to_list(), index=df.index)
        df[[
            'L_CB_pass', 'L_CB_accpass', 'L_CB_highpass', 'L_CB_acchighpass',
            'L_CB_accpassloc', 'L_CB_inaccpassloc', 'L_CB_acchighpassloc',
            'L_CB_inacchighpassloc'
        ]] = pd.DataFrame(df['L_CB_all'].to_list(), index=df.index)
        df[[
            'LB_pass', 'LB_accpass', 'LB_highpass', 'LB_acchighpass',
            'LB_accpassloc', 'LB_inaccpassloc', 'LB_acchighpassloc',
            'LB_inacchighpassloc'
        ]] = pd.DataFrame(df['LB_all'].to_list(), index=df.index)
        df.drop([
            'RB_all', 'R_CB_all', 'L_CB_all', 'LB_all', 'RCB_all', 'LCB_all',
            'CB_all', 'RWB_all', 'LWB_all'
        ],
                axis=1,
                inplace=True)
        df_defs_atb_metrics.append(df)

    elif df.iloc[0]['backline'] == 3.0:
        df[[
            'RCB_pass', 'RCB_accpass', 'RCB_highpass', 'RCB_acchighpass',
            'RCB_accpassloc', 'RCB_inaccpassloc', 'RCB_acchighpassloc',
            'RCB_inacchighpassloc'
        ]] = pd.DataFrame(df['RCB_all'].to_list(), index=df.index)
        df[[
            'CB_pass', 'CB_accpass', 'CB_highpass', 'CB_acchighpass',
            'CB_accpassloc', 'CB_inaccpassloc', 'CB_acchighpassloc',
            'CB_inacchighpassloc'
        ]] = pd.DataFrame(df['CB_all'].to_list(), index=df.index)
        df[[
            'LCB_pass', 'LCB_accpass', 'LCB_highpass', 'LCB_acchighpass',
            'LCB_accpassloc', 'LCB_inaccpassloc', 'LCB_acchighpassloc',
            'LCB_inacchighpassloc'
        ]] = pd.DataFrame(df['LCB_all'].to_list(), index=df.index)
        df.drop([
            'RB_all', 'R_CB_all', 'L_CB_all', 'LB_all', 'RCB_all', 'LCB_all',
            'CB_all', 'RWB_all', 'LWB_all'
        ],
                axis=1,
                inplace=True)
        df_defs_atb_metrics.append(df)

    elif df.iloc[0]['backline'] == 5.0:
        df[[
            'RCB_pass', 'RCB_accpass', 'RCB_highpass', 'RCB_acchighpass',
            'RCB_accpassloc', 'RCB_inaccpassloc', 'RCB_acchighpassloc',
            'RCB_inacchighpassloc'
        ]] = pd.DataFrame(df['RCB_all'].to_list(), index=df.index)
        df[[
            'CB_pass', 'CB_accpass', 'CB_highpass', 'CB_acchighpass',
            'CB_accpassloc', 'CB_inaccpassloc', 'CB_acchighpassloc',
            'CB_inacchighpassloc'
        ]] = pd.DataFrame(df['CB_all'].to_list(), index=df.index)
        df[[
            'LCB_pass', 'LCB_accpass', 'LCB_highpass', 'LCB_acchighpass',
            'LCB_accpassloc', 'LCB_inaccpassloc', 'LCB_acchighpassloc',
            'LCB_inacchighpassloc'
        ]] = pd.DataFrame(df['LCB_all'].to_list(), index=df.index)
        df[[
            'RWB_pass', 'RWB_accpass', 'RWB_highpass', 'RWB_acchighpass',
            'RWB_accpassloc', 'RWB_inaccpassloc', 'RWB_acchighpassloc',
            'RWB_inacchighpassloc'
        ]] = pd.DataFrame(df['RWB_all'].to_list(), index=df.index)
        df[[
            'LWB_pass', 'LWB_accpass', 'LWB_highpass', 'LWB_acchighpass',
            'LWB_accpassloc', 'LWB_inaccpassloc', 'LWB_acchighpassloc',
            'LWB_inacchighpassloc'
        ]] = pd.DataFrame(df['LWB_all'].to_list(), index=df.index)
        df.drop([
            'RB_all', 'R_CB_all', 'L_CB_all', 'LB_all', 'RCB_all', 'LCB_all',
            'CB_all', 'RWB_all', 'LWB_all'
        ],
                axis=1,
                inplace=True)
        df_defs_atb_metrics.append(df)

100%|██████████| 3/3 [00:01<00:00,  2.65it/s]


In [83]:
df_defs_atb_updated[2][df_defs_atb_updated[2]['backline']==5].head()

Unnamed: 0,wyId,team,team_defense,RB,R_CB,L_CB,LB,RCB,CB,LCB,RWB,LWB,backline,match,gameweek,teamsData,dateutc,venue,referees,score,footedness,RCB_pass,RCB_accpass,RCB_highpass,RCB_acchighpass,RCB_accpassloc,RCB_inaccpassloc,RCB_acchighpassloc,RCB_inacchighpassloc,CB_pass,CB_accpass,CB_highpass,CB_acchighpass,CB_accpassloc,CB_inaccpassloc,CB_acchighpassloc,CB_inacchighpassloc,LCB_pass,LCB_accpass,LCB_highpass,LCB_acchighpass,LCB_accpassloc,LCB_inaccpassloc,LCB_acchighpassloc,LCB_inacchighpassloc,RWB_pass,RWB_accpass,RWB_highpass,RWB_acchighpass,RWB_accpassloc,RWB_inaccpassloc,RWB_acchighpassloc,RWB_inacchighpassloc,LWB_pass,LWB_accpass,LWB_highpass,LWB_acchighpass,LWB_accpassloc,LWB_inaccpassloc,LWB_acchighpassloc,LWB_inacchighpassloc
191,2499814,Leicester City,"[DannySimpson, BenChilwell, WesMorgan, HarryMa...",,,,,BenChilwell,WesMorgan,HarryMaguire,DannySimpson,ChristianFuchs,5.0,Leicester City-Everton,10,"{'1623': {'scoreET': 0, 'coachId': 434992, 'si...",2017-10-29 16:00:00,King Power Stadium,"[{'refereeId': 385911, 'role': 'referee'}, {'r...",2–0,right-left-right-right-left,18,11,1,0,"[[[36.4, 67.32], [16.64, 67.32]], [[70.72, 54....","[[[49.92, 65.28], [81.12, 61.88]], [[39.52, 66...",[],"[[[39.52, 66.64], [91.52, 61.88]]]",24,22,2,2,"[[[12.48, 24.48], [6.24, 35.36]], [[33.28, 20....","[[[32.24, 28.56], [39.52, 11.56]], [[36.4, 14....","[[[26.0, 24.48], [69.68, 16.32]], [[34.32, 34....",[],26,20,3,0,"[[[39.52, 59.84], [33.28, 20.4]], [[34.32, 55....","[[[9.36, 57.12], [70.72, 53.04]], [[45.76, 58....",[],"[[[9.36, 57.12], [70.72, 53.04]], [[33.28, 61....",27,20,6,4,"[[[70.72, 29.24], [60.32, 8.84]], [[63.44, 21....","[[[47.84, 19.72], [67.6, 7.48]], [[54.08, 30.6...","[[[63.44, 21.08], [82.16, 62.56]], [[6.24, 12....","[[[24.96, 3.4], [66.56, 17.0]], [[28.08, 4.08]...",44,32,8,3,"[[[28.08, 63.92], [69.68, 65.28]], [[58.24, 65...","[[[55.12, 57.8], [63.44, 55.76]], [[58.24, 61....","[[[32.24, 64.6], [72.8, 51.0]], [[38.48, 42.84...","[[[26.0, 66.64], [74.88, 57.8]], [[16.64, 67.3..."
194,2499816,Manchester Utd,"[LuisAntonioValenciaMosquera, EricBertrandBail...",,,,,EricBertrandBailly,ChrisSmalling,PhilJones,LuisAntonioValenciaMosquera,AshleyYoung,5.0,Manchester Utd-Tottenham,10,"{'1611': {'scoreET': 0, 'coachId': 3295, 'side...",2017-10-28 11:30:00,Old Trafford,"[{'refereeId': 381851, 'role': 'referee'}, {'r...",1–0,right-right-right-right-right,32,24,6,1,"[[[46.8, 10.88], [52.0, 15.64]], [[47.84, 4.76...","[[[36.4, 8.16], [76.96, 0.0]], [[36.4, 6.12], ...","[[[21.84, 9.52], [65.52, 10.88]]]","[[[36.4, 6.12], [70.72, 14.96]], [[34.32, 6.12...",33,25,3,0,"[[[26.0, 36.72], [24.96, 16.32]], [[24.96, 32....","[[[32.24, 51.68], [49.92, 53.72]], [[45.76, 25...",[],"[[[45.76, 25.84], [64.48, 57.12]], [[42.64, 3....",38,36,1,0,"[[[45.76, 54.4], [49.92, 9.52]], [[29.12, 38.0...","[[[44.72, 34.0], [66.56, 44.2]], [[34.32, 52.3...",[],"[[[34.32, 52.36], [87.36, 68.0]]]",38,30,2,1,"[[[74.88, 4.08], [74.88, 12.92]], [[78.0, 3.4]...","[[[80.08, 2.72], [69.68, 17.68]], [[68.64, 9.5...","[[[22.88, 3.4], [62.4, 19.04]]]","[[[57.2, 2.72], [76.96, 25.84]]]",45,26,13,4,"[[[67.6, 64.6], [45.76, 54.4]], [[43.68, 63.24...","[[[35.36, 65.96], [59.28, 32.64]], [[59.28, 62...","[[[43.68, 63.24], [86.32, 59.16]], [[42.64, 65...","[[[35.36, 65.96], [59.28, 32.64]], [[59.28, 62..."
277,2499857,Southampton,"[CedricRicardoAlvesSoares, MayaYoshida, Virgil...",,,,,MayaYoshida,VirgilvanDijk,WesleyHoedt,CedricRicardoAlvesSoares,RyanBertrand,5.0,Manchester City-Southampton,14,"{'1625': {'scoreET': 0, 'coachId': 267136, 'si...",2017-11-29 20:00:00,Etihad Stadium,"[{'refereeId': 385705, 'role': 'referee'}, {'r...",2–1,right-right-right-left-left,25,17,4,0,"[[[28.08, 16.32], [11.44, 27.2]], [[64.48, 7.4...","[[[28.08, 6.12], [92.56, 25.16]], [[20.8, 9.52...",[],"[[[28.08, 6.12], [92.56, 25.16]], [[12.48, 10....",19,15,4,2,"[[[31.2, 32.64], [37.44, 9.52]], [[28.08, 37.4...","[[[84.24, 26.52], [76.96, 24.48]], [[4.16, 4.0...","[[[33.28, 21.08], [101.92, 6.8]], [[14.56, 36....","[[[4.16, 4.08], [104.0, 0.0]], [[34.32, 35.36]...",22,14,5,2,"[[[28.08, 58.48], [42.64, 45.56]], [[38.48, 55...","[[[23.92, 49.64], [68.64, 56.44]], [[22.88, 55...","[[[26.0, 46.92], [86.32, 62.56]], [[36.4, 55.7...","[[[23.92, 49.64], [68.64, 56.44]], [[22.88, 55...",20,13,3,3,"[[[20.8, 6.12], [64.48, 19.72]], [[24.96, 7.48...","[[[14.56, 6.12], [14.56, 4.76]], [[35.36, 3.4]...","[[[20.8, 6.12], [64.48, 19.72]], [[48.88, 5.44...",[],20,13,4,1,"[[[26.0, 63.92], [22.88, 55.08]], [[14.56, 57....","[[[59.28, 55.76], [75.92, 36.04]], [[1.04, 60....","[[[42.64, 60.52], [70.72, 49.64]]]","[[[44.72, 34.68], [96.72, 68.0]], [[33.28, 62...."
327,2499882,West Ham,"[PabloJavierZabaletaGirod, WinstonReid, Angelo...",,,,,WinstonReid,AngeloObinzeOgbonna,AaronCresswell,PabloJavierZabaletaGirod,ArthurMasuaku,5.0,West Ham-Arsenal,17,"{'1609': {'scoreET': 0, 'coachId': 0, 'side': ...",2017-12-13 20:00:00,London Stadium,"[{'refereeId': 381851, 'role': 'referee'}, {'r...",0–0,right-right-left-left-left,19,14,2,0,"[[[37.44, 17.68], [43.68, 13.6]], [[34.32, 14....","[[[36.4, 17.68], [78.0, 65.96]], [[23.92, 11.5...",[],"[[[36.4, 17.68], [78.0, 65.96]], [[23.92, 11.5...",20,14,2,2,"[[[30.16, 37.4], [11.44, 34.0]], [[45.76, 35.3...","[[[24.96, 31.96], [58.24, 34.0]], [[17.68, 35....","[[[33.28, 56.44], [73.84, 53.72]], [[40.56, 40...",[],36,28,4,1,"[[[33.28, 38.08], [48.88, 38.08]], [[63.44, 63...","[[[9.36, 40.12], [31.2, 36.04]], [[39.52, 57.1...","[[[30.16, 44.2], [74.88, 34.0]]]","[[[39.52, 57.12], [75.92, 44.2]], [[44.72, 60....",27,15,1,1,"[[[43.68, 13.6], [44.72, 24.48]], [[62.4, 7.48...","[[[55.12, 4.08], [69.68, 2.04]], [[69.68, 10.8...","[[[61.36, 4.08], [101.92, 11.56]]]",[],27,23,0,0,"[[[46.8, 63.92], [74.88, 67.32]], [[49.92, 66....","[[[88.4, 57.8], [93.6, 30.6]], [[68.64, 63.92]...",[],[]
419,2499928,West Brom,"[MattPhillips, CraigDawson, AhmedHegazy, Jonny...",,,,,CraigDawson,AhmedHegazy,JonnyEvans,MattPhillips,KieranGibbs,5.0,West Brom-Arsenal,21,"{'1609': {'scoreET': 0, 'coachId': 7845, 'side...",2017-12-31 16:30:00,The Hawthorns,"[{'refereeId': 385909, 'role': 'referee'}, {'r...",1–1,right-right-right-right-left,27,18,6,1,"[[[21.84, 22.44], [31.2, 7.48]], [[62.4, 12.24...","[[[15.6, 8.16], [50.96, 0.0]], [[38.48, 9.52],...","[[[56.16, 9.52], [95.68, 6.8]]]","[[[15.6, 8.16], [50.96, 0.0]], [[38.48, 9.52],...",26,23,4,4,"[[[29.12, 38.76], [29.12, 44.88]], [[21.84, 40...","[[[38.48, 9.52], [71.76, 3.4]], [[35.36, 16.32...","[[[49.92, 37.4], [83.2, 8.16]], [[44.72, 17.68...",[],21,19,3,2,"[[[37.44, 61.88], [16.64, 34.68]], [[30.16, 46...","[[[33.28, 53.72], [75.92, 57.8]], [[57.2, 48.2...","[[[39.52, 38.76], [91.52, 61.88]], [[26.0, 28....","[[[33.28, 53.72], [75.92, 57.8]]]",0,0,0,0,[],[],[],[],35,27,5,2,"[[[30.16, 63.24], [10.4, 42.16]], [[63.44, 65....","[[[32.24, 61.88], [46.8, 48.28]], [[16.64, 44....","[[[59.28, 62.56], [83.2, 61.2]], [[40.56, 53.0...","[[[16.64, 44.88], [60.32, 68.0]], [[34.32, 60...."


In [89]:
df_defs_atb_metrics_combined = list()
df_defs_atb_metrics_combined.append(df_defs_atb_metrics[0])
df_defs_atb_metrics_combined.append(pd.concat([df_defs_atb_metrics[1],df_defs_atb_metrics[2]]))

In [92]:
# df_defs_atb_metrics_combined[1][df_defs_atb_metrics_combined[1]['backline']==5].head()

In [93]:
atb = ['four_defs','three_five_defs']
for i,df in enumerate(df_defs_atb_metrics_combined):
    df.to_pickle(f'../data_top5/clusters/clusters_v3/cluster_{atb[i]}.pkl')

In [19]:
df_fatb = pd.read_pickle('../data_top5/clusters/clusters_v3/cluster_four_defs.pkl')

In [10]:
df_fatb.head()

NameError: name 'df_fatb' is not defined

**Steps to validate if all players have been assigned metrics**

**Fetch players that have not registered a single pass in any particular match**

In [94]:
players_no_pass = list()
for df in df_defs_atb_metrics_combined:
    if df.iloc[0]['backline']==4:
        for col in ['RB_pass','R_CB_pass','L_CB_pass','LB_pass']:
            players_no_pass.append(df[df[col].eq(0)][col.rsplit('_',1)[0]].values.tolist())
    elif df.iloc[0]['backline']==3:
        for col in ['RCB_pass','CB_pass','LCB_pass']:
            players_no_pass.append(df[df[col].eq(0)][col.rsplit('_',1)[0]].values.tolist())
    else:
        for col in ['RWB_pass','RCB_pass','CB_pass','LCB_pass','LWB_pass']:
            players_no_pass.append(df[df[col].eq(0)][col.rsplit('_',1)[0]].values.tolist())
players_no_pass_set = list(set([i for sublist in players_no_pass for i in sublist]))

In [95]:
players_no_pass_set

['GregorySertic',
 'EmreCan',
 'FranciscoJavierGuerreroMartin',
 'AssaneDiousseElHadji',
 'MarcAlbrighton',
 'YvesBissouma',
 'RomuloSouzaOrestesCaldeira',
 'MatthiasLehmann',
 'IbrahimAmadou',
 'JulianBaumgartlinger',
 'FabioDepaoli',
 'JesusNavasGonzalez',
 'DavidTimorCopovi',
 'PaoloPancrazioFarago',
 'SamClucas',
 'LarsBender',
 'IsmaelTiemokoDiomande',
 'BelDurelAvounou',
 'KwadwoAsamoah',
 'RubenPenaJimenez',
 'MatthiasZimmermann',
 'DanielAmartey',
 'MarcelRisse',
 'CarlosHenriqueCasimiro',
 'JohannesGeis',
 'StefanIlsanker',
 'VictorSanchezMata',
 'RaniKhedira',
 'RyanFraser',
 'JavierMartinezAginaga',
 'MakotoHasebe',
 'IvanRadovanovic',
 'LuizGustavoDias',
 'ThiagoMaiaAlencar',
 'IgnacioCamachoBarnola',
 'SamMcQueen',
 "AlfredJohnMomarN'Diaye",
 'LemouyaGoudiaby',
 'GeorgesConstantMandjeck',
 'IsaacHayden',
 'MohamedSalimFares',
 'SimonePadoin',
 "KevinN'Doram",
 'GeorginioWijnaldum',
 'SergiGomezSola',
 'EricDier',
 'JuanGuillermoCuadradoBello',
 'ThomasTeyePartey',
 'Stefan

**Further filter players that are defenders (Note: Players who have played in a defensive position but have not been marked as defenders are not assigned metrics)**

In [96]:
players = pd.read_pickle('../data/players/players.pkl')

In [97]:
for player in players_no_pass_set:
    player_name_split = re.findall('[A-Z][^A-Z]*',player)
    try:
        role = players[(players['playerName'].str.contains(player_name_split[-1]))&
                       (players['playerName'].str.contains(player_name_split[-2]))&
                       (players['playerName'].str.contains(player_name_split[-3]))]['role'].values.tolist()[0]['code2']
    except:
        try:
            role = players[(players['playerName'].str.contains(player_name_split[-1]))&
                           (players['playerName'].str.contains(player_name_split[-2]))]['role'].values.tolist()[0]['code2']
        except:
            role = players[(players['playerName'].str.contains(player_name_split[-1]))]['role'].values.tolist()[0]['code2']
    if role=='DF':
        print(player+':'+role)

SergiGomezSola:DF
JorgeAndujarMoreno:DF
JordanTorunarigha:DF
JeremyGelin:DF


**Finding match ids for which these defenders do not have a single pass**

In [98]:
no_pass_defs = ['JorgeAndujarMoreno', 'SergiGomezSola','JeremyGelin', 'JordanTorunarigha']
df_indexes = dict()
for i in range(len(df_defs_atb_metrics)):
    check_indexes = dict()
    for defender in no_pass_defs:
        if df_defs_atb_metrics[i].iloc[0]['backline']==4:
            index_list = list()
            for col in ['RB_pass','R_CB_pass','L_CB_pass','LB_pass']:
                index_list.append(df_defs_atb_metrics[i][(df_defs_atb_metrics[i][col].eq(0))&(df_defs_atb_metrics[i][col.rsplit('_',1)[0]]==defender)].index.tolist())
                check_indexes[defender]=index_list
        elif df_defs_atb_metrics[i].iloc[0]['backline']==3:
            index_list = list()
            for col in ['RCB_pass','CB_pass','LCB_pass']:
                index_list.append(df_defs_atb_metrics[i][(df_defs_atb_metrics[i][col].eq(0))&(df_defs_atb_metrics[i][col.rsplit('_',1)[0]]==defender)].index.tolist())
                check_indexes[defender]=index_list
        else:
            index_list = list()
            for col in ['RWB_pass','RCB_pass','CB_pass','LCB_pass','LWB_pass']:
                index_list.append(df_defs_atb_metrics[i][(df_defs_atb_metrics[i][col].eq(0))&(df_defs_atb_metrics[i][col.rsplit('_',1)[0]]==defender)].index.tolist())
                check_indexes[defender]=index_list
        df_indexes[i]=check_indexes

In [99]:
df_indexes

{0: {'JorgeAndujarMoreno': [[2499], [], [], []],
  'SergiGomezSola': [[], [2399], [], []],
  'JeremyGelin': [[], [1043], [], []],
  'JordanTorunarigha': [[], [], [1810], []]},
 1: {'JorgeAndujarMoreno': [[], [], []],
  'SergiGomezSola': [[], [], []],
  'JeremyGelin': [[], [], []],
  'JordanTorunarigha': [[], [], []]},
 2: {'JorgeAndujarMoreno': [[], [], [], [], []],
  'SergiGomezSola': [[], [], [], [], []],
  'JeremyGelin': [[], [], [], [], []],
  'JordanTorunarigha': [[], [], [], [], []]}}

**Wyscout has not recorded any data for Jeremy Gelin the mentioned match id, even though they have played 90 mins. Coke (JorgeAndujarMoreno) and Sergio Gomez were substituted while Jordan Torunaringha was shown a red card for the respective matches and hence do not have any passing event associated to them.**