# Computing technical statistics
_@Lutecity April 2023_

**Description**

This notebook computes technical statistics on players during the game such as: total of shots on target, pass accuracy, number of fouls, number of successfull tackles, etc. These codes will be used in the application in order to provide insights on the technical performances of the players for the staff during the game.

The data used are those provided by Statsbomb for the match Man City vs Arsenal.

**Summary**

1. [Import the data](#data)
2. [Offensive stats](#offensive)
    1. [Shots](#shots)
    2. [Offsides](#offsides)
3. [Passing stats](#passing)
    1. [Crosses](#crosses)
    2. [Passes](#passes)
4. [Discipline stats](#discipline)
5. [Defensive stats](#defensive)
    1. [Tackles](#tackles)
    2. [Duels](#duels)
    3. [Interceptions](#interceptions)
    4. [Clearances](#clearances)
    5. [Lost balls](#lost-balls)
6. [Aggregate stats](#aggregate)
6. [Differentiate stats based on player's position](#differentiate)

## Import the data <a id="data"></a>

In [58]:
# LIBRAIRIES ----------------------------------------------------
import pandas as pd
import numpy as np
import json
import warnings
from math import *
warnings.simplefilter(action='ignore', category=FutureWarning)
pd.set_option('max_colwidth', 400)
pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None
import os

In [59]:
path_event = os.path.normpath(os.getcwd() + os.sep + os.pardir) + "/Data/StatsBomb/ManCity_Arsenal_events.json"
path_lineup = os.path.normpath(os.getcwd() + os.sep + os.pardir) + "/Data/StatsBomb/ManCity_Arsenal_lineups.json"

In [60]:
with open(path_event, encoding='utf-8') as f:
    json_file = json.load(f)
    df_events = pd.json_normalize(json_file, sep = "_")

## Offensive stats <a id="offensive"></a>

### Shots <a id="shots"></a>
#### Total of shots

In [61]:
def compute_total_of_shots(df_events) :
    df_shot = df_events[df_events['type_name']=="Shot"]
    df_shot = df_shot.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total shots')
    
    return(df_shot)

df_shot = compute_total_of_shots(df_events)
print("Total of Shots: {}".format(sum(df_shot[df_shot['team_name']=='Manchester City WFC']['Total shots'])))
df_shot[df_shot['team_name']=='Manchester City WFC'].sort_values(by = 'Total shots',ascending = False)

Total of Shots: 16


Unnamed: 0,team_name,player_id,player_name,Total shots
7,Manchester City WFC,15570.0,Chloe Kelly,5
9,Manchester City WFC,25554.0,Khadija Monifa Shaw,3
6,Manchester City WFC,15555.0,Lauren Hemp,2
10,Manchester City WFC,25632.0,Yui Hasegawa,2
5,Manchester City WFC,10252.0,Alex Greenwood,1
8,Manchester City WFC,19416.0,Laura Coombs,1
11,Manchester City WFC,32210.0,Laia Aleixandri López,1
12,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,1


#### Total of shots on target

In [62]:
def compute_total_of_shots_on_target(df_events) :
    df_shot_on_target = df_events[(df_events['type_name']=="Shot") & df_events['shot_outcome_name'].isin(['Goal', 'Saved'])]
    df_shot_on_target = df_shot_on_target.groupby(['team_name','player_id','player_name']).size().reset_index(name='Shots on target')

    return(df_shot_on_target)

df_shot_on_target = compute_total_of_shots_on_target(df_events)
print("Total of Shots on target: {}".format(sum(df_shot_on_target[df_shot_on_target['team_name']=='Manchester City WFC']['Shots on target'])))
df_shot_on_target[df_shot_on_target['team_name']=='Manchester City WFC'].sort_values(by = 'Shots on target',ascending = False)

Total of Shots on target: 6


Unnamed: 0,team_name,player_id,player_name,Shots on target
2,Manchester City WFC,15555.0,Lauren Hemp,2
3,Manchester City WFC,15570.0,Chloe Kelly,2
4,Manchester City WFC,25554.0,Khadija Monifa Shaw,1
5,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,1


#### Shot accuracy

In [63]:
def compute_shot_accuracy(df_shot,df_shot_on_target) :
    df_shots = pd.merge(df_shot,
                        df_shot_on_target, 
                        on =['team_name','player_id','player_name'])
    df_shots['Shot accuracy (%)'] = ((df_shots['Shots on target'] / df_shots['Total shots']) * 100).round(1)
    return(df_shots)

df_shots = compute_shot_accuracy(df_shot,df_shot_on_target)
df_shots[df_shots['team_name']=='Manchester City WFC'].sort_values(by = 'Shot accuracy (%)',ascending = False)

Unnamed: 0,team_name,player_id,player_name,Total shots,Shots on target,Shot accuracy (%)
2,Manchester City WFC,15555.0,Lauren Hemp,2,2,100.0
5,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,1,1,100.0
3,Manchester City WFC,15570.0,Chloe Kelly,5,2,40.0
4,Manchester City WFC,25554.0,Khadija Monifa Shaw,3,1,33.3


### Offsides <a id="offsides"></a>

Two types of events are considered as offsides:
- Offside infringement. Cases resulting from a shot or clearance (non-pass).
- Ball reaches teammate but pass is judged offside

In [64]:
def compute_total_of_offsides(df_events):
    df_offsides = df_events[(df_events['type_name']=='Offside') | ((df_events['type_name']=="Pass") & (df_events['pass_outcome_name']=="Pass Offside")) ]
    df_offsides = df_offsides.groupby(['team_name','player_id','player_name']).size().reset_index(name='Offsides')
    return(df_offsides)

df_offsides = compute_total_of_offsides(df_events)
print("Total of Offsides: {}".format(sum(df_offsides[df_offsides['team_name']=='Manchester City WFC']['Offsides'])))
df_offsides[df_offsides['team_name']=='Manchester City WFC'].sort_values(by = 'Offsides',ascending = False)

Total of Offsides: 2


Unnamed: 0,team_name,player_id,player_name,Offsides
4,Manchester City WFC,15570.0,Chloe Kelly,1
5,Manchester City WFC,19416.0,Laura Coombs,1


## Passing stats <a id="passing"></a>

### Crosses <a id="crosses"></a>
#### Total of crosses

In [65]:
def compute_total_of_crosses(df_events): 
    df_cross = df_events[(df_events['type_name']=="Pass") & (df_events['pass_cross']==True)]
    df_cross = df_cross.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total crosses')
    return(df_cross)

df_crosses = compute_total_of_crosses(df_events)
print("Total of crosses: {}".format(sum(df_crosses[df_crosses['team_name']=='Manchester City WFC']['Total crosses'])))
df_crosses[df_crosses['team_name']=='Manchester City WFC'].sort_values(by = 'Total crosses',ascending = False)

Total of crosses: 14


Unnamed: 0,team_name,player_id,player_name,Total crosses
6,Manchester City WFC,15570.0,Chloe Kelly,5
5,Manchester City WFC,15555.0,Lauren Hemp,3
4,Manchester City WFC,10252.0,Alex Greenwood,2
9,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,2
7,Manchester City WFC,19416.0,Laura Coombs,1
8,Manchester City WFC,32210.0,Laia Aleixandri López,1


#### Total of successfull crosses

In [66]:
def compute_total_of_successfull_crosses(df_events):
    df_cross_successfull = df_events[(df_events['type_name']=="Pass") & (df_events['pass_cross']==True) &
                    (df_events['pass_outcome_name'].isnull())]
    df_cross_successfull = df_cross_successfull.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total succesfull crosses')
    return(df_cross_successfull)

df_successfull_crosses = compute_total_of_successfull_crosses(df_events)
print("Total of successfull crosses: {}".format(sum(df_successfull_crosses[df_successfull_crosses['team_name']=='Manchester City WFC']['Total succesfull crosses'])))
df_successfull_crosses[df_successfull_crosses['team_name']=='Manchester City WFC'].sort_values(by = 'Total succesfull crosses',ascending = False)

Total of successfull crosses: 4


Unnamed: 0,team_name,player_id,player_name,Total succesfull crosses
3,Manchester City WFC,15570.0,Chloe Kelly,2
1,Manchester City WFC,10252.0,Alex Greenwood,1
2,Manchester City WFC,15555.0,Lauren Hemp,1


### Passes <a id="passes"></a>
#### Total of passes
We remove the following events : Injury Clearance, Unknown

In [67]:
def compute_total_of_passes(df_events):
    df_pass = df_events[(df_events['type_name']=="Pass") &
                    ~(df_events['pass_outcome_name'].isin(["Injury Clearance", "Unknown"]))]

    df_pass = df_pass.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total passes')
    return(df_pass)

df_pass = compute_total_of_passes(df_events)
print("Total of passes: {}".format(sum(df_pass[df_pass['team_name']=='Manchester City WFC']['Total passes'])))
df_pass[df_pass['team_name']=='Manchester City WFC'].sort_values(by = 'Total passes',ascending = False)

Total of passes: 499


Unnamed: 0,team_name,player_id,player_name,Total passes
18,Manchester City WFC,10185.0,Stephanie Houghton,76
24,Manchester City WFC,25632.0,Yui Hasegawa,68
27,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,62
19,Manchester City WFC,10252.0,Alex Greenwood,59
26,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,47
25,Manchester City WFC,32210.0,Laia Aleixandri López,46
21,Manchester City WFC,15570.0,Chloe Kelly,37
22,Manchester City WFC,19416.0,Laura Coombs,35
16,Manchester City WFC,4637.0,Ellie Roebuck,24
23,Manchester City WFC,25554.0,Khadija Monifa Shaw,23


#### Total of successfull passes

In [68]:
def compute_total_of_succesfull_passes(df_events):
    df_pass_successfull = df_events[(df_events['type_name']=="Pass") &
                    (df_events['pass_outcome_name'].isnull())]
    df_pass_successfull = df_pass_successfull.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total successfull passes')
    return(df_pass_successfull)

df_successfull_pass = compute_total_of_succesfull_passes(df_events) 
print("Total of successfull passes: {}".format(sum(df_successfull_pass[df_successfull_pass['team_name']=='Manchester City WFC']['Total successfull passes'])))
df_successfull_pass[df_successfull_pass['team_name']=='Manchester City WFC'].sort_values(by = 'Total successfull passes',ascending = False)

Total of successfull passes: 393


Unnamed: 0,team_name,player_id,player_name,Total successfull passes
23,Manchester City WFC,25632.0,Yui Hasegawa,61
17,Manchester City WFC,10185.0,Stephanie Houghton,60
26,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,51
18,Manchester City WFC,10252.0,Alex Greenwood,49
25,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,40
24,Manchester City WFC,32210.0,Laia Aleixandri López,37
21,Manchester City WFC,19416.0,Laura Coombs,28
20,Manchester City WFC,15570.0,Chloe Kelly,22
22,Manchester City WFC,25554.0,Khadija Monifa Shaw,18
15,Manchester City WFC,4637.0,Ellie Roebuck,14


#### Pass accuracy

In [69]:
def compute_pass_accuracy(df_pass,df_pass_successfull):
    df_passes = pd.merge(df_pass,
                        df_pass_successfull,
                        on = ['team_name','player_name','player_id'],
                        how='left')
    df_passes = df_passes.replace(np.nan,0)
    df_passes['Pass accuracy (%)'] = (df_passes["Total successfull passes"] / (df_pass["Total passes"]) * 100).round(1)
    return(df_passes)

df_passes = compute_pass_accuracy(df_pass,df_successfull_pass)
df_passes[df_passes['team_name']=='Manchester City WFC'].sort_values(by = 'Pass accuracy (%)',ascending = False)


Unnamed: 0,team_name,player_id,player_name,Total passes,Total successfull passes,Pass accuracy (%)
17,Manchester City WFC,6818.0,Hayley Emma Raso,1,1.0,100.0
24,Manchester City WFC,25632.0,Yui Hasegawa,68,61.0,89.7
26,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,47,40.0,85.1
19,Manchester City WFC,10252.0,Alex Greenwood,59,49.0,83.1
27,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,62,51.0,82.3
25,Manchester City WFC,32210.0,Laia Aleixandri López,46,37.0,80.4
22,Manchester City WFC,19416.0,Laura Coombs,35,28.0,80.0
18,Manchester City WFC,10185.0,Stephanie Houghton,76,60.0,78.9
23,Manchester City WFC,25554.0,Khadija Monifa Shaw,23,18.0,78.3
21,Manchester City WFC,15570.0,Chloe Kelly,37,22.0,59.5


## Discipline stats <a id="discipline"></a>

#### Total of fouls

In [70]:
def compute_total_of_fouls(df_events):
    df_fouls = df_events[df_events['type_name']=='Foul Committed']
    df_fouls = df_fouls.groupby(['team_name','player_id','player_name']).size().reset_index(name='Fouls')
    return(df_fouls)

df_fouls = compute_total_of_fouls(df_events)   
print("Total of Fouls: {}".format(sum(df_fouls[df_fouls['team_name']=='Manchester City WFC']['Fouls'])))
df_fouls[df_fouls['team_name']=='Manchester City WFC'].sort_values(by = 'Fouls',ascending = False)

Total of Fouls: 10


Unnamed: 0,team_name,player_id,player_name,Fouls
13,Manchester City WFC,19416.0,Laura Coombs,3
14,Manchester City WFC,25554.0,Khadija Monifa Shaw,2
9,Manchester City WFC,10185.0,Stephanie Houghton,1
10,Manchester City WFC,10252.0,Alex Greenwood,1
11,Manchester City WFC,15555.0,Lauren Hemp,1
12,Manchester City WFC,15570.0,Chloe Kelly,1
15,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,1


## Defensive stats <a id="defensive"></a>

### Tackles <a id="tackles"></a>
#### Total of tackles

In [71]:
def compute_total_of_tackles(df_events):
    df_tackle = df_events[(df_events['type_name']=="Duel") & (df_events['duel_type_name']=="Tackle")]
    df_tackle = df_tackle.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total tackles')
    return(df_tackle)

df_tackle = compute_total_of_tackles(df_events)
print("Total of tackles: {}".format(sum(df_tackle[df_tackle['team_name']=='Manchester City WFC']['Total tackles'])))
df_tackle[df_tackle['team_name']=='Manchester City WFC'].sort_values(by = 'Total tackles',ascending = False)

Total of tackles: 25


Unnamed: 0,team_name,player_id,player_name,Total tackles
14,Manchester City WFC,25632.0,Yui Hasegawa,5
10,Manchester City WFC,15555.0,Lauren Hemp,3
12,Manchester City WFC,19416.0,Laura Coombs,3
16,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,3
17,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,3
11,Manchester City WFC,15570.0,Chloe Kelly,2
13,Manchester City WFC,25554.0,Khadija Monifa Shaw,2
15,Manchester City WFC,32210.0,Laia Aleixandri López,2
8,Manchester City WFC,10185.0,Stephanie Houghton,1
9,Manchester City WFC,10252.0,Alex Greenwood,1


#### Total of succesfull tackles

In [72]:
def compute_total_of_successfull_tackles(df_events):
    df_tackle_successfull = df_events[(df_events['type_name']=="Duel") & (df_events['duel_type_name']=="Tackle") & 
                            (df_events['duel_outcome_name'].isin(['Success In Play','Won','Success Out','Success']))]
    df_tackle_successfull = df_tackle_successfull.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total successfull tackles')
    return(df_tackle_successfull)

df_successfull_tackle = compute_total_of_successfull_tackles(df_events)   
print("Total of successfull tackles: {}".format(sum(df_successfull_tackle[df_successfull_tackle['team_name']=='Manchester City WFC']['Total successfull tackles'])))
df_successfull_tackle[df_successfull_tackle['team_name']=='Manchester City WFC'].sort_values(by = 'Total successfull tackles',ascending = False)

Total of successfull tackles: 16


Unnamed: 0,team_name,player_id,player_name,Total successfull tackles
9,Manchester City WFC,19416.0,Laura Coombs,3
11,Manchester City WFC,25632.0,Yui Hasegawa,3
7,Manchester City WFC,15555.0,Lauren Hemp,2
8,Manchester City WFC,15570.0,Chloe Kelly,2
10,Manchester City WFC,25554.0,Khadija Monifa Shaw,2
13,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,2
6,Manchester City WFC,10252.0,Alex Greenwood,1
12,Manchester City WFC,32210.0,Laia Aleixandri López,1


#### Tackle accuracy

In [73]:
def compute_tackle_accuracy(df_tackle,df_tackle_successfull):
    df_tackles = pd.merge(df_tackle,
                        df_tackle_successfull,
                        on = ['team_name','player_name','player_id'],
                        how='left')
    df_tackles = df_tackles.replace(np.nan,0)
    df_tackles['Tackle accuracy (%)'] = (df_tackles["Total successfull tackles"] / (df_tackles["Total tackles"]) * 100).round(1)
    return(df_tackles)

df_tackles = compute_tackle_accuracy(df_tackle,df_successfull_tackle)
df_tackles[df_tackles['team_name']=='Manchester City WFC'].sort_values(by = 'Tackle accuracy (%)',ascending = False)

Unnamed: 0,team_name,player_id,player_name,Total tackles,Total successfull tackles,Tackle accuracy (%)
9,Manchester City WFC,10252.0,Alex Greenwood,1,1.0,100.0
11,Manchester City WFC,15570.0,Chloe Kelly,2,2.0,100.0
12,Manchester City WFC,19416.0,Laura Coombs,3,3.0,100.0
13,Manchester City WFC,25554.0,Khadija Monifa Shaw,2,2.0,100.0
10,Manchester City WFC,15555.0,Lauren Hemp,3,2.0,66.7
17,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,3,2.0,66.7
14,Manchester City WFC,25632.0,Yui Hasegawa,5,3.0,60.0
15,Manchester City WFC,32210.0,Laia Aleixandri López,2,1.0,50.0
8,Manchester City WFC,10185.0,Stephanie Houghton,1,0.0,0.0
16,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,3,0.0,0.0


### Duels <a id="duels"></a>
#### Total of duels

In [74]:
def compute_total_of_duels(df_events) : 
    df_duels = df_events[df_events['type_name']=='Duel']
    df_duels = df_duels.groupby(['team_name','player_id','player_name']).size().reset_index(name='Duels')
    return(df_duels)

df_duels = compute_total_of_duels(df_events)
print("Total of Duels: {}".format(sum(df_duels[df_duels['team_name']=='Manchester City WFC']['Duels'])))
df_duels[df_duels['team_name']=='Manchester City WFC'].sort_values(by = 'Duels',ascending = False)

Total of Duels: 40


Unnamed: 0,team_name,player_id,player_name,Duels
16,Manchester City WFC,19416.0,Laura Coombs,7
14,Manchester City WFC,15555.0,Lauren Hemp,6
17,Manchester City WFC,25554.0,Khadija Monifa Shaw,6
18,Manchester City WFC,25632.0,Yui Hasegawa,6
20,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,3
21,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,3
12,Manchester City WFC,10185.0,Stephanie Houghton,2
13,Manchester City WFC,10252.0,Alex Greenwood,2
15,Manchester City WFC,15570.0,Chloe Kelly,2
19,Manchester City WFC,32210.0,Laia Aleixandri López,2


#### Total of succesfull duels

In [75]:
def compute_total_of_successfull_duels(df_events):
    df_duels_successfull = df_events[(df_events['type_name']=="Duel")& 
                            (df_events['duel_outcome_name'].isin(['Success In Play','Won','Success Out','Success']))]
    df_duels_successfull = df_duels_successfull.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total successfull duels')
    return(df_duels_successfull)

df_successfull_duels = compute_total_of_successfull_duels(df_events)
print("Total of successfull duels: {}".format(sum(df_successfull_duels[df_successfull_duels['team_name']=='Manchester City WFC']['Total successfull duels'])))
df_successfull_duels[df_successfull_duels['team_name']=='Manchester City WFC'].sort_values(by = 'Total successfull duels',ascending = False)

Total of successfull duels: 16


Unnamed: 0,team_name,player_id,player_name,Total successfull duels
9,Manchester City WFC,19416.0,Laura Coombs,3
11,Manchester City WFC,25632.0,Yui Hasegawa,3
7,Manchester City WFC,15555.0,Lauren Hemp,2
8,Manchester City WFC,15570.0,Chloe Kelly,2
10,Manchester City WFC,25554.0,Khadija Monifa Shaw,2
13,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,2
6,Manchester City WFC,10252.0,Alex Greenwood,1
12,Manchester City WFC,32210.0,Laia Aleixandri López,1


### Interceptions <a id="interceptions"></a>

In [76]:
def compute_total_of_interceptions(df_events):
    df_interceptions = df_events[(df_events['type_name']=='Interception') & 
                                    (df_events['interception_outcome_name'].isin(['Success In Play','Won','Success Out','Success']))]
    df_interceptions = df_interceptions.groupby(['team_name','player_id','player_name']).size().reset_index(name='Interceptions')
    return(df_interceptions)

df_interceptions = compute_total_of_interceptions(df_events)
print("Total of Interceptions: {}".format(sum(df_interceptions[df_interceptions['team_name']=='Manchester City WFC']['Interceptions'])))
df_interceptions[df_interceptions['team_name']=='Manchester City WFC'].sort_values(by = 'Interceptions',ascending = False)

Total of Interceptions: 18


Unnamed: 0,team_name,player_id,player_name,Interceptions
14,Manchester City WFC,25632.0,Yui Hasegawa,5
10,Manchester City WFC,10252.0,Alex Greenwood,3
17,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,3
12,Manchester City WFC,15570.0,Chloe Kelly,2
15,Manchester City WFC,32210.0,Laia Aleixandri López,2
11,Manchester City WFC,15555.0,Lauren Hemp,1
13,Manchester City WFC,25554.0,Khadija Monifa Shaw,1
16,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,1


### Clearances <a id="clearances"></a>

In [77]:
def compute_total_of_clearances(df_events):
    df_clearances = df_events[df_events['type_name']=='Clearance']
    df_clearances = df_clearances.groupby(['team_name','player_id','player_name']).size().reset_index(name='Clearances')
    return(df_clearances)

df_clearances = compute_total_of_clearances(df_events)
print("Total of Clearances: {}".format(sum(df_clearances[df_clearances['team_name']=='Manchester City WFC']['Clearances'])))
df_clearances[df_clearances['team_name']=='Manchester City WFC'].sort_values(by = 'Clearances',ascending = False)

Total of Clearances: 15


Unnamed: 0,team_name,player_id,player_name,Clearances
10,Manchester City WFC,10185.0,Stephanie Houghton,5
16,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,4
14,Manchester City WFC,32210.0,Laia Aleixandri López,2
11,Manchester City WFC,10252.0,Alex Greenwood,1
12,Manchester City WFC,15555.0,Lauren Hemp,1
13,Manchester City WFC,25554.0,Khadija Monifa Shaw,1
15,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,1


### Lost balls <a id="lost-balls"></a>

In [78]:
def compute_total_of_lost_balls(df_events):
    df_lost_ball = df_events[(df_events['type_name']=='Dispossessed') | (df_events['type_name']=='Miscontrol')]
    df_lost_ball = df_lost_ball.groupby(['team_name','player_id','player_name']).size().reset_index(name='Total of lost balls')
    return(df_lost_ball)

df_lost_balls = compute_total_of_lost_balls(df_events)
print("Total of lost balls: {}".format(sum(df_lost_balls[df_lost_balls['team_name']=='Manchester City WFC']['Total of lost balls'])))
df_lost_balls[df_lost_balls['team_name']=='Manchester City WFC'].sort_values(by = 'Total of lost balls',ascending = False)

Total of lost balls: 21


Unnamed: 0,team_name,player_id,player_name,Total of lost balls
12,Manchester City WFC,15570.0,Chloe Kelly,5
14,Manchester City WFC,25554.0,Khadija Monifa Shaw,5
11,Manchester City WFC,15555.0,Lauren Hemp,4
13,Manchester City WFC,19416.0,Laura Coombs,3
15,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,3
16,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,1


## Aggregate stats <a id="aggregate"></a>

Create a dataframe with all the stats computed before.

In [79]:
def aggregate_statistics(df_events) :
    df_shot = compute_total_of_shots(df_events)
    df_shot_on_target = compute_total_of_shots_on_target(df_events)
    df_shots = compute_shot_accuracy(df_shot,df_shot_on_target)
    df_offsides = compute_total_of_offsides(df_events)
    df_crosses = compute_total_of_crosses(df_events)
    df_successfull_crosses = compute_total_of_successfull_crosses(df_events)
    df_pass = compute_total_of_passes(df_events)
    df_successfull_pass = compute_total_of_succesfull_passes(df_events)
    df_passes = compute_pass_accuracy(df_pass,df_successfull_pass)
    df_fouls = compute_total_of_fouls(df_events)
    df_tackle = compute_total_of_tackles(df_events)
    df_successfull_tackle = compute_total_of_successfull_tackles(df_events)
    df_tackles = compute_tackle_accuracy(df_tackle,df_successfull_tackle)
    df_duels = compute_total_of_duels(df_events)
    df_successfull_duels = compute_total_of_successfull_duels(df_events)
    df_interceptions = compute_total_of_interceptions(df_events)
    df_clearances = compute_total_of_clearances(df_events)
    df_lost_balls = compute_total_of_lost_balls(df_events)

    df_players = df_events[~(df_events['player_id'].isnull())][['team_name','player_id','player_name']].drop_duplicates()

    list_of_df = [df_shots,df_offsides, df_crosses, df_successfull_crosses, df_passes, df_fouls, df_tackles, df_duels, 
                    df_successfull_duels, df_interceptions,df_clearances, df_lost_balls]

    for df in list_of_df :
        df_players = pd.merge(df_players,
                            df, 
                            on = ['team_name','player_id','player_name'],
                            how = 'left')
    df_players = df_players.replace(np.nan,0)

    return(df_players)

In [80]:
df_stats_players = aggregate_statistics(df_events)
df_stats_players[df_stats_players['team_name']=='Manchester City WFC']

Unnamed: 0,team_name,player_id,player_name,Total shots,Shots on target,Shot accuracy (%),Offsides,Total crosses,Total succesfull crosses,Total passes,Total successfull passes,Pass accuracy (%),Fouls,Total tackles,Total successfull tackles,Tackle accuracy (%),Duels,Total successfull duels,Interceptions,Clearances,Total of lost balls
0,Manchester City WFC,25554.0,Khadija Monifa Shaw,3.0,1.0,33.3,0.0,0.0,0.0,23,18.0,78.3,2.0,2.0,2.0,100.0,6.0,2.0,1.0,1.0,5.0
1,Manchester City WFC,25632.0,Yui Hasegawa,0.0,0.0,0.0,0.0,0.0,0.0,68,61.0,89.7,0.0,5.0,3.0,60.0,6.0,3.0,5.0,0.0,0.0
2,Manchester City WFC,10185.0,Stephanie Houghton,0.0,0.0,0.0,0.0,0.0,0.0,76,60.0,78.9,1.0,1.0,0.0,0.0,2.0,0.0,0.0,5.0,0.0
3,Manchester City WFC,10252.0,Alex Greenwood,0.0,0.0,0.0,0.0,2.0,1.0,59,49.0,83.1,1.0,1.0,1.0,100.0,2.0,1.0,3.0,1.0,0.0
5,Manchester City WFC,19416.0,Laura Coombs,0.0,0.0,0.0,1.0,1.0,0.0,35,28.0,80.0,3.0,3.0,3.0,100.0,7.0,3.0,0.0,0.0,3.0
6,Manchester City WFC,32210.0,Laia Aleixandri López,0.0,0.0,0.0,0.0,1.0,0.0,46,37.0,80.4,0.0,2.0,1.0,50.0,2.0,1.0,2.0,2.0,0.0
8,Manchester City WFC,221888.0,Kerstin Yasmijn Casparij,0.0,0.0,0.0,0.0,0.0,0.0,62,51.0,82.3,1.0,3.0,2.0,66.7,3.0,2.0,3.0,4.0,1.0
9,Manchester City WFC,4637.0,Ellie Roebuck,0.0,0.0,0.0,0.0,0.0,0.0,24,14.0,58.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13,Manchester City WFC,15555.0,Lauren Hemp,2.0,2.0,100.0,0.0,3.0,1.0,21,12.0,57.1,1.0,3.0,2.0,66.7,6.0,2.0,1.0,1.0,4.0
17,Manchester City WFC,62666.0,Ingrid Filippa Angeldal,1.0,1.0,100.0,0.0,2.0,0.0,47,40.0,85.1,0.0,3.0,0.0,0.0,3.0,0.0,1.0,1.0,3.0


## Differentiate stats based on player's position <a id="differentiate"></a>

Differentiate statistics based on the player's position

In [81]:
# Let's differentiate statistics based on the player's position
def differentiate_statistics(df_events, df_players, position_id) :
    defending_position = df_events[(df_events['position_id']>=2) &(df_events['position_id']<=8)]['position_id'].unique()
    middlefield_position = df_events[(df_events['position_id']>=9) &(df_events['position_id']<=20)]['position_id'].unique()
    attacking_position = df_events[(df_events['position_id']>=21)]['position_id'].unique()

    if position_id in attacking_position :
        cols_to_keep = ['team_name', 'player_id', 'player_name', 
        'Shot accuracy (%)','Pass accuracy (%)',
        'Total shots','Total of lost balls',
        'Shots on target', 'Offsides']
    elif position_id in middlefield_position :
        cols_to_keep = ['team_name', 'player_id', 'player_name', 
    'Shot accuracy (%)','Pass accuracy (%)', 
    'Tackle accuracy (%)','Total successfull tackles',
    'Interceptions','Total of lost balls']
    elif position_id in defending_position :
        cols_to_keep = ['team_name', 'player_id', 'player_name',
        'Pass accuracy (%)',
        'Total crosses',
        'Tackle accuracy (%)',
        'Interceptions', 'Clearances', 'Total of lost balls','Fouls']
    
    df_players = df_players[cols_to_keep]
    return(df_players)

Let's test the function on Alex Greenwood

In [83]:
player_name = "Alex Greenwood"
position_id = df_events[df_events['player_name']==player_name]['position_id'].unique()[0]
df_stats_players = df_stats_players[df_stats_players['player_name']==player_name]
differentiate_statistics(df_events, df_stats_players, position_id)

Unnamed: 0,team_name,player_id,player_name,Pass accuracy (%),Total crosses,Tackle accuracy (%),Interceptions,Clearances,Total of lost balls,Fouls
3,Manchester City WFC,10252.0,Alex Greenwood,83.1,2.0,100.0,3.0,1.0,0.0,1.0
