### Expected Goals including player positions
Step-by-step walk thru of making an expected goals model leveraging additional information concerning opposition player location. This tutorial follows similar design choices as Javier Fernandez’s expected goals model in "A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions.

This effort will train a shallow neural network with following features:

- ball location (x)
- binary variable signifying if ball was closer to the goal than the opponent’s goalkeeper
- angle between the ball and the goal
- distance between the ball and the goal
- distance between the ball and the goalkeeper in y-axis
- distance between the ball and the goalkeeper
- number of opponent players inside the triangle formed between the ball location and opponent’s goal posts
- number of opponent players less than 3 meters away from the ball location
- binary variable signifying if shot was a header
- expected goals based on distance to goal and angle between the ball and the goal

In [4]:
from pathlib import Path

In [5]:
#importing necessary libraries
from mplsoccer import Sbopen
import pandas as pd
import numpy as np
import warnings
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import os
import random as rn
import tensorflow as tf
#warnings not visible on the course webpage
pd.options.mode.chained_assignment = None
warnings.filterwarnings('ignore')

2023-12-20 21:33:39.903360: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [6]:
#setting random seeds so that the results are reproducible on the webpage

os.environ['PYTHONHASHSEED'] = '0'
os.environ['CUDA_VISIBLE_DEVICES'] = ''
np.random.seed(1)
rn.seed(1)
tf.random.set_seed(1)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

### Opening data
This task will use Statsbomb Indian Super League 2021/2022 data since it is the only dataset openly available that **contains both event and tracking data** for the entire season. Open each game and store data for the entire season in dataframes shot_df and track_df. Then change yards to meters. In the end, filter open play shots and remove shots without the goalkeeper tracked.

In [7]:
parser = Sbopen()
#get list of games during Indian Super League season
df_match = parser.match(competition_id=1238, season_id=108)
matches = df_match.match_id.unique()
matches[:10]

array([3827767, 3827335, 3827336, 3827338, 3827337, 3817856, 3817891,
       3817870, 3817899, 3817866])

In [8]:
file1 = Path('./shot_df_xGpos.csv')
file2 = Path('./track_df_xGpos.csv')
if file1.is_file() and file2.is_file():
    print('files are there')

files are there


In [9]:

file1 = Path('./shot_df_xGpos.csv')
file2 = Path('./track_df_xGpos.csv')
if file1.is_file() and file2.is_file():
    shot_df = pd.read_csv('shot_df_xGpos.csv')
    track_df = pd.read_csv('track_df_xGpos.csv')
else:
    shot_df = pd.DataFrame()
    track_df = pd.DataFrame()
    #store data in one dataframe
    for match in matches:
        shots = (parser.event(match)[0] # open events
                .query("type_name == 'Shot'") # query shots
                .assign(x = lambda df: df.x.apply(lambda cell: cell*105/120), # assign column updates
                        y = lambda df: df.y.apply(lambda cell: cell*68/80))
                )
        
        df_track = (parser.event(match)[2] # open 360 data
                    .assign(x = lambda df: df.x.apply(lambda cell: cell*105/120), # assign column updates
                            y = lambda df: df.y.apply(lambda cell: cell*68/80)))
        
        #append event and trackings to a dataframe
        shot_df = pd.concat([shot_df, shots], ignore_index = True)
        track_df = pd.concat([track_df, df_track], ignore_index = True)
    # reset indicies
    shot_df.reset_index(drop=True, inplace=True)
    track_df.reset_index(drop=True, inplace=True)

    # filter out non open-play shots
    shot_df = shot_df.query('sub_type_name == "Open Play"')

    #filter out shots where goalkeeper was not tracked
    gks_tracked = track_df.query("teammate == False and position_name == 'Goalkeeper'")['id'].unique()
    shot_df = shot_df.loc[shot_df["id"].isin(gks_tracked)]

In [10]:
shot_df.sample(3)

Unnamed: 0.1,Unnamed: 0,id,index,period,timestamp,minute,second,possession,duration,match_id,...,shot_deflected,shot_open_goal,ball_recovery_offensive,pass_miscommunication,player_off_permanent,dribble_no_touch,foul_committed_penalty,foul_won_penalty,shot_follows_dribble,shot_redirect
2158,2331,9bbb8185-7062-43a0-8799-3152959cf926,896,1,00:22:10,22,10,54,1.158561,3813317,...,,,,,,,,,,
1395,1515,9f21f417-17ba-45c9-b7de-4be95d449909,3120,2,00:52:55,97,55,197,1.183715,3817876,...,,,,,,,,,,
1721,1867,7834afb5-e505-4028-9307-0778f636bb22,2383,2,00:27:05,72,5,154,0.243436,3813269,...,,,,,,,,,,


In [11]:
track_df.sample(3)

Unnamed: 0.1,Unnamed: 0,teammate,match_id,id,x,y,player_id,player_name,position_id,position_name,event_freeze_id
17854,17854,False,3817892,0821a975-2be4-4d8d-97ee-942d4bf2206e,87.7625,23.715,24860,Héctor Rodas Ramírez,3,Right Center Back,8
5948,5948,False,3817863,4763d971-6861-4e3d-a862-ae6953ebb7e5,95.2,23.8,124903,Akash Mishra,6,Left Back,7
40295,40295,False,3813268,058bf268-4f99-405d-85a9-a5a5eba780cf,87.0625,43.435,124756,Subhasish Bose,8,Left Wing Back,3


### Feature engineering
This section creates features to improve the dataset. They will be stored in a model_vars dataframe. It is suggested to read the code comments to understand this part of tutorial better.  The details are embedded within the code.

In [16]:

model_vars = (shot_df[["id", "index", "x", "y", "outcome_name"]] # take important variables from shot dataframe
              .assign(goal = lambda df: np.where(df.outcome_name=='Goal',1,0),
                      goal_smf = lambda df: df.goal.astype(object),
                      x0 = lambda df: df.x,
                      x = lambda df: df.x.apply(lambda cell: 105-cell),
                      c = lambda df: df.y.apply(lambda cell: 34-cell),
                      angle = lambda df: np.where(np.arctan(7.32 * df.x/((df.x)**2 + (df.c)**2 - (7.32/2)**2)) >= 0, 
                                                  np.arctan(7.32 * df.x/((df.x)**2 + (df.c)**2 - (7.32/2)**2)),
                                                  np.arctan(7.32 * df.x/((df.x)**2 + (df.c)**2 - (7.32/2)**2)) + np.pi*180/np.pi),
                      distance = lambda df: np.sqrt((df.x)**2 + (df.c)**2))
            )
# #take important variables from shot dataframe
# model_vars = shot_df[["id", "index", "x", "y"]]
# #get the dependent variable
# model_vars["goal"] = shot_df.outcome_name.apply(lambda cell: 1 if cell == "Goal" else 0)
# #change the dependent variable to object for basic xG modelling
# model_vars["goal_smf"] = model_vars["goal"].astype(object)
# # ball location (x)
# model_vars['x0'] = model_vars.x
# # x to calculate angle and distance
# model_vars["x"] = model_vars.x.apply(lambda cell: 105-cell)
# # c to calculate angle and distance between ball and the goal as in Lesson 2
# model_vars["c"] = model_vars.y.apply(lambda cell: abs(34-cell))
# #calculating angle and distance as in Lesson 2
# model_vars["angle"] = np.where(np.arctan(7.32 * model_vars["x"] / (model_vars["x"]**2 + model_vars["c"]**2 - (7.32/2)**2)) >= 0, np.arctan(7.32 * model_vars["x"] /(model_vars["x"]**2 + model_vars["c"]**2 - (7.32/2)**2)), np.arctan(7.32 * model_vars["x"] /(model_vars["x"]**2 + model_vars["c"]**2 - (7.32/2)**2)) + np.pi)*180/np.pi
# model_vars["distance"] = np.sqrt(model_vars["x"]**2 + model_vars["c"]**2)
model_vars.head(10)

Unnamed: 0,id,index,x,y,outcome_name,goal,goal_smf,x0,c,angle,distance
0,f4889f42-cf51-4c27-8c8d-d57040b17ce1,368,25.55,16.15,Saved,0,0,79.45,17.85,0.192795,31.167692
1,3e3c20ab-5233-479f-877e-73b94d01a963,495,7.175,33.405,Off T,0,0,97.825,0.595,0.938993,7.199628
2,f6fa4021-385f-4353-9e41-81a9ade1d2ac,731,22.8375,41.735,Off T,0,0,82.1625,-7.735,0.286239,24.111857
3,3c75ac72-8c43-4ca4-b6f4-2d38dc58c8c7,1069,26.075,15.98,Blocked,0,0,78.925,18.02,0.190229,31.695836
4,49c30b55-a2df-4b6e-84fa-101457ed2d08,1356,13.5625,47.515,Post,0,0,91.4375,-13.515,0.274009,19.146713
5,e6c21735-9791-43f8-ac6b-5e18782df505,1360,7.875,21.165,Wayward,0,0,97.125,12.835,0.26388,15.058315
6,7bd0743e-8b06-4292-b33f-1d5cd8bdfa26,1435,26.8625,43.18,Off T,0,0,78.1375,-9.18,0.243215,28.387784
7,0498726a-bc2c-4ff9-ae79-7a61a24820f0,1601,8.1375,28.22,Saved,0,0,96.8625,5.78,0.604506,9.981348
8,634a3e52-38d5-4804-8904-8005defb69fe,1760,27.3875,39.015,Saved,0,0,77.6125,-5.015,0.257318,27.842869
9,41c752ca-0121-4090-a8c1-685a10fa057e,2057,11.6375,41.225,Off T,0,0,93.3625,-7.225,0.454739,13.697884
