## Radar Plots
Radar plots are an excellent way to display a number of details about player performance across a number of categories.  It allows a lot to be provided in a small pacakge.  The only challenge is the comparison of areas or arcs of a circle.  This is something people tend to do poorly, and has often been critized by data proponents in the community when it comes to visualizations.  That said, they still have an application in certain situation, and player profiles is one of those cases.

This lesson goes step-by-step through the process of making player radars for a striker. Calculations of the metrics will be created directly from a count of actions in the Wyscout event data.
- Non-penalty goals
- Assists
- Key passes
- Smart passes
- Ariel duels won
- Ground attacking duels won

Adding tho these to the calculations conducted previously:
- non-penalty expected goals.
- passes ending in final third
- receptions in final third.

### Imports
Required imports for the analysis.

In [1]:
import pandas as pd
import numpy as np
import json
# plotting
import matplotlib.pyplot as plt
# statistical fitting of models
import statsmodels.api as sm
import statsmodels.formula.api as smf
#opening data
import os
import pathlib
import warnings
#used for plots
from scipy import stats
from mplsoccer import PyPizza, FontManager

pd.options.mode.chained_assignment = None
warnings.filterwarnings('ignore')

### Load Data
This analysis will be leveraging the Wyscout data for the metrics creation.

In [2]:
#load data - store it in train dataframe
train = pd.DataFrame()
with open('./data/events_England.json') as f:
    data = json.load(f)
    train = pd.DataFrame(data)

In [3]:
#potential data collection error handling, doesn't create a difference that is visible
train2 = train.loc[train.apply (lambda x: len(x.positions) == 2, axis = 1)]
train2.head(2)

Unnamed: 0,eventId,subEventName,tags,playerId,positions,matchId,eventName,teamId,matchPeriod,eventSec,subEventId,id
0,8,Simple pass,[{'id': 1801}],25413,"[{'y': 49, 'x': 49}, {'y': 78, 'x': 31}]",2499719,Pass,1609,1H,2.758649,85,177959171
1,8,High pass,[{'id': 1801}],370224,"[{'y': 78, 'x': 31}, {'y': 75, 'x': 51}]",2499719,Pass,1609,1H,4.94685,83,177959172


In [4]:
# train['positions'][0][0]['x']
train.head(2)

Unnamed: 0,eventId,subEventName,tags,playerId,positions,matchId,eventName,teamId,matchPeriod,eventSec,subEventId,id
0,8,Simple pass,[{'id': 1801}],25413,"[{'y': 49, 'x': 49}, {'y': 78, 'x': 31}]",2499719,Pass,1609,1H,2.758649,85,177959171
1,8,High pass,[{'id': 1801}],370224,"[{'y': 78, 'x': 31}, {'y': 75, 'x': 51}]",2499719,Pass,1609,1H,4.94685,83,177959172


## Add xG Statistic
For the metrics expected goals is a key element to the overall view of striker performance, and is being added to the DataFrame. Two different models for xG will be used, one for headers and another for shots with leg. Then, the xG statistic is calculated. If there is interest is using non-penalty xG set the npxG value of the function to True. The function will then Calculate the cummulative xG for all players and return a dataframe groupped by playerId and this value.

This xG process uses the same method as in lesson 2 to caluclate xG for each position.

In [5]:
def calulatexG(df, npxG):
    """
    Parameters
    ----------
    df : dataframe
        dataframe with Wyscout event data.
    npxG : boolean
        True if xG should not include penalties, False elsewhere.

    Returns
    -------
    xG_sum: dataframe
        dataframe with sum of Expected Goals for players during the season.

    """
    # A very basic xG model based on an updated shots dataframe
    shots = (df.loc[df["eventName"] == "Shot"].copy()
             .assign(X = lambda df: df.positions.apply(lambda cell: (100 - cell[0]['x']) * 105/100),
                     Y = lambda df: df.positions.apply(lambda cell: cell[0]['y'] * 68/100),
                     C = lambda df: df.positions.apply(lambda cell: abs(cell[0]['y'] - 50) * 68/100),
                     Distance = lambda df: np.sqrt(df["X"]**2 + df["C"]**2),
                     Angle = lambda df: np.where(np.arctan(7.32 * df["X"] / (df["X"]**2 + df["C"]**2 - (7.32/2)**2)) > 0,
                                      np.arctan(7.32 * df["X"] /(df["X"]**2 + df["C"]**2 - (7.32/2)**2)), 
                                      np.arctan(7.32 * df["X"] /(df["X"]**2 + df["C"]**2 - (7.32/2)**2)) + np.pi),
                     Goal = lambda df: df.tags.apply(lambda x: 1 if {'id':101} in x else 0).astype(object)
                    )
            )
    # Split shots into headers (id = 403) and non headers
    headers = shots.loc[shots.apply (lambda x:{'id':403} in x.tags, axis = 1)]
    non_headers = shots.drop(headers.index)
    
    # Create a model for both types of shot
    headers_model = smf.glm(formula="Goal ~ Distance + Angle" , data=headers,
                               family=sm.families.Binomial()).fit()
    nonheaders_model = smf.glm(formula="Goal ~ Distance + Angle" , data=non_headers,
                               family=sm.families.Binomial()).fit()
    
    # Assigning xG to each, headers and non headers
    b_head = headers_model.params
    xG = 1/(1+np.exp(b_head[0]+b_head[1]*headers['Distance'] + b_head[2]*headers['Angle']))
    headers = headers.assign(xG = xG)

    b_nhead = nonheaders_model.params
    xG = 1/(1+np.exp(b_nhead[0]+b_nhead[1]*non_headers['Distance'] + b_nhead[2]*non_headers['Angle']))
    non_headers = non_headers.assign(xG = xG)

    # Account for penalties based on selection, Then groupby playerID
    if npxG == False:
        # find pens
        penalties = df.loc[df["subEventName"] == "Penalty"]
        # assign 0.8
        penalties = penalties.assign(xG = 0.8)
        # concat all three(pens, heads and nonheads), group and sum only playerId and xG
        all_shots_xg = pd.concat([non_headers[["playerId", "xG"]], headers[["playerId", "xG"]], penalties[["playerId", "xG"]]])
        xG_sum = all_shots_xg.groupby(["playerId"])["xG"].sum().sort_values(ascending = False).reset_index()
    else:
        #concat (headers and non headers), group and sum only playerId and xG
        all_shots_xg = pd.concat([non_headers[["playerId", "xG"]], headers[["playerId", "xG"]]])
        all_shots_xg.rename(columns = {"xG": "npxG"}, inplace = True)
        xG_sum = all_shots_xg.groupby(["playerId"])["npxG"].sum().sort_values(ascending = False).reset_index()
    #group by player and sum

    return xG_sum

In [6]:
#making function
npxg = calulatexG(train, npxG = True)
#investigate structure
npxg.head(3)

Unnamed: 0,playerId,npxG
0,8717,22.01418
1,120353,17.215819
2,11066,14.144484


## Calculating passes ending in final third and receptions in final third
These two (2) statistics capture how good a player is in receiving and passing th ball in the final third. These statistics add context to passes. It isn’t enough for a striker to be a good passer of the ball. He or she should be able to perform well in the final third.

To understand the information about receptions, the basic idea is that the player who made the next action was the receiver. The data is filtered for successful passes that ended in the final third obtaining the passer and the passes  receiver. Similar to the last step, sum the movement by player and merge these dataframes to return one. Note the use of an outer join to avoid forgeting a player who made no receptions in the final third, bud did make some passes.

In [7]:
def FinalThird(df):
    """
    Parameters
    ----------
    df : dataframe
        dataframe with Wyscout event data.

    Returns
    -------
    final_third: dataframe
        dataframe with number of passes ending in final third and receptions in that area for a player.

    """
    passes = (df.loc[train["eventName"] == "Pass"].copy()
          .assign(nextPlayerId = lambda df: df.playerId.shift(-1),
                  x = lambda df: df.positions.apply(lambda cell: (cell[0]['x']) * 105/100),
                  y = lambda df: df.positions.apply(lambda cell: (100 - cell[0]['y']) * 68/100),
                  end_x = lambda df: df.positions.apply(lambda cell: (cell[1]['x']) * 105/100),
                  end_y = lambda df: df.positions.apply(lambda cell: (100 - cell[1]['y']) * 68/100)
                 )
         )

    # get accurate passes
    accurate_passes = passes.loc[passes.apply (lambda x:{'id':1801} in x.tags, axis = 1)]
    
    # get passes into final third
    final_third_passes = accurate_passes.loc[accurate_passes["end_x"] > 2*105/3]

    # passes into final third by player
    ftp_player = final_third_passes.groupby(["playerId"]).end_x.count().reset_index()
    ftp_player.rename(columns = {'end_x':'final_third_passes'}, inplace=True)

    # receptions of accurate passes in the final third
    rtp_player = final_third_passes.groupby(["nextPlayerId"]).end_x.count().reset_index()
    rtp_player.rename(columns = {'end_x':'final_third_receptions', "nextPlayerId": "playerId"}, inplace=True)

    # outer join not to lose values
    final_third = ftp_player.merge(rtp_player, how = "outer", on = ["playerId"])
    return final_third

In [8]:
final_third = FinalThird(train)
#investigate structure
final_third.head(3)

Unnamed: 0,playerId,final_third_passes,final_third_receptions
0,36.0,186.0,99.0
1,38.0,62.0,65.0
2,48.0,392.0,232.0


## Calculating air and ground duels won
Adding number of duels won, there is a need to differentiate between air and attacking ground duels - many of them will be dribbles. The deifinition of Wyscout duel can be found in the API. Both for air duels and attacking ground duels repeat the next steps - sum them by player and outer join the two dataframes.

In [9]:
def wonDuels(df):
    """
    Parameters
    ----------
    df : dataframe
        dataframe with Wyscout event data.

    Returns
    -------
    duels_won: dataframe
        dataframe with number of won air and ground duels for a player

    """
    #find air duels
    air_duels = df.loc[df["subEventName"] == "Air duel"]
    #703 is the id of a won duel
    won_air_duels = (df.query("subEventName == 'Air duel'")
                     .loc[air_duels.apply (lambda x:{'id':703} in x.tags, axis = 1)]
                    )

    #group and sum air duels
    wad_player =  won_air_duels.groupby(["playerId"]).eventId.count().reset_index()
    wad_player.rename(columns = {'eventId':'air_duels_won'}, inplace=True)

    #find ground duels won
    ground_duels = df.loc[df["subEventName"].isin(["Ground attacking duel"])]
    won_ground_duels = ground_duels.loc[ground_duels.apply (lambda x:{'id':703} in x.tags, axis = 1)]

    wgd_player =  won_ground_duels.groupby(["playerId"]).eventId.count().reset_index()
    wgd_player.rename(columns = {'eventId':'ground_duels_won'}, inplace=True)

    #outer join
    duels_won = wgd_player.merge(wad_player, how = "outer", on = ["playerId"])
    return duels_won

In [10]:
duels = wonDuels(train)
#investigate structure
duels.head(3)

Unnamed: 0,playerId,ground_duels_won,air_duels_won
0,0,2244.0,1061.0
1,36,13.0,23.0
2,38,7.0,11.0
