# Pass Reception and Transition

One of the key skills in a midfielder is the ability to drop back into her defensive half for receiving passes and subsequently transitioning the play to their attacking zones. The objective of this exploratory analysis is to study the same. We are going to find such ball progressing midfielders from the FA Women’s Super League 2020-21 season and analyse them. For this analysis, we are going to use event data provided by StatsBomb and 'minutes played' data provided by FBref.

Before heading into the analysis, here are some important details related to the data and some rules which we are going to use to model such ball progressing sequences:

- We are going to use a pitch configuration where the the pitch is divided into 18 zones, with 6 zones in each third of the pitch. For reference, here's an image of the same pitch configuration by [The Coaches' Voice](https://www.coachesvoice.com/):

    <img src="images/pitch_zones.jpg" alt="Football Pitch 18 Zones" width="500" height="auto">

<br>

- For a pass reception to be considered in this analysis, it must satisfy the following conditions:
    - The pass should be successfully received by a midfielder in zones 4-9.
    - The pass must have started in zones 1-9, i.e. the defensive half.

<br>

- A transition following a pass reception is considered successful if it satisfies the following conditions:
    - The transition must end in the final/attacking third, i.e. zones 13-18.
    - The transition can involve 2 players at max (the first of which is the player who receives the pass). The idea behind this is:<br>
    > A midfielder, after receiving the pass in her own half, will want to move the ball into her attacking half. She can either carry the ball or pass the ball or do both to make the ball reach their attacking third - all on her own. Another option for her is to pass the ball to one of her teammates who is open and has the space and time to move the ball into the attacking third.
    >
    > In the second case, the first player saw an opportunity for progressing the ball through the second player and hence made the decision to pass the ball. Therefore, I want to give some credit to the first player because she made a decision depending on which of her teammates is open and has a higher probability of moving the ball forward. But if we add a third player, then the first player might not play a big role in the third player's decision making - the uncertainty will increase here. In other words, the first player is more certain of the second player's available choices and possible outcomes than she is for the third player.

    - The transition can have a maximum of 4 events. Here's the reason why:<br>
    > There are 2 major events in any transition - a pass and a carry. Based on numerous samples clips I watched, the most common transitions (into the attacking third) involving 2 players can be completed in 4 events - for example, player #1 carries (event count = 1) and passes (event count = 2) the ball to player #2 who then carries (event count = 3) and passes (event count = 4) the ball into the attacking third. Other examples might involve less number of events. 

<br>

- Here are the rules which are used to count events in our analysis:
    - A successful pass of any length is counted as an event and the final location of the pass is recorded (to check if the ball has reached the attacking third).
    - A successful carry is counted as an event only when the carry distance is more than 5 metres (or 5.46807 yards - since StatsBomb coordinate data is in yards). The carry end location is also recorded. 
    - If a successful carry's length is less than or equal to 5 metres then it is not counted as an event but its end location is recorded.
    - Events like dispossession, miscontrol, block, interception, clearance, unsuccessful pass, unsuccessful dribble, shot, foul will break the transition.
    - If there's a successful dribble between two successful carries, then both the carries will be combined for 'event count' and 'ball location' calculation.
    - All other events like duels and ball recoveries are ignored since those are covered by one of the above events/rules related to them.

<br>

- StatsBomb's pitch configuration is 120 yards long, i.e. the x-coordinate is from 0-120. Here are the x-coordinates for different areas of the pitch discussed above:
    - The Ball Reception Area (Zones 4-9): 20 < x < 60
    - The Pass Start Area (Defensive Half / Zones 1-9): x < 60
    - The Attacking/Final Third (Zones 13-18): x >= 80

<br>

Hereafter, all mentions of pass/ball receptions and successul/unsuccessful transitions will refer to the above definitions.

With the conditions and rules defined, let's dive into the code and analyse our findings!

In [1]:
# IMPORTING LIBRARIES AND PACKAGES

from statsbombpy import sb
import math
import pandas as pd
from pandasql import sqldf  # for running SQL on dataframes
import requests
from io import StringIO
import matplotlib.pyplot as plt
import numpy as np

In [2]:
# READING IN STATSBOMB EVENT DATA 

columns = [
    'id',
    'match_id',
    'index',
    'minute',
    'second',
    'possession',
    'possession_team_id',
    'possession_team',
    'type',
    'player_id',
    'player',
    'position',
    'team_id',
    'team',
    'location',
    'pass_end_location',
    'pass_outcome',
    'ball_receipt_outcome',
    'pass_recipient_id',
    'pass_recipient',
    'carry_end_location',
    'dribble_outcome',
    'foul_won_advantage',
    'related_events',
    'under_pressure'
]

events = sb.competition_events(
    country='England',
    division= "FA Women's Super League",
    season='2020/2021',
    gender='female'
)[columns]

events.head()  # dataframe containing the entire season's event data



Unnamed: 0,id,match_id,index,minute,second,possession,possession_team_id,possession_team,type,player_id,...,pass_end_location,pass_outcome,ball_receipt_outcome,pass_recipient_id,pass_recipient,carry_end_location,dribble_outcome,foul_won_advantage,related_events,under_pressure
0,f51b1630-d1a8-4837-97b0-de862f0e299a,3775648,1,0,0,1,2647,Aston Villa,Starting XI,,...,,,,,,,,,,
1,1bbead8d-7790-4898-a765-3cdffeaf966e,3775648,2,0,0,1,2647,Aston Villa,Starting XI,,...,,,,,,,,,,
2,caa9cf6d-dd46-4bdc-ba4c-f6fc82cb9fa9,3775609,1,0,0,1,968,Arsenal WFC,Starting XI,,...,,,,,,,,,,
3,f85b9236-3d19-476a-8304-57b74b6416b3,3775609,2,0,0,1,968,Arsenal WFC,Starting XI,,...,,,,,,,,,,
4,b7f68694-0261-4929-ad96-907e62ec630c,3775633,1,0,0,1,2647,Aston Villa,Starting XI,,...,,,,,,,,,,


In [3]:
# READING IN STATSBOMB MATCH DATA

matches = sb.matches(competition_id=37, season_id=90)[
    [
        'match_id', 'competition', 'season', 
        'match_date', 'home_team', 'home_score', 
        'away_score', 'away_team'
    ]
]

matches



Unnamed: 0,match_id,competition,season,match_date,home_team,home_score,away_score,away_team
0,3775648,England - FA Women's Super League,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC
1,3775609,England - FA Women's Super League,2020/2021,2021-04-28,Arsenal WFC,2,0,West Ham United LFC
2,3775633,England - FA Women's Super League,2020/2021,2021-02-06,Aston Villa,1,0,Tottenham Hotspur Women
3,3775570,England - FA Women's Super League,2020/2021,2021-03-28,Brighton & Hove Albion WFC,0,5,Everton LFC
4,3775581,England - FA Women's Super League,2020/2021,2021-03-28,Chelsea FCW,2,0,Aston Villa
...,...,...,...,...,...,...,...,...
126,3775608,England - FA Women's Super League,2020/2021,2021-01-17,West Ham United LFC,0,1,Tottenham Hotspur Women
127,3775599,England - FA Women's Super League,2020/2021,2021-04-20,West Ham United LFC,0,0,Aston Villa
128,3775554,England - FA Women's Super League,2020/2021,2020-11-14,Everton LFC,1,1,Reading WFC
129,3775652,England - FA Women's Super League,2020/2021,2021-02-07,Chelsea FCW,1,2,Brighton & Hove Albion WFC


In [4]:
# function to find if a pass started in defensive half for a given ball reception event
def pass_start_location(x):
    return events[
        events.id == list(filter(lambda x: events[events.id == x].type.item() == 'Pass', x))[0]
    ].location.item()[0] < 60

In [5]:
ball_receipts = events[
    (events.type == 'Ball Receipt*') \
    & (events.ball_receipt_outcome.isna()) \
    & (events.position.str.contains('Midfield'))
] # selecting event data for successful ball receptions by midfielders

ball_receipts = ball_receipts[
    ball_receipts.location.apply(lambda x: 20 < x[0] < 60)
] # checking if ball reception happened in zones 4-9

ball_receipts = ball_receipts[
    ball_receipts.related_events.apply(pass_start_location)
] # checking if the associated pass started in the defensive half


ball_receipts = pd.merge(
    left = ball_receipts, 
    right = matches, 
    how = 'left', 
    on = 'match_id'
) # joining ball reception and match data (to have match details for video verification)

'''
dataframe containing pass/ball reception event data (based on our definition 
of ball reception events discussed in the beginning of the analysis)
'''
ball_receipts.head() 

Unnamed: 0,id,match_id,index,minute,second,possession,possession_team_id,possession_team,type,player_id,...,foul_won_advantage,related_events,under_pressure,competition,season,match_date,home_team,home_score,away_score,away_team
0,0e9e4a0c-46ea-4aea-9c62-b4883bf1d993,3775648,9,0,1,2,968,Arsenal WFC,Ball Receipt*,10405.0,...,,[4467e274-205f-41be-ab7e-62ece2697dbc],,England - FA Women's Super League,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC
1,eab2985b-c33f-40fd-9a8a-02bc356a0a8e,3775648,15,0,6,2,968,Arsenal WFC,Ball Receipt*,10405.0,...,,[51af8ad7-5652-4717-9547-796fd8212fcf],,England - FA Women's Super League,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC
2,fe6cf834-bb08-4497-85dc-29f66ac5a744,3775648,46,0,31,3,2647,Aston Villa,Ball Receipt*,46539.0,...,,[5cd79af7-7929-4b2b-a3a7-9d9f839db461],,England - FA Women's Super League,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC
3,be59c93a-1476-4512-b90f-b02435b31ba0,3775648,57,0,40,4,968,Arsenal WFC,Ball Receipt*,10650.0,...,,[6ebcfd9e-39ed-490b-a120-c8a92c1a1c44],,England - FA Women's Super League,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC
4,94ab09b3-ae11-4630-8a70-35c5116348d0,3775648,61,0,44,4,968,Arsenal WFC,Ball Receipt*,10658.0,...,,[9674c1be-0675-499b-9328-82ec1f5fcc39],,England - FA Women's Super League,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC


In [7]:
data = dict()

'''
For every ball reception event (outer for loop), we'll now check if the 
subsequent events (inner for loop) in the same possession lead to a 
successful transition or not (based on the definition of transitions 
discussed in the beginning of this analysis).

For learning more about sequences and possessions, you can read this
article: 
https://www.statsperform.com/resource/introducing-a-possessions-framework/
'''
# outer for loop
for receipt in ball_receipts.itertuples():
    event_dict = dict()
    event_count = 0 # to count number of events
    final_location_x = receipt.location[0] # to track the ball's latest location
    players = set() # to keep a record of unique players involved
    dribble = False
    last_carry_start_location = receipt.location
    last_carry_distance = 0

    # dataframe with events following the ball reception in the same possession
    possession_events = events[
        (events.match_id == receipt.match_id) \
        & (events.possession == receipt.possession) \
        & (events['index'] > receipt.index)
    ].sort_values('index') 

    # inner for loop
    for event in possession_events.itertuples():

        # break if attacking third is reached or event/player limit is reached
        if final_location_x >= 80 or event_count == 4 or len(players) == 2:
            break
        
        else:
            # for a successful pass
            if event.type == 'Pass' and pd.isna(event.pass_outcome):
                event_count += 1
                final_location_x = event.pass_end_location[0]
                players.add(event.player_id)

            # for a carry which doesn't follow a successful dribble: 
            elif dribble == False and event.type == 'Carry':
                if math.dist(event.location, event.carry_end_location) > 5.46807:
                    event_count += 1
                final_location_x = event.carry_end_location[0]
                players.add(event.player_id)

                # the following 2 variables will help combine 2 carries sandwiching a successful dribble
                last_carry_start_location = event.location
                last_carry_distance = math.dist(event.location, event.carry_end_location)

            # for defensive actions, stoppages in play or shots which break the transition
            elif event.type in ['Dispossessed', 'Miscontrol', 'Interception', 'Clearance', 'Block', 'Shot'] \
            or (event.type == 'Pass' and pd.notna(event.pass_outcome)) \
            or (event.type == 'Dribble' and event.dribble_outcome == 'Incomplete') \
            or (event.type == 'Foul Won' and pd.isna(event.foul_won_advantage)):
                break

            # for a successful dribble
            elif event.type == 'Dribble' and event.dribble_outcome == 'Complete':
                dribble = True

            # for a carry following a successful dribble:
            elif dribble == True and event.type == 'Carry':

                # both the carries are combined and considered to be a single carry
                final_location_x = event.carry_end_location[0]
                dribble = False
                last_carry_distance = math.dist(last_carry_start_location, event.carry_end_location)

                # if the carry before the dribble was less than or equal to 5 metres long
                # and if both the carries combined are more than 5 metres long, then the
                # event_count variable is increased by 1
                if last_carry_distance <= 5.46807 \
                and math.dist(last_carry_start_location, event.carry_end_location) > 5.46807:
                    event_count += 1

            # for all other events
            else:
                continue
            
    event_dict['successful_transition'] = True if final_location_x >= 80 else False
    event_dict['transition_final_location_x'] = final_location_x
    event_dict['transition_event_count'] = event_count 
    event_dict['transition_players_involved'] = len(players)
    data[receipt.id] = event_dict

In [8]:
ball_receipts = pd.merge(
    left=ball_receipts,
    right=pd.DataFrame.from_dict(
        data=data, 
        orient='index'
    ).reset_index().rename(columns={'index': 'id'}),
    how='left',
    on='id'
) # joining ball_receipts dataframe with corresponding transition data

ball_receipts.head()

Unnamed: 0,id,match_id,index,minute,second,possession,possession_team_id,possession_team,type,player_id,...,season,match_date,home_team,home_score,away_score,away_team,successful_transition,transition_final_location_x,transition_event_count,transition_players_involved
0,0e9e4a0c-46ea-4aea-9c62-b4883bf1d993,3775648,9,0,1,2,968,Arsenal WFC,Ball Receipt*,10405.0,...,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC,False,40.3,1,2
1,eab2985b-c33f-40fd-9a8a-02bc356a0a8e,3775648,15,0,6,2,968,Arsenal WFC,Ball Receipt*,10405.0,...,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC,False,63.4,2,2
2,fe6cf834-bb08-4497-85dc-29f66ac5a744,3775648,46,0,31,3,2647,Aston Villa,Ball Receipt*,46539.0,...,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC,False,72.1,2,2
3,be59c93a-1476-4512-b90f-b02435b31ba0,3775648,57,0,40,4,968,Arsenal WFC,Ball Receipt*,10650.0,...,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC,False,43.9,2,2
4,94ab09b3-ae11-4630-8a70-35c5116348d0,3775648,61,0,44,4,968,Arsenal WFC,Ball Receipt*,10658.0,...,2020/2021,2021-02-28,Aston Villa,0,4,Arsenal WFC,True,80.6,2,2


In [9]:
# READING IN 'MINUTES PLAYED' DATA PROVIDED BY FBREF 
source = requests.get('https://fbref.com/en/comps/189/2020-2021/playingtime/2020-2021-Womens-Super-League-Stats')

minutes_played = pd.read_html(
    io=StringIO(source.text.replace('<!--','').replace('-->',''))
)[2].droplevel(level=0, axis=1)[['Player', 'Pos', 'Min']]

minutes_played.head()

Unnamed: 0,Player,Pos,Min
0,Angela Addison,FW,1270
1,Asmita Ale,DF,1457
2,Flo Allen,DF,1086
3,Jonna Andersson,DF,1563
4,Mackenzie Arnold,GK,1440


In [10]:
# grouping by player since there can be multiple rows for the same player changing clubs mid-season
# using SQL since simple aggregations are easier to write in SQL
minutes_played = sqldf(
    '''
    SELECT 
        player,
        SUM(CAST(COALESCE(min, 0) AS INT)) AS minutes_played
    FROM minutes_played
    GROUP BY 1
    '''
)

minutes_played.head()

Unnamed: 0,Player,minutes_played
0,Abbey-Leigh Stringer,695
1,Abbi Grant,38
2,Abbie Cowie,0
3,Abbie McManus,1305
4,Abby Dahlkemper,645


In [11]:
cte = ball_receipts[
    [
        'id', 'player_id', 'player', 'position', 
        'team_id', 'team', 'under_pressure', 
        'successful_transition', 'transition_players_involved'
    ]
]

# creating dataframe with player summary data for receptions and transitions
results = sqldf(
    '''
    SELECT 
        CAST(player_id AS INT) AS player_id,
        player,
        COUNT(id) AS receptions,
        COUNT(CASE WHEN successful_transition = TRUE THEN id END) AS transitions,
        COUNT(CASE WHEN successful_transition = TRUE AND transition_players_involved = 1 THEN id END) AS solo_transitions
    FROM cte
    GROUP BY 1, 2
    '''
)

results.head()

Unnamed: 0,player_id,player,receptions,transitions,solo_transitions
0,4638,Drew Spence,41,5,2
1,4641,Francesca Kirby,13,1,1
2,4643,Georgia Stanway,73,2,2
3,4645,Isobel Mary Christiansen,392,23,12
4,4646,Claire Emslie,15,4,3


In [12]:
# adding 'minutes played' data to results dataframe
results = sqldf(
    '''
    SELECT 
        r.*,
        mp.minutes_played
    FROM 
        results r
        LEFT JOIN
        minutes_played mp
        ON r.player = mp.player
    '''
)

results.head()

Unnamed: 0,player_id,player,receptions,transitions,solo_transitions,minutes_played
0,4638,Drew Spence,41,5,2,249.0
1,4641,Francesca Kirby,13,1,1,
2,4643,Georgia Stanway,73,2,2,1489.0
3,4645,Isobel Mary Christiansen,392,23,12,
4,4646,Claire Emslie,15,4,3,1289.0
