# Passing Heat Maps
Visualization and calculation of the area of dangerous passes within a set of games. This helps to show where teams began passing sequences that ultimately led to goals or shots within games.  It creates an interesting way of demonstrating where teams are able to start their attacking sequences.  In order to add context, set a window for danger passes to be those within 15 seconds of a shot.

## Imports
This version will deviate from the Soccermatics approach due to the mplsoccer library throwing an error when making a call to Statsbomb.  Instead, the statsbomb data will be called directly for the dataframe from the statsbomb library.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from mplsoccer import Pitch, Sbopen, VerticalPitch
from statsbombpy import sb
import seaborn as sns

## Opening and creating the dataset
Using the sb parser in place of SBopen in mplsoccer the event data is pulled out. This effort will leverage muliple games to provide a compelling view of the teams performance.  After extracting the game numbers (mathc_id), each will be queried for the event level data. The event data is stored in a dataframe df to be evaluated in aggregate. 

In [2]:
# get matches from competition
match_ids = (sb.matches(competition_id=72, season_id=30)
            .query('home_team.str.startswith("Eng") or away_team.str.startswith("Eng")')
            .match_id
            .tolist()
           )

no_games = len(match_ids)
print(len(match_ids), match_ids)

7 [69301, 22936, 68337, 22962, 68362, 69199, 69258]


In [11]:
sb.events(69301).columns

Index(['bad_behaviour_card', 'ball_receipt_outcome',
       'ball_recovery_recovery_failure', 'block_deflection',
       'block_save_block', 'carry_end_location', 'clearance_aerial_won',
       'clearance_body_part', 'clearance_head', 'clearance_left_foot',
       'clearance_other', 'clearance_right_foot', 'counterpress',
       'dribble_outcome', 'dribble_overrun', 'duel_outcome', 'duel_type',
       'duration', 'foul_committed_advantage', 'foul_committed_card',
       'foul_committed_offensive', 'foul_committed_type', 'foul_won_advantage',
       'foul_won_defensive', 'goalkeeper_body_part', 'goalkeeper_end_location',
       'goalkeeper_outcome', 'goalkeeper_position',
       'goalkeeper_shot_saved_to_post', 'goalkeeper_technique',
       'goalkeeper_type', 'id', 'index', 'interception_outcome', 'location',
       'match_id', 'minute', 'miscontrol_aerial_won', 'off_camera', 'out',
       'pass_aerial_won', 'pass_angle', 'pass_assisted_shot_id',
       'pass_body_part', 'pass_cross', 

## Finding danger passes
For each game open the event data. Note that the use of [0] to store only event data. 
- Take out shots by England and accurate passes by England that were not set pieces. 
- Look for the passes 15 seconds before a shot. This requires iteration for/thru different periods. 

If a shot was made in 46th minute and there were 3 additional minutes in the first half include those passes. After extracting the danger passes for each game, concatenate them using a pandas dataframe. This will allow storage of danger passes across all games.

In [14]:
team = "England Women's"

#declare an empty dataframe
danger_passes = pd.DataFrame()
for idx in match_ids:
    #open the event data from this game
    df = sb.events(idx)
    for period in [1, 2]:
        #keep only accurate passes by England that were not set pieces in this period
        mask_pass = ((df.team == team) & (df.type == "Pass") & 
                     (df.pass_outcome.isnull()) & (df.period == period) & 
                     (df.pass_type.isnull()))
        #keep only necessary columns
        passes = (df.loc[mask_pass]
                  .assign(x = lambda df: df.location.apply(pd.Series)[0],
                          y = lambda df: df.location.apply(pd.Series)[1],
                         xf = lambda df: df.pass_end_location.apply(pd.Series)[0],
                         yf = lambda df: df.pass_end_location.apply(pd.Series)[1])
                  [["x","y", "xf", "yf", "minute", "second", "player"]])

        #keep only Shots by England in this period
        mask_shot = (df.team == team) & (df.type == "Shot") & (df.period == period)
        #keep only necessary columns
        shots = df.loc[mask_shot, ["minute", "second"]]
        #convert time to seconds
        shot_times = shots['minute']*60+shots['second']
        shot_window = 15
        #find starts of the window
        shot_start = shot_times - shot_window
        #condition to avoid negative shot starts
        shot_start = shot_start.apply(lambda i: i if i>0 else (period-1)*45)
        #convert to seconds
        pass_times = passes['minute']*60+passes['second']
        #check if pass is in any of the windows for this half
        pass_to_shot = pass_times.apply(lambda x: True in ((shot_start < x) & (x < shot_times)).unique())

        #keep only danger passes
        danger_passes_period = passes.loc[pass_to_shot]
        #concatenate dataframe with a previous one to keep danger passes from the whole tournament
        danger_passes = pd.concat([danger_passes, danger_passes_period], ignore_index = True)

In [18]:
danger_passes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176 entries, 0 to 175
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x       176 non-null    float64
 1   y       176 non-null    float64
 2   xf      176 non-null    float64
 3   yf      176 non-null    float64
 4   minute  176 non-null    int64  
 5   second  176 non-null    int64  
 6   player  176 non-null    object 
dtypes: float64(4), int64(2), object(1)
memory usage: 9.8+ KB
