In [None]:
# Markdown
from IPython.core.display import display, HTML, Javascript
import IPython.display

# Data reading, Preprocessing and analysis
import numpy as np
import pandas as pd

# Data Visualization
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns

# Warnings
import warnings
warnings.filterwarnings('ignore')

# Data
game_data = pd.read_csv('../input/nfl-big-data-bowl-2022/games.csv')
plays_data = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv')
tracking = []
for n in range(2018, 2020+1):
    tracking.append(pd.read_csv('../input/nfl-big-data-bowl-2022/tracking'+str(n)+'.csv'))

# Updating Data
game_data = game_data[['gameId', 'season', 'homeTeamAbbr', 'visitorTeamAbbr']]
plays_data = plays_data[['gameId', 'playId', 'possessionTeam', 'specialTeamsPlayType', 'specialTeamsResult', 'kickLength', 'absoluteYardlineNumber', 'down']]
for n in range(3):
    tracking[n] = tracking[n][['gameId', 'playId', 'time', 'team', 'x', 'y', 'displayName', 'playDirection']]

    
# Merging Data
stamp = pd.merge(game_data, plays_data, on=['gameId'])

full2020 = pd.merge(stamp, tracking[2], on=['gameId', 'playId'])
full2019 = pd.merge(stamp, tracking[1], on=['gameId', 'playId'])
full2018 = pd.merge(stamp, tracking[0], on=['gameId', 'playId'])

# For punt returns plays only
full2020 = full2020[(full2020['specialTeamsPlayType'] == 'Punt') & (full2020['specialTeamsResult'] == 'Return')]
full2019 = full2019[(full2019['specialTeamsPlayType'] == 'Punt') & (full2019['specialTeamsResult'] == 'Return')]
full2018 = full2018[(full2018['specialTeamsPlayType'] == 'Punt') & (full2018['specialTeamsResult'] == 'Return')]



# CSS from Schubert's notebook
# CSS styling for markdown
styling = """
    <style>
        .main-heading{
            background-color: #bd1e21;
            color: white !important;
            font-family: Helvetica;
            font-size: 32px !important;
            padding: 12px 12px;
            margin-bottom: 5px;
            border-radius: 4px;
            box-shadow: rgba(0, 0, 0, 0.19) 0px 10px 20px, rgba(0, 0, 0, 0.23) 0px 6px 6px;
        }

        .sub-heading{
            width: auto !important;
            background-color: #ad2e30;
            color: white !important;
            font-family: Helvetica;
            font-size: 24px !important;
            padding: 10px 12px;
            margin-bottom: 3px;
            box-shadow: rgba(0, 0, 0, 0.16) 0px 3px 6px, rgba(0, 0, 0, 0.23) 0px 3px 6px;
        }
    </style>
"""
HTML(styling)

<h1>Punt Return, Blocking Scores</h1>
<h2 class="main-heading">Introduction</h2>

&nbsp;&nbsp;&nbsp;The normal stats about Punt Return teams show only the simple stats which is not telling the truth and what happened behind the scenes like returns or average returns and so on. What about the Block of players who prevent the punt returner from the opponent team? This notebook gives you new insights about calculating a new metric which isn't simple but will help you to know which is the team having the best Punt Return team in terms of preventing the punt returner from being tackled by the opponent's team players. If there is any objection about the sensitive parameters' values used in this notebook don't worry. All that in the accounts and the code will give you the ability to change each parameter you need and explains what it does specifically.

In [None]:
# To display an image uploaded on Kaggle
from IPython.display import Image
Image("../input/photos/Why-removebg-preview.png")

&nbsp;&nbsp;&nbsp;The Reason is the most likely play to do in the 4th down and the punt result also is the most likely result to happen. Another reason is that having gained yards doesn't tell how skilled your whole punt return team is?

In [None]:
def pie_chart(data, col, title, center_word, explode=None, colors=['lightgray', 'lightseagreen', 'lightblue', 'skyblue']):
        '''
        Plotting your categorical variable with Pie chart having the variable name in it's center and with your specific colors
        title: The title of the graph
        colors: List of colors to each value of variable's values
        '''
        
        fig = go.Figure()
        num_types = len(data[col].unique())
        if not explode:
            explode = [0]*num_types
            
        fig.add_trace(
            go.Pie(
                labels=data[col],
                values=None,
                hole=.4,
                pull=explode,
                title=center_word,
                titlefont={'color':None, 'size':22},
                )
            )
        fig.update_traces(
            hoverinfo='label+value',
            textinfo='label+percent',
            textfont_size=14,
            marker=dict(
                colors=colors[:num_types],
                line=dict(color='#000000',
                width=2)
                )
            )
        
        # Updating the layout
        fig.update_layout(title=title, 
                          font_family="San Serif",
                          bargap=0.2,
                          titlefont={'size': 24},
                          legend=dict(
                          orientation="v", y=1, yanchor="top", x=1.25, xanchor="right")                 
                          )
        fig.show()
    
fourth_down_plays = plays_data.query('down == 4')
pie_chart(fourth_down_plays, 'specialTeamsPlayType', 'The Play Types in the fourth down', 'PlayType')

In [None]:
punt_plays = plays_data.query('down == 4 and specialTeamsPlayType == "Punt"')
pie_chart(fourth_down_plays, 'specialTeamsResult', 'The Play Results in Punts', 'PlayResult', [0.1]+[0]*4)

In [None]:
# To display an image uploaded on Kaggle
from IPython.display import Image
Image("../input/photos/How-removebg-preview (1).png")

&nbsp;&nbsp;&nbsp;First, the focus will be only on the moment the player receives the ball from the opposing team and then runs until the ball is cut from him. The metric will then start working so that every time **the distance between any player on the opposing team and the punt returner is about three yards** or less(you can change this distance, this is from the changeable parameters) from either side (think of this as a circle and we'll call it the **circle of danger**), this distance will be added to a variable in our equation That is, a block score. For accuracy, a value of three will be added minus this value, since the smaller it is, the worse it is. after we get the block score and the returns gains, then make the **metric equals (block score / returns gains)** putting in accounts the differences in returns gains, then we can improve it further by make the **metric equals (1/itself)** this will make the higher it is, the better it is.

&nbsp;&nbsp;&nbsp;**Distance Calculation For each Play**, it will be calculated by taking the distance between each player in the opposing team and Punt returner at each part of each second, then taking the average distance in each second (summation of the distance in each part of the second / number of distances), if this average distance is higher than **3 yards** it will be ignored (not added to the block score) else **3 - this average distance** will be added to the block score. After adding all these values during one play and then dividing the result by the yards gained and then flipping the fraction (or keeping it the same) this will be calculated for all the punts played by this team and taking the average from all of them we will get our metric for this team in any desired season.

## Explanation Video for the metric

https://youtu.be/LTz8Yy-5ME0

In [None]:
from base64 import b64encode

def play(filename):
    html = ''
    video = open(filename,'rb').read()
    src = 'data:video/mp4;base64,' + b64encode(video).decode()
    html += '<video width=1000 controls autoplay loop><source src="%s" type="video/mp4"></video>' % src 
    return HTML(html)

play('../input/videonflmet/1_6.mp4')

<h2 class="sub-heading">Used Data</h2>

&nbsp;&nbsp;&nbsp;At this notebook, we've choose some columns from the datasets and left others to focus deeply just on our metric. Of course Tracking Data will be used along with Plays Data as they are the most important datasets in our analysis, but there will be another used dataset such as Game Data. So in the following few lines there will be a discussion about which columns will be used from each one and why some datasets have been left such as Players Data and PFF Scouting Data. So first, let's start with **<mark>Game Data</mark>**. In the analysis, each season and each game specifically will matter as there are some plays share their playId, but in different games, so it is a must not a choice to loop on each game then loop on each play with in it. homeTeamAbbr and visitorTeamAbbr will help to know who will be in the situation of punting and who will be in the situation of returning. From **<mark>Plays Data</mark>** gameId and playId will be used for merging the datasets. specialTeamsPlayType will be used to determine only punting plays, kickLength will help to know where is the location of the Punt Returner when catching the ball, specialTeamsResult will be used to determine only the punts where return is the reaction of the opposing team, absoluteYardlineNumber helps understand where the ball is as in the exploratory notebook, assuring from our results was important. **<mark>Tracking Data</mark>**, gameId and playId are used for merging the dataset with last ones, x and y are from the most important columns have been used as they will help to calculate the distance in each part of a second and completing the rest of the calculation process, displayName will help us to know from where this player is he from home or away team or he can be the ball itself, and playDirection will help to know where was the Punt Player when catching the ball as the calculation of his position will differ depending on left or right. 

<h2 class="sub-heading">Hyper Parameters</h2>

&nbsp;&nbsp;&nbsp;In the metric calculation process, there are some parameters that can be changed for coming reasons such as **`punt_return_initial_position_error`** which will differ for plays some of them have 0.1 error, some have 0.5 and believe it or not some have more than 2.4!! So this parameter must be set to a value so this value will be higher than the maximum error for all the plays. There are some advantages and disadvantages about each hyper parameter. In the next few lines, there will be a brief definition of each hyper parameter and discussion about its effects such as what will happen if we increase it or decrease it? How will this hyper parameter help us in the metric calculation process and so on. Before this there will be a quick definitions about the parameters or variables have been made to speak about hyperparameters professionaly.

## Variables/Parameters and Concepts

**`Expected_Ball_Position`**: Ball position can be expected by the ball initial position when being snapped (The start of whole play) and the Kick Length. (Adding or Subtracting them will differ according to the play direction).

**`opponent_player_distance`**: This is the distance for a player in the opposing team to the Punt Returner. 

**`avg_distance_per_second`**: This is the average of all **"opponent_player_distance"** for a player in a second.

**`Risky Distance`**: This is the distance limit for **"avg_distance_per_second"** which will make it be considered in the metric (This is bad for the Punt Returner team) as a risk try to tackle the Punt Returner. (This is a hyper parameter)

**`Risk Circle`**: This circle as shown in the video is a circle its radius is the risky distance (this is a hyper parameter which will be discussed next) from any player in the opposing team to Punt Returner (the distance will be used is **"avg_distance_per_second"**).

**`added value`**: It's actually happen if the "avg_distance_per_second" equals or less than the **"Risky Distance"** and it equals --> 'Risky Distance' -  **"avg_distance_per_second"**.   ('**Risky Distance** -' here to make the more closer the more worse, think of it)

**`Blocking Score`**: The summation of all **added values**.

**`Returner Score`**: It is the yards gain by the Punt Return Team. (end position of the ball - initial position of the ball, These positions starts when the punt returner catchs the ball)

**`Metric Score`**: It is our topic :-), it equals **"Blocking Score"** / **"Returner Score"** if you want to make it the higher it is, the worse it is else flip the fraction to be **"Returner Score"** / **"Blocking Score"**.


## Hyper Parameters

**`punt_return_initial_position_error`**: This hyper parameter will allow the errors less than it, errors here are the absolute value of ball real position when being catched - "Expected_Ball_Position". Of course making it higher will permit many plays to be calculated without errors, but also will make problems!! How is this? This because if it is set to very high value it may consider the ball when it is on the air (Because of the punt) start calculating all the process of metric calculation as the "opponent_player_distance" is actually from the ball and the players of the opposing team and as the z-axis is not considered, the ball may start adding values to the "**Blocking Score**" as the avg_distance_per_second will be very small. (**Default value for season 2020 will be about 2.5**)

**`Risky Distance`**: It's definition is above, gapping it or widing it will affect the block score, but "Typical NFL players are approximately 2 yards tall. Typical NFL player arm length is approximately 1 yard. Thus, considering the extreme example of a player who is diving to make a tackle, 2-3 yards would be a reasonable value for tackle range". This words written by DS. Thompson Bliss thanks to him for his helpful comments. **The default value will be set to 3 yards** which is changeable for the above reason. Making it less than 3 will decrease the blocking score as it will not take players in 3 yards distance in consideration which is bad as they are very risky.

**`Used_Function`**: This is a statistcal way of looking it the data, but you can instead of taking all the above parameters and calculate their averages you can take the median for example. **Default is "Mean"**

In the next cell there will be a function, you can use it to see the plan view of the game. Determine only gameId and playId

In [None]:
def calc_dist(x1, y1, x2, y2):
    '''
    Calculates the distance between two points
    '''
    return ( (x1-x2)**2 + (y1-y2)**2 )**0.5

def plan_view(data, gameId, playId):
    '''
    Plot the plan view of the 
    '''
    
    df = data
    df = df[(df['gameId'] == gameId) & (df['playId'] == playId)] # Select the specific game and play
    
    # Plotting the scatter plot (As a video)
    fig = px.scatter(df, x="x", y="y", animation_frame="time", color="team", hover_name='displayName', template='plotly_dark',
                    range_x=[0, 120], range_y=[0, 53.3])

    
    
    # The coming three for loops for putting lines to seem like a football stadium
    for i in range(10, 50+10, 10):

        fig.add_vline(x=i+10, line_dash='solid', annotation_text=str(i), annotation_position="top right", line_color='white', 
                      annotation=dict(font_size=30, font_family="Times New Roman", font_color='white'),
                      fillcolor="white", line_width=2)

    for i in range(40, 10-10, -10):

        fig.add_vline(x=60+(50-i), line_dash='solid', annotation_text=str(i), annotation_position="top right", line_color='white', 
                      annotation=dict(font_size=30, font_family="Times New Roman", font_color='white'),
                      fillcolor="white", line_width=2)

    for i in [10, 110]:

        fig.add_vline(x=i, line_dash='dash', annotation_text='End Zone', annotation_position="bottom right", line_color='white', 
                      annotation=dict(font_size=15, font_family="Times New Roman", font_color='white'),
                      fillcolor="white", line_width=2)
        

    # To hide tick labels and the axis itself
    fig.update_yaxes(showticklabels=False, visible=False)
    fig.update_xaxes(showticklabels=False, visible=False)
    
    # Color bg to green
    fig.update_layout(plot_bgcolor='green')
    
    # Update duration
    fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 30
    fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 5
    fig.show()
    

    

# Testing Part
plan_view(full2020, 2020110500, 3674)

<h1 class="main-heading">Metric Implementation</h1>

&nbsp;&nbsp;&nbsp;After the brief overview of each side of the problem and understanding its idea, viewing the video and trying the function for making movie from scatter plot frames should Expand your perceptions and gets you ready for the implementation. Before going throw it don't forget that this notebook was an example of how you NFL can use this metric, but not for punt return only it can be done with kickoff return too, but as NFL stats site separate kick off return and punt return stats we did it in a similar manner, but don't worry the differences will not be that big as it will be  very easy to make it for kickoff return only or for both of them. 

## `Metric Formula Used =  Returner_score / Block_score`
## `punt_return_initial_position_error = 2.5`
## `Risk Distance = 3`

In [None]:
def metric_stats(dataframe, season, punt_return_initial_position_error= 2.5, risk_distance= 3):
    
    teams_scores = []
    teams = []
    return_scores = []
    block_scores = []
    return_fin_scores = []
    block_fin_scores = []

    teams_names = list(dataframe['visitorTeamAbbr'].unique()) # use in full dataframe
    df = dataframe.copy()
    
    for team in teams_names:


        game_data_per_team = df[(df['season'] == season) & ((df['visitorTeamAbbr'] == team) | (df['homeTeamAbbr'] == team))]
        games_per_team = game_data_per_team['gameId'].unique()
        
        # Setting them for all teams
        scores = []
        block_scores = []
        return_scores = []
        for game in games_per_team:

            plays_data_per_game_per_team = game_data_per_team.query(f'gameId == {game}')
            plays_per_game_per_team = plays_data_per_game_per_team['playId'].unique()
            for play in plays_per_game_per_team:

                plays_data_per_play_per_game_per_team = plays_data_per_game_per_team[(plays_data_per_game_per_team['playId'] == play) & (plays_data_per_game_per_team['possessionTeam'] != team)]
                if plays_data_per_play_per_game_per_team.shape[0] == 0:
                    continue

                # Group them all to color them in different colors
                home = plays_data_per_play_per_game_per_team.query("team == 'home'")
                away = plays_data_per_play_per_game_per_team.query("team == 'away'")
                football = plays_data_per_play_per_game_per_team.query('team == "football"')

                # Merge them again; this was done to have x and y for each of them
                test = pd.merge(home, away, on=['gameId', 'playId', 'time'], suffixes=['_home', '_away'])
                test = pd.merge(test, football, on=['gameId', 'playId', 'time'])
                test[['x_football', 'y_football']] = test[['x', 'y']]

                # Distance between any player of them opposing team and the ball
                test['distance_metric'] = calc_dist(test['x_home'], test['y_home'], test['x_football'], test['y_football'])

                # Initial position of the ball (on x-axis only)
                start_kick_x = test.loc[0, 'x_football']

                # The length of the kick, be aware that any value in this series will be the same you can pick any value not just value in the 0 index 
                kick_length = df[(df['gameId'] == game) & (df['playId'] == play)]['kickLength'].values[0]

                # Left direction will be separated from right direction as one of them will be by adding start_kick_x and kick_length and the other one is by subtracting
                if test['playDirection'].unique()[0] == 'left':

                    start_time = (test[abs(test['x_football'] - (start_kick_x - kick_length)) <= punt_return_initial_position_error]['time']).values[0]
                else:
                    start_time = (test[abs(test['x_football'] - (start_kick_x + kick_length)) <= punt_return_initial_position_error]['time']).values[0]



                # Updating time data type
                plays_data_per_play_per_game_per_team['time_update'] = pd.to_datetime(plays_data_per_play_per_game_per_team['time'])
                plays_data_per_play_per_game_per_team = plays_data_per_play_per_game_per_team[plays_data_per_play_per_game_per_team['time_update'] >= start_time]



                # Final thing in the metric calculation

                # Prepare test dataframe
                test['time_update'] = pd.to_datetime(test['time'])
                test = test[test['time_update'] >= start_time]
                test['second'] = test['time_update'].dt.second

                # Calc averages per player in each second
                avg_per_player_for_each_sec = test.groupby(['playId', 'displayName_home', 'second']).agg({'distance_metric':'mean'}).reset_index()
                avgs_filtered = avg_per_player_for_each_sec.query(f'distance_metric <= {risk_distance}')
                sums = 3 - avgs_filtered['distance_metric']
                block_score = sum(sums.values)

                # Calculating the number of yards gained by the punt return team 
                start_point = test['x_football'].values[0]
                end_point = test['x_football'].values[-1]
                returner_score = end_point - start_point if test['playDirection'].unique()[0] == 'left' else start_point - end_point
                if returner_score < 0:
                    returner_score = 0

                # Final score for a play
                final_score_per_play = returner_score / block_score if block_score != 0 else returner_score / 0.1
                scores.append(final_score_per_play)
                return_scores.append(returner_score)
                block_scores.append(block_score)


        metric_score_per_team = round( sum(scores) / len(scores), 2 )
        block_score_per_team = round( sum(block_scores) / len(block_scores), 2 )
        return_score_per_team = round( sum(return_scores) / len(return_scores), 2 )
        block_fin_scores.append(block_score_per_team)
        return_fin_scores.append(return_score_per_team)
        teams_scores.append(metric_score_per_team)
        teams.append(team)
        
    return teams, teams_scores, block_fin_scores, return_fin_scores

> # Distribution of the metric for season 2020 as an example

In [None]:
full_stats_2020 = metric_stats(full2020, 2020)
teams, teams_scores = full_stats_2020[:2]
sns.kdeplot(teams_scores)
plt.xlabel('Metric Score')
plt.title('Distribution of the metric in season 2020')
plt.show()

It seems to be normally distributed, so the metric doesn't have typos :-), another interesting thing is that most teams has 0.75 to 2 as their score.

## Highest Scores in season 2020

In [None]:
def plot_highest(teams, teams_scores, season, n=10):
    reorder = sorted([(team, score) for team, score in zip(teams, teams_scores)], key=lambda x : x[1])
    reorder = reorder[::-1] # reverse it
    highest_teams = [reorder[i][0] for i in range(n)]
    highest_scores = [reorder[i][1] for i in range(n)]
    plt.bar(highest_teams, highest_scores)
    plt.title(f'Highest {n} teams in season {season}')
    plt.ylabel('Metric Score')
    plt.xlabel('Team')
    plt.show()
    
plot_highest(teams, teams_scores, 2020)

## Highest Scores in season 2019

In [None]:
full_stats_2019 = metric_stats(full2019, 2019)
teams, teams_scores = full_stats_2019[:2]
plot_highest(teams, teams_scores, 2019)

## Highest Scores in season 2018

In [None]:
full_stats_2018 = metric_stats(full2018, 2018)
teams, teams_scores = full_stats_2018[:2]
plot_highest(teams, teams_scores, 2018)

<h1 class="main-heading">Conclusion & Recommendations</h1>

&nbsp;&nbsp;&nbsp;As explained above there are lots of hyper parameters and parameters for this metric which are changeable, the metric show new way of looking at your data not just full score or gained yardage but the stringness of the players in the punt return team. Ohe of the most interesting fact about this metric is the flexability, it can be used in any return play such as Kickoff returns not just Punt returns. Also here there will be some additional recommendations to be used with this metric. Score can be calculated by several ways as explained above, choosing <mark>blocking score / return score</mark> was because our main goal was blocking score, but as we wish to do it in more accurate way, the metric must deal with different scenarios such as different gained yards. As if the punt return team won many yardages, this should increase the number of opposing team players tackles or attacking him which will cause increasing blocking score and to manage this we add the second term which is return score and the metric also deals with minus gained yards. The second way of scoring is <mark>return score / blocking score</mark> it has been made to make the metric the higher it is, the better it is, that's it :-)

## References:
- <a href="https://www.nfl.com/stats/team-stats/special-teams/punt-returns/2021/reg/all">NFL Stats</a>

- <a href="https://www.kaggle.com/c/nfl-big-data-bowl-2022/discussion/2979530">Thread for asking about the risky distance</a>