In this notebook we'll take a look at tracking2018.csv and plays.csv to first do a light EDA and then make a function to see an animation of the players and the football in a play. 

The animation function near the bottom of the notebook, getPlayAnimation(), should be easily applicable to other problems as well. 

**In this notebook, you will:**
* See some simple exploration of the tracking data
* Pick which play, using widgets, you would like to see an animation of


**Quick Note**

If you're looking for more examples with using widgets, my [notebook](https://github.com/jvmohr/dataScience/blob/master/Spotify/SpotifyV.ipynb) on GitHub uses some other widgets, as well as having a layout system. 

Thanks to Jaron_Michael and his [notebook](https://www.kaggle.com/jaronmichal/tracking-data-visualization) for inspiration to improve my animation with widgets (with the newly created Interactive Play Animation section), and for the last line I needed for it (as noted by said line)!

In [None]:
from IPython.core.display import HTML
import matplotlib.animation as anim
import matplotlib.pyplot as plt
import ipywidgets as wg
import pandas as pd
import numpy as np

%matplotlib inline

In [None]:
t2018 = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking2018.csv')
t2018.head()

In [None]:
t2018.shape # Wow over 12 million rows!

# Rows Per Play

Looking at the first 5 rows above, each separate play has it's own id, as well as the game id. So, let's look at the first one in this file to see how many rows correspond to it. 

In [None]:
t2018[ (t2018['playId'] == 36) & (t2018['gameId'] == 2018123000) ]

Now checking the average across plays, it looks like the average number of rows is just below 1900. 

In [None]:
# Average number of rows per play
num_rows = t2018.groupby(['gameId', 'playId'])['x'].count()
num_rows.mean()

It would be nice if we could get separate averages for different kinds of plays. plays.csv has some rows that can help with that. 

In [None]:
plays = pd.read_csv('../input/nfl-big-data-bowl-2022/plays.csv')
plays.head(2)

In [None]:
plays['specialTeamsPlayType'].value_counts()

In [None]:
# Add the play type
# https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns
num_rows = num_rows.reset_index().merge(plays[['gameId', 'playId', 'specialTeamsPlayType', 'specialTeamsResult']], 
                             how='left', left_on=['gameId', 'playId'], right_on=['gameId', 'playId'])
num_rows

Now we can look at the average rows per play type. 

In [None]:
avg_rows = num_rows.groupby('specialTeamsPlayType')['x'].mean()
avg_rows.plot.bar()
avg_rows

It makes sense that extra points and field goals are fairly close and the lowest two as generally they are over the quickest as once the ball is hit the play usually doesn't last much longer (only long enough for the ball to go through/miss the uprights). While I do find it interesting that punts would have that many more rows than kickoffs, I suppose the hangtime of punts and the possible greater likelihood of a touchback for kickoffs would account for this. In fact let's investigate that next. 

In [None]:
avg_rows_result = num_rows.groupby(['specialTeamsPlayType', 'specialTeamsResult'])['x'].mean().astype(int)
avg_rows_result.sort_values().plot.barh()
avg_rows_result

I think several interesting things pop out by looking at this data. First is the specialTeamsResult of 'Non-Special Teams Result.' This would be something like a fake extra point or a fake field goal attempt (as in they set up as if they were going to kick the ball but actually did something else). These plays could likely be ignored for parts of later analysis, although the initial setup of players in these plays could offer some insight as to when teams are likely setting up to go for it. 

From there, blocked kick attempts tend to have more rows, which makes sense as players would try to return the ball. Kickoff/punt returns also seem to yield longer plays and more rows. 

As expected, normal extra point/field goal attempts and non-returned kickoffs generally have the least nnumber of rows. 

# Play Animation

Let's take another look at the tracking data.

In [None]:
t2018

It looks like the football has it's own rows as well. Let's create an animation to watch one play, including the players' movements and the football's movement. 

In [None]:
one_play = t2018[ (t2018['playId'] == 36) & (t2018['gameId'] == 2018123000) ]
one_play.head(2)
# different between frames (looking at time column) is .1 seconds

In [None]:
def getPlayAnimation(df, playId=36, gameId=2018123000, color_dict={'home': 'green', 'away':'blue', 'football': 'brown'}, 
                     from_helper=False, add_info=None):
    # function to draw one frame
    def plotFrame(i):
        one_frame = one_play[ one_play['frameId'] == i+1 ]
        ax.cla()
        
        # Constants from frame to frame
        ax.set_xlim([0, 120])
        ax.set_ylim([0, 53.3])
        ax.plot([10, 10], [0, 53.3], c='black')
        ax.plot([110, 110], [0, 53.3], c='black')
        ax.plot([60, 60], [0, 53.3], c='gray')
        
        # What changes
        ax.scatter(one_frame['x'], one_frame['y'], c=one_frame['team'].transform(lambda x: color_dict[x]), s=64)
        ax.set_title('Frame: {}'.format(i+1))
        
        if add_info is not None:
            pass
    
    # Get df of tracking data for play
    if not from_helper:
        one_play = df[(df['playId'] == playId) & (df['gameId'] == gameId)]
    else:
        one_play = df
    
    fig, ax = plt.subplots(figsize=(9,6))
    
    play_anim = anim.FuncAnimation(fig, plotFrame, frames=one_play['frameId'].max()-1, interval=100)
    html = play_anim.to_html5_video()
    plt.close(fig)
    return HTML(html)
    
# Can switch any of these parameters to change the colors/get a different play
# -- t2018 is the df for the year that the desired game and play is in
getPlayAnimation(t2018, playId=36, gameId=2018123000, 
                 color_dict={'home': 'green', 
                             'away':'blue', 
                             'football': 'brown'})

# Interactive Play Animation

In [None]:
# A few variables
teams = sorted(plays['possessionTeam'].unique())
games_df = pd.read_csv('../input/nfl-big-data-bowl-2022/games.csv')

In [None]:
yr_dpd = wg.Dropdown(
    options=['2018'],
    value='2018',
    description='Year:',
    disabled=False,
)

team_dpd = wg.Dropdown(
    options=teams,
    value='GB',
    description='Teams:',
    disabled=False,
)

week_dpd = wg.Dropdown(
    options=list(range(1,18)),
    value=1,
    description='Weeks:',
    disabled=False,
)
btn = wg.Button(
    description='Click to Get Plays',
    disabled=False,
    button_style='',
    tooltip='To generate plays',
    icon='check' 
)
plays_dpd = wg.Dropdown(
    options=[1],
    value=1,
    description='Plays:',
    disabled=False,
)

anim_btn = wg.Button(
    description='Get Animation',
    disabled=False,
    button_style='',
    tooltip='To generate animation',
    icon='check'
)

output = wg.Output()

display(yr_dpd, team_dpd, week_dpd, btn, plays_dpd, anim_btn, output) # Display widgets

# onClick for btn
def getPlaysDPD(b):
#     if int(yr_dpd.value) != 2018:
#         t_df = pd.read_csv('../input/nfl-big-data-bowl-2022/tracking{}.csv'.format(yr_dpd.value)) # redo this
#     else:
#         t_df = t2018
    t_df = t2018
    temp_df = games_df[(week_dpd.value == games_df['week'])]
    temp_df = temp_df[(temp_df['homeTeamAbbr'] == team_dpd.value) | (temp_df['visitorTeamAbbr'] == team_dpd.value)]
    play_ids = list(t_df[t_df['gameId'] == temp_df['gameId'].iloc[0]]['playId'].unique())
    plays_dpd.options = play_ids
    plays_dpd.value = play_ids[0]
    return
    
btn.on_click(getPlaysDPD)

#onClick for anim_btn
def getAnimOnClick(b):
    t_df = t2018
    temp_df = games_df[(week_dpd.value == games_df['week'])] # get df of games at chosen week
    temp_df = temp_df[(temp_df['homeTeamAbbr'] == team_dpd.value) | (temp_df['visitorTeamAbbr'] == team_dpd.value)] # down to chosen team
    one_play = t_df[(t_df['gameId'] == temp_df['gameId'].iloc[0]) & (t_df['playId'] == plays_dpd.value)] # get tracking data for chosen game and play
    output.clear_output()
    with output:
        h = getPlayAnimation(one_play, color_dict={'home': 'green', 'away':'blue', 'football': 'brown'}, from_helper=True, 
                             add_info=[temp_df['homeTeamAbbr'], temp_df['visitorTeamAbbr'], play_type])
        display(h) # Thanks to Jaron_Michael and his notebook linked below for this line, it was the last piece
        # https://www.kaggle.com/jaronmichal/tracking-data-visualization
    return

anim_btn.on_click(getAnimOnClick)

Future Improvements:
* Quick way to choose from different years without running into memory issues
* Clean/shorten widget code up a bit
* Make animation look nicer
* Add more info to add_info and display it in the animation nicely