First we have to install a couple of libraries in order to make the visualizations. The important one is [ptplot](https://github.com/AndrewRook/ptplot), which provides a ggplot2-like interface to the extremely powerful [bokeh](https://bokeh.org/) interactive visualization package. 

In [None]:
!conda install nodejs -y
!pip install ptplot

Imports!

In [None]:
import numpy as np
import pandas as pd

from ptplot import PTPlot
from ptplot.animation import Animation
from ptplot.hover import Hover
from ptplot.nfl import Aesthetics, Field
from ptplot.plot import Positions

from bokeh.plotting import show
from bokeh.io import output_notebook

output_notebook()

Now we'll load the raw data in.

In [None]:
scouting_data = pd.read_csv("../input/nfl-big-data-bowl-2022/PFFScoutingData.csv")
game_data = pd.read_csv("../input/nfl-big-data-bowl-2022/games.csv")
play_data = pd.read_csv("../input/nfl-big-data-bowl-2022/plays.csv")
tracking_data_2018 = pd.read_csv("../input/nfl-big-data-bowl-2022/tracking2018.csv")
scouting_data.shape, game_data.shape, play_data.shape, tracking_data_2018.shape

We'll pick an [interesting return somewhat at random as the example](https://www.youtube.com/watch?v=fIaNbxIQgxI), then join all the data together.

In [None]:
sample_play = play_data[(play_data.gameId == 2018100702) & (play_data.playId == 1828)]
sample_play_joined = sample_play.merge(
    tracking_data_2018, on=["gameId", "playId"], how="inner"
).merge(
    game_data, on="gameId", how="inner"
).merge(scouting_data, on=["gameId", "playId"], how="inner")

sample_play_joined.shape

Some preprocessing to fill null values for the team number and add team names to the data.

In [None]:
# I want the jersey number to be a string with NA values filled by "", which is surprisingly
# convoluted to do:
sample_play_joined["jerseyNumber"] = sample_play_joined.jerseyNumber.fillna(0).astype(int).astype(str)
sample_play_joined.loc[sample_play_joined.jerseyNumber == "0", "jerseyNumber"] = ""

# First get home/away team abbreviations, then insert "football" for all rows relating to the
# ball's position:
sample_play_joined["team_abbr"] = sample_play_joined.homeTeamAbbr.where(
    sample_play_joined.team == 'home', sample_play_joined.visitorTeamAbbr
)
sample_play_joined.loc[sample_play_joined.displayName == "football", "team_abbr"] = "football"

Now let's show an animation of the play, showing the position and orientation of all the players and the ball at each frame. We'll also add color-coding by team, jersey numbers, and a simple hover label with player names to make the plot easier to read.

In [None]:
plot = (
    PTPlot(data=sample_play_joined, pixel_height=300) 
    + Field() 
    + Positions(
        "x-10", "y", orientation="o", number="jerseyNumber",
        name="positions", marker_radius=1.2
    )
    + Aesthetics(team_ball_mapping="team_abbr", home_away_mapping="team == 'home'", ball_identifier="football")
    + Hover([("name", "@displayName")], "positions", ["displayName"])
    + Animation("frameId", 10)
)
    
show(plot.draw())

Interestingly, you can see that there's a small error with the ball position, where it briefly goes behind the punter. We can plot one of the frames where that's happening to make it really clear.

In [None]:
plot = (
    PTPlot(data=sample_play_joined, pixel_height=300) 
    + Field() 
    + Positions(
        "x-10", "y", orientation="o", number="jerseyNumber",
        name="positions", marker_radius=1.2, frame_filter="frameId == 23"
    )
    + Aesthetics(team_ball_mapping="team_abbr", home_away_mapping="team == 'home'", ball_identifier="football")
    + Hover([("Name", "@displayName")], "positions", ["displayName"])
)
    
show(plot.draw())

That might be something that needs to be accounted for before building any kind of model off this data, especially if it's common.

Another thing I was wondering is what some of the player designations in the scouting report data meant, especially who are the "vises". Let's add those designations to the hover labels.

In [None]:
team_abbr_plus_number = sample_play_joined.team_abbr.str.cat(sample_play_joined.jerseyNumber, sep=" ")
# this is yucky but there's no clean way to to an n-n string contains operation in pandas
is_gunner = np.core.char.find(
    sample_play_joined.gunners.values.astype(str),team_abbr_plus_number.values.astype(str)
) != -1
is_vise = np.core.char.find(
    sample_play_joined.vises.values.astype(str),team_abbr_plus_number.values.astype(str)
) != -1
is_rusher = np.core.char.find(
    sample_play_joined.puntRushers.values.astype(str),team_abbr_plus_number.values.astype(str)
) != -1

sample_play_joined["player_role"] = "Not Specified"
sample_play_joined.loc[is_gunner, "player_role"] = "Gunner"
sample_play_joined.loc[is_vise, "player_role"] = "Vise"
sample_play_joined.loc[is_rusher, "player_role"] = "Rusher"


In [None]:
plot = (
    PTPlot(data=sample_play_joined, pixel_height=350) 
    + Field() 
    + Positions(
        "x-10", "y", orientation="o", number="jerseyNumber",
        name="positions", marker_radius=1.2
    )
    + Aesthetics(team_ball_mapping="team_abbr", home_away_mapping="team == 'home'", ball_identifier="football")
    + Hover(
        [("Name", "@displayName"), ("Role", "@player_role")], 
        "positions", ["displayName", "player_role"]
    )
    + Animation("frameId", 10)
)
    
show(plot.draw())

Oh, ok. The gunners are the players on the punting team lined up near the sidelines who attempt to make the tackle, while the vises are the players on the returning team who are trying to keep the gunners away from the returner.