Given the special teams data, our team decided to explore punts, which we assumed to be the least studied aspect of special teams, considering it rarely produces points on its own. Specifically, we focused on punt returns and comparing returns interactively within and between players. To begin, we loaded the necessary packages, and the player, game, and tracking data.

In [None]:
library(tidyverse)
install.packages("edeaR")
library(edeaR)
install.packages("sportyR")
library(sportyR)
games <- read.csv("../input/nfl-big-data-bowl-2022/games.csv")
scouting <- read.csv("../input/nfl-big-data-bowl-2022/PFFScoutingData.csv")
players <- read.csv("../input/nfl-big-data-bowl-2022/players.csv")
plays <- read.csv("../input/nfl-big-data-bowl-2022/plays.csv")
tracking2018 <- read.csv("../input/nfl-big-data-bowl-2022/tracking2018.csv")
tracking2019 <- read.csv("../input/nfl-big-data-bowl-2022/tracking2019.csv")
tracking2020 <- read.csv("../input/nfl-big-data-bowl-2022/tracking2020.csv")

After filtering the play data to only include punting plays, we began exploratory analysis, which included familiarizing ourselves with variables of interest, such as return length and kick length, and plotting single variable distributions for these and other variables. We moved forward from univariate analysis to explore the relationship between kick length and return length, hypothesizing that longer punts would lead to longer returns. However, the correlation between these two variables was low, only 0.18, and the resulting scatter plot confirmed the weak linear relationship between kick and return length. As a result, we turned to a deeper dive into returns alone, focusing on the paths of each returner using player tracking data. 

In [None]:
punt_plays <- plays %>% filter(specialTeamsPlayType=="Punt")

hist(punt_plays$kickLength, main="Distribution of Kick Lengths", xlab="Kick Length")
hist(punt_plays$kickReturnYardage, main="Distribution of Return Lengths", xlab="Return Length")


punt_plays %>% ggplot(aes(kickLength, kickReturnYardage)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Relationship Between Kick and Return Length", y="Kick Return Yardage", x="Kick Length") +
  annotate("text",x=20, y=35, label="r = 0.183458")
cor(punt_plays$kickReturnYardage, punt_plays$kickLength, use = "complete.obs")

Our project was inspired by NFL NextGenStats wide receiver route charts. We wanted to create a way that punt returns could be graphed and compared in a similar way. Our vision was an app that could display all of the returns for a particular player standardized based on the location the ball was caught. To accomplish this task, we had to analyze the way that the data was reported and filter out the location data that we needed. Due to the size of the datasets for the player tracking data, we chose to focus only on the 2018 season. In the future, we could expand our app to include other seasons and potentially combine seasons to get a better picture of a player’s returns over time. 

In [None]:
punt_plays_players <- left_join(punt_plays, players, by=c("kickerId"="nflId")) %>%
  rename(punter_height=height, punter_weight=weight, punter_college=collegeName, punter_name = displayName)
punt_plays_players$returnerId <- as.numeric(punt_plays_players$returnerId)

punt_tracking <- left_join(punt_plays_players, tracking2018, by=c("gameId", "playId"))

When beginning to look at tracking data, we created a new dataset with only the first punt play so that we could get an understanding of plotting the data without dealing with the size of the full 2018 dataset. We then filtered down to only have the data from the returner. By manually looking through the data, we identified the frames when the ball was caught and when the play ended and then graphed the location data onto a football field using the sportyR package. While this worked, we wanted to find a way to systematically filter the event data to only include instances when the return is occurring. 

In [None]:
punt_tracking_simple <- punt_tracking[1:2944,]

playernames <- players %>% select(nflId, displayName)

punt_tracking_simple$returnerId <- as.numeric(punt_tracking_simple$returnerId)
punt_tracking_returner <- left_join(punt_tracking_simple, playernames, by=c("returnerId"="nflId")) %>%
  rename(returner_name = displayName.y, tracking_player = displayName.x) %>%
  filter(returner_name==tracking_player)

geom_football(league = "nfl", grass_color="darkgreen") +
  geom_point(data=punt_tracking_returner[77:length(punt_tracking_returner),], aes(x,y)) +
  labs(title = "First Return: Justin Hardy")

For our next step, we plotted all returns for one player: Darren Sproles. We were able to implement an algorithm to filter the data down to instances that occurred between the punt being received and an event that would end the return such as a tackle, fumble, touchback, lateral, safety, fair catch, touchdown, handoff, or out of bounds. We then separated Sproles’ returns by the direction that he was moving, left to right or right to left, in order to understand whether his returns were positive or negative. We plotted the two directions separately to create a map of Sproles’ total returns from left to right and right to left. 

In [None]:
#punt tracking data for specific returner
punt_tracking_by_returner_sproles <- left_join(punt_tracking, playernames, by=c("returnerId"="nflId")) %>%
  rename(returner_name = displayName.y, tracking_player = displayName.x) %>%
  filter(returner_name==tracking_player) %>%
  filter(returner_name=="Darren Sproles")

#only from catch to end of play
punt_tracking_by_returner2_sproles <- punt_tracking_by_returner_sproles %>%
  group_by(gameId, playId) %>%
  filter(any(event=="punt_received")) %>%
  mutate(counter = ifelse(event=="punt_received", 1, ifelse(event %in% c("tackle","fumble_defense_recovered","touchback","fumble","lateral","safety","fair_catch","fumble_offense_recovered","punt_muffed","touchdown","handoff","out_of_bounds"), -1, 0)), counter_sum = cumsum(counter)) %>%
  filter(counter_sum == 1)

# only going in one direction: punt to the right, return to the left
punt_tracking_by_returner_right2left_sproles <- punt_tracking_by_returner2_sproles %>%
  filter(playDirection == "right")

# only going in one direction: punt to the left, return to the right
punt_tracking_by_returner_left2right_sproles <- punt_tracking_by_returner2_sproles %>%
  filter(playDirection == "left")

first_sproles_return <- punt_tracking_by_returner2_sproles %>%
  filter(gameId==2018090600, playId==1989)

geom_football(league = "nfl") +
  geom_point(data=first_sproles_return, aes(x,y)) +
    labs(title="First Sproles Return")

#plotting return path on football field
geom_football(league = "nfl", grass_color="darkgreen") +
  geom_point(data=punt_tracking_by_returner_left2right_sproles, aes(x,y))+
  labs(title = "Left to Right")

geom_football(league = "nfl", grass_color="darkgreen") +
  geom_point(data=punt_tracking_by_returner_right2left_sproles, aes(x,y)) +
  labs(title = "Right to Left")

Finally, we standardized all player returns so they could be plotted on the same graph and would all be moving in the same direction and from the same yardline. By subtracting the initial x coordinate at the time of catch from each x coordinate, we were able to standardize all plays to an initial start line, but the plays were still in opposite directions so we had to combine them. To do this we multiplied every x coordinate in the right to left dataset by negative one so that it would start from the same point as the left to right dataset. We originally tried using the absolute value of the difference between the initial x and the current x, but this did not work because it made all plays look positive even if the return had actually been negative. 

In [None]:
punt_tracking_standardized_by_returner_right2left_sproles <- punt_tracking_by_returner_right2left_sproles %>%
  group_by(gameId, playId) %>%
  mutate(x2 = x-first(x))

punt_tracking_standardized_by_returner_left2right_sproles <- punt_tracking_by_returner_left2right_sproles %>%
  group_by(gameId, playId) %>%
  mutate(x2 = x-first(x))

geom_football(league = "nfl", grass_color="darkgreen") +
  geom_point(data=punt_tracking_standardized_by_returner_right2left_sproles, aes(-x2,y)) +
  labs(title = "Right to Left Standard")

geom_football(league = "nfl", grass_color="darkgreen") +
  geom_point(data=punt_tracking_standardized_by_returner_left2right_sproles, aes(x2,y)) +
  labs(title = "Left to Right Standard")

punt_tracking_standardized_by_returner_sproles <- punt_tracking_by_returner2_sproles %>%
  group_by(gameId, playId) %>%
  mutate(x2 = abs(x-first(x)))

ggplot(data=punt_tracking_standardized_by_returner_sproles, aes(x2,y)) +
  geom_path(data=punt_tracking_standardized_by_returner_sproles, aes(x2,y,group = playId)) +
  labs(title = "Standard General") +
  geom_vline(aes(xintercept=0))

geom_football(league = "nfl", grass_color="darkgreen") +
  geom_path(data=punt_tracking_standardized_by_returner_sproles, aes(x2+10,y,group = playId)) +
  labs(title = "Standard General")

We expanded this approach to all returns in the dataset and plotted them on a simple line graph using the same system as above. This allowed us to view the distribution of the returns and notice that there were some net negative returns, but none that exceeded a loss of 10 yards. There were more net positive returns which maxed out just under 90 yards. Knowing this, we standardized the returns about the 10 yard line in our final app, making sure that every return was neatly plotted on the football field. 

In [None]:
punt_tracking_by_returner <- left_join(punt_tracking, playernames, by=c("returnerId"="nflId")) %>%
    rename(returner_name = displayName.y, tracking_player = displayName.x) %>%
    filter(returner_name==tracking_player)

punt_tracking_by_returner2 <- punt_tracking_by_returner %>%
    group_by(gameId, playId) %>%
    filter(any(event=="punt_received")) %>%
    mutate(counter = ifelse(event=="punt_received", 1, 
                            ifelse(event %in% c("tackle","fumble_defense_recovered","touchback",
                                                "fumble","lateral","safety","fair_catch",
                                                "fumble_offense_recovered","punt_muffed","touchdown",
                                                "handoff","out_of_bounds"), -1, 0)), 
           counter_sum = cumsum(counter)) %>%
    filter(counter_sum == 1)

punt_tracking_by_returner_right2left <- punt_tracking_by_returner2 %>%
  filter(playDirection == "right")

punt_tracking_by_returner_left2right <- punt_tracking_by_returner2 %>%
  filter(playDirection == "left")

punt_tracking_standardized_by_returner_right2left <- punt_tracking_by_returner_right2left %>%
  group_by(gameId, playId) %>%
  mutate(x2 = x-first(x))

punt_tracking_standardized_by_returner_left2right <- punt_tracking_by_returner_left2right %>%
  group_by(gameId, playId) %>%
  mutate(x2 = x-first(x))

ggplot() +
  geom_path(data=punt_tracking_standardized_by_returner_left2right, aes(x2,y,group = playId)) +
  geom_path(data=punt_tracking_standardized_by_returner_right2left, aes(-x2,y,group = playId)) +
  labs(title = "Both Standardized")

The app displays the standardized returns for each player assuming all punts were receieved at the 10 yard line. There is a dropdown menu allowing the user to pick any returner from the data. Once a player is selected, the graph responds dynamically and plots their returns. Negative returns are shown in red and posititve returns are shown in green to enhance the viewer's experience. We also created a slider that allows the user to pick a range of net return yardage. The graph then dybamically filters the returns by this range. 

The output allows users to compare different returns by individual players and see which types of returns are more favorable. For example, does running up the middle or down the sideline generate more yardage for Tyreek Hill? Additionally, one can compare the return lengths between players and make a decision about who is the better punt returner. Overall trends can be identified through this process of comparison as well, making it a great tool for teams to use to optimize punt returns. Additionally, this app, or a version of it, could be displayed to the TV audience during games as a way to display the prowess of certain returners and compare returners from competing teams. 

Note: The shiny app may not run in the kaggle notebook, but copying and pasting the code below (written in markdown) into a shiny app file and running it locally will produce the desired app. In that case, data must be downloaded locally and the necessary datasets that lead to the data used in the shiny app must be created. 

In [None]:
punt_tracking_standardized_by_returner <- punt_tracking_by_returner2 %>%
    group_by(gameId, playId) %>%
    mutate(x2 = abs(x-first(x)))

data = punt_tracking_standardized_by_returner

returner_names <- data %>% 
    group_by(returner_name) %>%
    filter(row_number()==1) %>%
    pull(returner_name) %>%
    sort()

data1 = punt_tracking_standardized_by_returner_right2left
data2 = punt_tracking_standardized_by_returner_left2right

library(shiny)
server <- function(input, output, session) {
    
    updateSelectizeInput(session, "returner", choices = returner_names, server = TRUE)
    data1_filtered <- reactive({
        data1 %>% filter(returner_name %in% input$returner) %>%
            group_by(gameId, playId, returner_name) %>% 
            mutate(col = ifelse(last(x2)>0, "red", "green")) %>%
            filter(-last(x2) > input$return_yards[1], -last(x2) < input$return_yards[2])
            
    })
    data2_filtered <- reactive({
        data2 %>% filter(returner_name %in% input$returner) %>%
            group_by(gameId, playId, returner_name) %>% 
            mutate(col = ifelse(last(x2)<0, "red", "green")) %>%
            filter(last(x2) > input$return_yards[1], last(x2) < input$return_yards[2]) 
            
    })
    
    output$scatterPlot <- renderPlot({
        geom_football(league = "nfl", grass_color="darkgreen") +
            geom_path(data=data1_filtered(), aes(-x2+20,y,group = playId, color=col)) +
            geom_path(data=data2_filtered(), aes(x2+20,y,group = playId, color=col)) +
            scale_color_identity()
    })
}


ui <- fluidPage(
    
    titlePanel("Standardized Punt Returns for Selected Player: 2018 Season"),
    
    sidebarLayout(
        sidebarPanel(
            selectizeInput("returner", "Returner", multiple = F, choices = NULL),
            sliderInput("return_yards",
                        "Range of Net Return Yards:",
                        min = -100,
                        max = 100,
                        value = c(-30,30))
        ),
        
        mainPanel(
            plotOutput("scatterPlot")
        )
    )
)
 
shinyApp(ui = ui, server = server)
