# Routes Over Expected

*Metric Track*

*Matt Flaherty*



# Introduction

Every pass play has a combination of routes. Determining each player's route could allow the defense to adjust coverage and also allow the defenders to adjust which way they force the route runner. Both reactions by the defense are an attempt to minimize the number of yards gained by the offense. Doing all of this in less than 40 seconds while players substitute and the offense shifts and motions can make it difficult for a defender to determine which route the receivers might run. Predicting the routes for each route runner can allow defenders to make decisions on the fly and increase their chance of either generating incompletions or interceptions. Defenses should look to generate incompletions and interceptions because the data suggests that these outcomes reduce offensive efficiency. <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/EPADist.png" style="width: 600px"/></center>

It would also be beneficial for defenses to know when deeper routes are being run because deeper routes increase the number of points scored by the offense more than shorter routes on average. The below figure gives evidence that there is a moderate positive relationship between the average depth of target and EPA/Pass. <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/RouteEPA.png" style="width: 600px"/></center>

This research provides a predictive model that can be used to predict the route for each route runner on a play. A Routes Over Expected (ROE) metric will be created from this model which will allow defenses to pick up on opponent tendencies. Each route can have its own RROE metric; however, for this research, a Go Route Over Expected metric will be created as an example. Additionally, offenses can use these predictions for self-scouting to determine how predictable they are being. This paper will walk through the process of building the model as well as what may be gleaned from the results.

# Data

This research uses tracking, play-by-play, route runner, and game data from weeks 1-9 of the 2022 season. The data was filtered to remove post-snap information, run plays, and non-route runners. The data was additionally filtered to remove wildcat and jumbo formations as well as 1x0, 3x0, and 3x3 receiver alignments because of a small sample size.

## Feature Engineering

Customizations were added to the dataset to enhance the performance of the model. The offensive formation, receiver alignment, and route runner's team were one-hot encoded to add context about tendencies. For example, shotgun formations are more likely to lead to GO routes while under center formations are more likely to lead to FLAT and CROSS routes. <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/RouteRateByFormation.png" style="width: 1000"/></center>

Additionally, the distance from the route runner to the other 10 offensive players was added to give context about the setup of the formation. In the below example, there is evidence that there is a positive relationship between predictability and distance to the closest offensive player. As players move away from the closest offensive player, they are more likely to run either a HITCH or GO route while players lining up near another offensive player have a more diverse route tree.  <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/RouteRateByDistToNearestOff.png" style="width: 1000px"/></center>

Another feature that was added to the dataset was the route runner's distance to the line of scrimmage. There is evidence that route runners lined up closer to the line of scrimmage will run GO and HITCH routes while players lined up in the backfield will run more FLAT and ANGLE routes. This is likely because route runners closer to the line of scrimmage are wide receivers while those further away from the line of scrimmage are running backs. Rather than using the player's position on the depth chart, the player's distance to the line of scrimmage will be used in case a wide receiver was lined up in the backfield or a running back motioned out wide. <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/RouteRateByYdsToLOS.png" style="width: 1000px"/></center>

# Model

Two multilayer perceptron neural networks were trained to generate the route predictions, one for shotgun formations and the other for under center formations. While one model was initially trained, splitting the dataset into the two types of formations led to better F1 scores on unseen data. The best performing models had 86 input features, 30 nodes in the first hidden layer, 20 nodes in the second hidden layer, and 13 nodes in the output layer, one node for each possible route. The 86 input features include the route runner's coordinates, direction, and speed as well as features for the distance to each offensive player, quarter, seconds remaining in the half, down, distance, presnap possession team win probability, distance to the line of scrimmage, formation, receiver alignment, and team of the route runner. The training set consisted of weeks 1-7 while the test set used weeks 8 and 9. A baseline F1 score was created by using the route rates which resulted in an F1 score of 0.1148. The under center model had an F1 score of 0.1531 and shotgun had 0.1261; therefore, using the models would create more accurate predictions that using only the route rate.

# Results

## Feature Importance

Feature importance scores were created for the model by shuffling the values of one feature for the model, generating new predictions, calculating the difference between the new score and the original score, and resetting the dataset to the original data. This process was repeated ten times for each feature and an average difference was calculated to determine the feature's importance. The average differences were sorted in descending order to find the most important features. The following figure shows the top 10 important features for both models. Distance to the line of scrimmage is the most important feature because players that lined up in the backfield were more likely to run FLAT routes while players lined up on the line of scrimmage were more likely to run GO and HITCH routes. Additionally, players that were lined up close to the 3 nearest offensive players had a diverse route tree while those further away primarily ran GO and HITCH routes. <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/featureImportance.png" style="width: 1200px"/></center>

## Go Routes Over Expected

From the model, a ROE metric can be created to determine tendencies by players. Defenses can use this metric to understand what they should be expecting from the offense's route runners. While each route can have a ROE, Go Routes Over Expected (GROE) will be used as an example. The below table shows the top 5 route runners for GROE. These route runners are the most likely to run GO routes even when the average route runner might run another route; therefore, it might behoove the defender to play deeper against these players.<center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/goRouteOverExpected.png" style="width: 600px"/></center>

## Team/Player Predictability

From the predictions made by the model, there is evidence that there is a moderate negative relationship between offensive predictability and offensive efficiency. Therefore, teams could use this model to determine if they are being predictable and improve their efficiency by reducing their predictability.
<center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/teamPredictabilityVsEPAPass.png" style="width: 600px"/></center>

Furthermore, we can learn that players who want to be more efficient should be willing to diversify their route tree and location at the time of the snap because there is evidence that there is a negative relationship between player predictability and EPA/Reception. Additionally, defenses could use features from this model to help their defensive backs determine which route their man is going to run. By having a better understanding of the route that is about to be run, the defensive backs can minimize the effectiveness of their man's route.
<center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/playerPredictabilityVsEPAPass.png" style="width: 600px"/></center>

## Play Example

The following example illustrates how the model works throughout the presnap process. The table to the right of the animation provides the predictions for each player in each frame with the probability that each player runs the predicted route. From this, we can see how important the player's distance from the line of scrimmage is as both players in the backfield are predicted to run FLAT routes. It's also important to note that both receivers are the lone receiver on their side of the formation; therefore, this leads the model to predict GO routes for them. <center><img src="https://raw.githubusercontent.com/mattflaherty97/BDB2025/refs/heads/main/visuals/samplePlay.gif" style="width: 1200px"/></center>


# Conclusions

Teams can use this analysis in preparation for the upcoming week both offensively and defensively. Offensively, teams could look back on how predictable they have been up until the current week and look to change their patterns. This research has shown that there are benefits from being unpredictable with routes. Defensively, teams could use important features in this model like distance of the route runner to the line of scrimmage and distance to the other receivers to find tendencies in receiver alignment and formation of their upcoming opponent. This research gives evidence that the more predictable you can make a team, the less efficient they will be.

# Appendix

[Code](https://github.com/mattflaherty97/BDB2025/tree/main)