## Relevance of Tackles to Game Wins

### Overview

The goal of this Bayesian classifier is to explore the relevance of tackles to game wins.

The process is as follows:
1. The datasets `games.csv`, `plays.csv`, and `tackles.csv` will be joined, and data cleaning will be performed to get the winning team and the number of successful tackles performed by each team.
2. The number of successful tackles will be bucketed into different categories (High, Average, Low) for use in the classifier.
3. The classifier will be constructed.

In [1]:
# Importing packages for analysis
import pandas as pd
import numpy as np

# Specific to Bayes classifier
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import (
    accuracy_score,
    f1_score
)

In [2]:
# Importing data provided by NFL Kaggle dataset
tackles = pd.read_csv("../data/tackles.csv")
plays = pd.read_csv("../data/plays.csv")
games = pd.read_csv("../data/games.csv")
players = pd.read_csv("../data/players.csv")

# Display
print("\nTackles")
display(tackles.head())
print("\nPlays")
display(plays.head())
print("\nGames")
display(games.head())
# print("\nPlayers")
# display(players.head())


Tackles


Unnamed: 0,gameId,playId,nflId,tackle,assist,forcedFumble,pff_missedTackle
0,2022090800,101,42816,1,0,0,0
1,2022090800,393,46232,1,0,0,0
2,2022090800,486,40166,1,0,0,0
3,2022090800,646,47939,1,0,0,0
4,2022090800,818,40107,1,0,0,0



Plays


Unnamed: 0,gameId,playId,ballCarrierId,ballCarrierDisplayName,playDescription,quarter,down,yardsToGo,possessionTeam,defensiveTeam,...,preSnapHomeTeamWinProbability,preSnapVisitorTeamWinProbability,homeTeamWinProbabilityAdded,visitorTeamWinProbilityAdded,expectedPoints,expectedPointsAdded,foulName1,foulName2,foulNFLId1,foulNFLId2
0,2022100908,3537,48723,Parker Hesse,(7:52) (Shotgun) M.Mariota pass short middle t...,4,1,10,ATL,TB,...,0.976785,0.023215,-0.00611,0.00611,2.360609,0.981955,,,,
1,2022091103,3126,52457,Chase Claypool,(7:38) (Shotgun) C.Claypool right end to PIT 3...,4,1,10,PIT,CIN,...,0.160485,0.839515,-0.010865,0.010865,1.733344,-0.263424,,,,
2,2022091111,1148,42547,Darren Waller,(8:57) D.Carr pass short middle to D.Waller to...,2,2,5,LV,LAC,...,0.756661,0.243339,-0.037409,0.037409,1.312855,1.133666,,,,
3,2022100212,2007,46461,Mike Boone,(13:12) M.Boone left tackle to DEN 44 for 7 ya...,3,2,10,DEN,LV,...,0.620552,0.379448,-0.002451,0.002451,1.641006,-0.04358,,,,
4,2022091900,1372,47857,Devin Singletary,(8:33) D.Singletary right guard to TEN 32 for ...,2,1,10,BUF,TEN,...,0.83629,0.16371,0.001053,-0.001053,3.686428,-0.167903,,,,



Games


Unnamed: 0,gameId,season,week,gameDate,gameTimeEastern,homeTeamAbbr,visitorTeamAbbr,homeFinalScore,visitorFinalScore
0,2022090800,2022,1,09/08/2022,20:20:00,LA,BUF,10,31
1,2022091100,2022,1,09/11/2022,13:00:00,ATL,NO,26,27
2,2022091101,2022,1,09/11/2022,13:00:00,CAR,CLE,24,26
3,2022091102,2022,1,09/11/2022,13:00:00,CHI,SF,19,10
4,2022091103,2022,1,09/11/2022,13:00:00,CIN,PIT,20,23


In [5]:
# Data cleaning

################################
# PART 1 - Getting Tackle Counts
################################

# Merging tables
team_tackles = pd.merge(left=tackles, right=plays,
                        how='left',
                        left_on=['gameId', 'playId'],
                        right_on=['gameId', 'playId'])

# Subsetting for relevant columns
team_tackles = team_tackles[['gameId', 'tackle', 'assist', 'possessionTeam', 'defensiveTeam']]

# Filtering for successful tackles only
successful_tackles = team_tackles[(team_tackles['tackle'] == 1) |
                                  ((team_tackles['tackle'] == 0) & (team_tackles['assist'] == 1))]

# Filtering for unsuccessful tackles only
unsuccessful_tackles = team_tackles[(team_tackles['tackle'] == 0) |
                                    ((team_tackles['tackle'] == 0) & (team_tackles['assist'] == 0))]

# Getting counts of successful tackles by game and team
successful_tackle_count = successful_tackles.value_counts(['gameId', 'defensiveTeam'])
successful_tackle_count = successful_tackle_count.to_frame().reset_index()
successful_tackle_count = successful_tackle_count.rename(columns={0: 'numTackles'})

# Getting counts of unsuccessful tackles by game and team
unsuccessful_tackle_count = unsuccessful_tackles.value_counts(['gameId', 'defensiveTeam'])
unsuccessful_tackle_count = unsuccessful_tackle_count.to_frame().reset_index()
unsuccessful_tackle_count = unsuccessful_tackle_count.rename(columns={0: 'numTackles'})

###############################
# PART 2 - Getting Game Victors
###############################

# Get winning team
games['winningTeam'] = np.where((games['homeFinalScore'] > games['visitorFinalScore']),
                                games['homeTeamAbbr'], games['visitorTeamAbbr'])

# Subsetting for relevant columns
game_victors = games[['gameId', 'winningTeam']]

# Merging
successful_tackle_count = successful_tackle_count.merge(game_victors,
                                                        left_on='gameId', right_on='gameId')
unsuccessful_tackle_count = unsuccessful_tackle_count.merge(game_victors,
                                                            left_on='gameId', right_on='gameId')

# Getting boolean column for winning team
successful_tackle_count['winBool'] = np.where((successful_tackle_count['defensiveTeam'] ==
                                               successful_tackle_count['winningTeam']), 1, 0)
unsuccessful_tackle_count['winBool'] = np.where((successful_tackle_count['defensiveTeam'] ==
                                                 successful_tackle_count['winningTeam']), 1, 0)

####################
# PART 3 - Bucketing
####################

# Bucketing 'numTackles' into categories
num_tackles_sorting_conditions = [(pd.to_numeric(successful_tackle_count['numTackles']) > 61),
                                  (pd.to_numeric(successful_tackle_count['numTackles']) <= 61) & (pd.to_numeric(successful_tackle_count['numTackles']) >= 45),
                                  (pd.to_numeric(successful_tackle_count['numTackles']) < 45)]
num_tackles_sorting_values = ['High', 'Average', 'Low']
successful_tackle_count['numTacklesCategory'] = np.select(num_tackles_sorting_conditions, num_tackles_sorting_values,
                                                  default="")
unsuccessful_tackle_count['numTacklesCategory'] = np.select(num_tackles_sorting_conditions, num_tackles_sorting_values,
                                                            default="")

display(successful_tackle_count)
display(unsuccessful_tackle_count)

Unnamed: 0,gameId,defensiveTeam,numTackles,winningTeam,winBool,numTacklesCategory
0,2022091105,HOU,86,IND,0,High
1,2022091105,IND,53,IND,1,Average
2,2022100209,JAX,82,PHI,0,High
3,2022100209,PHI,36,PHI,1,Low
4,2022102306,GB,81,WAS,0,High
...,...,...,...,...,...,...
266,2022092500,CAR,48,CAR,1,Average
267,2022091900,TEN,50,BUF,0,Average
268,2022091900,BUF,48,BUF,1,Average
269,2022100206,LAC,48,LAC,1,Average


Unnamed: 0,gameId,defensiveTeam,numTackles,winningTeam,winBool,numTacklesCategory
0,2022091101,CAR,54,CLE,0,High
1,2022091101,CLE,21,CLE,1,Average
2,2022102306,GB,52,WAS,0,High
3,2022102306,WAS,35,WAS,1,Low
4,2022100209,JAX,51,PHI,0,High
...,...,...,...,...,...,...
266,2022102309,SEA,18,SEA,1,Average
267,2022103005,ARI,19,MIN,0,Average
268,2022103005,MIN,15,MIN,1,Average
269,2022091803,IND,16,JAX,1,Average


In [6]:
final_subset_success = successful_tackle_count[['winBool', 'numTacklesCategory']]
final_subset_success = pd.get_dummies(final_subset_success, columns=['numTacklesCategory'])

# Bayes Classifier

x = final_subset_success.drop('winBool', axis=1)
y = final_subset_success['winBool']

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.33, random_state=125
)

bayes_model = BernoulliNB()
bayes_model.fit(x_train, y_train)

y_pred_bayes = bayes_model.predict(x_test)

accuracy_bayes = accuracy_score(y_pred_bayes, y_test)
f1_bayes = f1_score(y_pred_bayes, y_test, average="weighted")

print("Accuracy:", accuracy_bayes)
print("F1 Score:", f1_bayes)

# Probability of a victory given a high number of successful tackles
sample_pred_high = {'numTacklesCategory_Average': [0],
                    'numTacklesCategory_High': [1],
                    'numTacklesCategory_Low': [0]}
sample_pred_df_high = pd.DataFrame(data=sample_pred_high)
print("\nSample Prediction (High):", bayes_model.predict(sample_pred_df_high))

# Probability of a victory given an average number of successful tackles
sample_pred_average = {'numTacklesCategory_Average': [1],
                       'numTacklesCategory_High': [0],
                       'numTacklesCategory_Low': [0]}
sample_pred_df_average = pd.DataFrame(data=sample_pred_average)
print("Sample Prediction (Average):", bayes_model.predict(sample_pred_df_average))

# Probability of a victory given a low number of successful tackles
sample_pred_low = {'numTacklesCategory_Average': [0],
                   'numTacklesCategory_High': [0],
                   'numTacklesCategory_Low': [1]}
sample_pred_df_low = pd.DataFrame(data=sample_pred_low)
print("Sample Prediction (Low):", bayes_model.predict(sample_pred_df_low))

Accuracy: 0.6
F1 Score: 0.6178628389154706

Sample Prediction (High): [0]
Sample Prediction (Average): [1]
Sample Prediction (Low): [1]


In [8]:
final_subset_failure = unsuccessful_tackle_count[['winBool', 'numTacklesCategory']]
final_subset_failure = pd.get_dummies(final_subset_failure, columns=['numTacklesCategory'])

# Bayes Classifier

x = final_subset_failure.drop('winBool', axis=1)
y = final_subset_failure['winBool']

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.33, random_state=125
)

bayes_model = BernoulliNB()
bayes_model.fit(x_train, y_train)

y_pred_bayes = bayes_model.predict(x_test)

accuracy_bayes = accuracy_score(y_pred_bayes, y_test)
f1_bayes = f1_score(y_pred_bayes, y_test, average="weighted")

print("Accuracy:", accuracy_bayes)
print("F1 Score:", f1_bayes)

# Probability of a victory given a high number of missed tackles
sample_pred_high = {'numTacklesCategory_Average': [0],
                    'numTacklesCategory_High': [1],
                    'numTacklesCategory_Low': [0]}
sample_pred_df_high = pd.DataFrame(data=sample_pred_high)
print("\nSample Prediction (High):", bayes_model.predict(sample_pred_df_high))

# Probability of a victory given an average number of missed tackles
sample_pred_average = {'numTacklesCategory_Average': [1],
                       'numTacklesCategory_High': [0],
                       'numTacklesCategory_Low': [0]}
sample_pred_df_average = pd.DataFrame(data=sample_pred_average)
print("Sample Prediction (Average):", bayes_model.predict(sample_pred_df_average))

# Probability of a victory given a low number of missed tackles
sample_pred_low = {'numTacklesCategory_Average': [0],
                   'numTacklesCategory_High': [0],
                   'numTacklesCategory_Low': [1]}
sample_pred_df_low = pd.DataFrame(data=sample_pred_low)
print("Sample Prediction (Low):", bayes_model.predict(sample_pred_df_low))

Accuracy: 0.6
F1 Score: 0.6178628389154706

Sample Prediction (High): [0]
Sample Prediction (Average): [1]
Sample Prediction (Low): [1]


### Conclusions

The Bayes Classifier for the relevance of the number of successful tackles to the likelihood of winning is shown to have a 60% accuracy and an F1 score of about 0.618. As demonstrated, for teams with the average number of tackles (between 45 and 61, inclusive), it predicted a victory. For teams with over 61 tackles, it predicted a loss, and for teams with less than 45 tackles, it predicted a victory.

Thus, the model does not serve as a good predictor. The primary reason for this is likely that only one attribute (the number of successful tackles by a team) was passed to the Bayes classifier, and thus accuracy of the model was inherently limited. For example, it is not uncommon for opposing teams to get a comparable number of tackles within the same game, but only one team can win and only one team can lose. As such, tackles do not seem to have a large impact on game outcomes, at least not when removed from the context of their plays and the overarching strategies.