---
## What’s This About 🤔
Welcome to *Finding NFL Winners*, a series where we dive into what it takes to predict **NFL game outcomes**. In this first part, *The Start*, we’ll explore the data to uncover which stats truly matter. Are turnovers, yards, or something else the key to victory?

As the series unfolds, we’ll build tools and eventually create a predictive model to tackle the ultimate question: *Who’s going to win?* Let’s get started!

### Disclaimers

First, I’m relatively new to American football, with just three years of watching under my belt. While I still have plenty to learn, this fresh perspective allows me to focus solely on the numbers, free from bias.

Second, although I’ll reference Vegas odds in my analysis, I strongly discourage gambling. It’s addictive and harmful. This series is about understanding and predicting the game—not betting on it.

## CODE UTILS
---

In [26]:
# Imports
from itables import show
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'notebook_connected'
from scipy.stats import gaussian_kde

# Internals
from common.data_sources import get_nfl_pbp_data, get_nfl_per_game_df, get_nfl_per_game_per_team_df
from common.graph import plot_overlap_histogram


In [27]:
# Import Data
nfl_df = get_nfl_pbp_data()


Columns (36,37,45,179,180,182,183,189,190,193,194,197,198,203,204,205,206,207,208,209,210,211,212,213,214,218,219,220,222,224,226,233,234,235,236,237,238,243,244,245,248,249,253,254,255,260,262,263,266,267,268,269,283,284,292,293,294,295,296,299,301,302,303,306,332,373,375,376,377,379,381,382,383,389,390,391) have mixed types. Specify dtype option on import or set low_memory=False.



In [28]:
# Transform
nfl_per_game_df = get_nfl_per_game_df(nfl_df)
nfl_per_game_per_team_df = get_nfl_per_game_per_team_df(nfl_df)
show(
    nfl_per_game_df,
    # layout={"top1": "searchPanes"},
    # searchPanes={"layout": "columns-3", "cascadePanes": True, "columns": [1, 6, 7]},
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Unnamed: 13_level_0,Unnamed: 14_level_0,Unnamed: 15_level_0,yards_gained,air_yards,yards_after_catch,kick_distance,winning_team_score,losing_team_score,score_differential_post,first_down_rush,first_down_pass,first_down_penalty,third_down_failed,third_down_converted,fourth_down_converted,fourth_down_failed,incomplete_pass,interception,rush_attempt,pass_attempt,sack,touchdown,pass_touchdown,rush_touchdown,return_touchdown,passing_yards,receiving_yards,rushing_yards
game_id,home_team,away_team,season_type,week,game_date,season,location,div_game,roof,surface,temp,wind,home_coach,away_coach,game_stadium,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1
Loading ITables v2.2.3 from the internet... (need help?),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


## CONTENT
---

---
## Just Looking 👀

<div style="text-align: center;">
  <figure>
    <img src="article_assets/michael-lehan-cleveland-browns-student-athletes.jpg" alt="Alt text" style="max-height: 600px;">
    <figcaption>Michael Lehan (Getty Images)</figcaption>
  </figure>
</div>

### How Do You Win in the NFL?
According to the [NFL Rules](https://operations.nfl.com/the-rules/nfl-rulebook/): *"A team wins by having more points than its opponent after four quarters (60 minutes) or after overtime, if applicable"*. In simple terms, your offense just needs to score more points than the points your defense allow your opponent to score.

But how many points should my offense score to win, or how many points should my defense allow?


In [29]:
# Plot Histogram
# Data
score_data = nfl_per_game_per_team_df['posteam_score_post'].dropna()
winning_score_data = nfl_per_game_df['winning_team_score'].dropna()
losing_score_data = nfl_per_game_df['losing_team_score'].dropna()
# Figure
HIST_BINS = 25
fig = go.Figure()
plot_overlap_histogram(score_data, fig, bins=HIST_BINS, name='All', color='green')
plot_overlap_histogram(winning_score_data, fig, bins=HIST_BINS, name='Winning Team', color='blue')
plot_overlap_histogram(losing_score_data, fig, bins=HIST_BINS, name='Losing Team',color='red')
# Update layout
fig.update_layout(
    title=dict(text="Fig.1: Team Points Probability", x=0.5, y=0.95, font=dict(size=20, color="black")),
    xaxis=dict(range=[0, 60], title="Points"),
    yaxis_title="Frequency",
    template="plotly_white",
    barmode='overlay'
)
fig.show()

In [30]:
# Calculate Probability
w_m, w_std = winning_score_data.mean(), winning_score_data.std()
l_m, l_std = losing_score_data.mean(), losing_score_data.std()
w_lower_bound, w_higher_bound = (w_m - w_std).round(), (w_m + w_std).round()
l_lower_bound, l_higher_bound = (l_m - l_std).round(), (l_m + l_std).round()
total = nfl_per_game_df['winning_team_score'].count()
w_between_bound = nfl_per_game_df[
    (nfl_per_game_df['winning_team_score'] >= w_lower_bound) &
    (nfl_per_game_df['winning_team_score'] <= w_higher_bound)
]['winning_team_score'].count()
l_between_bound = nfl_per_game_df[
    (nfl_per_game_df['losing_team_score'] >= l_lower_bound) &
    (nfl_per_game_df['losing_team_score'] <= l_higher_bound)
]['losing_team_score'].count()
w_l_between_bound = nfl_per_game_df[
    (nfl_per_game_df['winning_team_score'] >= w_lower_bound) &
    (nfl_per_game_df['winning_team_score'] <= w_higher_bound) &
    (nfl_per_game_df['losing_team_score'] >= l_lower_bound) &
    (nfl_per_game_df['losing_team_score'] <= l_higher_bound)
]['losing_team_score'].count()
print(f'Probabily of winning team score between [{w_lower_bound}, {w_higher_bound}]: {round(w_between_bound/total, 2)}')
print(f'Probabily of losing team score between [{l_lower_bound}, {l_higher_bound}]: {round(l_between_bound/total, 2)}')
print(f'Probabily of winning and losing team score between bounds: {round(w_l_between_bound/total,2)}')

Probabily of winning team score between [19.0, 36.0]: 0.69
Probabily of losing team score between [8.0, 24.0]: 0.69
Probabily of winning and losing team score between bounds: 0.51


In [31]:
# HeatMap 2D Histogram
# Create 2D histogram bins
RANGE_SIZE = 5
x_bins = np.arange(nfl_per_game_df['winning_team_score'].min(), nfl_per_game_df['winning_team_score'].max() + 1, RANGE_SIZE)
y_bins = np.arange(nfl_per_game_df['losing_team_score'].min(), nfl_per_game_df['losing_team_score'].max() + 1, RANGE_SIZE)
hist, x_edges, y_edges = np.histogram2d(
    nfl_per_game_df['winning_team_score'], nfl_per_game_df['losing_team_score'], bins=[x_bins, y_bins]
)
# Create the heatmap
heatmap = go.Heatmap(z=hist.T, x=x_bins[:-1], y=y_bins[:-1], colorscale='Viridis', colorbar=dict(title="Occurrences"),)
# Annotate the heatmap with counts
annotations = []
for i, y in enumerate(y_bins[:-1]):
    for j, x in enumerate(x_bins[:-1]):
        value = hist[j, i]
        if value > 0:
            annotations.append(
                go.layout.Annotation(
                    text=f"{int(value)}", x=x, y=y, showarrow=False,
                    font=dict(size=10, color="white" if value > hist.max() / 2 else "black")
                )
            )
# Build figure
fig = go.Figure(data=[heatmap])
fig.update_layout(
    title=dict(text="Fig.2: Winng/Losing Points Heatmap", x=0.5, y=0.95, font=dict(size=20, color="black")),
    xaxis=dict(title="Winning Team Points", range=[0, 40]),
    yaxis=dict(title="Losing Team Points",range=[-3, 33]),
    annotations=annotations,  # Add annotations
    template="plotly_white"
)
# Show plot
fig.show()

<br>

*Fig. 1* shows the probability of all scores in NFL games from **1999 to 11/24/2024**. The average score for any team (winning or losing) is 22 points, while losing teams average 16 points and winning teams average 27 points.

Something relevant is that the coefficient of variation of the losing team is over 50%, which means *losing scores are very volatile*. Even when they have an average of 16, they are very likely to range from 8 to 24. And **why don't winning teams behave like this?** 🤔 Let's save this one for later analysis.

On *Fig. 2*, we can see a heatmap of winning team points versus losing team points. It highlights that the range (23–28) for winning teams versus (20–25) for losing teams is the most common occurrence. Additionally, the most frequent scores are 23–20 (77 occurrences), 20–17 (75 occurrences), 24–17 (74 occurrences), and 27–24 (71 occurrences).

Finally, let's keep this in mind: almost **70% of winning team points** fall between 19 and 36 points, while **70% of losing team points** fall between 8 and 24 points.
<br>


In [32]:
# Group by abs_score_differential_post and calculate average winning_team_score
grouped_data = nfl_per_game_df.groupby('losing_team_score').agg(
    avg_winning_team_score=('winning_team_score', 'mean'),
    std_winning_team_score=('winning_team_score', 'std')
).reset_index()

# Create scatter plot
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=grouped_data['losing_team_score'],
    y=grouped_data['avg_winning_team_score'],
    mode='lines+markers',
    name='Team Scores'
))

# Add x=y line
min_val = min(grouped_data['losing_team_score'].min(), grouped_data['avg_winning_team_score'].min())
max_val = max(grouped_data['losing_team_score'].max(), grouped_data['avg_winning_team_score'].max())
fig.add_trace(go.Scatter(
    x=[min_val, max_val],
    y=[min_val, max_val],
    mode='lines',
    line=dict(color='red', dash='dash'),
    name='x = y'
))

# Group by abs_score_differential_post and calculate average winning_team_score
# Calculate bounds for the shaded region
grouped_data['lower_bound'] = grouped_data['avg_winning_team_score'] - grouped_data['std_winning_team_score']
grouped_data['upper_bound'] = grouped_data['avg_winning_team_score'] + grouped_data['std_winning_team_score']
# Add shaded area for mean ± std
fig.add_trace(go.Scatter(
    x=grouped_data['losing_team_score'],
    y=grouped_data['upper_bound'],
    mode='lines',
    line=dict(width=0, color='rgba(0,100,200,0.2)'),  # Transparent upper bound
    showlegend=False
))
fig.add_trace(go.Scatter(
    x=grouped_data['losing_team_score'],
    y=grouped_data['lower_bound'],
    mode='lines',
    line=dict(width=0, color='rgba(0,100,200,0.2)'),  # Transparent lower bound
    fill='tonexty',  # Fill area between lower and upper bounds
    fillcolor='rgba(0,100,200,0.2)',
    showlegend=True,
    name='Mean ± Std'
))
# Plot
fig.update_layout(
    title=dict(text="Fig.2: Winning Points due Losing Points", x=0.5, y=0.95, font=dict(size=20, color="black")),
    xaxis_title="Losing Points",
    yaxis_title="Winning Points",
    template="plotly_white"
)
fig.show()

In [33]:
nfl_per_game_per_team_per_wp = (
    nfl_df[ nfl_df['posteam'] != '']
    .groupby([
        'game_id', 'posteam', 'wp',
    ])
    .agg({
        'epa': 'mean',
    })
    .reset_index()
)

In [34]:

# Group by abs_score_differential_post and calculate average winning_team_score
nfl_per_game_per_team_per_wp['wp_bin'] = (nfl_per_game_per_team_per_wp['wp'] * 100).round() /100
nfl_per_game_per_team_per_wp['abs_epa'] = nfl_per_game_per_team_per_wp['epa'] + abs(min(nfl_per_game_per_team_per_wp['epa']))
grouped_data = nfl_per_game_per_team_per_wp.groupby('wp_bin').agg(
    avg_abs_epa=('abs_epa', 'mean')
).reset_index()


fig.add_trace(go.Scatter(
    x=grouped_data['wp_bin'],
    y=grouped_data['avg_abs_epa'],
    mode='lines+markers',
    name='Team Scores'
))

# Update layout
fig.update_layout(
    title="",
    xaxis_title="Win Probability",
    yaxis_title="EPA",
    template="plotly_white"
)
# Show the plot
fig.show()

## Cooking 🧑‍🍳
---


## Credits
---
* Assistants: Since content, images,

---
## To be continued ...