# Day 46

I've heard announcers talking about teams that lead at halftime and then end up losing the game and how many times a team has done this in 2022. So, I want to see which teams lead at halftime but fail to win and the teams that trail at halftime and manage to win. 

I'll solve this question by mainly using SQL.

In [1]:
import pandas as pd
import numpy as np
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_palette('deep')

# Create database connection
conn = sqlite3.connect('../../data/db/database.db')

## Query the Data

In [7]:
query = """
WITH data AS (
    -- Get the score at the end of each quarter
    SELECT
        game_id,
        season,
        week,
        home_team,
        away_team,
        total_home_score,
        total_away_score,
        away_score,
        home_score,
        desc
    FROM pbp
    WHERE season = 2022
        -- AND week = 1
        -- AND game_id = '2022_01_BAL_NYJ'
        AND desc IN ('END QUARTER 1', 'END QUARTER 2', 'END QUARTER 3', 'END GAME')
), 
-- Stack the dataset to make it easier to work with
stacked AS (
    WITH home_team AS (
        SELECT 
            game_id,
            season,
            week,
            home_team AS team,
            total_home_score AS total_score,
            total_away_score AS total_opp_score,
            home_score AS score,
            away_score AS opp_score,
            desc,
            ROW_NUMBER() OVER(PARTITION BY game_id, home_team) AS quarter
        FROM data
    ), away_team AS (
        SELECT 
            game_id,
            season,
            week,
            away_team AS team,
            total_away_score AS total_score,
            total_home_score AS total_opp_score,
            away_score AS score,
            home_score AS opp_score,
            desc,
            ROW_NUMBER() OVER(PARTITION BY game_id, away_team) AS quarter
        FROM data
    )
    SELECT *
    FROM home_team
    UNION
    SELECT *
    FROM away_team
    ORDER BY game_id, team, quarter
)
SELECT
    *,
    /*
    Get the amount of points scored in the quarter
    If the result is NULL from the LAG function,
    COALESCE() will return the total_score value for the quarter
    */
    COALESCE(total_score - LAG(total_score, 1) OVER(PARTITION BY game_id, team ORDER BY quarter), total_score) AS points_scored,
    CASE
        WHEN score > opp_score THEN 1
        WHEN score < opp_score THEN 0
        ELSE NULL
    END AS win_loss,
    CASE
        WHEN total_score > total_opp_score THEN 1
        WHEN total_score < total_opp_score THEN 0
        ELSE NULL
    END AS lead_trail
FROM stacked
ORDER BY game_id, team
"""

df = pd.read_sql(query, conn)
df.head(15)

Unnamed: 0,game_id,season,week,team,total_score,total_opp_score,score,opp_score,desc,quarter,points_scored,win_loss,lead_trail
0,2022_01_BAL_NYJ,2022,1,BAL,3.0,0.0,24,9,END QUARTER 1,1,3.0,1.0,1.0
1,2022_01_BAL_NYJ,2022,1,BAL,10.0,3.0,24,9,END QUARTER 2,2,7.0,1.0,1.0
2,2022_01_BAL_NYJ,2022,1,BAL,24.0,3.0,24,9,END QUARTER 3,3,14.0,1.0,1.0
3,2022_01_BAL_NYJ,2022,1,BAL,24.0,9.0,24,9,END GAME,4,0.0,1.0,1.0
4,2022_01_BAL_NYJ,2022,1,NYJ,0.0,3.0,9,24,END QUARTER 1,1,0.0,0.0,0.0
5,2022_01_BAL_NYJ,2022,1,NYJ,3.0,10.0,9,24,END QUARTER 2,2,3.0,0.0,0.0
6,2022_01_BAL_NYJ,2022,1,NYJ,3.0,24.0,9,24,END QUARTER 3,3,0.0,0.0,0.0
7,2022_01_BAL_NYJ,2022,1,NYJ,9.0,24.0,9,24,END GAME,4,6.0,0.0,0.0
8,2022_01_BUF_LA,2022,1,BUF,7.0,0.0,31,10,END QUARTER 1,1,7.0,1.0,1.0
9,2022_01_BUF_LA,2022,1,BUF,10.0,10.0,31,10,END QUARTER 2,2,3.0,1.0,


This query produces a table at the game, team, quarter level. From here I can aggregate aggregate by quarter or by half. I'll need to create a flag that tells me if a team is winning at half time and if team won game.

In [12]:
# Get the number of times a team has led at halftime and lost the game
df.query("desc == 'END QUARTER 2' & lead_trail == 1 & win_loss == 0")\
    .groupby('team')['game_id']\
    .count()\
    .sort_values(ascending=False)[:5]
    

team
DEN    5
LV     4
LAC    4
BAL    3
LA     3
Name: game_id, dtype: int64

Denver has lost the most games where they were actually **winning** at halftime! 

In [14]:
df.query("desc == 'END QUARTER 2' & lead_trail == 1 & win_loss == 0 & team == 'DEN'")

Unnamed: 0,game_id,season,week,team,total_score,total_opp_score,score,opp_score,desc,quarter,points_scored,win_loss,lead_trail
561,2022_05_IND_DEN,2022,5,DEN,6.0,3.0,9,12,END QUARTER 2,2,3.0,0.0,1.0
689,2022_06_DEN_LAC,2022,6,DEN,13.0,10.0,16,19,END QUARTER 2,2,3.0,0.0,1.0
1121,2022_10_DEN_TEN,2022,10,DEN,10.0,7.0,10,17,END QUARTER 2,2,10.0,0.0,1.0
1265,2022_11_LV_DEN,2022,11,DEN,10.0,7.0,16,22,END QUARTER 2,2,3.0,0.0,1.0
1461,2022_13_DEN_BAL,2022,13,DEN,6.0,3.0,9,10,END QUARTER 2,2,3.0,0.0,1.0


All of these losses for Denver were only by one score so that's not too bad.  

Let's find the opposite – the number of times a team has *trailed* at halftime and ended up winning the game.

In [15]:
df.query("desc == 'END QUARTER 2' & lead_trail == 0 & win_loss == 1")\
    .groupby('team')['game_id']\
    .count()\
    .sort_values(ascending=False)[:5]

team
KC     4
IND    3
TB     3
NYG    3
NO     3
Name: game_id, dtype: int64

Kansas City being able to pull out wins comes at no surprise although I'm a little surprised they weren't leading at halftime to begin with.

In [17]:
df.query("desc == 'END QUARTER 2' & lead_trail == 0 & win_loss == 1 & team == 'KC'")

Unnamed: 0,game_id,season,week,team,total_score,total_opp_score,score,opp_score,desc,quarter,points_scored,win_loss,lead_trail
185,2022_02_LAC_KC,2022,2,KC,7.0,10.0,27,24,END QUARTER 2,2,7.0,1.0,0.0
577,2022_05_LV_KC,2022,5,KC,10.0,20.0,30,29,END QUARTER 2,2,10.0,1.0,0.0
1081,2022_09_TEN_KC,2022,9,KC,9.0,14.0,20,17,END QUARTER 2,2,6.0,1.0,0.0
1249,2022_11_KC_LAC,2022,11,KC,13.0,20.0,30,27,END QUARTER 2,2,7.0,1.0,0.0


They were *just barely* able to eek out these wins!

Lastly, I'll look to see teams that were leading at the end of the 3rd quarter and managed to lose the game. Not good

In [20]:
df.query("desc == 'END QUARTER 3' & lead_trail == 1 & win_loss == 0")\
    .groupby('team')['game_id']\
    .count()\
    .sort_values(ascending=False)[:5]

team
CHI    3
CLE    3
LV     3
LAC    3
BAL    3
Name: game_id, dtype: int64

Vegas and the Chargers show up in this list again...this helps explain why they have been so disappointing this year – they are giving up leads not only at halftime but also late into the game!  

Can't finish without checking the opposite...the teams that were losing at at the end of the 3rd and came back to win the game.

In [21]:
df.query("desc == 'END QUARTER 3' & lead_trail == 0 & win_loss == 1")\
    .groupby('team')['game_id']\
    .count()\
    .sort_values(ascending=False)[:5]

team
MIN    4
NYG    4
IND    3
CIN    3
KC     3
Name: game_id, dtype: int64