# Day 46

I've heard announcers talking about teams that lead at halftime and then end up losing the game and how many times a team has done this in 2022. So, I want to see which teams lead at halftime but fail to win and the teams that trail at halftime and manage to win. 

I'll solve this question by mainly using SQL.

In [1]:
import pandas as pd
import numpy as np
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_palette('deep')

# Create database connection
conn = sqlite3.connect('../../data/db/database.db')

## Query the Data

In [2]:
query = """
WITH data AS (
    -- Get the score at the end of each quarter
    SELECT
        game_id,
        season,
        week,
        home_team,
        away_team,
        total_home_score,
        total_away_score,
        away_score,
        home_score,
        desc
    FROM pbp
    WHERE season = 2022
        -- AND week = 1
        -- AND game_id = '2022_01_BAL_NYJ'
        AND desc IN ('END QUARTER 1', 'END QUARTER 2', 'END QUARTER 3', 'END GAME')
), 
-- Stack the dataset to make it earier to work with
stacked AS (
    WITH home_team AS (
        SELECT 
            game_id,
            season,
            week,
            home_team AS team,
            total_home_score AS total_score,
            total_away_score AS total_opp_score,
            home_score AS score,
            away_score AS opp_score,
            desc,
            ROW_NUMBER() OVER(PARTITION BY game_id, home_team) AS quarter
        FROM data
    ), away_team AS (
        SELECT 
            game_id,
            season,
            week,
            away_team AS team,
            total_away_score AS total_score,
            total_home_score AS total_opp_score,
            away_score AS score,
            home_score AS opp_score,
            desc,
            ROW_NUMBER() OVER(PARTITION BY game_id, away_team) AS quarter
        FROM data
    )
    SELECT *
    FROM home_team
    UNION
    SELECT *
    FROM away_team
    ORDER BY game_id, team, quarter
)
SELECT
    *,
    /*
    Get the amount of points scored in the quarter
    If the result is NULL from the LAG function,
    COALESCE() will return the total_score value for the quarter
    */
    COALESCE(total_score - LAG(total_score, 1) OVER(PARTITION BY game_id, team ORDER BY quarter), total_score) AS points_scored
FROM stacked
ORDER BY game_id, team
"""

df = pd.read_sql(query, conn)
df.head(10)

Unnamed: 0,game_id,season,week,team,total_score,total_opp_score,score,opp_score,desc,quarter,points_scored
0,2022_01_BAL_NYJ,2022,1,BAL,3.0,0.0,24,9,END QUARTER 1,1,3.0
1,2022_01_BAL_NYJ,2022,1,BAL,10.0,3.0,24,9,END QUARTER 2,2,7.0
2,2022_01_BAL_NYJ,2022,1,BAL,24.0,3.0,24,9,END QUARTER 3,3,14.0
3,2022_01_BAL_NYJ,2022,1,BAL,24.0,9.0,24,9,END GAME,4,0.0
4,2022_01_BAL_NYJ,2022,1,NYJ,0.0,3.0,9,24,END QUARTER 1,1,0.0
5,2022_01_BAL_NYJ,2022,1,NYJ,3.0,10.0,9,24,END QUARTER 2,2,3.0
6,2022_01_BAL_NYJ,2022,1,NYJ,3.0,24.0,9,24,END QUARTER 3,3,0.0
7,2022_01_BAL_NYJ,2022,1,NYJ,9.0,24.0,9,24,END GAME,4,6.0
8,2022_01_BUF_LA,2022,1,BUF,7.0,0.0,31,10,END QUARTER 1,1,7.0
9,2022_01_BUF_LA,2022,1,BUF,10.0,10.0,31,10,END QUARTER 2,2,3.0


This query produces a table at the game, team, quarter level. From here I can aggregate aggregate by quarter or by half. I'll need to create a flag that tells me if a team is winning at half time and if team won game.