# Comeback kids (and their coaches)

Few things are more inspiring than a comeback in sports. When a team that has dug themselves a hole manages to reverse the momentum of the game, it creates an emotional rollercoaster that players, coaches and fans can all experience together. However, for every Disney movie that sports comebacks have inspired, there is also a crowd full of devastated fans. There is even a term to describe the instinctive pose when one's team has collapsed before their eyes: the surrender cobra. Personally, I think college football comebacks are the most exciting of all. Unlike baseball, football has a clock, so this adds an extra challenge in for teams trying to stage a comeback. It's not just about making up the point difference... it has to be done before the clock hits 0:00. Unlike basketball, points can come in larger chunks, so a single play can have more weight on the outcome of a football game than typical jump shots, blocks, steals, etc. do in basketball. Having grown up in the state of Alabama and walked on to Auburn's team, I could be biased, but a football comeback is tough to beat.

With this in mind, I wanted to know more about the people behind these comebacks: the head coaches. Some coaches are solemn strategists who remain unflappable while facing a hefty deficit. Others are fiery competitors who can spontaneously deliver an inspirational speech, chest bump the players or chew out a ref to get their teams going. Others are a mixture of both. Regardless, some coaches just cannot be counted out when their teams are down. Their teams simply refuse to quit. On the flip side, some fans cannot rest because they know their coach can blow a lead in the blink of an eye. It is a helpless, devastating feeling that can take an indefinite amount of time to heal depending on the final outcome of the game and the sequence of events leading to the opponent taking the lead.

For the sake of determining which coaches most frequently find themselves involved on either side of a comeback, I defined a comeback as when a team is losing by 17+ points and goes on to retake the lead. It does not necessarily mean that they win the game, but it does mean that the deficit of 17+ points was erased at some point in the game. Also, more than one comeback (by our definition) will sometimes occur in a game and create an unhealthy range of emotions for all fans involved. To gain a better understanding of college football comebacks and the ringleaders behind them, I used the API of [CollegeFootballData.com](https://collegefootballdata.com/) for schedule, coaching and play-by-play data and SAS Visual Analytics to create meaningful graphs and charts of the data. The data pertains to the years 2001 (when CollegeFootballData.com's play-by-play data begins) to 2019 (the final year of complete play-by-play data).

#### Import pandas and numpy

In [1]:
import pandas as pd
import numpy as np

#### Display 20 rows of datasets by default
#### Establish API Base URL

In [2]:
pd.set_option('display.max_rows', 20)
base = 'https://api.collegefootballdata.com'

#### Retrieve Coach data from 2000 to present

In [3]:
coach_df = pd.read_json('{}/coaches?minYear=2000'.format(base))
coach_df

Unnamed: 0,first_name,last_name,seasons
0,Steve,Addazio,"[{'school': 'Temple', 'year': 2011, 'games': 1..."
1,Terry,Allen,"[{'school': 'Kansas', 'year': 2000, 'games': 1..."
2,Tom,Allen,"[{'school': 'Indiana', 'year': 2016, 'games': ..."
3,Barry,Alvarez,"[{'school': 'Wisconsin', 'year': 2000, 'games'..."
4,Chuck,Amato,"[{'school': 'NC State', 'year': 2000, 'games':..."
...,...,...,...
433,Jeff,Woodruff,"[{'school': 'Eastern Michigan', 'year': 2000, ..."
434,Chris,Woods,"[{'school': 'Texas State', 'year': 2018, 'game..."
435,Brian,Wright,"[{'school': 'Florida Atlantic', 'year': 2013, ..."
436,Paul,Wulff,"[{'school': 'Washington State', 'year': 2008, ..."


#### Establish one coach for each school-year combination based on most games coached that year

In [4]:
coach_df2 = pd.DataFrame()
for index, row in coach_df.iterrows():
    temp_df = pd.json_normalize(coach_df['seasons'][index])
    temp_df['first_name'] = row['first_name']
    temp_df['last_name'] = row['last_name']
    if coach_df2.empty:
        coach_df2 = temp_df.copy()
    else:
        coach_df2 = pd.concat([coach_df2, temp_df])
coach_df2 = coach_df2[['first_name','last_name','school','year','games']].reset_index(drop=True)
coach_df2 = coach_df2.loc[coach_df2.groupby(by=['school','year'])['games'].idxmax()].sort_values(by=['school','year'])
coach_df2

Unnamed: 0,first_name,last_name,school,year,games
595,Fisher,DeBerry,Air Force,2000,12
596,Fisher,DeBerry,Air Force,2001,12
597,Fisher,DeBerry,Air Force,2002,13
598,Fisher,DeBerry,Air Force,2003,12
599,Fisher,DeBerry,Air Force,2004,11
...,...,...,...,...,...
210,Craig,Bohl,Wyoming,2016,14
211,Craig,Bohl,Wyoming,2017,13
212,Craig,Bohl,Wyoming,2018,12
213,Craig,Bohl,Wyoming,2019,13


#### Determine which weeks had games each year so that we can iterate through the schedule

In [5]:
year_range = range(2001, 2020)
schedule_df = pd.DataFrame()
for year in year_range:
    temp_df = pd.read_json('{}/calendar?year={}'.format(base, str(year)))
    temp_df2 = pd.DataFrame({'year': year, 'week': pd.Series(temp_df['week'].unique())})
    if schedule_df.empty:
        schedule_df = temp_df2.copy()
    else:
        schedule_df = pd.concat([schedule_df, temp_df2])
schedule_df

Unnamed: 0,year,week
0,2001,1
1,2001,2
2,2001,3
3,2001,5
4,2001,6
...,...,...
10,2019,11
11,2019,12
12,2019,13
13,2019,14


#### Merge coach data with results from games

In [6]:
results_df = pd.DataFrame()
for year in year_range:
    temp_df = pd.read_json('{}/games?year={}'.format(base, str(year)))
    temp_df2 = pd.merge(temp_df, coach_df2[coach_df2['year'] == year], how='left', left_on='home_team', right_on='school')
    temp_df2['home_coach'] = temp_df2['first_name'] + ' ' + temp_df2['last_name']
    temp_df2.rename(columns={"id": "game_id"}, inplace=True)
    temp_df3 = pd.merge(temp_df2, coach_df2[coach_df2['year'] == year], how='left', left_on='away_team', right_on='school')
    temp_df3['away_coach'] = temp_df3['first_name_y'] + ' ' + temp_df3['last_name_y']
    temp_df3['year'] = year
    temp_df3 = temp_df3[['year','game_id','home_team','home_points','home_coach','away_team','away_points','away_coach']]
    if results_df.empty:
        results_df = temp_df3.copy()
    else:
        results_df = pd.concat([results_df, temp_df3[~temp_df3['home_points'].isnull()]])
results_df['winning_team'] = np.where(results_df['home_points'] > results_df['away_points'], results_df['home_team'], results_df['away_team'])
results_df['losing_team'] = np.where(results_df['home_points'] > results_df['away_points'], results_df['away_team'], results_df['home_team'])
results_df

Unnamed: 0,year,game_id,home_team,home_points,home_coach,away_team,away_points,away_coach,winning_team,losing_team
0,2001,212350097,Louisville,45,John Smith,New Mexico State,24,Tony Samuel,Louisville,New Mexico State
1,2001,63770,Nebraska,21,Frank Solich,TCU,7,Gary Patterson,Nebraska,TCU
2,2001,212370275,Wisconsin,26,Barry Alvarez,Virginia,17,Al Groh,Wisconsin,Virginia
3,2001,212370252,BYU,70,Gary Crowton,Tulane,35,Chris Scelfo,BYU,Tulane
4,2001,212370201,Oklahoma,10,Bob Stoops,North Carolina,0,John Bunting,Oklahoma,North Carolina
...,...,...,...,...,...,...,...,...,...,...
843,2019,401132979,Boise State,31,Bryan Harsin,Hawai'i,10,Nick Rolovich,Boise State,Hawai'i
844,2019,401132981,LSU,37,Ed Orgeron,Georgia,10,Kirby Smart,LSU,Georgia
845,2019,401132975,Clemson,62,Dabo Swinney,Virginia,17,Bronco Mendenhall,Clemson,Virginia
846,2019,401132983,Wisconsin,21,Paul Chryst,Ohio State,34,Ryan Day,Ohio State,Wisconsin


#### Iterate through all of the games from 2001 to 2019 and accumulate the data from scoring plays to track scores throughout the stages of the games
#### NOTE: This takes a few minutes

In [7]:
score_change_df = pd.DataFrame()
print_year = '0'
for index, row in schedule_df.iterrows():
    year = str(row['year'])
    week = str(row['week'])
    if year != print_year:
        print_year = year
        print('Year: ' + print_year)
    temp_df = pd.read_json('{}/plays?year={}&week={}'.format(base, year, week))
    if len(temp_df.index) > 0:
        temp_df = temp_df[temp_df['scoring'] == True]
        temp_df = temp_df[['id','game_id','period','clock','offense','defense','offense_score','defense_score','scoring']].sort_values(by='id')
        if score_change_df.empty:
            score_change_df = temp_df.copy()
        else:
            score_change_df = pd.concat([score_change_df, temp_df])
score_change_df['winning_team_temp'] = np.where(score_change_df['offense_score'] > score_change_df['defense_score'], score_change_df['offense'], np.where(score_change_df['offense_score'] < score_change_df['defense_score'], score_change_df['defense'], 'tie'))
score_change_df['losing_team_temp'] = np.where(score_change_df['offense_score'] > score_change_df['defense_score'], score_change_df['defense'], np.where(score_change_df['offense_score'] < score_change_df['defense_score'], score_change_df['offense'], 'tie'))
score_change_df['winning_team_score_temp'] = np.where(score_change_df['offense_score'] > score_change_df['defense_score'], score_change_df['offense_score'], score_change_df['defense_score'])
score_change_df['losing_team_score_temp'] = np.where(score_change_df['offense_score'] > score_change_df['defense_score'], score_change_df['defense_score'], score_change_df['offense_score'])
score_change_df.drop(columns=['offense','defense','offense_score','defense_score'], inplace=True)
score_change_df

Year: 2001
Year: 2002
Year: 2003
Year: 2004
Year: 2005
Year: 2006
Year: 2007
Year: 2008
Year: 2009
Year: 2010
Year: 2011
Year: 2012
Year: 2013
Year: 2014
Year: 2015
Year: 2016
Year: 2017
Year: 2018
Year: 2019


Unnamed: 0,id,game_id,period,clock,scoring,winning_team_temp,losing_team_temp,winning_team_score_temp,losing_team_score_temp
8,2123702010204,212370201,1,"{'minutes': 13, 'seconds': 33}",True,Oklahoma,North Carolina,3,0
35,2123800380409,212380038,1,"{'minutes': 6, 'seconds': 54}",True,Fresno State,Colorado,6,0
36,2123800380410,212380038,1,"{'minutes': 6, 'seconds': 54}",True,Fresno State,Colorado,7,0
48,2123800380703,212380038,1,"{'minutes': 3, 'seconds': 47}",True,Fresno State,Colorado,13,0
49,2123800380704,212380038,1,"{'minutes': 3, 'seconds': 47}",True,Fresno State,Colorado,14,0
...,...,...,...,...,...,...,...,...,...
1910,401132984103944192,401132984,3,"{'minutes': 5, 'seconds': 57}",True,Appalachian State,Louisiana,42,17
1920,401132984103975808,401132984,3,"{'minutes': 2, 'seconds': 41}",True,Appalachian State,Louisiana,42,24
1946,401132984104918720,401132984,4,"{'minutes': 8, 'seconds': 12}",True,Appalachian State,Louisiana,45,24
1961,401132984104958080,401132984,4,"{'minutes': 4, 'seconds': 18}",True,Appalachian State,Louisiana,45,31


#### Merge game result data with play by play data to compare final score with temporary scores throughout the games

In [8]:
merged_df = pd.merge(results_df, score_change_df, on='game_id').sort_values(by='id')
# Ensure no score statuses are double-counted
merged_df.drop_duplicates(subset=['game_id','winning_team_score_temp','losing_team_score_temp'], inplace=True)
merged_df.to_csv('output_data.csv', index=False)
merged_df

Unnamed: 0,year,game_id,home_team,home_points,home_coach,away_team,away_points,away_coach,winning_team,losing_team,id,period,clock,scoring,winning_team_temp,losing_team_temp,winning_team_score_temp,losing_team_score_temp
19121,2005,252440009,Arizona State,63,Dirk Koetter,Temple,16,Bobby Wallace,Arizona State,Temple,252440009032,1,"{'minutes': 5, 'seconds': 56}",True,Arizona State,Temple,6,0
19122,2005,252440009,Arizona State,63,Dirk Koetter,Temple,16,Bobby Wallace,Arizona State,Temple,252440009033,1,"{'minutes': 5, 'seconds': 56}",True,Arizona State,Temple,7,0
19123,2005,252440009,Arizona State,63,Dirk Koetter,Temple,16,Bobby Wallace,Arizona State,Temple,252440009053,1,"{'seconds': 13, 'minutes': 0}",True,Arizona State,Temple,13,0
19124,2005,252440009,Arizona State,63,Dirk Koetter,Temple,16,Bobby Wallace,Arizona State,Temple,252440009054,1,"{'seconds': 13, 'minutes': 0}",True,Arizona State,Temple,14,0
19125,2005,252440009,Arizona State,63,Dirk Koetter,Temple,16,Bobby Wallace,Arizona State,Temple,252440009069,2,"{'minutes': 10, 'seconds': 11}",True,Arizona State,Temple,14,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
163743,2019,401183793,New Mexico,22,Bob Davie,Air Force,44,Troy Calhoun,Air Force,New Mexico,401183793103947520,3,"{'minutes': 5, 'seconds': 24}",True,Air Force,New Mexico,23,16
163744,2019,401183793,New Mexico,22,Bob Davie,Air Force,44,Troy Calhoun,Air Force,New Mexico,401183793103977280,3,"{'minutes': 2, 'seconds': 26}",True,Air Force,New Mexico,30,16
163745,2019,401183793,New Mexico,22,Bob Davie,Air Force,44,Troy Calhoun,Air Force,New Mexico,401183793104869120,4,"{'minutes': 13, 'seconds': 8}",True,Air Force,New Mexico,37,16
163746,2019,401183793,New Mexico,22,Bob Davie,Air Force,44,Troy Calhoun,Air Force,New Mexico,401183793104924928,4,"{'minutes': 7, 'seconds': 50}",True,Air Force,New Mexico,44,16


#### Filter down to games that qualify as a "blowout" (games with comeback potential by our definition)

In [9]:
blowout_games_df = merged_df[merged_df['winning_team_score_temp'] - merged_df['losing_team_score_temp'] >= 17].copy()
blowout_games_df['diff'] = blowout_games_df['winning_team_score_temp'] - blowout_games_df['losing_team_score_temp']
blowout_games_df = blowout_games_df.loc[blowout_games_df.groupby(by=['game_id','winning_team_temp'])['diff'].idxmax()].sort_values(by='id')
blowout_games_df

Unnamed: 0,year,game_id,home_team,home_points,home_coach,away_team,away_points,away_coach,winning_team,losing_team,id,period,clock,scoring,winning_team_temp,losing_team_temp,winning_team_score_temp,losing_team_score_temp,diff
19138,2005,252440009,Arizona State,63,Dirk Koetter,Temple,16,Bobby Wallace,Arizona State,Temple,252440009189,4,"{'minutes': 5, 'seconds': 24}",True,Arizona State,Temple,56,9,47
19102,2005,252440041,Connecticut,38,Randy Edsall,Buffalo,0,Jim Hofher,Connecticut,Buffalo,252440041184,4,"{'minutes': 9, 'seconds': 0}",True,Connecticut,Buffalo,38,0,38
19118,2005,252440265,Washington State,38,Bill Doba,Idaho,26,Nick Holt,Washington State,Idaho,252440265207,4,"{'minutes': 13, 'seconds': 19}",True,Washington State,Idaho,38,19,19
19082,2005,252442579,South Carolina,24,Steve Spurrier,UCF,15,George O'Leary,South Carolina,UCF,252442579118,3,"{'minutes': 6, 'seconds': 35}",True,South Carolina,UCF,24,3,21
19069,2005,252442649,Toledo,62,Tom Amstutz,Western Illinois,14,,Toledo,Western Illinois,252442649178,4,"{'minutes': 8, 'seconds': 24}",True,Toledo,Western Illinois,62,0,62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
164787,2019,401132980,Florida Atlantic,49,Lane Kiffin,UAB,6,Bill Clark,Florida Atlantic,UAB,401132980104904832,4,"{'minutes': 9, 'seconds': 51}",True,Florida Atlantic,UAB,49,6,43
164811,2019,401132981,LSU,37,Ed Orgeron,Georgia,10,Kirby Smart,LSU,Georgia,401132981103995392,3,"{'seconds': 45, 'minutes': 0}",True,LSU,Georgia,34,3,31
164754,2019,401132984,Appalachian State,45,Eli Drinkwitz,Louisiana,38,Billy Napier,Appalachian State,Louisiana,401132984103944192,3,"{'minutes': 5, 'seconds': 57}",True,Appalachian State,Louisiana,42,17,25
157010,2019,401135910,Charlotte,49,Will Healy,Gardner-Webb,28,,Charlotte,Gardner-Webb,401135910104858432,4,"{'minutes': 14, 'seconds': 15}",True,Charlotte,Gardner-Webb,49,21,28


#### Filter down to games where a blowout lead was blown (comeback successful)

In [10]:
comeback_wins_df = blowout_games_df[blowout_games_df['winning_team'] != blowout_games_df['winning_team_temp']].copy()
comeback_wins_df

Unnamed: 0,year,game_id,home_team,home_points,home_coach,away_team,away_points,away_coach,winning_team,losing_team,id,period,clock,scoring,winning_team_temp,losing_team_temp,winning_team_score_temp,losing_team_score_temp,diff
19225,2005,252460356,Illinois,33,Ron Zook,Rutgers,30,Greg Schiano,Illinois,Rutgers,252460356141,3,"{'minutes': 10, 'seconds': 53}",True,Rutgers,Illinois,27,7,20
21390,2005,252670252,BYU,50,Bronco Mendenhall,TCU,51,Gary Patterson,TCU,BYU,252670252153,3,"{'minutes': 10, 'seconds': 14}",True,BYU,TCU,34,16,18
21729,2005,252690099,LSU,27,Les Miles,Tennessee,30,Phillip Fulmer,Tennessee,LSU,252690099094,2,"{'minutes': 6, 'seconds': 55}",True,LSU,Tennessee,21,0,21
22140,2005,252740009,Arizona State,28,Dirk Koetter,USC,38,Pete Carroll,USC,Arizona State,252740009113,2,"{'minutes': 4, 'seconds': 34}",True,Arizona State,USC,21,3,18
23121,2005,252872655,Tulane,21,Chris Scelfo,UTEP,45,Mike Price,UTEP,Tulane,252872655150,3,"{'minutes': 8, 'seconds': 7}",True,Tulane,UTEP,21,0,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
163082,2019,401119303,Kent State,30,Sean Lewis,Buffalo,27,Lance Leipold,Kent State,Buffalo,401119303104888128,4,"{'minutes': 11, 'seconds': 18}",True,Buffalo,Kent State,27,6,21
163296,2019,401119304,Ball State,44,Mike Neu,Central Michigan,45,Jim McElwain,Central Michigan,Ball State,401119304102979776,2,"{'minutes': 2, 'seconds': 1}",True,Ball State,Central Michigan,20,3,17
164203,2019,401119314,Ball State,41,Mike Neu,Miami (OH),27,Chuck Martin,Ball State,Miami (OH),401119314102895808,2,"{'minutes': 10, 'seconds': 41}",True,Miami (OH),Ball State,24,7,17
158168,2019,401121937,Georgia State,48,Shawn Elliott,Furman,42,,Georgia State,Furman,401121937102874432,2,"{'minutes': 12, 'seconds': 55}",True,Furman,Georgia State,20,3,17


#### The rare "double comeback" games where both teams managed to erase a 17+ point deficit in a single game!

In [11]:
double_comeback_games_df = blowout_games_df[blowout_games_df.duplicated(subset='game_id', keep=False)].copy()
double_comeback_games_df

Unnamed: 0,year,game_id,home_team,home_points,home_coach,away_team,away_points,away_coach,winning_team,losing_team,id,period,clock,scoring,winning_team_temp,losing_team_temp,winning_team_score_temp,losing_team_score_temp,diff
23121,2005,252872655,Tulane,21,Chris Scelfo,UTEP,45,Mike Price,UTEP,Tulane,252872655150,3,"{'minutes': 8, 'seconds': 7}",True,Tulane,UTEP,21,0,21
23126,2005,252872655,Tulane,21,Chris Scelfo,UTEP,45,Mike Price,UTEP,Tulane,252872655202,4,"{'minutes': 10, 'seconds': 34}",True,UTEP,Tulane,45,21,24
24842,2005,253020197,Oklahoma State,28,Mike Gundy,Texas,47,Mack Brown,Texas,Oklahoma State,253020197091,2,"{'minutes': 5, 'seconds': 44}",True,Oklahoma State,Texas,28,9,19
24852,2005,253020197,Oklahoma State,28,Mike Gundy,Texas,47,Mack Brown,Texas,Oklahoma State,253020197205,4,"{'minutes': 3, 'seconds': 39}",True,Texas,Oklahoma State,47,28,19
31464,2006,262800061,Georgia,33,Mark Richt,Tennessee,51,Phillip Fulmer,Tennessee,Georgia,262800061074,2,"{'minutes': 4, 'seconds': 50}",True,Georgia,Tennessee,24,7,17
31478,2006,262800061,Georgia,33,Mark Richt,Tennessee,51,Phillip Fulmer,Tennessee,Georgia,262800061197,4,"{'minutes': 2, 'seconds': 55}",True,Tennessee,Georgia,51,33,18
31833,2006,262872305,Kansas,32,Mark Mangino,Oklahoma State,42,Mike Gundy,Oklahoma State,Kansas,262872305113,3,"{'minutes': 10, 'seconds': 46}",True,Kansas,Oklahoma State,17,0,17
31847,2006,262872305,Kansas,32,Mark Mangino,Oklahoma State,42,Mike Gundy,Oklahoma State,Kansas,262872305196,4,"{'minutes': 2, 'seconds': 27}",True,Oklahoma State,Kansas,42,25,17
44541,2007,273140249,North Texas,62,Todd Dodge,Navy,74,Paul Johnson,Navy,North Texas,273140249046,1,"{'seconds': 46, 'minutes': 0}",True,North Texas,Navy,21,3,18
44569,2007,273140249,North Texas,62,Todd Dodge,Navy,74,Paul Johnson,Navy,North Texas,273140249191,4,"{'minutes': 8, 'seconds': 16}",True,Navy,North Texas,73,56,17
