# Conceit
I want to try and simulate/model NFL games via a Monte Carlo methodology. The idea is to choose drives from a team's history and use them to construct a complete football game. In its simplest form, this will look like randomly choosing drive outcomes from a team's history. In a more realistic model, the game situation should guide what drive outcomes are more likely; a drive starting with 20 seconds to go in the half is much less likely to result in a touchdown or field goal than an opening drive.

In this document, I aim to use drive-level history and this kind of Monte Carlo simulation to find and implement a reasonable way to model NFL football games.

In [1]:
# Load relevant packages
import pandas as pd
import numpy as np

In [2]:
# Read drive-level data from csv
alldrives = pd.read_csv('../data/espn_drives2009-2017.csv')
alldrives.sample(5)

Unnamed: 0.1,Unnamed: 0,away,away_score_after,away_score_before,drive,home,home_score_after,home_score_before,offense,plays,...,uid,TD,FG,punt,turnover,EoH,secs_rem,starting_fieldposition,time_in_secs,left_in_half
281,281,ARI,16,16,27,SF,20,20,SF,3,...,290913022-27,0,0,1,0,0,111.0,-17.0,55,111.0
51557,51557,TEN,17,17,23,CIN,13,13,TEN,3,...,400951656-23,0,0,1,0,0,411.0,-24.0,67,411.0
22672,22672,TEN,3,3,17,HOU,21,21,TEN,3,...,321202010-17,0,0,1,0,0,1800.0,15.0,113,1800.0
16568,16568,MIN,32,32,28,DEN,32,29,DEN,7,...,311204016-28,0,1,0,0,0,186.0,-30.0,93,186.0
37162,37162,OAK,10,10,5,BAL,10,3,BAL,6,...,400791702-5,1,0,0,0,0,2869.0,-30.0,179,1069.0


In [3]:
alldrives.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54298 entries, 0 to 54297
Data columns (total 24 columns):
Unnamed: 0                54298 non-null int64
away                      54298 non-null object
away_score_after          54298 non-null int64
away_score_before         54298 non-null int64
drive                     54298 non-null int64
home                      54298 non-null object
home_score_after          54298 non-null int64
home_score_before         54298 non-null int64
offense                   54298 non-null object
plays                     54298 non-null int64
result                    54298 non-null object
time                      54298 non-null object
yds_gained                54298 non-null int64
gameId                    54298 non-null int64
uid                       54298 non-null object
TD                        54298 non-null int64
FG                        54298 non-null int64
punt                      54298 non-null int64
turnover                  54298 non-nul

Given game situation, write a function to make a set of possible drives that could come next. Then choose one of those drives.

In simple case, game situation means home team, away team, whether the home team should be the offense, and how much time is remaining in the half. To make the list of possible next drives, filter all drives for those where the offensive team has posession or the defensive team does not have posession. In addition, the drive time should be <= the time remaining in the simulated game's half.

In [4]:
# game_situation (home, away, home_poss, time_rem)

home = 'CAR'
away = 'MIA'
home_poss = True
time_rem = 150

# Set some keywords for the filter
if home_poss:
    off_team = home
    def_team = away  
else:
    off_team = away
    def_team = home
    
# Filter for plays for this offense and defense
teamdrives = alldrives.loc[ 
           # Condition 1: team is the offense
           ( 
             alldrives.offense.astype(str) == off_team 
           ) 
           |  # OR
           # Condition 2:
           ( # Defensive team is home or away
             ( 
               (alldrives.home.astype(str) == def_team) |
               (alldrives.away.astype(str) == def_team)
             ) 
             &  # AND
               # not the offense
             (alldrives.offense.astype(str) != def_team)
           )
         ]

possible_drives = teamdrives[ teamdrives.time_in_secs <= time_rem ]

possible_drives.sample(5)

Unnamed: 0.1,Unnamed: 0,away,away_score_after,away_score_before,drive,home,home_score_after,home_score_before,offense,plays,...,uid,TD,FG,punt,turnover,EoH,secs_rem,starting_fieldposition,time_in_secs,left_in_half
21002,21002,NYJ,3,3,21,MIA,27,27,NYJ,4,...,321028020-21,0,0,0,1,0,1237.0,18.0,112,1237.0
42773,42773,SEA,3,3,4,MIA,0,0,SEA,3,...,400874577-4,0,0,1,0,0,2769.0,-33.0,120,969.0
39300,39300,CAR,10,10,7,IND,0,0,CAR,3,...,400791690-7,0,0,1,0,0,2725.0,-40.0,45,925.0
7705,7705,CAR,6,6,27,CHI,17,17,CAR,3,...,301010029-27,0,0,1,0,0,402.0,-35.0,39,402.0
17001,17001,CAR,23,16,9,ATL,7,7,CAR,4,...,311211029-9,1,0,0,0,0,2102.0,-10.0,119,302.0


Now, given a list of possible drives, assign a weight to each, and choose one based on the weights.

In [5]:
# Assign weight to each drive. Start simple
drives = possible_drives.index.values.tolist()
drive_weights = [ 1 for d in drives ]
for i, drive in enumerate(drives):
    drive_weights[i] = 1

In [139]:
import random

# Function to return one item from a list, where each has a weight
def select(container, weights):
    total_weight = float(sum(weights))
    rel_weight = [w / total_weight for w in weights]

    # Probability for each element
    probs = [sum(rel_weight[:i + 1]) for i in range(len(rel_weight))]
    
    r = random.random()
    for (i, element) in enumerate(container):
        if r <= probs[i]:
            break

    return element

In [8]:
drive_id = select(drives, drive_weights)
possible_drives.loc[drive_id,:]

Unnamed: 0                            2349
away                                   MIA
away_score_after                         7
away_score_before                        0
drive                                    2
home                                    NO
home_score_after                         0
home_score_before                        0
offense                                 NO
plays                                    2
result                    Intercepted Pass
time                                  0:54
yds_gained                              20
gameId                           291025015
uid                            291025015-2
TD                                       0
FG                                       0
punt                                     0
turnover                                 1
EoH                                      0
secs_rem                              3199
starting_fieldposition                 -43
time_in_secs                            54
left_in_hal

In [90]:
def get_possible_drives(home,away,home_poss,time_rem):

    # Set some keywords for the filter
    if home_poss:
        off_team = home
        def_team = away  
    else:
        off_team = away
        def_team = home
    
    # Filter for plays for this offense and defense
    teamdrives = alldrives.loc[ 
               # Condition 1: team is the offense
               ( 
                 alldrives.offense.astype(str) == off_team 
               ) 
               |  # OR
               # Condition 2:
               ( # Defensive team is home or away
                 ( 
                   (alldrives.home.astype(str) == def_team) |
                   (alldrives.away.astype(str) == def_team)
                 ) 
                 &  # AND
                   # not the offense
                 (alldrives.offense.astype(str) != def_team)
               )
             ]

    poss_ds = teamdrives[ teamdrives.time_in_secs <= time_rem ]
    
    # Cut out End of Half drives unless they're relevant
    possible_drives = poss_ds[ ~(poss_ds.EoH==1) | ~(poss_ds.time_in_secs < time_rem - 20) ]

    return possible_drives


def get_drive_weights(drives_df):
    # Assign weight to each drive. Start simple
    drives = drives_df.index.values.tolist()
    drive_weights = [ 1 for d in drives ]
    for i, drive in enumerate(drives):
        drive_weights[i] = 1
        
    return drive_weights
    
    
# Function that takes game situation and returns a drive.
def next_drive(game_sit):
    """Takes a tuple describing the game situation.
    Returns a Series describing the next drive."""
    (home, away, home_poss, time_rem, home_score, away_score) = game_sit
    
    # Get df of possible drives
    poss_drives = get_possible_drives(home, away, home_poss, time_rem)
    
    # Get weights for the possible drives
    drive_ids = poss_drives.index.values.tolist()
    weights = get_drive_weights(poss_drives)
    
    # Randomly choose a drive, with weights assigned to each.
    chosen_drive_id = select(drive_ids, weights)
    
    return poss_drives.loc[chosen_drive_id, :]    

In [10]:
# Test the next_drive function
game_sit = ( "CAR", # home
             "MIA", # away
             True,  # home has possession
             200,   # seconds remaining
             10,    # home score
             10,    # away score
           )

next_drive(game_sit)

Unnamed: 0                        1099
away                               DAL
away_score_after                    10
away_score_before                   10
drive                               15
home                               CAR
home_score_after                     7
home_score_before                    7
offense                            CAR
plays                                3
result                            Punt
time                              1:26
yds_gained                           9
gameId                       290928006
uid                       290928006-15
TD                                   0
FG                                   0
punt                                 1
turnover                             0
EoH                                  0
secs_rem                          1194
starting_fieldposition             -34
time_in_secs                        86
left_in_half                      1194
Name: 1099, dtype: object

In [132]:
# Define a class that constitutes a game
class football_game:
    """Class for representing a football game"""
    def __init__(self,home,away):
        # Set some initial values
        self.home = home
        self.away = away
        self.half = 1
        self.time_rem = 1800
        self.home_score = 0
        self.away_score = 0
        
        # Decide which team gets the ball to start
        coin = random.randint(1,2)
        if coin == 1:
            self.home_poss = True
        else:
            self.home_poss = False
            
            
    def get_game_sit(self):
        game_sit = (self.home,
                    self.away,
                    self.home_poss,
                    self.time_rem,
                    self.home_score,
                    self.away_score )
        return game_sit
    
    def game_sit_series(self, drive):
        # Figure out which team has possession
        if self.home_poss:
            possessor = self.home
        else:
            possessor = self.away
        
        sit_dict = {'home':self.home,
                    'away':self.away,
                    'offense':possessor,
                    'half':self.half,
                    'time_rem':self.time_rem,
                    'home_score':self.home_score,
                    'away_score':self.away_score,
                    'result':drive.result}
        return pd.Series(sit_dict)
    
    
    def game_sit_dict(self):
        # Figure out which team has possession
        if self.home_poss:
            possessor = self.home
        else:
            possessor = self.away
        
        sit_dict = {'home':self.home,
                    'away':self.away,
                    'offense':possessor,
                    'half':self.half,
                    'time_rem':self.time_rem,
                    'home_score':self.home_score,
                    'away_score':self.away_score}
        return sit_dict
    
    
    def update_game_sit(self,drive):
        """Takes a Series and updates game situation vars accordingly"""
        
        # Update clock and score
        self.time_rem -= drive.time_in_secs
        self.home_score += drive.home_score_after - drive.home_score_before
        self.away_score += drive.away_score_after - drive.away_score_before
        
        # Flip the possession arrow
        if self.home_poss:
            self.home_poss = False
        else:
            self.home_poss = True
            
            
    def record_drive(self,drive,drive_num=1):
        """Given a drive, update the proper quantities, 
        assuming dataframes for chosen drives and game history have 
        already been created"""
        # Get gamestate before this drive
        gamestate = self.game_sit_dict()
        
        # Clock changes
        gamestate_delta = {'time':drive.time_in_secs}
        if drive.time_in_secs < 10:
            gamestate_delta['time'] = 10
        
        # Score changes
        # Home team in selected drive might not be home team in sim. game
        if self.home_poss and (drive.offense == drive.home):
            flip = False
        elif self.home_poss and (drive.offense == drive.away):
            flip = True
        elif (not self.home_poss) and (drive.offense == drive.home):
            flip = True
        elif (not self.home_poss) and (drive.offense == drive.away):
            flip = False
        else:
            print("Something went wrong in determining flipped possession")
            
        if not flip:
            gamestate_delta['home_score'] = drive.home_score_after - drive.home_score_before
            gamestate_delta['away_score'] = drive.away_score_after - drive.away_score_before
        else:
            gamestate_delta['away_score'] = drive.home_score_after - drive.home_score_before
            gamestate_delta['home_score'] = drive.away_score_after - drive.away_score_before

        # Check for negative values in score delta
        scores_delta = (gamestate_delta['home_score'], gamestate_delta['away_score'])
        if sum([1 if (val < 0 or val > 8) else 0 for val in scores_delta]) > 0:
            # Recalculate score change based on drive result
            # Default to zero points
            gamestate_delta['home_score'] = 0
            gamestate_delta['away_score'] = 0
            if (drive.FG == 1):
                if self.home_poss:
                    gamestate_delta['home_score'] = 3
                else:
                    gamestate_delta['away_score'] = 3
            elif (drive.TD == 1):
                if self.home_poss:
                    gamestate_delta['home_score'] = 7
                else:
                    gamestate_delta['away_score'] = 7
                
        
        # Figure out whether possession arrow changes. Default True
        gamestate_delta['poss'] = True
        if ( (self.home_poss) & 
             (gamestate_delta['away_score'] != 0) ):
            gamestate_delta['poss'] = False
        elif ( (not self.home_poss) &
               (gamestate_delta['home_score'] != 0) ):
            gamestate_delta['poss'] = False
                    
#        # Add chosen drive to appropriate dataframe
#        drivedf = pd.Series.to_frame(drive)
#        dfs = [self.drives_selected, drivedf]
#        self.drives_selected = pd.concat( dfs, axis=1 )
        
        # Add entry to simulated game history
        this_series = pd.Series(gamestate)
        this_series['home_score_after'] = self.home_score + gamestate_delta['home_score']
        this_series['away_score_after'] = self.away_score + gamestate_delta['away_score']
        this_series['result'] = drive.result
        this_series['time'] = gamestate_delta['time']
        
#        # Additional for debugging
#        this_series['drive_hsb'] = drive.home_score_before
#        this_series['drive_hsa'] = drive.home_score_after
#        this_series['drive_asb'] = drive.away_score_before
#        this_series['drive_asa'] = drive.away_score_after
        
        if drive_num == 1:  # Need to start gamestate dataFrame
            self.gamestate_df = pd.Series.to_frame(this_series)
        else:            # Add this series to gamestate dF
            series_df = pd.Series.to_frame(this_series)
            dfs = [ self.gamestate_df, series_df ]
            self.gamestate_df = pd.concat( dfs, axis=1 )
        
        # Update the game's state vars
        self.home_score += gamestate_delta['home_score']
        self.away_score += gamestate_delta['away_score']
        self.time_rem -= gamestate_delta['time']
        if gamestate_delta['poss']:
            self.home_poss = not self.home_poss
    
    
    def check_for_EoH(self):
        pass

In [114]:
def simulate_game(home,away):

    # Need new wrapper for simulating a game
    newgame = football_game( home, away )
    
    # Assign possible drives for this game
    poss_drives = 

    # Choose the first drive
    game_sit = newgame.get_game_sit()
    first_drive = next_drive(game_sit)

    # Make drive history DF, starting with this first drive.
    newgame.drives_selected = pd.Series.to_frame(first_drive)

    # Update game object after the first drive
    drive_num = 1
    newgame.record_drive( first_drive, drive_num )

    #while newgame.time_rem.astype(float) >= 5:
    for half in (1,2):
        newgame.half = half
        if half > 1:
            newgame.time_rem = 1800
    
        end_of_half = False
        while (not end_of_half) and (newgame.time_rem > 0):
            drive_num += 1
    
            # Choose a new drive
            this_drive = next_drive( newgame.get_game_sit() )
            # Add drive to chosen drives dataframe
            newgame.drives_selected = pd.concat( [newgame.drives_selected, this_drive], axis=1 )
        
            # Update the game object
            newgame.record_drive( this_drive, drive_num )
        
            # Check for end of Half
            if this_drive.EoH == 1:
                end_of_half = True
        
    
    # Post-game, need to transpose the dataFrames
    newgame.drives_selected = newgame.drives_selected.transpose()
    newgame.gamestate_df = newgame.gamestate_df.transpose()
    
    newgame.result = [newgame.home, newgame.home_score, newgame.away, newgame.away_score]
    
    return newgame

In [117]:
game = simulate_game("TB",'NO')
print(game.result)
cols = [ 'home','away','offense','half','time_rem','time','result',
         'home_score','home_score_after','away_score','away_score_after' ]
game.gamestate_df[cols]

['TB', 17, 'NO', 35]


Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score,home_score_after,away_score,away_score_after
0,TB,NO,NO,1,1800,415,Touchdown,0,0,0,7
0,TB,NO,TB,1,1385,112,Punt,0,0,7,7
0,TB,NO,NO,1,1273,77,Touchdown,0,0,7,14
0,TB,NO,TB,1,1196,43,Field Goal,0,3,14,14
0,TB,NO,NO,1,1153,45,Punt,3,3,14,14
0,TB,NO,TB,1,1108,126,Punt,3,3,14,14
0,TB,NO,NO,1,982,103,Punt,3,3,14,14
0,TB,NO,TB,1,879,100,Punt,3,3,14,14
0,TB,NO,NO,1,779,246,Touchdown,3,10,14,21
0,TB,NO,NO,1,533,120,Intercepted Pass,10,10,21,21


In [118]:
for i in range(10):
    game = simulate_game("CAR","PHI")
    print(game.result)

['CAR', 20, 'PHI', 26]
['CAR', 20, 'PHI', 16]
['CAR', 39, 'PHI', 38]
['CAR', 13, 'PHI', 11]
['CAR', 31, 'PHI', 37]
['CAR', 23, 'PHI', 39]
['CAR', 34, 'PHI', 24]
['CAR', 21, 'PHI', 43]
['CAR', 17, 'PHI', 56]
['CAR', 23, 'PHI', 34]


### Figure out slightly less simple weighting function
- Look at time remaining in half.
- Look at how old the drive is.
- Maybe, score difference.

In [119]:
gamedata = pd.read_csv('../data/espn_gamedata2009-2017.csv')
gamedata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2306 entries, 0 to 2305
Data columns (total 10 columns):
gameId        2306 non-null int64
result        2306 non-null object
season        2306 non-null int64
week          2306 non-null int64
home          2297 non-null object
away          2297 non-null object
winner        2306 non-null object
home_score    2306 non-null object
away_score    2306 non-null object
OT            2306 non-null object
dtypes: int64(3), object(7)
memory usage: 180.2+ KB


In [124]:
alldrives = alldrives.merge(
                right=gamedata[['gameId','season','week']],
                how='left',
                left_on='gameId',
                right_on='gameId')

In [135]:
def get_drive_weights(drives_df):
    drives = drives_df.index.values.tolist()
    drive_weights = [ 1 for d in drives ]
    for i, d in enumerate(drives):
        w_age = 1 / (2018 - drives_df.loc[d,'season'])
        w_time = 1
        w_score = 1
        drive_weights[i] = 1 / (2018 - drives_df.loc[d,'season'])
        
    return drive_weights

In [141]:
game = simulate_game("TB",'NO')
print(game.result)
cols = [ 'home','away','offense','half','time_rem','time','result',
         'home_score_after','away_score_after' ]
print(game.drives_selected[['season','week']])
game.gamestate_df[cols]

['TB', 14, 'NO', 41]
      season week
28166   2013   11
52568   2017   13
40993   2015   13
50528   2017    7
45696   2016   10
45716   2016   10
39155   2015    8
33865   2014   10
52380   2017   12
42175   2015   17
50221   2017    6
15356   2011   10
42590   2016    1
37499   2015    3
50531   2017    7
18132   2011   17
22025   2012   11
49748   2017    4
49828   2017    5
50222   2017    6
39137   2015    8
48821   2017    2
21570   2012   10
48821   2017    2
52968   2017   14
45633   2016   10
48707   2017    1
46835   2016   13
45129   2016    8
37517   2015    3
43538   2016    3
50245   2017    6


Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score_after,away_score_after
0,TB,NO,TB,1,1800,109,Punt,0,0
0,TB,NO,NO,1,1691,241,Touchdown,0,6
0,TB,NO,TB,1,1450,67,Fumble Touchdown,0,13
0,TB,NO,TB,1,1383,203,Touchdown,7,13
0,TB,NO,NO,1,1180,131,Interception,7,13
0,TB,NO,TB,1,1049,115,Punt,7,13
0,TB,NO,NO,1,934,104,Punt,7,13
0,TB,NO,TB,1,830,183,Punt,7,13
0,TB,NO,NO,1,647,86,Punt,7,13
0,TB,NO,TB,1,561,229,Touchdown,14,13


### Pieces to add:
Rather than randomly choose drives with End of Half ending, I should check probability that a drive ends in 'End of Half' based on how much time is remaining.

Configure weights to decide which drives are more likely. Lots to think about for this option.