# Conceit
I want to try and simulate/model NFL games via a Monte Carlo methodology. The idea is to choose drives from a team's history and use them to construct a complete football game. In its simplest form, this will look like randomly choosing drive outcomes from a team's history. In a more realistic model, the game situation should guide what drive outcomes are more likely; a drive starting with 20 seconds to go in the half is much less likely to result in a touchdown or field goal than an opening drive.

In this document, I aim to use drive-level history and this kind of Monte Carlo simulation to find and implement a reasonable way to model NFL football games.

In [41]:
# Load relevant packages
import pandas as pd
import numpy as np
import math

In [2]:
# Read drive-level data from csv
alldrives = pd.read_csv('../data/espn_drives2009-2017.csv')
alldrives.sample(5)

Unnamed: 0.1,Unnamed: 0,away,away_score_after,away_score_before,drive,home,home_score_after,home_score_before,offense,plays,...,uid,TD,FG,punt,turnover,EoH,secs_rem,starting_fieldposition,time_in_secs,left_in_half
45041,45041,CLE,7,7,1,NYJ,0,0,NYJ,3,...,400874612-1,0,0,1,0,0,3405.0,0.0,134,1605.0
53382,53382,WSH,7,7,2,ARI,3,0,ARI,16,...,400951809-2,0,1,0,0,0,3510.0,0.0,583,1710.0
4795,4795,HOU,24,24,7,SEA,0,0,SEA,6,...,291213034-7,0,0,1,0,0,2572.0,-30.0,159,772.0
23385,23385,CLE,21,21,26,WSH,38,38,CLE,4,...,321216005-26,0,0,0,1,0,294.0,-19.0,86,294.0
23047,23047,PIT,0,0,4,SD,3,3,PIT,6,...,321209023-4,0,0,1,0,0,2880.0,-29.0,194,1080.0


In [3]:
alldrives.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54298 entries, 0 to 54297
Data columns (total 24 columns):
Unnamed: 0                54298 non-null int64
away                      54298 non-null object
away_score_after          54298 non-null int64
away_score_before         54298 non-null int64
drive                     54298 non-null int64
home                      54298 non-null object
home_score_after          54298 non-null int64
home_score_before         54298 non-null int64
offense                   54298 non-null object
plays                     54298 non-null int64
result                    54298 non-null object
time                      54298 non-null object
yds_gained                54298 non-null int64
gameId                    54298 non-null int64
uid                       54298 non-null object
TD                        54298 non-null int64
FG                        54298 non-null int64
punt                      54298 non-null int64
turnover                  54298 non-nul

Given game situation, write a function to make a set of possible drives that could come next. Then choose one of those drives.

In simple case, game situation means home team, away team, whether the home team should be the offense, and how much time is remaining in the half. To make the list of possible next drives, filter all drives for those where the offensive team has posession or the defensive team does not have posession. In addition, the drive time should be <= the time remaining in the simulated game's half.

In [4]:
# game_situation (home, away, home_poss, time_rem)

home = 'CAR'
away = 'MIA'
home_poss = True
time_rem = 150

# Set some keywords for the filter
if home_poss:
    off_team = home
    def_team = away  
else:
    off_team = away
    def_team = home
    
# Filter for plays for this offense and defense
teamdrives = alldrives.loc[ 
           # Condition 1: team is the offense
           ( 
             alldrives.offense.astype(str) == off_team 
           ) 
           |  # OR
           # Condition 2:
           ( # Defensive team is home or away
             ( 
               (alldrives.home.astype(str) == def_team) |
               (alldrives.away.astype(str) == def_team)
             ) 
             &  # AND
               # not the offense
             (alldrives.offense.astype(str) != def_team)
           )
         ]

possible_drives = teamdrives[ teamdrives.time_in_secs <= time_rem ]

possible_drives.sample(5)

Unnamed: 0.1,Unnamed: 0,away,away_score_after,away_score_before,drive,home,home_score_after,home_score_before,offense,plays,...,uid,TD,FG,punt,turnover,EoH,secs_rem,starting_fieldposition,time_in_secs,left_in_half
25467,25467,MIA,10,10,9,ATL,20,13,ATL,4,...,330922015-9,1,0,0,0,0,1752.0,38.0,92,1752.0
53689,53689,KC,29,29,20,MIA,13,13,KC,4,...,400951625-20,0,1,0,0,0,152.0,19.0,37,152.0
2706,2706,ARI,7,7,11,CAR,28,28,CAR,3,...,291101022-11,0,0,0,0,1,1883.0,-7.0,83,83.0
42510,42510,CAR,10,7,6,TB,3,3,CAR,6,...,400791610-6,0,1,0,0,0,2503.0,-10.0,134,703.0
51275,51275,CAR,14,7,11,ATL,10,10,CAR,4,...,400951749-11,1,0,0,0,0,1875.0,19.0,52,75.0


Now, given a list of possible drives, assign a weight to each, and choose one based on the weights.

In [5]:
# Assign weight to each drive. Start simple
drives = possible_drives.index.values.tolist()
drive_weights = [ 1 for d in drives ]
for i, drive in enumerate(drives):
    drive_weights[i] = 1

In [107]:
import random

# Function to return one item from a list, where each has a weight
def select(container, weights):
    total_weight = float(sum(weights))
    rel_weight = [w / total_weight for w in weights]

    # Probability for each element
    probs = [sum(rel_weight[:i + 1]) for i in range(len(rel_weight))]
    
    r = random.random()
    for (i, element) in enumerate(container):
        if r <= probs[i]:
            print(element,rel_weight[i])
            break

    return element

In [7]:
drive_id = select(drives, drive_weights)
possible_drives.loc[drive_id,:]

Unnamed: 0                       28541
away                               MIA
away_score_after                    10
away_score_before                   10
drive                               11
home                               CAR
home_score_after                     3
home_score_before                    3
offense                            CAR
plays                                3
result                            Punt
time                              1:24
yds_gained                           0
gameId                       331124015
uid                       331124015-11
TD                                   0
FG                                   0
punt                                 1
turnover                             0
EoH                                  0
secs_rem                          2161
starting_fieldposition             -44
time_in_secs                        84
left_in_half                       361
Name: 28541, dtype: object

In [21]:
def get_possible_drives(home,away,home_poss,time_rem):

    # Set some keywords for the filter
    if home_poss:
        off_team = home
        def_team = away  
    else:
        off_team = away
        def_team = home
    
    # Filter for plays for this offense and defense
    teamdrives = alldrives.loc[ 
               # Condition 1: team is the offense
               ( 
                 alldrives.offense.astype(str) == off_team 
               ) 
               |  # OR
               # Condition 2:
               ( # Defensive team is home or away
                 ( 
                   (alldrives.home.astype(str) == def_team) |
                   (alldrives.away.astype(str) == def_team)
                 ) 
                 &  # AND
                   # not the offense
                 (alldrives.offense.astype(str) != def_team)
               )
             ]

    poss_ds = teamdrives[ teamdrives.time_in_secs <= time_rem ]
    
    # Cut out End of Half drives unless they're relevant
    possible_drives = poss_ds[ ~(poss_ds.EoH==1) | ~(poss_ds.time_in_secs < time_rem - 20) ]

    return possible_drives


def get_drive_weights(drives_df):
    # Assign weight to each drive. Start simple
    drives = drives_df.index.values.tolist()
    drive_weights = [ 1 for d in drives ]
    for i, drive in enumerate(drives):
        drive_weights[i] = 1
        
    return drive_weights
    
    
# Function that takes game situation and returns a drive.
def next_drive(game_sit, game=game):
    """Takes a tuple describing the game situation.
    Returns a Series describing the next drive."""
    (home, away, home_poss, time_rem, home_score, away_score) = game_sit
    
    # Get df of possible drives
    poss_drives = get_possible_drives(home, away, home_poss, time_rem)
    
    # Get weights for the possible drives
    drive_ids = poss_drives.index.values.tolist()
    weights = game.get_drive_weights(poss_drives)
    
    # Randomly choose a drive, with weights assigned to each.
    chosen_drive_id = select(drive_ids, weights)
    
    return poss_drives.loc[chosen_drive_id, :]    

In [9]:
# Test the next_drive function
game_sit = ( "CAR", # home
             "MIA", # away
             True,  # home has possession
             200,   # seconds remaining
             10,    # home score
             10,    # away score
           )

next_drive(game_sit)

Unnamed: 0                      33429
away                              CAR
away_score_after                    0
away_score_before                   0
drive                               1
home                               NO
home_score_after                    0
home_score_before                   0
offense                           CAR
plays                               8
result                           Punt
time                             3:18
yds_gained                         34
gameId                      400554309
uid                       400554309-1
TD                                  0
FG                                  0
punt                                1
turnover                            0
EoH                                 0
secs_rem                         3310
starting_fieldposition            -23
time_in_secs                      198
left_in_half                     1510
Name: 33429, dtype: object

In [111]:
# Define a class that constitutes a game
class football_game:
    """Class for representing a football game"""
    def __init__(self,home,away):
        # Set some initial values
        self.home = home
        self.away = away
        self.half = 1
        self.time_rem = 1800
        self.home_score = 0
        self.away_score = 0
        
        # Decide which team gets the ball to start
        coin = random.randint(1,2)
        if coin == 1:
            self.home_poss = True
        else:
            self.home_poss = False
            
            
    def get_drive_weights(self, drives_df):
        drives = drives_df.index.values.tolist()
        drive_weights = [ 1 for d in drives ]
        # Get current score difference
        if self.home_poss:
            curr_score_diff = self.home_score - self.away_score
        else:
            curr_score_diff = self.away_score - self.home_score
        
        print("\n","Getting weights for drives")
        for i, d in enumerate(drives):
            drive = drives_df.loc[d,:]
#            w_age = 1 / (2018 - drives_df.loc[d,'season'])
            w_age = 1 / (2018 - drive['season'])
            # Gaussian function for selecting plays with similar time remaining in half
#            w_time = math.exp( ( drives_df.loc[d,'left_in_half'] - self.time_rem )**2 / -64800 )
            w_time = math.exp( ( drive['left_in_half'] - self.time_rem )**2 / -64800 )

            # Get score difference before this drive
            if drive['offense'] == drive['home']:
                hist_score_diff = drive['home_score_before'] - drive['away_score_before']
            elif drive['offense'] == drive['away']:
                hist_score_diff = drive['away_score_before'] - drive['home_score_before']
                
            # Want Gaussian for plays with similar score situations
            w_score = math.exp( ( curr_score_diff - hist_score_diff )**2 / -98 )
            
            # Finally, set weight as product of the other pieces
            weight = w_age * w_time * w_score
            
            # Try and catch nans
            if math.isnan(weight):
                weight = 0
                
            drive_weights[i] = weight
            
#            if d in (54265, 54266, 54267):
#                print("weights for",d,":", w_age, w_time, drive_weights[i])
                
        print("sum of weights = ",sum(drive_weights))
            
        return drive_weights
    
    
    def next_drive(self):
#    def next_drive(game_sit, game=game):
        """Takes a tuple describing the game situation.
        Returns a Series describing the next drive."""
        (home, away, home_poss, time_rem, home_score, away_score) = game_sit
    
        # Get df of possible drives
        poss_drives = get_possible_drives(home, away, home_poss, time_rem)
        poss_drives = get_possible_drives(self.home, self.away, self.home_poss, self.time_rem)
    
        # Get weights for the possible drives
        drive_ids = poss_drives.index.values.tolist()
        weights = self.get_drive_weights(poss_drives)
    
        # Randomly choose a drive, with weights assigned to each.
        chosen_drive_id = select(drive_ids, weights)
    
        return poss_drives.loc[chosen_drive_id, :]
            
            
    def get_game_sit(self):
        game_sit = (self.home,
                    self.away,
                    self.home_poss,
                    self.time_rem,
                    self.home_score,
                    self.away_score )
        return game_sit
    
    def game_sit_series(self, drive):
        # Figure out which team has possession
        if self.home_poss:
            possessor = self.home
        else:
            possessor = self.away
        
        sit_dict = {'home':self.home,
                    'away':self.away,
                    'offense':possessor,
                    'half':self.half,
                    'time_rem':self.time_rem,
                    'home_score':self.home_score,
                    'away_score':self.away_score,
                    'result':drive.result}
        return pd.Series(sit_dict)
    
    
    def game_sit_dict(self):
        # Figure out which team has possession
        if self.home_poss:
            possessor = self.home
        else:
            possessor = self.away
        
        sit_dict = {'home':self.home,
                    'away':self.away,
                    'offense':possessor,
                    'half':self.half,
                    'time_rem':self.time_rem,
                    'home_score':self.home_score,
                    'away_score':self.away_score}
        return sit_dict
    
    
    def update_game_sit(self,drive):
        """Takes a Series and updates game situation vars accordingly"""
        
        # Update clock and score
        self.time_rem -= drive.time_in_secs
        self.home_score += drive.home_score_after - drive.home_score_before
        self.away_score += drive.away_score_after - drive.away_score_before
        
        # Flip the possession arrow
        if self.home_poss:
            self.home_poss = False
        else:
            self.home_poss = True
            
            
    def record_drive(self,drive,drive_num=1):
        """Given a drive, update the proper quantities, 
        assuming dataframes for chosen drives and game history have 
        already been created"""
        # Get gamestate before this drive
        gamestate = self.game_sit_dict()
        
        # Clock changes
        gamestate_delta = {'time':drive.time_in_secs}
        if drive.time_in_secs < 10:
            gamestate_delta['time'] = 10
        
        # Score changes
        # Home team in selected drive might not be home team in sim. game
        if self.home_poss and (drive.offense == drive.home):
            flip = False
        elif self.home_poss and (drive.offense == drive.away):
            flip = True
        elif (not self.home_poss) and (drive.offense == drive.home):
            flip = True
        elif (not self.home_poss) and (drive.offense == drive.away):
            flip = False
        else:
            print("Something went wrong in determining flipped possession")
            
        if not flip:
            gamestate_delta['home_score'] = drive.home_score_after - drive.home_score_before
            gamestate_delta['away_score'] = drive.away_score_after - drive.away_score_before
        else:
            gamestate_delta['away_score'] = drive.home_score_after - drive.home_score_before
            gamestate_delta['home_score'] = drive.away_score_after - drive.away_score_before

        # Check for negative values in score delta
        scores_delta = (gamestate_delta['home_score'], gamestate_delta['away_score'])
        if sum([1 if (val < 0 or val > 8) else 0 for val in scores_delta]) > 0:
            # Recalculate score change based on drive result
            # Default to zero points
            gamestate_delta['home_score'] = 0
            gamestate_delta['away_score'] = 0
            if (drive.FG == 1):
                if self.home_poss:
                    gamestate_delta['home_score'] = 3
                else:
                    gamestate_delta['away_score'] = 3
            elif (drive.TD == 1):
                if self.home_poss:
                    gamestate_delta['home_score'] = 7
                else:
                    gamestate_delta['away_score'] = 7
                
        
        # Figure out whether possession arrow changes. Default True
        gamestate_delta['poss'] = True
        if ( (self.home_poss) & 
             (gamestate_delta['away_score'] != 0) ):
            gamestate_delta['poss'] = False
        elif ( (not self.home_poss) &
               (gamestate_delta['home_score'] != 0) ):
            gamestate_delta['poss'] = False
                    
#        # Add chosen drive to appropriate dataframe
#        drivedf = pd.Series.to_frame(drive)
#        dfs = [self.drives_selected, drivedf]
#        self.drives_selected = pd.concat( dfs, axis=1 )
        
        # Add entry to simulated game history
        this_series = pd.Series(gamestate)
        this_series['home_score_after'] = self.home_score + gamestate_delta['home_score']
        this_series['away_score_after'] = self.away_score + gamestate_delta['away_score']
        this_series['result'] = drive.result
        this_series['time'] = gamestate_delta['time']
        
#        # Additional for debugging
#        this_series['drive_hsb'] = drive.home_score_before
#        this_series['drive_hsa'] = drive.home_score_after
#        this_series['drive_asb'] = drive.away_score_before
#        this_series['drive_asa'] = drive.away_score_after
        
        if drive_num == 1:  # Need to start gamestate dataFrame
            self.gamestate_df = pd.Series.to_frame(this_series)
        else:            # Add this series to gamestate dF
            series_df = pd.Series.to_frame(this_series)
            dfs = [ self.gamestate_df, series_df ]
            self.gamestate_df = pd.concat( dfs, axis=1 )
        
        # Update the game's state vars
        self.home_score += gamestate_delta['home_score']
        self.away_score += gamestate_delta['away_score']
        self.time_rem -= gamestate_delta['time']
        if gamestate_delta['poss']:
            self.home_poss = not self.home_poss
    
    
    def check_for_EoH(self):
        pass

In [31]:
def simulate_game(home,away):

    # Need new wrapper for simulating a game
    newgame = football_game( home, away )
    
    # Assign possible drives for this game
    newgame.home_drives = get_possible_drives( home, away, True, 1800 )
    newgame.away_drives = get_possible_drives( home, away, False, 1800 )

    # Choose the first drive
#    game_sit = newgame.get_game_sit()
    first_drive = newgame.next_drive()

    # Make drive history DF, starting with this first drive.
    newgame.drives_selected = pd.Series.to_frame(first_drive)

    # Update game object after the first drive
    drive_num = 1
    newgame.record_drive( first_drive, drive_num )

    #while newgame.time_rem.astype(float) >= 5:
    for half in (1,2):
        newgame.half = half
        if half > 1:
            newgame.time_rem = 1800
    
        end_of_half = False
        while (not end_of_half) and (newgame.time_rem > 0):
            drive_num += 1
    
            # Choose a new drive
            this_drive = newgame.next_drive()
            # Add drive to chosen drives dataframe
            newgame.drives_selected = pd.concat( [newgame.drives_selected, this_drive], axis=1 )
        
            # Update the game object
            newgame.record_drive( this_drive, drive_num )
        
            # Check for end of Half
            if this_drive.EoH == 1:
                end_of_half = True
        
    
    # Post-game, need to transpose the dataFrames
    newgame.drives_selected = newgame.drives_selected.transpose()
    newgame.gamestate_df = newgame.gamestate_df.transpose()
    
    newgame.result = [newgame.home, newgame.home_score, newgame.away, newgame.away_score]
    
    return newgame

In [51]:
game = simulate_game("TB",'NO')
print(game.result)
cols = [ 'home','away','offense','half','time_rem','time','result',
         'home_score','home_score_after','away_score','away_score_after' ]
game.gamestate_df[cols]

['TB', 112, 'NO', 16]


Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score,home_score_after,away_score,away_score_after
0,TB,NO,NO,1,1800,201,Punt,0,0,0,0
0,TB,NO,TB,1,1599,109,Touchdown,0,8,0,0
0,TB,NO,NO,1,1490,61,Punt,8,8,0,0
0,TB,NO,TB,1,1429,109,Touchdown,8,16,0,0
0,TB,NO,NO,1,1320,172,Punt,16,16,0,0
0,TB,NO,TB,1,1148,109,Touchdown,16,24,0,0
0,TB,NO,NO,1,1039,120,Punt,24,24,0,0
0,TB,NO,TB,1,919,109,Touchdown,24,32,0,0
0,TB,NO,NO,1,810,187,Punt,32,32,0,0
0,TB,NO,TB,1,623,109,Touchdown,32,40,0,0


In [16]:
for i in range(10):
    game = simulate_game("CAR","PHI")
    print(game.result)

['CAR', 19, 'PHI', 33]
['CAR', 41, 'PHI', 31]
['CAR', 24, 'PHI', 13]
['CAR', 23, 'PHI', 27]
['CAR', 17, 'PHI', 64]
['CAR', 22, 'PHI', 27]
['CAR', 27, 'PHI', 46]
['CAR', 14, 'PHI', 5]
['CAR', 16, 'PHI', 10]
['CAR', 41, 'PHI', 45]


### Figure out slightly less simple weighting function
- Look at time remaining in half.
- Look at how old the drive is.
- Maybe, score difference.

In [17]:
gamedata = pd.read_csv('../data/espn_gamedata2009-2017.csv')
gamedata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2306 entries, 0 to 2305
Data columns (total 10 columns):
gameId        2306 non-null int64
result        2306 non-null object
season        2306 non-null int64
week          2306 non-null int64
home          2297 non-null object
away          2297 non-null object
winner        2306 non-null object
home_score    2306 non-null object
away_score    2306 non-null object
OT            2306 non-null object
dtypes: int64(3), object(7)
memory usage: 180.2+ KB


In [18]:
alldrives = alldrives.merge(
                right=gamedata[['gameId','season','week']],
                how='left',
                left_on='gameId',
                right_on='gameId')

In [113]:
game = simulate_game("TB",'NO')
print(game.result)
cols = [ 'home','away','offense','half','time_rem','time','result',
         'home_score_after','away_score_after' ]
print(game.drives_selected[['season','week']])
game.gamestate_df[cols]


 Getting weights for drives
sum of weights =  112.862252662
52821 0.00878832052248

 Getting weights for drives
sum of weights =  107.7788778
35673 0.00226640922682

 Getting weights for drives
sum of weights =  101.355465875
52386 0.000711364813812

 Getting weights for drives
sum of weights =  114.029166536
33397 0.00106005898904

 Getting weights for drives
sum of weights =  102.758569741
25979 0.00177418685004

 Getting weights for drives
sum of weights =  112.697090933
37453 0.00164201995265

 Getting weights for drives
sum of weights =  105.156356093
52563 0.00416649491493

 Getting weights for drives
sum of weights =  117.673310472
52965 0.0045612896205

 Getting weights for drives
sum of weights =  123.112114019
47540 0.00225851407152

 Getting weights for drives
sum of weights =  108.33178074
53335 0.00578439877399

 Getting weights for drives
sum of weights =  98.0587573266
1252 0.000641913661464

 Getting weights for drives
sum of weights =  102.058044357
46382 0.0026916706

Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score_after,away_score_after
0,TB,NO,NO,1,1800,96,Touchdown,0,7
0,TB,NO,TB,1,1704,106,Punt,0,7
0,TB,NO,NO,1,1598,146,Punt,0,7
0,TB,NO,TB,1,1452,123,Downs,0,7
0,TB,NO,NO,1,1329,55,Punt,0,7
0,TB,NO,TB,1,1274,185,Punt,0,7
0,TB,NO,NO,1,1089,87,Punt,0,7
0,TB,NO,TB,1,1002,270,Touchdown,7,7
0,TB,NO,NO,1,732,204,Touchdown,7,14
0,TB,NO,TB,1,528,106,Punt,7,14


### Pieces to add:
Rather than randomly choose drives with End of Half ending, I should check probability that a drive ends in 'End of Half' based on how much time is remaining.

Configure weights to decide which drives are more likely. Lots to think about for this option.