# Conceit
I want to try and simulate/model NFL games via a Monte Carlo methodology. The idea is to choose drives from a team's history and use them to construct a complete football game. In its simplest form, this will look like randomly choosing drive outcomes from a team's history. In a more realistic model, the game situation should guide what drive outcomes are more likely; a drive starting with 20 seconds to go in the half is much less likely to result in a touchdown or field goal than an opening drive.

In this document, I aim to use drive-level history and this kind of Monte Carlo simulation to find and implement a reasonable way to model NFL football games.

In [1]:
# Load relevant packages
import pandas as pd
import numpy as np
import math

In [2]:
# Read drive-level data from csv
alldrives = pd.read_csv('../data/espn_drives2009-2017.csv')
alldrives.sample(5)

Unnamed: 0.1,Unnamed: 0,away,away_score_after,away_score_before,drive,home,home_score_after,home_score_before,offense,plays,...,uid,TD,FG,punt,turnover,EoH,secs_rem,starting_fieldposition,time_in_secs,left_in_half
11465,11465,ARI,21,21,17,DAL,13,13,ARI,3,...,301225022-17,0,0,1,0,0,1300.0,-37.0,55,1300.0
31762,31762,IND,14,7,3,TEN,0,0,IND,11,...,400554238-3,1,0,0,0,0,3126.0,0.0,374,1326.0
12747,12747,NO,30,30,25,CHI,13,13,NO,1,...,310918018-25,0,0,0,1,0,256.0,23.0,32,256.0
26978,26978,PIT,10,10,10,BAL,6,6,BAL,8,...,331020023-10,0,0,1,0,0,1800.0,15.0,241,1800.0
31151,31151,NYG,0,0,2,ARI,10,7,ARI,10,...,400554211-2,0,1,0,0,0,3098.0,-21.0,307,1298.0


In [3]:
alldrives.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54298 entries, 0 to 54297
Data columns (total 24 columns):
Unnamed: 0                54298 non-null int64
away                      54298 non-null object
away_score_after          54298 non-null int64
away_score_before         54298 non-null int64
drive                     54298 non-null int64
home                      54298 non-null object
home_score_after          54298 non-null int64
home_score_before         54298 non-null int64
offense                   54298 non-null object
plays                     54298 non-null int64
result                    54298 non-null object
time                      54298 non-null object
yds_gained                54298 non-null int64
gameId                    54298 non-null int64
uid                       54298 non-null object
TD                        54298 non-null int64
FG                        54298 non-null int64
punt                      54298 non-null int64
turnover                  54298 non-nul

Given game situation, write a function to make a set of possible drives that could come next. Then choose one of those drives.

In simple case, game situation means home team, away team, whether the home team should be the offense, and how much time is remaining in the half. To make the list of possible next drives, filter all drives for those where the offensive team has posession or the defensive team does not have posession. In addition, the drive time should be <= the time remaining in the simulated game's half.

In [4]:
# game_situation (home, away, home_poss, time_rem)

home = 'CAR'
away = 'MIA'
home_poss = True
time_rem = 150

# Set some keywords for the filter
if home_poss:
    off_team = home
    def_team = away  
else:
    off_team = away
    def_team = home
    
# Filter for plays for this offense and defense
teamdrives = alldrives.loc[ 
           # Condition 1: team is the offense
           ( 
             alldrives.offense.astype(str) == off_team 
           ) 
           |  # OR
           # Condition 2:
           ( # Defensive team is home or away
             ( 
               (alldrives.home.astype(str) == def_team) |
               (alldrives.away.astype(str) == def_team)
             ) 
             &  # AND
               # not the offense
             (alldrives.offense.astype(str) != def_team)
           )
         ]

possible_drives = teamdrives[ teamdrives.time_in_secs <= time_rem ]

possible_drives.sample(5)

Unnamed: 0.1,Unnamed: 0,away,away_score_after,away_score_before,drive,home,home_score_after,home_score_before,offense,plays,...,uid,TD,FG,punt,turnover,EoH,secs_rem,starting_fieldposition,time_in_secs,left_in_half
47796,47796,CAR,3,0,7,ATL,13,13,CAR,6,...,400874540-7,0,1,0,0,0,2509.0,6.0,92,709.0
17280,17280,BUF,7,7,13,MIA,13,13,BUF,3,...,311218002-13,0,0,1,0,0,2161.0,-36.0,98,361.0
43716,43716,ATL,48,41,23,CAR,33,33,CAR,1,...,400874664-23,0,0,0,1,0,74.0,-26.0,11,74.0
42352,42352,MIA,17,17,20,NE,10,10,NE,3,...,400791607-20,0,0,1,0,0,418.0,0.0,137,418.0
45800,45800,SD,24,24,22,MIA,31,31,SD,2,...,400874544-22,0,0,0,1,0,61.0,0.0,24,61.0


Now, given a list of possible drives, assign a weight to each, and choose one based on the weights.

In [5]:
# Assign weight to each drive. Start simple
drives = possible_drives.index.values.tolist()
drive_weights = [ 1 for d in drives ]
for i, drive in enumerate(drives):
    drive_weights[i] = 1

In [6]:
import random

# Function to return one item from a list, where each has a weight
def select(container, weights):
    total_weight = float(sum(weights))
    rel_weight = [w / total_weight for w in weights]

    # Probability for each element
    probs = [sum(rel_weight[:i + 1]) for i in range(len(rel_weight))]
    
    r = random.random()
    for (i, element) in enumerate(container):
        if r <= probs[i]:
            print("^",element,rel_weight[i])
            break

    return element

In [7]:
drive_id = select(drives, drive_weights)
possible_drives.loc[drive_id,:]

^ 18663 0.0005241090146750524


Unnamed: 0                      18663
away                               TB
away_score_after                    0
away_score_before                   0
drive                               0
home                              CAR
home_score_after                    0
home_score_before                   0
offense                           CAR
plays                               5
result                           Punt
time                             2:20
yds_gained                         13
gameId                      320909027
uid                       320909027-0
TD                                  0
FG                                  0
punt                                1
turnover                            0
EoH                                 0
secs_rem                         3600
starting_fieldposition            -30
time_in_secs                      140
left_in_half                     1800
Name: 18663, dtype: object

In [36]:
def get_possible_drives(home,away,home_poss,time_rem):

    # Set some keywords for the filter
    if home_poss:
        off_team = home
        def_team = away  
    else:
        off_team = away
        def_team = home
    
    # Filter for plays for this offense and defense
    teamdrives = alldrives.loc[ 
               # Condition 1: team is the offense
               ( 
                 alldrives.offense.astype(str) == off_team 
               ) 
               |  # OR
               # Condition 2:
               ( # Defensive team is home or away
                 ( 
                   (alldrives.home.astype(str) == def_team) |
                   (alldrives.away.astype(str) == def_team)
                 ) 
                 &  # AND
                   # not the offense
                 (alldrives.offense.astype(str) != def_team)
               )
             ]

    poss_ds = teamdrives[ teamdrives.time_in_secs <= time_rem ]
    
    # Cut out End of Half drives unless they're relevant
#    possible_drives = poss_ds[ ~(poss_ds.EoH==1) | 
#                               ~(poss_ds.time_in_secs < time_rem) ]

    return teamdrives


def get_drive_weights(drives_df):
    # Assign weight to each drive. Start simple
    drives = drives_df.index.values.tolist()
    drive_weights = [ 1 for d in drives ]
    for i, drive in enumerate(drives):
        drive_weights[i] = 1
        
    return drive_weights
    
    
# Function that takes game situation and returns a drive.
def next_drive(game_sit, game=game):
    """Takes a tuple describing the game situation.
    Returns a Series describing the next drive."""
    (home, away, home_poss, time_rem, home_score, away_score) = game_sit
    
    # Get df of possible drives
    poss_drives = get_possible_drives(home, away, home_poss, time_rem)
    
    # Get weights for the possible drives
    drive_ids = poss_drives.index.values.tolist()
    weights = game.get_drive_weights(poss_drives)
    
    # Randomly choose a drive, with weights assigned to each.
    chosen_drive_id = select(drive_ids, weights)
    
    return poss_drives.loc[chosen_drive_id, :]    

In [9]:
# Test the next_drive function
game_sit = ( "CAR", # home
             "MIA", # away
             True,  # home has possession
             200,   # seconds remaining
             10,    # home score
             10,    # away score
           )

next_drive(game_sit)

NameError: name 'next_drive' is not defined

In [48]:
# Define a class that constitutes a game
class football_game:
    """Class for representing a football game"""
    def __init__(self,home,away,season=2018):
        # Set some initial values
        self.home = home
        self.away = away
        self.season = season
        self.half = 1
        self.time_rem = 1800
        self.home_score = 0
        self.away_score = 0
        self.drive_base_weights = {}
        
        # Decide which team gets the ball to start
        coin = random.randint(1,2)
        if coin == 1:
            self.home_poss = True
        else:
            self.home_poss = False
            
    
    def get_drive_base_weights(self, drives_df):
        """Return a dictionary with base weights for past plays"""
        drives = drives_df.index.values.tolist()
        base_weights = {}
        for i, d in enumerate(drives):
            drive = drives_df.loc[d,:]
            w_age = 1 / ( 2*(self.season - drive['season']) )
            
            # Place to add weight for home/away
            w_home = 1
            if (drive['home'] == self.home) or (drive['away'] == self.away):
                w_home = 2
                
            # Add weight to previous matchups of same teams
            w_matchup = 1
            teams = [drive['home'],drive['away']]
            if (self.home in teams) and (self.away in teams):
                w_matchup = 2
            
            base_weights[d] = w_age * w_home * w_matchup
            
        return base_weights
                 
            
    def get_drive_weights(self, drives_df):
        drives = drives_df.index.values.tolist()
        drive_weights = [ 1 for d in drives ]
        # Get current score difference
        if self.home_poss:
            curr_score_diff = self.home_score - self.away_score
        else:
            curr_score_diff = self.away_score - self.home_score        

#        print("\n","Getting weights for drives")
        for i, d in enumerate(drives):
            drive = drives_df.loc[d,:]
            w_base = self.drive_base_weights[d]
#            w_age = 1 / (2018 - drive['season'])

            # Gaussian function for selecting plays with similar time remaining in half
            w_time = 0            
            if (not drive.EoH==1) or (drive.time_in_secs < self.time_rem + 10):
                    
                numerator = drive['left_in_half'] - self.time_rem
                # Scale variance in Gaussian by time remaining. More possibilities with more time left
                stdev = self.time_rem / 3
                w_time = math.exp( (numerator)**2 / (-2*stdev**2) )

            # Get score difference before this drive
            if drive['offense'] == drive['home']:
                hist_score_diff = drive['home_score_before'] - drive['away_score_before']
            elif drive['offense'] == drive['away']:
                hist_score_diff = drive['away_score_before'] - drive['home_score_before']
                
            # Want Gaussian for plays with similar score situations
            w_score = math.exp( ( curr_score_diff - hist_score_diff )**2 / -98 )
            
            # Finally, set weight as product of the other pieces
            weight = w_base * w_time * w_score
            
            # Try and catch nans
            if math.isnan(weight):
                weight = 0
                
            drive_weights[i] = weight
                
        print("sum of weights = ",sum(drive_weights))
            
        return drive_weights
    
    
    def next_drive(self):
#    def next_drive(game_sit, game=game):
        """Returns a Series describing the next drive."""
    
        # Get df of possible drives
#        poss_drives = get_possible_drives(self.home, self.away, 
#                                          self.home_poss, self.time_rem)
        poss_drives = self.away_drives
        if self.home_poss:
            poss_drives = self.home_drives
            
        # Choose subset of drives based on time remaning in 1st/2nd half
        if self.time_rem > 300:
            poss_drives = poss_drives[ poss_drives.left_in_half >= 300 ]
        elif self.half == 1:
            poss_drives = poss_drives[ (poss_drives.left_in_half < 300) &
                                       (poss_drives.secs_rem > 1800) ]
        else:
            poss_drives = poss_drives[ (poss_drives.left_in_half < 300) &
                                       (poss_drives.secs_rem < 1800) ]
    
        # Get weights for the possible drives
        drive_ids = poss_drives.index.values.tolist()
        weights = self.get_drive_weights(poss_drives)
    
        # Randomly choose a drive, with weights assigned to each.
        chosen_drive_id = select(drive_ids, weights)
    
        return poss_drives.loc[chosen_drive_id, :]
            
            
    def get_game_sit(self):
        game_sit = (self.home,
                    self.away,
                    self.home_poss,
                    self.time_rem,
                    self.home_score,
                    self.away_score )
        return game_sit
 

    def game_sit_series(self, drive):
        # Figure out which team has possession
        if self.home_poss:
            possessor = self.home
        else:
            possessor = self.away
        
        sit_dict = {'home':self.home,
                    'away':self.away,
                    'offense':possessor,
                    'half':self.half,
                    'time_rem':self.time_rem,
                    'home_score':self.home_score,
                    'away_score':self.away_score,
                    'result':drive.result}
        return pd.Series(sit_dict)
    
    
    def game_sit_dict(self):
        # Figure out which team has possession
        if self.home_poss:
            possessor = self.home
        else:
            possessor = self.away
        
        sit_dict = {'home':self.home,
                    'away':self.away,
                    'offense':possessor,
                    'half':self.half,
                    'time_rem':self.time_rem,
                    'home_score':self.home_score,
                    'away_score':self.away_score}
        return sit_dict
    
    
    def update_game_sit(self,drive):
        """Takes a Series and updates game situation vars accordingly"""
        
        # Update clock and score
        self.time_rem -= drive.time_in_secs
        self.home_score += drive.home_score_after - drive.home_score_before
        self.away_score += drive.away_score_after - drive.away_score_before
        
        # Flip the possession arrow
        if self.home_poss:
            self.home_poss = False
        else:
            self.home_poss = True
            
            
    def record_drive(self,drive,drive_num=1):
        """Given a drive, update the proper quantities, 
        assuming dataframes for chosen drives and game history have 
        already been created"""
        # Get gamestate before this drive
        gamestate = self.game_sit_dict()
        
        # Clock changes
        gamestate_delta = {'time':drive.time_in_secs}
        if drive.time_in_secs < 10:
            gamestate_delta['time'] = 10
        
        # Score changes
        # Home team in selected drive might not be home team in sim. game
        if self.home_poss and (drive.offense == drive.home):
            flip = False
        elif self.home_poss and (drive.offense == drive.away):
            flip = True
        elif (not self.home_poss) and (drive.offense == drive.home):
            flip = True
        elif (not self.home_poss) and (drive.offense == drive.away):
            flip = False
        else:
            print("Something went wrong in determining flipped possession")
            
        if not flip:
            gamestate_delta['home_score'] = drive.home_score_after - drive.home_score_before
            gamestate_delta['away_score'] = drive.away_score_after - drive.away_score_before
        else:
            gamestate_delta['away_score'] = drive.home_score_after - drive.home_score_before
            gamestate_delta['home_score'] = drive.away_score_after - drive.away_score_before

        # Check for negative values in score delta
        scores_delta = (gamestate_delta['home_score'], gamestate_delta['away_score'])
        if sum([1 if (val < 0 or val > 8) else 0 for val in scores_delta]) > 0:
            # Recalculate score change based on drive result
            # Default to zero points
            gamestate_delta['home_score'] = 0
            gamestate_delta['away_score'] = 0
            if (drive.FG == 1):
                if self.home_poss:
                    gamestate_delta['home_score'] = 3
                else:
                    gamestate_delta['away_score'] = 3
            elif (drive.TD == 1):
                if self.home_poss:
                    gamestate_delta['home_score'] = 7
                else:
                    gamestate_delta['away_score'] = 7
                
        
        # Figure out whether possession arrow changes. Default True
        gamestate_delta['poss'] = True
        if ( (self.home_poss) & 
             (gamestate_delta['away_score'] != 0) ):
            gamestate_delta['poss'] = False
        elif ( (not self.home_poss) &
               (gamestate_delta['home_score'] != 0) ):
            gamestate_delta['poss'] = False
                    
#        # Add chosen drive to appropriate dataframe
#        drivedf = pd.Series.to_frame(drive)
#        dfs = [self.drives_selected, drivedf]
#        self.drives_selected = pd.concat( dfs, axis=1 )
        
        # Add entry to simulated game history
        this_series = pd.Series(gamestate)
        this_series['home_score_after'] = self.home_score + gamestate_delta['home_score']
        this_series['away_score_after'] = self.away_score + gamestate_delta['away_score']
        this_series['result'] = drive.result
        this_series['time'] = gamestate_delta['time']
        
        if drive_num == 1:  # Need to start gamestate dataFrame
            self.gamestate_df = pd.Series.to_frame(this_series)
        else:            # Add this series to gamestate dF
            series_df = pd.Series.to_frame(this_series)
            dfs = [ self.gamestate_df, series_df ]
            self.gamestate_df = pd.concat( dfs, axis=1 )
        
        # Update the game's state vars
        self.home_score += gamestate_delta['home_score']
        self.away_score += gamestate_delta['away_score']
        self.time_rem -= gamestate_delta['time']
        if gamestate_delta['poss']:
            self.home_poss = not self.home_poss
    
    
    def check_for_EoH(self):
        pass

In [38]:
def simulate_game(home,away):

    # Need new wrapper for simulating a game
    newgame = football_game( home, away )
    
    # Assign possible drives for this game
    newgame.home_drives = get_possible_drives( home, away, True, 1800 )
    newgame.away_drives = get_possible_drives( home, away, False, 1800 )
    newgame.drive_base_weights = newgame.get_drive_base_weights(newgame.home_drives)
    newgame.drive_base_weights.update(newgame.get_drive_base_weights(newgame.away_drives))
#    newgame.home_drives['base_weight'] = newgame.home_drives.index.map(newgame.drive_base_weights)
#    newgame.away_drives['base_weight'] = newgame.away_drives.index.map(newgame.drive_base_weights)

    # Choose the first drive
#    game_sit = newgame.get_game_sit()
    first_drive = newgame.next_drive()

    # Make drive history DF, starting with this first drive.
    newgame.drives_selected = pd.Series.to_frame(first_drive)

    # Update game object after the first drive
    drive_num = 1
    newgame.record_drive( first_drive, drive_num )

    #while newgame.time_rem.astype(float) >= 5:
    for half in (1,2):
        newgame.half = half
        if half > 1:
            newgame.time_rem = 1800
    
        end_of_half = False
        while (not end_of_half) and (newgame.time_rem > 0):
            drive_num += 1
    
            # Choose a new drive
            this_drive = newgame.next_drive()
            # Add drive to chosen drives dataframe
            newgame.drives_selected = pd.concat( [newgame.drives_selected, this_drive], axis=1 )
        
            # Update the game object
            newgame.record_drive( this_drive, drive_num )
        
            # Check for end of Half
            if this_drive.EoH == 1:
                end_of_half = True
        
    
    # Post-game, need to transpose the dataFrames
    newgame.drives_selected = newgame.drives_selected.transpose()
    newgame.gamestate_df = newgame.gamestate_df.transpose()
    
    newgame.result = [newgame.home, newgame.home_score, newgame.away, newgame.away_score]
    
    return newgame

In [39]:
game = simulate_game("TB",'NO')
print(game.result)
cols = [ 'home','away','offense','half','time_rem','time','result',
         'home_score','home_score_after','away_score','away_score_after' ]
game.gamestate_df[cols]

sum of weights =  408.820195135
^ 49757 0.00121880926492
sum of weights =  437.245036574
^ 5482 0.000916749106923
sum of weights =  460.085791956
^ 53801 0.00339313887231
sum of weights =  364.894775393
^ 49237 0.00530489514959
sum of weights =  327.885861712
^ 50947 0.00231997458846
sum of weights =  190.722369798
^ 53347 0.000775058898831
sum of weights =  146.686500791
^ 53794 0.00595587067707
sum of weights =  142.16832867
^ 44906 0.00633992879027
sum of weights =  99.1594071902
^ 24618 0.000900004365869
sum of weights =  100.72402022
^ 53795 0.0164039790481
sum of weights =  78.208329179
^ 18920 0.00280129311022
sum of weights =  38.7607161116
^ 32150 0.0216645209205
sum of weights =  6.97169870427
^ 54267 0.0578008066578
sum of weights =  394.334198513
^ 50358 0.00440516654401
sum of weights =  419.723887342
^ 21332 0.000488134545127
sum of weights =  419.175746979
^ 52562 0.00448859320248
sum of weights =  363.863310342
^ 51195 0.00849543140855
sum of weights =  352.997556371
^ 

Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score,home_score_after,away_score,away_score_after
0,TB,NO,NO,1,1800,242,Missed FG,0,0,0,0
0,TB,NO,TB,1,1558,215,Punt,0,0,0,0
0,TB,NO,NO,1,1343,338,Field Goal,0,0,0,3
0,TB,NO,TB,1,1005,94,Punt,0,0,3,3
0,TB,NO,NO,1,911,270,Touchdown,0,0,3,10
0,TB,NO,TB,1,641,93,Punt,0,0,10,10
0,TB,NO,NO,1,548,84,Punt,0,0,10,10
0,TB,NO,TB,1,464,159,Field Goal,0,3,10,10
0,TB,NO,NO,1,305,12,Fumble,3,3,10,10
0,TB,NO,TB,1,293,138,Field Goal,3,6,10,10


In [16]:
for i in range(10):
    game = simulate_game("CAR","PHI")
    print(game.result)

['CAR', 19, 'PHI', 33]
['CAR', 41, 'PHI', 31]
['CAR', 24, 'PHI', 13]
['CAR', 23, 'PHI', 27]
['CAR', 17, 'PHI', 64]
['CAR', 22, 'PHI', 27]
['CAR', 27, 'PHI', 46]
['CAR', 14, 'PHI', 5]
['CAR', 16, 'PHI', 10]
['CAR', 41, 'PHI', 45]


### Figure out slightly less simple weighting function
- Look at time remaining in half.
- Look at how old the drive is.
- Maybe, score difference.

In [13]:
gamedata = pd.read_csv('../data/espn_gamedata2009-2017.csv')
gamedata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2306 entries, 0 to 2305
Data columns (total 10 columns):
gameId        2306 non-null int64
result        2306 non-null object
season        2306 non-null int64
week          2306 non-null int64
home          2297 non-null object
away          2297 non-null object
winner        2306 non-null object
home_score    2306 non-null object
away_score    2306 non-null object
OT            2306 non-null object
dtypes: int64(3), object(7)
memory usage: 180.2+ KB


In [14]:
# Include time information on these plays
alldrives = alldrives.merge(
                right=gamedata[['gameId','season','week']],
                how='left',
                left_on='gameId',
                right_on='gameId')

In [49]:
game = simulate_game("TB",'NO')
print(game.result)
cols = [ 'home','away','offense','half','time_rem','time','result',
         'home_score_after','away_score_after' ]
print(game.drives_selected[['season','week']])
game.gamestate_df[cols]

sum of weights =  202.420239138
^ 48804 0.00309429787685
sum of weights =  217.25315508
^ 37817 0.000767148966401
sum of weights =  172.048280848
^ 50942 0.00529384895312
sum of weights =  153.86061248
^ 42025 0.000254203108996
sum of weights =  96.8903785057
^ 51471 0.00089756445051
sum of weights =  75.9322137026
^ 43870 0.00265450866577
sum of weights =  78.9863005287
^ 23436 0.00241502040811
sum of weights =  62.13888714
^ 30532 0.00316452318434
sum of weights =  65.268078692
^ 35173 0.000556107580318
sum of weights =  42.0865643168
^ 53352 0.0157766152939
sum of weights =  25.9513193567
^ 50246 0.0374721857375
sum of weights =  16.0207288762
^ 39558 0.00480840737656
sum of weights =  10.3620742363
^ 22022 0.00435447944303
sum of weights =  90.5136567834
^ 45474 0.00160731202826
sum of weights =  130.28687158
^ 28448 0.000332076839938
sum of weights =  187.935973531
^ 52726 0.00181599767787
sum of weights =  114.728685957
^ 43144 0.00241646649098
sum of weights =  93.6959023243
^ 3

Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score_after,away_score_after
0,TB,NO,NO,1,1800,173,Field Goal,0,3
0,TB,NO,TB,1,1627,88,Interception Touchdown,0,10
0,TB,NO,TB,1,1539,189,Punt,0,10
0,TB,NO,NO,1,1350,160,Touchdown,0,17
0,TB,NO,TB,1,1190,84,Punt,0,17
0,TB,NO,NO,1,1106,200,Punt,0,17
0,TB,NO,TB,1,906,71,Punt,0,17
0,TB,NO,NO,1,835,92,Punt,0,17
0,TB,NO,TB,1,743,265,Field Goal,3,17
0,TB,NO,NO,1,478,148,Punt,3,17


### Pieces to add:
Rather than randomly choose drives with End of Half ending, I should check probability that a drive ends in 'End of Half' based on how much time is remaining.

Configure weights to decide which drives are more likely. Mostly implemented like I want, but lots to think about for this option.
- Home/away differences?
- 1st half vs. 2nd half.

In [43]:
games = []
for i in range(10):
    games.append(simulate_game("CAR","NO"))

for g in games:
    print(g.result)

sum of weights =  199.483331178
^ 19540 5.29502987357e-05
sum of weights =  213.88532683
^ 18681 2.21452206755e-05
sum of weights =  217.814188827
^ 51193 0.00391587557906
sum of weights =  205.975834846
^ 26243 0.000827816904373
sum of weights =  187.232319818
^ 48212 0.00203303524449
sum of weights =  222.172044765
^ 23728 0.00034917181504
sum of weights =  186.039401253
^ 54262 0.00197620811819
sum of weights =  190.427648954
^ 52974 0.000560173870548
sum of weights =  121.037876962
^ 53794 0.000380465626568
sum of weights =  99.1596775537
^ 51270 0.00331848973864
sum of weights =  91.1953475856
^ 31861 0.000909297211494
sum of weights =  78.4921206474
^ 30258 0.00170531232101
sum of weights =  54.7270885647
^ 53335 0.0174815525325
sum of weights =  40.8609876419
^ 12732 0.00335266117179
sum of weights =  27.0589238228
^ 25381 0.00341312129534
sum of weights =  16.6601110737
^ 51926 0.00205094984773
sum of weights =  100.089524701
^ 39414 0.00165462859364
sum of weights =  117.15945

^ 52821 0.00126942317683
sum of weights =  87.6582557527
^ 45637 0.000293538182051
sum of weights =  71.6032210373
^ 44485 0.00895969308491
sum of weights =  89.7809532703
^ 34880 0.00220916830728
sum of weights =  64.3546454135
^ 43466 0.00122381921802
sum of weights =  41.0644386208
^ 45900 0.000199943807618
sum of weights =  30.1638775956
^ 46396 0.00453361150372
sum of weights =  25.9049502102
^ 50248 0.02320400605
sum of weights =  15.2177523759
^ 52736 0.0393138504635
sum of weights =  15.3478350336
^ 45481 0.0131630174378
sum of weights =  8.73629857525
^ 50249 0.037108042831
sum of weights =  3.01495298988
^ 26049 0.00763084199872
sum of weights =  201.86606579
^ 45636 0.00144983260656
sum of weights =  192.35790455
^ 11839 8.6819359701e-05
sum of weights =  176.21618328
^ 53724 0.00517505880772
sum of weights =  197.788940064
^ 50942 0.00450171168846
sum of weights =  164.076293643
^ 50544 0.000952043652274
sum of weights =  171.818357551
^ 28274 0.00062305499383
sum of weight

In [133]:
games[6].gamestate_df[cols]

Unnamed: 0,home,away,offense,half,time_rem,time,result,home_score_after,away_score_after
0,CAR,NO,CAR,1,1800,171,Field Goal,3,0
0,CAR,NO,NO,1,1629,319,Missed FG,3,0
0,CAR,NO,CAR,1,1310,95,Punt,3,0
0,CAR,NO,NO,1,1215,253,Fumble,3,0
0,CAR,NO,CAR,1,962,113,Touchdown,10,0
0,CAR,NO,NO,1,849,135,TOUCHDOWN,10,7
0,CAR,NO,CAR,1,714,192,Punt,10,7
0,CAR,NO,NO,1,522,131,Punt Return Touchdown,16,7
0,CAR,NO,NO,1,391,88,Touchdown,16,15
0,CAR,NO,CAR,1,303,189,Field Goal,19,15


Have a look at timing the performance

In [132]:
import timeit
t = timeit.Timer("simulate_game('CAR','NYG')",
                  "from __main__ import simulate_game",
                )
t.repeat()

sum of weights =  292.282754112
^ 47619 0.00138852792395
sum of weights =  316.603303307
^ 44678 0.000196891107333
sum of weights =  299.451802332
^ 53375 0.00179351585259
sum of weights =  269.440878267
^ 11726 5.17956248453e-05
sum of weights =  214.040246747
^ 3976 0.000138477142238
sum of weights =  143.713372267
^ 36666 0.00142453244198
sum of weights =  116.881480897
^ 44694 0.00243545975343
sum of weights =  56.1316403364
^ 47433 0.000659126376333
sum of weights =  49.149304971
^ 33284 0.000496913472247
sum of weights =  15.0605740006
^ 46082 0.00628616687044
sum of weights =  1.94680342385
^ 43718 0.21173438878
sum of weights =  64.6391849962
^ 5307 0.000424593318504
sum of weights =  53.7114744781
^ 47804 0.0071652152676
sum of weights =  52.4376989021
^ 44486 0.00504691391321
sum of weights =  97.9910389501
^ 49899 0.00210529524773
sum of weights =  117.980987853
^ 31731 0.000678650887922
sum of weights =  86.2960567433
^ 10498 0.000865374393653
sum of weights =  88.381694971

sum of weights =  52.0508723367
^ 30080 0.00104326393111
sum of weights =  2.8049331446
^ 27191 0.0368634163711
sum of weights =  1.03577194038
^ 43704 0.140883104629
sum of weights =  297.917687817
^ 29443 0.000172693199821
sum of weights =  332.596556131
^ 44515 0.000620321297091
sum of weights =  330.48380929
^ 30395 0.000600492696408
sum of weights =  317.024883656
^ 28655 0.000374755744325
sum of weights =  209.154027128
^ 50732 0.00211853198804
sum of weights =  213.048467745
^ 49980 0.00321288436294
sum of weights =  166.72641748
^ 40124 0.00126314553515
sum of weights =  99.7721840786
^ 49759 0.0018600999612
sum of weights =  76.1182022332
^ 47654 0.00598764376874
sum of weights =  33.6161772911
^ 51038 0.0196624604458
sum of weights =  18.7293453596
^ 52944 0.03801168939
sum of weights =  10.959212659
^ 40346 0.0120461417839
sum of weights =  5.57705066796
^ 17861 0.0155363905154
sum of weights =  34.9505057147
^ 32697 0.000210113213125
sum of weights =  29.2772070543
^ 51761 

^ 42367 0.00107755369483
sum of weights =  298.232919034
^ 24880 0.00053746644119
sum of weights =  326.330380293
^ 45741 0.00139948085469
sum of weights =  305.818454797
^ 2695 0.000179021038059
sum of weights =  165.252606324
^ 2588 0.000520458850155
sum of weights =  104.775748871
^ 48657 0.00281399774328
sum of weights =  104.376329341
^ 42361 0.000101293188183
sum of weights =  72.7539438979
^ 33718 5.93542812188e-05
sum of weights =  64.9334409839
^ 49325 0.00746394741936
sum of weights =  34.6168497618
^ 34241 0.00170952973887
sum of weights =  6.0248266738
^ 46068 0.0148403156126
sum of weights =  297.917687817
^ 4392 0.000307032449756
sum of weights =  333.672280094
^ 53358 0.00264197731177
sum of weights =  286.522189553
^ 45896 0.000635009800349
sum of weights =  226.602364874
^ 52933 0.000711476258372
sum of weights =  200.976466669
^ 49981 0.00424846420463
sum of weights =  177.550942076
^ 49747 0.000581682213035
sum of weights =  159.520326522
^ 48371 0.00154507196694
sum

KeyboardInterrupt: 

In [None]:
    # segment possible plays into those for home/away team,
    # 3 categories for each: 5+ mins remaining --> main game
    #                        <5 mins remaining in 1st half
    #                        <5 mins remaining in 2nd half
    # Keep weights in a dictionary. Recalculate individual weights only when necessary
    # Give each play a base weight according to how old game is.
    # Could expand base weight to include home/away matching.
    # Recalculate score weight when score changes.
    # Eliminate time weight.