# Pythagorean Expectation Via Weibull Distribution

The goal of this notebook is to build a model to predict the winner of a baseball game using the data gathered as a season goes forward. The main idea of rather using the Pythagorean Expectation is to use the idea of Steven J. Miller's paper "A Derivation of the Pythagorean Won-Lose Formula in Baseball" (see https://arxiv.org/pdf/math/0509698.pdf). However, rather using the actual Pythagorean Expectation as our measure of whether or not a team will win, we will use the empirical observation that the Pythagorean Expectation is a good approximation for the actual win percentage in baseball. Furthermore, as the paper shows the Pythagorean Expectation gives the precise win percentage when we make the assumption that the runs scored and runs allowed follow Weibull distributions with given parameters. Thus, the idea of the model will be to use this observation, so for any particular game between two games look at how the teams previously did throughout the season to determine what the appropriate parameters for their Weibull distribution will be for that particular game. Then we shall simulate several games for how many runs will be scored by each team using these distributions to predict who will win the game. 

Note: this notebook is meant more of a proof of concept, so we shall only focus on the data from the 2010 season. 

In [1]:
##Import of packages that we will use 
import pandas as pd
import numpy as np
import math

Note: the data we are reading in will be an adjusted version of Joe's game data, but we have removed the variables which we do not use (i.e. win percentage, pythagorean expectation, etc.), we have also added additional columns that we will be using to get the parameters for the Weibull distribution. Primarily, we have added a cumulative runs scored and allowed columns which consist of the cumulative runs scored and allowed before that particular game (thus for the first game each team will have 0 to initialize). We have used these cumulative runs/scored and allowed to calculate the Weibull distribution parameters at that point in the season.

Remark: The three parameter Weibull distribution has pdf 
$$
f(x;\alpha,\beta,\gamma)=\begin{cases}
\frac{\gamma}{\alpha}\left(\frac{x-\beta}{\alpha}\right)^{\gamma-1}e^{-\left(\frac{x-\beta}{\alpha}\right)^\gamma} & x\geq \beta\\
0 & \text{else}
\end{cases}
$$
Per the paper, we note that $\gamma$ corresponds to the exponent appearing in the Pythagorean Expectation formula, for simplicity of the model, we shall make the assumption in what follows is that $\gamma=1.82$ as a constant which we shall not change. Furthermore, per the commment the variable $\beta$ is just a translation which has no affect if we assume that it is constant in all cases (and was included, so that the ideas may be extrapolated to other sports with higher scoring games). For baseball Miller takes $\beta=-.5$, so that the bins are centered at integers. Thus, we shall take these variables to be constant throughout. Now per equation (2.8), we have a formula for the parameter $\alpha$ for each team given by 
$$
\alpha_{RS}(n+1)=\frac{RS(n)-\beta}{\Gamma(1+\gamma^{-1})}
$$
and
$$
\alpha_{RA}=\frac{RA-\beta}{\Gamma(1+\gamma^{-1})}
$$
Thus, if we use the previous RS and RA for a team, we can assume that their RS and RA follow a Weibull distirbution with the above parameters. Then there will be two ways that we can predict how each team will win (namely using the RS or RA for each team). 

In [2]:
#Initialize our values of gamma and beta as
#we are fixing them now, but might play with
#this later
gamma = 1.82
beta = 0

In [3]:
##Read data from Joe's csv file as gl_2010 (i.e. game log 2010)
gl_2010 = pd.read_csv("./game_log_sorter/sorted_game_logs/gl2010_sorted.csv")

In [4]:
##We can take a quick look at our data
gl_2010.head()

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,opponent_hits,win_loss,win_frac_actual,win_frac_cummulative_pythagorean,win_frac_cummulative_pythagenpat
0,81,20100405,ANA,MIN,6,3,1,33,9,32,7,1.0,1.0,0.8,0.786213
1,82,20100406,ANA,MIN,3,5,2,33,8,32,9,0.0,0.5,0.558621,0.548292
2,83,20100407,ANA,MIN,2,4,3,35,9,34,7,0.0,0.333333,0.456604,0.468054
3,84,20100408,ANA,MIN,1,10,4,34,8,37,11,0.0,0.25,0.229299,0.297089
4,85,20100409,ANA,OAK,4,10,5,35,8,41,13,0.0,0.2,0.2,0.275311


In [5]:
#Gives the list of teams and alphabetizes it
team_list = gl_2010.team.value_counts().keys()
team_list = sorted(team_list)

We will now make the data that we need, but we shall do this individually for ANA as a test case, and then we will generalize it for all the other teams all at once 

In [6]:
#This is a function that takes in a dataframe
#for an individual team, and returns a series
#which gives the cumulative sum of the runs scored
#for all the games prior to that given one
def make_previous_cum_sum_rs(df):
    cumsumlist = [0]
    for x in df.runs_scored.cumsum():
        cumsumlist.append(x)
    cumsumlist.pop()
    return pd.Series(cumsumlist)
#This is an analogous function for runs allowed
def make_previous_cum_sum_ra(df):
    cumsumlist = [0]
    for x in df.runs_allowed.cumsum():
        cumsumlist.append(x)
    cumsumlist.pop()
    return pd.Series(cumsumlist)    

In [7]:
#This is a function that takes in the data frame
#it then takes the given cumulative runs scored
#from all previous games, and uses it to construct
#the alpha parameters
def make_alpha_rs(df, beta, gamma):
    return (df.previous_cumulative_runs_scored - beta) / (math.gamma(1 + (1 / gamma)))
#This is the analogous function for runs allowed
def make_alpha_ra(df, beta, gamma):
    return (df.previous_cumulative_runs_allowed - beta) / (math.gamma(1 + (1 / gamma)))

In [8]:
#Creates a dataframe just for the team ANA
ANA_2010 = gl_2010.loc[gl_2010.team == "ANA"]
ANA_2010.head()

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,opponent_hits,win_loss,win_frac_actual,win_frac_cummulative_pythagorean,win_frac_cummulative_pythagenpat
0,81,20100405,ANA,MIN,6,3,1,33,9,32,7,1.0,1.0,0.8,0.786213
1,82,20100406,ANA,MIN,3,5,2,33,8,32,9,0.0,0.5,0.558621,0.548292
2,83,20100407,ANA,MIN,2,4,3,35,9,34,7,0.0,0.333333,0.456604,0.468054
3,84,20100408,ANA,MIN,1,10,4,34,8,37,11,0.0,0.25,0.229299,0.297089
4,85,20100409,ANA,OAK,4,10,5,35,8,41,13,0.0,0.2,0.2,0.275311


In [9]:
#We shall add columns for cumulative RS, RA, 
#and the corresponding Weibull pararmeters
ANA_2010.insert(len(ANA_2010.T),"previous_cumulative_runs_scored", make_previous_cum_sum_rs(ANA_2010))
ANA_2010.insert(len(ANA_2010.T),"previous_cumulative_runs_allowed", make_previous_cum_sum_ra(ANA_2010))
ANA_2010.insert(len(ANA_2010.T),"alpha_rs", make_alpha_rs(ANA_2010, beta, gamma))
ANA_2010.insert(len(ANA_2010.T),"alpha_ra", make_alpha_ra(ANA_2010, beta, gamma))

In [10]:
ANA_2010.head()

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,opponent_hits,win_loss,win_frac_actual,win_frac_cummulative_pythagorean,win_frac_cummulative_pythagenpat,previous_cumulative_runs_scored,previous_cumulative_runs_allowed,alpha_rs,alpha_ra
0,81,20100405,ANA,MIN,6,3,1,33,9,32,7,1.0,1.0,0.8,0.786213,0,0,0.0,0.0
1,82,20100406,ANA,MIN,3,5,2,33,8,32,9,0.0,0.5,0.558621,0.548292,6,3,6.75046,3.37523
2,83,20100407,ANA,MIN,2,4,3,35,9,34,7,0.0,0.333333,0.456604,0.468054,9,8,10.12569,9.000613
3,84,20100408,ANA,MIN,1,10,4,34,8,37,11,0.0,0.25,0.229299,0.297089,11,12,12.375843,13.50092
4,85,20100409,ANA,OAK,4,10,5,35,8,41,13,0.0,0.2,0.2,0.275311,12,22,13.50092,24.751687


We will now break everything up in terms of the teams for the general case of the code to break things into each of the teams

In [11]:
#This function takes in our data and the name of the team and 
#returns a dataframe that is just for that specific team
#it then calls a function that adds the additional data we 
#want from what we have
def make_df_for_team(gamelog, team):
    df = gamelog.loc[gamelog.team == team].reset_index(drop = True) 
    #Note we had to reindex otherwise only the first 
    return make_df_from_team_df(df)
    
#This function takes in the dataframe for a specific team
#It then appends new columns with the additional data that we 
#are interested in which we will be using
def make_df_from_team_df(df):
    df.insert(len(df.T),"previous_cumulative_runs_scored", make_previous_cum_sum_rs(df))
    df.insert(len(df.T),"previous_cumulative_runs_allowed", make_previous_cum_sum_ra(df))
    df.insert(len(df.T),"alpha_rs", make_alpha_rs(df, beta, gamma))
    df.insert(len(df.T),"alpha_ra", make_alpha_ra(df, beta, gamma))
    return df

In [12]:
#Make a list for the data frame for each team
frames = []
for team in team_list:
    df = make_df_for_team(gl_2010, team)#for each data frame we get the rest of the data we want
    frames.append(df)  #adds the dataframe for that team to the list
result = pd.concat(frames) #puts together all the dataframes into a single dataframe
result["alpha_rs"] = result["alpha_rs"] + (result["alpha_rs"] == 0) #adds 1 if alpha_rs = 0
result["alpha_ra"] = result["alpha_ra"] + (result["alpha_ra"] == 0) #adds 1 if alpha_ra = 0

In [13]:
result

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,opponent_hits,win_loss,win_frac_actual,win_frac_cummulative_pythagorean,win_frac_cummulative_pythagenpat,previous_cumulative_runs_scored,previous_cumulative_runs_allowed,alpha_rs,alpha_ra
0,81,20100405,ANA,MIN,6,3,1,33,9,32,7,1.0,1.000000,0.800000,0.786213,0,0,1.000000,1.000000
1,82,20100406,ANA,MIN,3,5,2,33,8,32,9,0.0,0.500000,0.558621,0.548292,6,3,6.750460,3.375230
2,83,20100407,ANA,MIN,2,4,3,35,9,34,7,0.0,0.333333,0.456604,0.468054,9,8,10.125690,9.000613
3,84,20100408,ANA,MIN,1,10,4,34,8,37,11,0.0,0.250000,0.229299,0.297089,11,12,12.375843,13.500920
4,85,20100409,ANA,OAK,4,10,5,35,8,41,13,0.0,0.200000,0.200000,0.275311,12,22,13.500920,24.751687
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157,160,20100928,WAS,PHI,2,1,158,27,3,35,9,1.0,0.430380,0.444856,0.485334,647,724,727.924607,814.555511
158,161,20100929,WAS,PHI,1,7,159,30,3,33,8,0.0,0.427673,0.440874,0.484297,649,725,730.174761,815.680588
159,78,20101001,WAS,NYN,1,2,160,33,3,33,4,0.0,0.425000,0.440287,0.484187,650,732,731.299837,823.556124
160,79,20101002,WAS,NYN,2,7,161,32,6,33,11,0.0,0.422360,0.437123,0.483369,651,734,732.424914,825.806278


Now that we have the data that we are interested in, we can now build up our predictive model, we shall do this by using the alpha_ra and alpha_rs variables as our variables for our Weibull distributions for each team.

We shall now go through things for the first game of the season before writing the code that will do it all at once. Note that this code only works for the first game of the season, but when iterated, we found that it failed at game 4 because CHA's game 4 was against MIN and ANA also was against MIN in game 4, so we have to go about this a separate way. 

In [14]:
def run_simulation_same_rs (alpha1_rs, alpha2_rs):
    team1wincount = 0
    for x in range(1001):
        rs1 = gamma * np.random.weibull(alpha1_rs)
        rs2 = gamma * np.random.weibull(alpha2_rs)
        if (rs1>rs2):
            team1wincount = team1wincount + 1
    return team1wincount

def run_simulation_same_ra (alpha1_ra, alpha2_ra):
    team1wincount = 0
    for x in range(1001):
        ra1 = gamma * np.random.weibull(alpha1_ra)
        ra2 = gamma * np.random.weibull(alpha2_ra)
        if (ra2>ra1):
            team1wincount = team1wincount + 1
    return team1wincount

def run_simulation_rs_ra (alpha1_rs, alpha1_ra, alpha2_rs,alpha2_ra):
    team1wincount = 0
    for x in range(1001):
        rs1 = gamma * np.random.weibull(alpha1_rs)
        rs2 = gamma * np.random.weibull(alpha2_rs)
        ra1 = gamma * np.random.weibull(alpha1_ra)
        ra2 = gamma * np.random.weibull(alpha2_ra)
        if (rs1-ra2>rs2-ra1):
            team1wincount = team1wincount + 1
    return team1wincount

In [15]:
def simulate_game(result, game_number, team_list):
    games = result.loc[result.game_number == game_number].reset_index(drop = True)
    #Makes an empty list of all the teams that will be played
    teams_played = []
    #Creates a for loop that goes through each of the rows for 
    #the particular game we are interested in
    games.insert(len(games.T),"same_rs_percentage",0)
    games.insert(len(games.T),"same_ra_percentage",0)
    games.insert(len(games.T),"both_percentage",0)
    for game in range(games.shape[0]):
        #For the particular game we are interested in we now
        #look at all the teams in the roster
        for team in team_list:
            #If the team that we are interested in has not
            #played yet, then we add it and its opponent to 
            #the teams played list, and simulate the games 
            #of it with its opponent
            if games.team[game] not in teams_played:
                if games.opponent[game] not in teams_played:
                    teams_played.append(games.opponent[game])
                if games.team[game] not in teams_played:
                    teams_played.append(games.team[game])
                opponents_index = games.loc[games.opponent == games.team[game]].index[0]
                team1wincount_same_rs = run_simulation_same_rs(games.alpha_rs[game]
                                                           ,games.alpha_rs[opponents_index])
                team1wincount_same_ra = run_simulation_same_ra(games.alpha_ra[game]
                                                           ,games.alpha_ra[opponents_index])
                team1wincount_different = run_simulation_rs_ra (games.alpha_rs[game], 
                                                            games.alpha_ra[game], 
                                                            games.alpha_rs[opponents_index],
                                                            games.alpha_ra[opponents_index])
                #print(team1wincount_same_rs)
                games["same_rs_percentage"][game] = team1wincount_same_rs/101
                games["same_rs_percentage"][opponents_index] = 1 - team1wincount_same_rs/101
                games["same_ra_percentage"][game] = team1wincount_same_ra/101
                games["same_ra_percentage"][opponents_index] = 1 - team1wincount_same_ra/101
                games["both_percentage"][game] = team1wincount_different/101
                games["both_percentage"][opponents_index] = 1 - team1wincount_different/101
    return games

Here we shall do this instead of breaking it up by the game number, we shall do this for each individual team. Thus, for each team we will look at who is the winner, we shall do this for ANA and then generalize to all the others by iterating over the team_list things might also get difficult to code if there are double headers, but we will cross that bridge when we do

In [16]:
team_games = result.loc[result.team == "ANA"]


In [17]:
result.head()

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,opponent_hits,win_loss,win_frac_actual,win_frac_cummulative_pythagorean,win_frac_cummulative_pythagenpat,previous_cumulative_runs_scored,previous_cumulative_runs_allowed,alpha_rs,alpha_ra
0,81,20100405,ANA,MIN,6,3,1,33,9,32,7,1.0,1.0,0.8,0.786213,0,0,1.0,1.0
1,82,20100406,ANA,MIN,3,5,2,33,8,32,9,0.0,0.5,0.558621,0.548292,6,3,6.75046,3.37523
2,83,20100407,ANA,MIN,2,4,3,35,9,34,7,0.0,0.333333,0.456604,0.468054,9,8,10.12569,9.000613
3,84,20100408,ANA,MIN,1,10,4,34,8,37,11,0.0,0.25,0.229299,0.297089,11,12,12.375843,13.50092
4,85,20100409,ANA,OAK,4,10,5,35,8,41,13,0.0,0.2,0.2,0.275311,12,22,13.50092,24.751687


In [18]:
team_games.head()

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,opponent_hits,win_loss,win_frac_actual,win_frac_cummulative_pythagorean,win_frac_cummulative_pythagenpat,previous_cumulative_runs_scored,previous_cumulative_runs_allowed,alpha_rs,alpha_ra
0,81,20100405,ANA,MIN,6,3,1,33,9,32,7,1.0,1.0,0.8,0.786213,0,0,1.0,1.0
1,82,20100406,ANA,MIN,3,5,2,33,8,32,9,0.0,0.5,0.558621,0.548292,6,3,6.75046,3.37523
2,83,20100407,ANA,MIN,2,4,3,35,9,34,7,0.0,0.333333,0.456604,0.468054,9,8,10.12569,9.000613
3,84,20100408,ANA,MIN,1,10,4,34,8,37,11,0.0,0.25,0.229299,0.297089,11,12,12.375843,13.50092
4,85,20100409,ANA,OAK,4,10,5,35,8,41,13,0.0,0.2,0.2,0.275311,12,22,13.50092,24.751687


In [19]:
#iterate over all the game numbers
import random
random.seed(10)
team_games.insert(len(team_games.T),"same_rs_percentage",0)
team_games.insert(len(team_games.T),"same_ra_percentage",0)
team_games.insert(len(team_games.T),"both_percentage",0)
for game_number in range(len(team_games)):
    #get the index for where the opponent is for this game
    opponent_index = result.loc[(result.team == team_games.opponent[game_number]) 
                         & (result.opponent == "ANA")
                         & (result.Date == team_games.Date[game_number])
                         & (result.runs_allowed == team_games.runs_scored[game_number])
                        ].values[0][0]
    #gets the alpha values for each team
    home_alpha_rs = team_games.alpha_rs[game_number]
    home_alpha_ra = team_games.alpha_ra[game_number]
    away_alpha_rs = result.reset_index(drop = True).alpha_rs[opponent_index]
    away_alpha_ra = result.reset_index(drop = True).alpha_ra[opponent_index]
    
    team1wincount_same_rs = run_simulation_same_rs(home_alpha_rs,away_alpha_rs)
    team1wincount_same_ra = run_simulation_same_ra(home_alpha_ra,away_alpha_ra)
    team1wincount_different = run_simulation_rs_ra (home_alpha_rs, 
                                                    home_alpha_ra, 
                                                    away_alpha_rs,
                                                    away_alpha_ra)
    #print(team1wincount_same_rs)
    team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
    team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
    team_games["both_percentage"][game_number] = team1wincount_different/1001
team_games.insert(len(team_games.T),"same_rs_prediction",(team_games["same_rs_percentage"].values>.5)*1)
team_games.insert(len(team_games.T),"same_ra_prediction",(team_games["same_ra_percentage"].values>.5)*1)
team_games.insert(len(team_games.T),"both_prediction",(team_games["both_percentage"].values>.5)*1)
team_games.insert(len(team_games.T),"same_rs_prediction_correct",(team_games.same_rs_prediction == team_games.win_loss)*1)
team_games.insert(len(team_games.T),"same_ra_prediction_correct",(team_games.same_ra_prediction == team_games.win_loss)*1)
team_games.insert(len(team_games.T),"both_prediction_correct",(team_games.both_prediction == team_games.win_loss)*1)
print("ANA RS Prediction accuracy " + str((team_games.cumsum().same_rs_prediction_correct[161])/162))
print("ANA RA Prediction accuracy " + str((team_games.cumsum().same_ra_prediction_correct[161])/162))
print("ANA Both Prediction accuracy " + str((team_games.cumsum().both_prediction_correct[161])/162))


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


ANA RS Prediction accuracy 0.5123456790123457
ANA RA Prediction accuracy 0.48148148148148145
ANA Both Prediction accuracy 0.5617283950617284


In [20]:
team_games

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,...,alpha_ra,same_rs_percentage,same_ra_percentage,both_percentage,same_rs_prediction,same_ra_prediction,both_prediction,same_rs_prediction_correct,same_ra_prediction_correct,both_prediction_correct
0,81,20100405,ANA,MIN,6,3,1,33,9,32,...,1.000000,0.489510,0.503497,0.504496,0,1,1,0,1,1
1,82,20100406,ANA,MIN,3,5,2,33,8,32,...,3.375230,0.511489,0.496503,0.496503,1,0,0,0,1,1
2,83,20100407,ANA,MIN,2,4,3,35,9,34,...,9.000613,0.524476,0.507493,0.502498,1,1,1,0,0,0
3,84,20100408,ANA,MIN,1,10,4,34,8,37,...,13.500920,0.500500,0.497502,0.507493,1,0,1,0,1,0
4,85,20100409,ANA,OAK,4,10,5,35,8,41,...,24.751687,0.593407,0.379620,0.562438,1,0,1,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157,161,20100929,ANA,OAK,2,1,158,39,8,40,...,771.802597,0.523477,0.461538,0.600400,1,0,1,1,0,1
158,77,20100930,ANA,TEX,2,3,159,31,5,29,...,772.927674,0.495504,0.490509,0.492507,0,0,0,1,1,1
159,78,20101001,ANA,TEX,5,4,160,41,11,41,...,776.302904,0.504496,0.514486,0.498501,1,1,0,1,1,0
160,79,20101002,ANA,TEX,2,6,161,32,6,32,...,780.803211,0.514486,0.490509,0.487512,1,0,0,0,1,1


In [21]:
print("ANA RS Prediction accuracy " + str((team_games.cumsum().same_rs_prediction_correct[161])/162))
print("ANA RA Prediction accuracy " + str((team_games.cumsum().same_ra_prediction_correct[161])/162))
print("ANA Both Prediction accuracy " + str((team_games.cumsum().both_prediction_correct[161])/162))


ANA RS Prediction accuracy 0.5123456790123457
ANA RA Prediction accuracy 0.48148148148148145
ANA Both Prediction accuracy 0.5617283950617284


In [22]:
def simulate_for_team(result, team):
    team_games = result.loc[result.team == team]
    #iterate over all the game numbers
    team_games.insert(len(team_games.T),"same_rs_percentage",0)
    team_games.insert(len(team_games.T),"same_ra_percentage",0)
    team_games.insert(len(team_games.T),"both_percentage",0)
    for game_number in range(len(team_games)):
        #get the index for where the opponent is for this game
        opponent_index = result.loc[(result.team == team_games.opponent[game_number]) 
                             & (result.opponent == team)
                             & (result.Date == team_games.Date[game_number])
                             & (result.runs_allowed == team_games.runs_scored[game_number])
                            ].values[0][0]
        #gets the alpha values for each team
        home_alpha_rs = team_games.alpha_rs[game_number]
        home_alpha_ra = team_games.alpha_ra[game_number]
        away_alpha_rs = result.reset_index(drop = True).alpha_rs[opponent_index]
        away_alpha_ra = result.reset_index(drop = True).alpha_ra[opponent_index]
        team1wincount_same_rs = run_simulation_same_rs(home_alpha_rs,away_alpha_rs)
        team1wincount_same_ra = run_simulation_same_ra(home_alpha_ra,away_alpha_ra)
        team1wincount_different = run_simulation_rs_ra (home_alpha_rs, 
                                                        home_alpha_ra, 
                                                        away_alpha_rs,
                                                        away_alpha_ra)
        #print(team1wincount_same_rs)
        team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
        team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
        team_games["both_percentage"][game_number] = team1wincount_different/1001
    team_games.insert(len(team_games.T),"same_rs_prediction",(team_games["same_rs_percentage"].values>.5)*1)
    team_games.insert(len(team_games.T),"same_ra_prediction",(team_games["same_ra_percentage"].values>.5)*1)
    team_games.insert(len(team_games.T),"both_prediction",(team_games["both_percentage"].values>.5)*1)
    team_games.insert(len(team_games.T),"same_rs_prediction_correct",(team_games.same_rs_prediction == team_games.win_loss)*1)
    team_games.insert(len(team_games.T),"same_ra_prediction_correct",(team_games.same_ra_prediction == team_games.win_loss)*1)
    team_games.insert(len(team_games.T),"both_prediction_correct",(team_games.both_prediction == team_games.win_loss)*1)
    print(str(team) + "RS Prediction accuracy " + str((team_games.cumsum().same_rs_prediction_correct[161])/162))
    print(str(team) + "RA Prediction accuracy " + str((team_games.cumsum().same_ra_prediction_correct[161])/162))
    print(str(team) + "Both Prediction accuracy " + str((team_games.cumsum().both_prediction_correct[161])/162))
    return team_games

In [23]:
#This will loop through going through each game 
#number of the season
#Make a list for the data frame for each team
frames = []
for team in team_list:
    print(team)
    df = simulate_for_team(result,team) #for each data frame we get the rest of the data we want
    frames.append(df)  #adds the dataframe for that team to the list
final_results = pd.concat(frames)
        

ANA


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


ANARS Prediction accuracy 0.5493827160493827
ANARA Prediction accuracy 0.4691358024691358
ANABoth Prediction accuracy 0.5493827160493827
ARI


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


ARIRS Prediction accuracy 0.5555555555555556
ARIRA Prediction accuracy 0.46296296296296297
ARIBoth Prediction accuracy 0.5617283950617284
ATL


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


ATLRS Prediction accuracy 0.6234567901234568
ATLRA Prediction accuracy 0.3950617283950617
ATLBoth Prediction accuracy 0.6296296296296297
BAL


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


BALRS Prediction accuracy 0.5493827160493827
BALRA Prediction accuracy 0.4691358024691358
BALBoth Prediction accuracy 0.5370370370370371
BOS


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


BOSRS Prediction accuracy 0.5123456790123457
BOSRA Prediction accuracy 0.4876543209876543
BOSBoth Prediction accuracy 0.5
CHA


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


CHARS Prediction accuracy 0.5370370370370371
CHARA Prediction accuracy 0.4506172839506173
CHABoth Prediction accuracy 0.5246913580246914
CHN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


CHNRS Prediction accuracy 0.4876543209876543
CHNRA Prediction accuracy 0.49382716049382713
CHNBoth Prediction accuracy 0.5061728395061729
CIN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


CINRS Prediction accuracy 0.5370370370370371
CINRA Prediction accuracy 0.4691358024691358
CINBoth Prediction accuracy 0.5246913580246914
CLE


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


CLERS Prediction accuracy 0.5555555555555556
CLERA Prediction accuracy 0.4691358024691358
CLEBoth Prediction accuracy 0.5370370370370371
COL


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


COLRS Prediction accuracy 0.5987654320987654
COLRA Prediction accuracy 0.43209876543209874
COLBoth Prediction accuracy 0.6049382716049383
DET


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


DETRS Prediction accuracy 0.6111111111111112
DETRA Prediction accuracy 0.4012345679012346
DETBoth Prediction accuracy 0.6234567901234568
FLO


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


FLORS Prediction accuracy 0.5123456790123457
FLORA Prediction accuracy 0.5185185185185185
FLOBoth Prediction accuracy 0.47530864197530864
HOU


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


HOURS Prediction accuracy 0.5740740740740741
HOURA Prediction accuracy 0.41975308641975306
HOUBoth Prediction accuracy 0.5679012345679012
KCA


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


KCARS Prediction accuracy 0.5246913580246914
KCARA Prediction accuracy 0.5123456790123457
KCABoth Prediction accuracy 0.5493827160493827
LAN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


LANRS Prediction accuracy 0.5493827160493827
LANRA Prediction accuracy 0.4382716049382716
LANBoth Prediction accuracy 0.5555555555555556
MIL


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


MILRS Prediction accuracy 0.5
MILRA Prediction accuracy 0.47530864197530864
MILBoth Prediction accuracy 0.5061728395061729
MIN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


MINRS Prediction accuracy 0.5740740740740741
MINRA Prediction accuracy 0.43209876543209874
MINBoth Prediction accuracy 0.5679012345679012
NYA


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


NYARS Prediction accuracy 0.5246913580246914
NYARA Prediction accuracy 0.49382716049382713
NYABoth Prediction accuracy 0.5370370370370371
NYN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


NYNRS Prediction accuracy 0.5740740740740741
NYNRA Prediction accuracy 0.42592592592592593
NYNBoth Prediction accuracy 0.6049382716049383
OAK


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


OAKRS Prediction accuracy 0.5493827160493827
OAKRA Prediction accuracy 0.4567901234567901
OAKBoth Prediction accuracy 0.5740740740740741
PHI


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


PHIRS Prediction accuracy 0.5802469135802469
PHIRA Prediction accuracy 0.41975308641975306
PHIBoth Prediction accuracy 0.5617283950617284
PIT


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


PITRS Prediction accuracy 0.6111111111111112
PITRA Prediction accuracy 0.42592592592592593
PITBoth Prediction accuracy 0.6172839506172839
SDN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


SDNRS Prediction accuracy 0.5
SDNRA Prediction accuracy 0.5061728395061729
SDNBoth Prediction accuracy 0.47530864197530864
SEA


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


SEARS Prediction accuracy 0.5617283950617284
SEARA Prediction accuracy 0.42592592592592593
SEABoth Prediction accuracy 0.5679012345679012
SFN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


SFNRS Prediction accuracy 0.5432098765432098
SFNRA Prediction accuracy 0.5
SFNBoth Prediction accuracy 0.5432098765432098
SLN


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


SLNRS Prediction accuracy 0.5925925925925926
SLNRA Prediction accuracy 0.43209876543209874
SLNBoth Prediction accuracy 0.6111111111111112
TBA


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


TBARS Prediction accuracy 0.5123456790123457
TBARA Prediction accuracy 0.47530864197530864
TBABoth Prediction accuracy 0.5123456790123457
TEX


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


TEXRS Prediction accuracy 0.5432098765432098
TEXRA Prediction accuracy 0.4382716049382716
TEXBoth Prediction accuracy 0.5617283950617284
TOR


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


TORRS Prediction accuracy 0.5061728395061729
TORRA Prediction accuracy 0.47530864197530864
TORBoth Prediction accuracy 0.5493827160493827
WAS


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_rs_percentage"][game_number] = team1wincount_same_rs/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["same_ra_percentage"][game_number] = team1wincount_same_ra/1001
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  team_games["both_percentage"][game_number] = team1wincount_different/1001


WASRS Prediction accuracy 0.5740740740740741
WASRA Prediction accuracy 0.43209876543209874
WASBoth Prediction accuracy 0.5925925925925926


In [25]:
final_results

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,...,alpha_ra,same_rs_percentage,same_ra_percentage,both_percentage,same_rs_prediction,same_ra_prediction,both_prediction,same_rs_prediction_correct,same_ra_prediction_correct,both_prediction_correct
0,81,20100405,ANA,MIN,6,3,1,33,9,32,...,1.000000,0.519481,0.486513,0.501499,1,0,1,1,0,1
1,82,20100406,ANA,MIN,3,5,2,33,8,32,...,3.375230,0.512488,0.499500,0.521479,1,0,1,0,1,0
2,83,20100407,ANA,MIN,2,4,3,35,9,34,...,9.000613,0.532468,0.495504,0.471528,1,0,0,0,1,1
3,84,20100408,ANA,MIN,1,10,4,34,8,37,...,13.500920,0.497502,0.500500,0.515485,0,1,1,1,0,0
4,85,20100409,ANA,OAK,4,10,5,35,8,41,...,24.751687,0.607393,0.374625,0.560440,1,0,1,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157,160,20100928,WAS,PHI,2,1,158,27,3,35,...,814.555511,0.528472,0.456543,0.579421,1,0,1,1,0,1
158,161,20100929,WAS,PHI,1,7,159,30,3,33,...,815.680588,0.558442,0.452547,0.555445,1,0,1,0,1,0
159,78,20101001,WAS,NYN,1,2,160,33,3,33,...,823.556124,0.474525,0.495504,0.523477,0,0,1,1,1,0
160,79,20101002,WAS,NYN,2,7,161,32,6,33,...,825.806278,0.493506,0.525475,0.488511,0,1,0,1,0,1


In [26]:
final_results.reset_index(drop = True).cumsum().both_prediction_correct[4859]/4860

0.554320987654321

In [27]:
print("RS Prediction accuracy " + str((final_results.reset_index(drop = True).cumsum().same_rs_prediction_correct[4859])/4860))
print("RA Prediction accuracy " + str((final_results.reset_index(drop = True).cumsum().same_ra_prediction_correct[4859])/4860))
print("Both Prediction accuracy " + str((final_results.reset_index(drop = True).cumsum().both_prediction_correct[4859])/4860))

RS Prediction accuracy 0.5508230452674897
RA Prediction accuracy 0.4567901234567901
Both Prediction accuracy 0.554320987654321


Note: for the entire season it seems as though this model is only marginally better than flipping a coin, but at the beginning of the season we pretty much were just flipping a coin, so we might think about if we just take all the games of the season starting after game 50 (we can change this) since this metric is used part way through the season

In [28]:
LaterSeasondf = final_results.loc[final_results.game_number >=75]

In [30]:
LaterSeasondf

Unnamed: 0,index,Date,team,opponent,runs_scored,runs_allowed,game_number,at_bats,hits,opponent_at_bats,...,alpha_ra,same_rs_percentage,same_ra_percentage,both_percentage,same_rs_prediction,same_ra_prediction,both_prediction,same_rs_prediction_correct,same_ra_prediction_correct,both_prediction_correct
74,115,20100624,ANA,LAN,6,10,75,35,12,41,...,399.402219,0.539461,0.453546,0.609391,1,0,1,0,1,0
75,116,20100625,ANA,COL,3,4,76,38,8,43,...,410.652985,0.550450,0.469530,0.630370,1,0,1,0,1,0
76,117,20100626,ANA,COL,4,2,77,28,4,30,...,415.153292,0.553447,0.434565,0.599401,1,0,1,1,0,1
77,118,20100627,ANA,COL,10,3,78,33,9,35,...,417.403446,0.546454,0.467532,0.599401,1,0,1,1,0,1
78,119,20100629,ANA,TEX,6,5,79,34,11,37,...,420.778676,0.558442,0.434565,0.619381,1,0,1,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
157,160,20100928,WAS,PHI,2,1,158,27,3,35,...,814.555511,0.528472,0.456543,0.579421,1,0,1,1,0,1
158,161,20100929,WAS,PHI,1,7,159,30,3,33,...,815.680588,0.558442,0.452547,0.555445,1,0,1,0,1,0
159,78,20101001,WAS,NYN,1,2,160,33,3,33,...,823.556124,0.474525,0.495504,0.523477,0,0,1,1,1,0
160,79,20101002,WAS,NYN,2,7,161,32,6,33,...,825.806278,0.493506,0.525475,0.488511,0,1,0,1,0,1


In [32]:
for number in range(75,100):
    LaterSeasondf = final_results.loc[final_results.game_number >=number]
    print("Prediction after game number " + str(number))
    print("RS Prediction accuracy " + str((LaterSeasondf.reset_index().cumsum().same_rs_prediction_correct[len(LaterSeasondf)-1])/len(LaterSeasondf)))
    print("RA Prediction accuracy " + str((LaterSeasondf.reset_index().cumsum().same_ra_prediction_correct[len(LaterSeasondf)-1])/len(LaterSeasondf)))
    print("Both Prediction accuracy " + str((LaterSeasondf.reset_index().cumsum().both_prediction_correct[len(LaterSeasondf)-1])/len(LaterSeasondf)))

Prediction after game number 75
RS Prediction accuracy 0.5378787878787878
RA Prediction accuracy 0.4662878787878788
Both Prediction accuracy 0.5446969696969697
Prediction after game number 76
RS Prediction accuracy 0.5371647509578544
RA Prediction accuracy 0.4674329501915709
Both Prediction accuracy 0.5440613026819924
Prediction after game number 77
RS Prediction accuracy 0.537984496124031
RA Prediction accuracy 0.4662790697674419
Both Prediction accuracy 0.5453488372093023
Prediction after game number 78
RS Prediction accuracy 0.5364705882352941
RA Prediction accuracy 0.46705882352941175
Both Prediction accuracy 0.543921568627451
Prediction after game number 79
RS Prediction accuracy 0.5369047619047619
RA Prediction accuracy 0.4666666666666667
Both Prediction accuracy 0.5444444444444444
Prediction after game number 80
RS Prediction accuracy 0.5357429718875502
RA Prediction accuracy 0.4678714859437751
Both Prediction accuracy 0.5437751004016064
Prediction after game number 81
RS Predic