# Figuring out SCORE9

So SCORE9 had a small but statistically significant negative coefficient. That can be interpreted as reducing the outcome variable for all four players. Recall that the outcome is the deviation from the mean victory points for the game. I assumed that the coefficient on all of these score tiles would have to be about 0, because if you increase the score of 1 player that pulls up the mean, and therefore should have a counterbalancing negative impact on the other three players. That turned out to be true for every score tile but this one. 

Talking about it with Kyle I learned "Score 9 is the hardest tile to use effectively, scoring wise it almost never has an impact on score as much as others"

Let's see if I can simulate some data that works the same way and reproduce that result

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [2]:
# Make a series of games, give 4 players random scores
game_size = 10_000
game_df = pd.DataFrame(index=np.arange(game_size))
for i in range(1, 5):
    game_df[f"player{i}_vp"] = 100 + np.random.normal(size=game_size) * 5

In [3]:
# https://github.com/numpy/numpy/issues/5173
def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return a

In [4]:
# Add random scoring tiles
index = np.ones(game_size)
score_cols = [f"SCORE{i}" for i in range(1, 10)]
# Make an array with the correct number of true and false values for a valid game
unshuffled_scores = np.array([True, True, True, True, True, True, False, False, False])
scores_array = disarrange(np.outer(index, unshuffled_scores))
scores = pd.DataFrame(data=scores_array, columns=score_cols)

In [5]:
game_df = pd.concat([game_df, scores], axis="columns")

In [6]:
# Now give all the scores except score 9 a positive impact for a random player (but the same player for all games)
for score in range(1, 9):
    player_impacted = np.random.choice(4) + 1 # indexed from 0
    score_col = f"SCORE{score}"
    player_col = f"player{player_impacted}_vp"
    # Give it some impact plus some noise
    impact = np.random.normal(size=game_size) + np.random.choice(np.arange(11))
    game_df[player_col] += game_df[score_col] * impact

In [7]:
# Get mean victory points for each player
player_cols = [f"player{i}_vp" for i in range(1, 5)]
game_df["mean_vp"] = game_df[player_cols].mean(axis="columns")
for i in range(1, 5):
    game_df[f"player_{i}_vp_margin"] = game_df[f"player{i}_vp"] - game_df["mean_vp"]

In [8]:
# Make the player level dataframe we'll build a model from
game_level_info = game_df.reindex(columns=[f"SCORE{i}" for i in range(1,10)]).reset_index()

In [9]:
def _player_n_frame(base_df, n):
    """simplified version from the model_data module
    """
    player_dict = {
        f"player_{n}_vp_margin": "vp_margin",
    }
    player_n_df = (
        base_df.rename(columns=player_dict)
        .reindex(columns=["vp_margin"])
        .assign(player_num=n)
        .reset_index()
    )
    return player_n_df

In [10]:
player_df = pd.concat([_player_n_frame(game_df, i) for i in range(1, 5)])
recombined_df = player_df.merge(game_level_info, on="index")

In [11]:
assert len(recombined_df) == 4 * game_size

In [12]:
y = recombined_df["vp_margin"]
X = sm.add_constant(recombined_df[[f"SCORE{i}" for i in range(2, 10)]])
model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,vp_margin,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,1.505e-12
Date:,"Sun, 05 Apr 2020",Prob (F-statistic):,1.0
Time:,16:57:22,Log-Likelihood:,-129860.0
No. Observations:,40000,AIC:,259700.0
Df Residuals:,39991,BIC:,259800.0
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-1.499e-14,0.353,-4.25e-14,1.000,-0.692,0.692
SCORE2,9.992e-15,0.088,1.13e-13,1.000,-0.173,0.173
SCORE3,8.438e-15,0.087,9.65e-14,1.000,-0.171,0.171
SCORE4,6.661e-16,0.089,7.52e-15,1.000,-0.174,0.174
SCORE5,3.886e-15,0.087,4.44e-14,1.000,-0.171,0.171
SCORE6,3.83e-15,0.088,4.36e-14,1.000,-0.172,0.172
SCORE7,3.442e-15,0.088,3.92e-14,1.000,-0.172,0.172
SCORE8,4.108e-15,0.088,4.66e-14,1.000,-0.173,0.173
SCORE9,5.348e-15,0.088,6.08e-14,1.000,-0.172,0.172

0,1,2,3
Omnibus:,479.309,Durbin-Watson:,2.906
Prob(Omnibus):,0.0,Jarque-Bera (JB):,493.834
Skew:,0.268,Prob(JB):,5.8299999999999995e-108
Kurtosis:,2.903,Cond. No.,27.4


So this is showing zero effect for all score columns, including 9. Doesn't seem to have solved my issue.

Let's check the interaction terms with player numbers. If I set up the simulation right they should have actual coefficients on them

In [13]:
interact_cols = []
for player_num in range(1, 5):
    for score_num in range(2, 10):
        interact = f"player{player_num}_x_score{score_num}"
        interact_cols.append(interact)
        recombined_df[interact] = ((recombined_df["player_num"] == player_num) & (recombined_df[f"SCORE{score_num}"] == 1)).astype(int)

In [14]:
y = recombined_df["vp_margin"]
X = sm.add_constant(recombined_df[interact_cols])
model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,vp_margin,R-squared:,0.489
Model:,OLS,Adj. R-squared:,0.489
Method:,Least Squares,F-statistic:,1195.0
Date:,"Sun, 05 Apr 2020",Prob (F-statistic):,0.0
Time:,16:57:23,Log-Likelihood:,-116440.0
No. Observations:,40000,AIC:,232900.0
Df Residuals:,39967,BIC:,233200.0
Df Model:,32,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-1.131e-13,0.252,-4.48e-13,1.000,-0.495,0.495
player1_x_score2,-1.5135,0.096,-15.786,0.000,-1.701,-1.326
player1_x_score3,2.2408,0.096,23.327,0.000,2.053,2.429
player1_x_score4,2.9565,0.096,30.760,0.000,2.768,3.145
player1_x_score5,-0.6101,0.096,-6.382,0.000,-0.797,-0.423
player1_x_score6,-0.6101,0.096,-6.364,0.000,-0.798,-0.422
player1_x_score7,0.6552,0.096,6.831,0.000,0.467,0.843
player1_x_score8,-1.6740,0.096,-17.424,0.000,-1.862,-1.486
player1_x_score9,0.1431,0.096,1.491,0.136,-0.045,0.331

0,1,2,3
Omnibus:,2.294,Durbin-Watson:,2.504
Prob(Omnibus):,0.318,Jarque-Bera (JB):,2.31
Skew:,-0.017,Prob(JB):,0.315
Kurtosis:,2.983,Cond. No.,22.7


Yup, interaction terms work fine. So the mystery of SCORE9 remains....