In [1]:
import os
os.chdir("..")

In [2]:
import seaborn as sns
import pandas as pd

from sim import run_monte_carlo, params

# Monte Carlo sims

The previous page's data all came from a single simulated season of 100 teams and 100 games. This is insufficient to determine small effects with low but non-zero correlations, so we're going to need stronger tools.

From now on, all data is run across 1000 seasons. Teams and players are regenerated each times.

Let's start by asking: what's the actual correlation between attributes and hit rate? This is a repeat of the previous page's charts, except with 1000x as much data.

In [3]:
def get_correlations(team_df, player_df) -> dict:
    batters = player_df.loc[player_df["is_pitcher"] == False]
    corrs = {}
    for attr in ["goodness"] + [f"attr_{i}" for i in range(4)]:
        corrs[attr] = batters[[attr, 'average']].corr().iat[0, 1]
    return corrs

print("Here are the correlations for each season across 1000 seasons:")
df = run_monte_carlo(get_correlations)
display(df)

print("And the final correlations, averaged across all iterations:")
display(df.mean())

Here are the correlations for each season across 1000 seasons:


Unnamed: 0,goodness,attr_0,attr_1,attr_2,attr_3
0,0.820398,0.729107,0.400817,0.060962,-0.049424
1,0.835669,0.729912,0.373068,0.039231,-0.001411
2,0.834708,0.724292,0.369391,0.116796,0.045185
3,0.808906,0.694124,0.372405,0.160246,0.015655
4,0.822382,0.727843,0.353234,0.092963,0.038340
...,...,...,...,...,...
995,0.832745,0.730756,0.348985,0.147038,0.030493
996,0.832475,0.728904,0.404927,0.088539,0.005551
997,0.832534,0.741229,0.335942,0.158517,-0.025376
998,0.816521,0.709570,0.399886,0.097352,0.039180


And the final correlations, averaged across all iterations:


goodness    0.825394
attr_0      0.729851
attr_1      0.365426
attr_2      0.122071
attr_3     -0.000124
dtype: float64

## Effects of matchmaking windows

The above experiment is going to be run across several matchmaking windows to determine the effects matchmaking has on the stat: attribute correlations.

In [4]:
window_correlations = {}
for window in (2, 6, 20, 100):
    params.MATCHMAKING_SPREAD = window
    df = run_monte_carlo(get_correlations)
    window_correlations[window] = df.mean()

display(pd.DataFrame(window_correlations).T)

Unnamed: 0,goodness,attr_0,attr_1,attr_2,attr_3
2,0.825975,0.730506,0.365681,0.122477,-0.00077
6,0.825718,0.730665,0.364706,0.121656,0.000766
20,0.826659,0.731065,0.366406,0.12239,-0.000752
100,0.825944,0.730275,0.364884,0.123448,0.001022


In [5]:
display(df.loc[100] - df.loc[2])

goodness   -0.010693
attr_0     -0.020992
attr_1     -0.012181
attr_2      0.047178
attr_3      0.008222
dtype: float64