## What is the relationship between ELO ratings and winning percentage?

The simulator was implemented using 538's ELO ratings as the team ratings and computing winning percentages from differences in ratings. For compatibility, I am not changing that even as I switch to MLB as a data source. Instead, I will compute ELO-scaled ratings from projected winning percentages. The goal here is to examine the relationship between ELO and predicted winning percentage, and essentially reverse the calculation (by brute force) to generate the ELO-scaled ratings.

In [1]:
import pandas as pd
import series_probs_compute as probs
import plotly.express as px
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [2]:
probs.p_from_diff(10)

0.5143871841659987

In [3]:
{d: probs.p_from_diff(d) for d in range(-50, 50)}

{-50: 0.4285368825916186,
 -49: 0.42994717641601177,
 -48: 0.43135860811633236,
 -47: 0.4327711556570478,
 -46: 0.43418479693076684,
 -45: 0.4355995097595788,
 -44: 0.43701527189640416,
 -43: 0.43843206102635396,
 -42: 0.4398498547680971,
 -41: 0.44126863067523897,
 -40: 0.44268836623770724,
 -39: 0.4441090388831469,
 -38: 0.44553062597832394,
 -37: 0.4469531048305375,
 -36: 0.4483764526890395,
 -35: 0.449800646746463,
 -34: 0.4512256641402582,
 -33: 0.45265148195413535,
 -32: 0.4540780772195163,
 -31: 0.4555054269169921,
 -30: 0.4569335079777882,
 -29: 0.45836229728523653,
 -28: 0.45979177167625435,
 -27: 0.46122190794282847,
 -26: 0.4626526828335072,
 -25: 0.4640840730548977,
 -24: 0.46551605527316847,
 -23: 0.46694860611555894,
 -22: 0.46838170217189273,
 -21: 0.469815319996098,
 -20: 0.4712494361077314,
 -19: 0.4726840269935071,
 -18: 0.4741190691088309,
 -17: 0.47555453887933863,
 -16: 0.4769904127024377,
 -15: 0.47842666694885455,
 -14: 0.47986327796418354,
 -13: 0.48130022207044

In [4]:
rng = range(-100, 100)
df = pd.DataFrame([probs.p_from_diff(d) for d in rng], index=rng, columns=['wpct'])
df

Unnamed: 0,wpct
-100,0.359935
-99,0.361262
-98,0.362592
-97,0.363923
-96,0.365257
...,...
95,0.633408
96,0.634743
97,0.636077
98,0.637408


In [5]:
px.scatter(df, trendline="ols")

In [6]:
# Use LR to predict elo from projected wp
y = df.reset_index()['index']
X = df[['wpct']]
model = LinearRegression().fit(X, y)
model.coef_, model.intercept_

(array([706.26879653]), -353.13978057104595)

In [7]:
# OK, so that means every 10 points of winning percentage equate to about 7 points of ELO