# Applying `elo-grad` to NBA data

This notebook provides example usage of applying the `elo-grad` package to NBA data. This is based on this example notebook in the [kickscore](https://github.com/lucasmaystre/kickscore) package.

In [1]:
import datetime

import pandas as pd

from elo_grad import EloEstimator

## Getting the data

**NOTE:** the below will not run on Windows machines. Manually download the CSV if that is the case.

In [2]:
! ! test -f ./data/nba_elo.csv && curl https://projects.fivethirtyeight.com/nba-model/nba_elo.csv -o ./data/nba_elo.csv

## Processing the data

In [3]:
df = pd.read_csv(
    './data/nba_elo.csv',
    usecols=['date', 'team1', 'team2', 'score1', 'score2'],
    dtype={'team1': 'category', 'team2': 'category'},
    parse_dates=['date'],
).sort_index()
# There are some duplicates here (with respect to date/team1/team2)
df = df.drop_duplicates(subset=['date', 'team1', 'team2'], keep='first').set_index('date')
df['t'] = df.index.astype(int)  # Convert to Unix timestamp
print(f'We have {df.shape[0]:,} matches.')
df.head()

We have 73,358 matches.


Unnamed: 0_level_0,team1,team2,score1,score2,t
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1946-11-01,TRH,NYK,66,68,-731116800000000000
1946-11-02,PRO,BOS,59,53,-731030400000000000
1946-11-02,STB,PIT,56,51,-731030400000000000
1946-11-02,CHS,NYK,63,47,-731030400000000000
1946-11-02,DTF,WSC,33,50,-731030400000000000


Quick check for nulls.

In [4]:
df.isnull().sum()

team1     0
team2     0
score1    0
score2    0
t         0
dtype: int64

Assign a *result* field, based on the score.

In [5]:
df['result'] = (df['score1'] > df['score2']).astype(int)
df.head()

Unnamed: 0_level_0,team1,team2,score1,score2,t,result
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1946-11-01,TRH,NYK,66,68,-731116800000000000,0
1946-11-02,PRO,BOS,59,53,-731030400000000000,1
1946-11-02,STB,PIT,56,51,-731030400000000000,1
1946-11-02,CHS,NYK,63,47,-731030400000000000,1
1946-11-02,DTF,WSC,33,50,-731030400000000000,0


## Train/val split

Split data into training and validation sets.

In [6]:
split_date = datetime.date(2022, 1, 1)

cols = ['team1', 'team2', 'result', 't']

X_train, X_val = df.loc[:split_date, cols].set_index('t'), df.loc[split_date:, cols].set_index('t')
X_train['t'] = X_train.index
X_val['t'] = X_val.index

## Benchmark

Let's do a performance benchmark.

In [7]:
%%timeit -n 20
elo = EloEstimator(entity_cols=('team1', 'team2'), score_col='result', beta=200, k_factor=20, default_init_rating=1500)

elo.transform(X_train)

376 ms ± 43.7 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)


## Get predictions/ratings

Use the `transform` method to calculate expected scores/ratings.

In [8]:
elo = EloEstimator(entity_cols=('team1', 'team2'), score_col='result', beta=200, k_factor=20, default_init_rating=1500)

preds = elo.transform(X_train)
sorted(elo.model.ratings.items(), key=lambda item: item[1][1], reverse=True)[:10]

[('NYA', (200880000000000000, 1752.0559323398625)),
 ('PHO', (1640995200000000000, 1724.2373346063296)),
 ('DNA', (200880000000000000, 1721.998076830764)),
 ('SAA', (199238400000000000, 1697.53730433188)),
 ('UTA', (1640995200000000000, 1682.0141963706546)),
 ('GSW', (1640995200000000000, 1678.9729618061392)),
 ('MIL', (1640995200000000000, 1673.7471552660552)),
 ('PHW', (-244166400000000000, 1668.484710295501)),
 ('KEN', (199584000000000000, 1657.4588446037305)),
 ('OAK', (-6825600000000000, 1656.4269126889053))]