In this notebook, we attempt to fit a higher-dimensional player static player embedding model, in the hope that this allows us to model player interactions rather than just main effects.  Later, we will try to fit an equivalent recurrent model so that the embeddings update over time through the model rather than through refitting the embeddings.

Our goal for this notebook is to demonstrate that a static multidimensional player embedding model will outperform a logistic regression.  Such a conclusion will allow us to next focus on a temporal multidimensional player embedding model.

Conclusions:  So far, we haven't been able to make this work.  We suspect this has to do with the inputs to sigmoid activation functions getting pushed to extreme enough regions that the gradients become close to 0.

In [1]:
from tennis_new.fetch.tennis_explorer.combiner import read_joined

jd = read_joined()

  if (yield from self.run_code(code, result)):


#### Run Set ELO

Run SetELO first so that we have easy access to training set and validation set and all that

In [2]:
from tennis_new.model.config.elo.global_set_elo import SetELO

set_elo = SetELO()
set_elo.run(jd)

In [3]:
set_elo.validation_evaluation

{'DummyFilter_prediction_AUCMetric': 0.8187031847302881,
 'DummyFilter_prediction_AccuracyMetric': 0.7358520800135314,
 'DummyFilter_prediction_LogLikelihoodMetric': -0.5226366377611569,
 'HasOddsFilter_prediction_AUCMetric': 0.7839029874196454,
 'HasOddsFilter_prediction_AccuracyMetric': 0.7056423354253945,
 'HasOddsFilter_prediction_LogLikelihoodMetric': -0.5594758958654537,
 'DummyFilter_odds_implied_probability_AUCMetric': None,
 'DummyFilter_odds_implied_probability_AccuracyMetric': None,
 'DummyFilter_odds_implied_probability_LogLikelihoodMetric': None,
 'HasOddsFilter_odds_implied_probability_AUCMetric': 0.7937506478103871,
 'HasOddsFilter_odds_implied_probability_AccuracyMetric': 0.7114980299325661,
 'HasOddsFilter_odds_implied_probability_LogLikelihoodMetric': -0.5501844612492598}

#### Define Logit Training X, y

Now we'll need to create a sparse dataset for the logistic regression.  We'll start by making sure we have the right date filtering.  Recall that for ELO models, our training data is the full date range.  We'll have to manually cut the dates for our logit model.

In [4]:
elo_training_set = set_elo.training_filter.filter_data(jd)
elo_validation_set = set_elo.validation_filter.filter_data(set_elo.all_jd)
elo_test_set = set_elo.test_filter.filter_data(set_elo.all_jd)
logit_training_set = elo_training_set[
    elo_training_set['date'] < elo_validation_set['date'].min()
].copy()
(
    (logit_training_set['date'].min(), logit_training_set['date'].max()),
    (elo_validation_set['date'].min(), elo_validation_set['date'].max()),
    (elo_test_set['date'].min(), elo_test_set['date'].max())
)

(('1997-01-01', '2010-12-31'),
 ('2011-01-01', '2014-12-31'),
 ('2015-01-01', '2020-12-21'))

#### Mess Around with Pytorch DataLoaders

In [5]:
import pandas as pd

all_players = pd.concat([
    logit_training_set[['p1_link', 'date']].rename(columns={'p1_link': 'pid'}).drop_duplicates('pid', keep='first'),
    logit_training_set[['p2_link', 'date']].rename(columns={'p2_link': 'pid'}).drop_duplicates('pid', keep='first')
]).sort_values('date', ascending=True)['pid'].drop_duplicates(keep='first')
player_map = dict(enumerate(all_players))
inv_player_map = {v: k for k, v in player_map.items()}

In [6]:
torch_training_set = logit_training_set[[
    'p1_link',
    'p2_link',
    'p1_sets_won',
    'p2_sets_won'
]].copy()
torch_training_set['p1_id'] = torch_training_set['p1_link'].map(inv_player_map)
torch_training_set['p2_id'] = torch_training_set['p2_link'].map(inv_player_map)

In [7]:
import torch

torch_validation_set = elo_validation_set[
    elo_validation_set['p1_link'].isin(torch_training_set['p1_link']) &
    elo_validation_set['p2_link'].isin(torch_training_set['p2_link']) &
    (elo_validation_set['date'] < '2012-01-01')
].copy()

torch_val_X = torch.from_numpy(
    pd.DataFrame({
        'p1_id': torch_validation_set['p1_link'].map(inv_player_map),
        'p2_id': torch_validation_set['p2_link'].map(inv_player_map)
    }).values
)

In [8]:
# Calculate embedding size
N_PLAYERS = torch_training_set[['p1_id', 'p2_id']].max().max() + 1
N_PLAYERS

26545

In [9]:
import numpy as np

class MyDataSet(torch.utils.data.Dataset):
    def __init__(self, X, w1, w2):
        self.X = torch.from_numpy(X.values)
        self.w1 = torch.from_numpy(w1.values.astype(np.float32))
        self.w2 = torch.from_numpy(w2.values.astype(np.float32))
        
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self, idx):
        return self.X[idx], self.w1[idx], self.w2[idx]

batch_size = 1024
train_ds = MyDataSet(
    torch_training_set[['p1_id', 'p2_id']], 
    torch_training_set['p1_sets_won'],
    torch_training_set['p2_sets_won']
)
train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, shuffle=True)

In [20]:
class TorchRunner(object):
    
    def __init__(
        self,
        train_dl,
        validation_set=None,
        test_set=None,
    ):
        self.train_dl = train_dl
        self.n_epochs = 0
        self.epoch_loss = 0
        self.validation_set = validation_set
        self.test_set = test_set
        self.training_initialized=False
        self.model = None
        
        
    def on_epoch_end(self):
        # Callback to do at the end of an epoch
        pass
    
    def on_minibatch_end(self):
        # Callback to perform at end of a minibatch
        pass
   
    @property
    def model_cls(self):
        raise NotImplementedError()

    @property
    def model_kwargs(self):
        return {}

    @property
    def optimizer_cls(self):
        raise NotImplementedError()

    @property
    def optimizer_kwargs(self): 
        return {}

    @property
    def loss_criterion(self):
        raise NotImplementedError()
    
    def loss(self, minibatch_data):
        raise NotImplementedError()

    def init_training(self):
        # Instantiate model and optimizer
        self.training_initialized = True
        self.model = self.model_cls(**self.model_kwargs)
        self.optimizer = self.optimizer_cls(self.model.parameters(), **self.optimizer_kwargs)
        
    def train(self, n_epochs):
        if not self.training_initialized:
            self.init_training()
        for epoch in range(n_epochs):
            self.epoch_loss = 0
            for minibatch_data in self.train_dl:
                self.optimizer.zero_grad()
                loss = self.loss(minibatch_data)
                self.epoch_loss += loss.item()
                loss.backward()
                self.optimizer.step()
                self.on_minibatch_end()
            self.n_epochs += 1
            self.on_epoch_end()

In [21]:
torch.cat([embedded[:, 0, :], embedded[:, 1, :]], 1).shape

torch.Size([3, 64])

In [87]:
def centered_sigmoid(x):
    return torch.sigmoid(x) - 0.5


class MultiDimModel(torch.nn.Module):
    def __init__(self, n_players):
        super(MultiDimModel, self).__init__()
        self.main_effect_embedding = torch.nn.Embedding(n_players, 2)

        # Interaction
        self.interaction_embedding = torch.nn.Embedding(n_players, 3)
        self.sigmoid = torch.nn.Sigmoid()
        self.linear1 = torch.nn.Linear(3, 3, bias=False)
        # self.linear2 = torch.nn.Linear(3, 3, bias=False)
        # self.linear3 = torch.nn.Linear(3, 3, bias=False)
        self.interaction_output = torch.nn.Linear(3, 1, bias=False)

    def main_effect(self, x):
        main_effect_embedded = self.main_effect_embedding(x)
        main_effect_diff = main_effect_embedded[:, 0, 0] - main_effect_embedded[:, 1, 0]
        return main_effect_diff

    def interaction_score(self, x):
        interaction_embedded = self.interaction_embedding(x)
        interaction_embedding_diff = interaction_embedded[:, 0, :] - interaction_embedded[:, 1, :] 
        hidden1 = centered_sigmoid(self.linear1(interaction_embedding_diff))
        # hidden2 = centered_sigmoid(self.linear2(hidden1))
        # hidden3 = centered_sigmoid(self.linear3(hidden2))
        interaction_score = self.interaction_output(hidden1)[:, 0]
        return interaction_score
        
    def forward(self, x):
        main_effect = self.main_effect(x)
        interaction_score = self.interaction_score(x)
        return torch.sigmoid(main_effect + interaction_score)

In [91]:
?torch.optim.RMSprop

In [209]:
INITIAL_LR = 0.001

class MyLogitFitter(TorchRunner):

    def __init__(self, *args, **kwargs):
        super(MyLogitFitter, self).__init__(*args, **kwargs)
        self.last_epoch_loss = 9999999999999
        self.lr = INITIAL_LR
        
    @property
    def model_cls(self):
        return MultiDimModel 

    @property
    def model_kwargs(self):
        return {
            'n_players': N_PLAYERS 
        }

    @property
    def loss_criterion(self):
        return torch.nn.BCELoss(reduction='none')
    
    @property
    def optimizer_cls(self):
        return torch.optim.RMSprop

    @property
    def optimizer_kwargs(self):
        return {
            'lr': INITIAL_LR,
            'momentum': 0.9
        }

    def on_epoch_end(self):
        val_preds = self.model(self.validation_set)
        accuracy = (val_preds.detach().numpy() > 0.5).mean()
        print("Iteration: {}, Loss: {}, Accuracy: {}.".format(self.n_epochs, self.epoch_loss, accuracy))
        if self.epoch_loss > self.last_epoch_loss:  # If training loss is getting worse, halve learning rate
            self.lr /= 10
            print("Reducing learning rate to %0.4f" % self.lr)
            for pg in self.optimizer.param_groups:
                pg['lr'] = self.lr
        self.last_epoch_loss = self.epoch_loss

        interaction_embedded = self.model.interaction_embedding(self.validation_set)
        interaction_embedding_diff = interaction_embedded[:, 0, :] - interaction_embedded[:, 1, :] 
        hidden1 = centered_sigmoid(self.model.linear1(interaction_embedding_diff))
        print("Hidden Interaction Layer Values")
        print(pd.Series(hidden1.detach().numpy()[:, 0]).describe())
        
    def loss(self, minibatch_data):
        # Set-Weighted Loss
        X, w1, w2 = minibatch_data
        outputs = self.model(X)
        y_1 = torch.from_numpy(np.ones(X.shape[0], dtype=np.float32))
        y_2 = torch.from_numpy(np.zeros(X.shape[0], dtype=np.float32))
        loss_1 = torch.mean(torch.mul(w1, self.loss_criterion(outputs, y_1)))
        loss_2 = torch.mean(torch.mul(w2, self.loss_criterion(1. - outputs, y_2)))
        loss = loss_1 + loss_2
        return loss

In [210]:
logit_fitter = MyLogitFitter(train_dl, validation_set=torch_val_X)

In [212]:
logit_fitter.train(20)

Iteration: 21, Loss: 489.70415127277374, Accuracy: 0.7092137087573587.
Hidden Interaction Layer Values
count    43826.000000
mean        -0.141610
std          0.337391
min         -0.500000
25%         -0.458364
50%         -0.253036
75%          0.158322
max          0.499988
dtype: float64
Iteration: 22, Loss: 488.1647711992264, Accuracy: 0.7082553735225665.
Hidden Interaction Layer Values
count    43826.000000
mean        -0.143072
std          0.346877
min         -0.500000
25%         -0.467885
50%         -0.268834
75%          0.171397
max          0.499997
dtype: float64
Iteration: 23, Loss: 486.6281443834305, Accuracy: 0.7095103363300324.
Hidden Interaction Layer Values
count    43826.000000
mean        -0.146467
std          0.354266
min         -0.500000
25%         -0.476319
50%         -0.286953
75%          0.180762
max          0.499998
dtype: float64
Iteration: 24, Loss: 485.2019747495651, Accuracy: 0.7088942636790946.
Hidden Interaction Layer Values
count    43826.000

In [213]:
interaction_scores = logit_fitter.model.interaction_score(torch_val_X)
main_effects = logit_fitter.model.main_effect(torch_val_X)

In [226]:
torch_validation_set['interaction_score'] = interaction_scores.detach().numpy()
torch_validation_set['main_effect'] = main_effects.detach().numpy()
# torch_validation_set['interaction_score'] -= torch_validation_set['interaction_score'].mean()
# torch_validation_set['main_effect'] += torch_validation_set['interaction_score'].mean()

In [230]:
mean_interaction_score = torch_validation_set['interaction_score'].mean()
mean_main_effect = torch_validation_set['main_effect'].mean()
torch_validation_set['interaction_score'] -= mean_interaction_score
torch_validation_set['main_effect'] += mean_interaction_score

In [235]:
torch_validation_set['p1_id'] = torch_validation_set['p1_link'].map(inv_player_map)
torch_validation_set['p2_id'] = torch_validation_set['p2_link'].map(inv_player_map)

In [236]:
fed_wins = torch_validation_set[torch_validation_set['p1_link'] == '/player/nadal/']
fed_losses = torch_validation_set[torch_validation_set['p2_link'] == '/player/nadal/'].copy()
fed_losses['interaction_score'] *= -1
fed_losses['main_effect'] *= -1
together = pd.concat([
    fed_wins,
    fed_losses
])
together.sort_values('interaction_score', ascending=True)[[
    'p1_name',
    'p2_name',
    'interaction_score',
    'main_effect',
    'p1_id',
    'p2_id'
]].drop_duplicates('p2_name').head(40)

Unnamed: 0,p1_name,p2_name,interaction_score,main_effect,p1_id,p2_id
446106,Nadal R.,Federer R.,-0.882128,0.435751,1552,836
484375,Nadal R.,Murray A.,-0.381895,0.898965,1552,3325
497140,Nadal R.,Del Potro J.,-0.153313,0.185482,1552,8879
448585,Nadal R.,Cilic M.,-0.151081,0.705787,1552,9919
465496,Nadal R.,Andujar P.,-0.150477,1.675149,1552,3542
482954,Nadal R.,Golubev A.,0.031765,1.152369,1552,8176
486013,Nadal R.,Tsonga J.,0.086151,0.491189,1552,2167
459316,Nadal R.,Monfils G.,0.172508,0.776197,1552,6993
466052,Nadal R.,Ljubicic I.,0.17506,1.280476,1552,200
496334,Federer R.,Nadal R.,0.259354,-0.705731,836,1552


TODO: Figure out how to get multidim to work!  It's got to eventually!