In [None]:
from zamboni.training import ZamboniData, ModelInitializer, Trainer, OneSplitStrategy, SequentialStrategy, ResultsAnalyzer
from zamboni.data_management import ColumnTracker, ZamboniDataManager
import torch

### What is the best way to train a model iteratively over time for maximum predictive power?
The sequential method tested here is trained day by day - each day the model is trained on the previous day's outcome and then predicts today's games. Training this day gives us the realistic performance as if the model was operating in production over that time period. For comparison, it is tested against a model trained on the first 80% of games in our dataset and evaluated on the remaining 20%.

### First set up our data and create our two models

In [None]:
manager = ZamboniDataManager('../data/games_all.parquet')
manager.load_parquet()
all_data = ZamboniData(manager.data)
all_columns = all_data.data.columns.tolist()

In [None]:
column_tracker = ColumnTracker(all_columns)
all_data.column_tracker = column_tracker

In [None]:
model_init = ModelInitializer('data/embed_test_nn', 'EmbeddingNN', column_tracker)
norm_model, norm_optimizer, _, _ = model_init.get_model()
seq_model, seq_optimizer, _, _ = model_init.get_model()

In [None]:
norm_trainer = Trainer(norm_model, norm_optimizer)
seq_trainer = Trainer(seq_model, seq_optimizer)

### Run the basic 80/20 strategy

In [None]:
one_split_strat = OneSplitStrategy(all_data, norm_trainer)
one_split_strat.split_by_percentage(0.8)
min_test_date = min(one_split_strat.test_data.data['datePlayed'])
seq_strat = SequentialStrategy(all_data, seq_trainer)

In [None]:
one_split_trainer, one_split_preds, one_split_labels = one_split_strat.run()
one_split_analyzer = ResultsAnalyzer(one_split_preds, one_split_labels)

### Run the sequential training strategy

In [None]:
seq_trainer, seq_preds, seq_labels = seq_strat.run()
seq_analyzer = ResultsAnalyzer(seq_preds, seq_labels)

In [None]:
print(f'Accuracy of 80/20 model: {one_split_analyzer.get_accuracy().item()*100:.1f}%')
print(f'Accuracy of sequential model: {seq_analyzer.get_accuracy().item()*100:.1f}%')

It seems like the 80/20 model performs better! But wait, remember that this was only evaluated on the last 20% of games after having trained over the first 80%. This is not realistic unless we are fine with not making any predictions for the majority of games. Plus, it begins making predictions after having been fully trained, whereas the sequential model is asked to make its first prediction after not having trained on anything! So for a fair comparison, we should only consider sequential model predictions after the last date that the 80/20 model was trained on.

In [None]:
dates_played_mask = all_data.data['datePlayed'] >= min_test_date # 2024-12-18
dates_played = all_data.data['datePlayed'][dates_played_mask]

In [None]:
dates_played_mask = dates_played_mask.reset_index(drop=True)
seq_analyzer_comp = ResultsAnalyzer(seq_preds[dates_played_mask], seq_labels[dates_played_mask])

In [None]:
print(f'Accuracy of sequential model after max 80/20 training date: {seq_analyzer_comp.get_accuracy().item()*100:.1f}%')

Still not as high as the 80/20 model, but much closer. This could be down to probabilistic fluctuations in how networks are initialized and trained. Or it could be that the sequential training method is less performant. But we can take advantage of the sequential training by pushing the minimum evaluation date even further into the future.

In [None]:
min_test_date = '2025-01-18'

In [None]:
dates_played_mask = all_data.data['datePlayed'] >= min_test_date # 2025-01-18
dates_played = all_data.data['datePlayed'][dates_played_mask]

In [None]:
dates_played_mask = dates_played_mask.reset_index(drop=True)
seq_analyzer_comp = ResultsAnalyzer(seq_preds[dates_played_mask], seq_labels[dates_played_mask])

In [None]:
print(f'Accuracy of sequential model at a later date: {seq_analyzer_comp.get_accuracy().item()*100:.1f}%')

We see that the accuracy continues to increase! So more data is helping our sequential model make better and better predictions!