# Bradley Terry example notebook
What you will find in this notebook examples of using skpref:

* for setting up the modelling task based framework
* to fit a classifier that's being read in from scikit-learn on the same problem which in the background uses reduction and aggregation methods.
* to fit a Bradley-Terry model with and without covariates on the pairwise comparison data of basketball matches.
* for applying the GridSearch technique for model selection

In [1]:
# Optionally change the theme of the notebook to dark
# from jupyterthemes.stylefx import set_nb_theme
# set_nb_theme('chesterish')

In [2]:
# Import skpref modules
import sys
sys.path.insert(0, "../..")
from skpref.random_utility import BradleyTerry
from skpref.task import PairwiseComparisonTask
from skpref.base import ClassificationReducer
from skpref.model_selection import GridSearchCV
from skpref.utils import nice_print_results

# Import scikit-learn packages to be used in tandem with skpref architecture
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score

# Import other useful packages
import pandas as pd
import numpy as np

# Reading in the data
The example dataset will be matches played by NBA teams, we will use the 2016 season's matches to predict the results of the 2017 matches. The dataset contains:

- a column for team1 and team2 indicating the two teams that have played each other
- season_start, which indicates which season the match belongs to
- team1_wins takes the value of 1 if the team in column team1 win the match, 0 if they lost (there are no ties in basketball)
- team_1_home takes the value of 1 if team1 was playing in their home court 0 if they were paying away (no neutral courts in the NBA)

In [3]:
NBA_results = pd.read_csv('data/NBA_matches.csv')
NBA_results.head()

Unnamed: 0,team1,team2,season_start,team1_wins,team_1_home
0,Atlanta Hawks,Toronto Raptors,2014,0,0
1,Atlanta Hawks,Indiana Pacers,2014,1,1
2,Atlanta Hawks,San Antonio Spurs,2014,0,0
3,Atlanta Hawks,Charlotte Hornets,2014,0,0
4,Atlanta Hawks,New York Knicks,2014,1,1


In [4]:
NBA_results.tail()

Unnamed: 0,team1,team2,season_start,team1_wins,team_1_home
9835,Washington Wizards,Houston Rockets,2017,0,0
9836,Washington Wizards,Cleveland Cavaliers,2017,0,0
9837,Washington Wizards,Atlanta Hawks,2017,0,1
9838,Washington Wizards,Boston Celtics,2017,1,1
9839,Washington Wizards,Orlando Magic,2017,0,0


In [5]:
season_split = 2016
train_data = NBA_results[NBA_results.season_start == season_split].copy()
test_data = NBA_results[NBA_results.season_start == season_split+1].copy()

We will also use team salary data as covariates in the model later, with the idea being that a team that has more money to pay to their athletes has an advantage over other teams, by having a better chance to attract the top talent in the league.

In [6]:
NBA_team_salary_budget = pd.read_csv('data/team_salary_budgets.csv')
NBA_team_salary_budget.head()

Unnamed: 0,team,season_start,salary
0,Atlanta Hawks,2014,58337671
1,Atlanta Hawks,2015,71378126
2,Atlanta Hawks,2016,95957250
3,Atlanta Hawks,2017,99375302
4,Boston Celtics,2014,59418142


# Setting up the tasks

We set up the preference learning task by using the PairwiseComparisonTask object in skpref. This is the only extra step which might be a completely new concept to seasoned scikit-learn users. Once the task is specified, say in this case a pairwise comparison task, for any models applied in skpref, whether that is a reduction via scikit-learn or even a model that is not a pairwise comparison model, the package will know that the problem itself is a pairwise comparison problem and can perform reduction and aggregation adequately in the background when needed.

In this example the PairwiseComparisonTask has the following components:

- primary_table: the table that contains the observed preferences
- primary_table_alternatives_names: the column or columns that contain the alternatives, in this case both columns team1 and team2 contain alternatives
- primary_table_target_name: the column that indicates the result of the pairwise comparison
- target_column_correspondence: in the case of pairwise comparisons, when the alternatives are split across two columns, the column indicating the result usually takes the form 1/0 to show whether one of the columns, in our case team1 or team2 has been preferred. So in this column the user indicates that when the team1_wins column takes the value 1 that means that the alternative in the column team1 has won.
- features_to_use: indicates which columns to use as covariates

In [7]:
NBA_results_task_train_LR = PairwiseComparisonTask(
    primary_table=train_data,
    primary_table_alternatives_names=['team1', 'team2'],
    primary_table_target_name ='team1_wins',
    target_column_correspondence='team1',
    features_to_use=['team_1_home']
)

# For the test task, it's possible to make a copy of the training task and
# update the primary table
NBA_results_task_predict_LR = PairwiseComparisonTask(
    primary_table=test_data,
    primary_table_alternatives_names=['team1', 'team2'],
    primary_table_target_name ='team1_wins',
    target_column_correspondence='team1',
    features_to_use=['team_1_home']
)

# Fitting a Logistic Regression
The only covariate we will use in this for now will be the team_1_home column, which should return a method that only learns what the home team advantage was on average, which is the equivalent to fitting a logistic regression where whether team1 is playing home or not is the only covariate.

$P(\texttt{team1}\_\texttt{wins}=1) = logit(\alpha + \beta_1 \texttt{team}\_\texttt{1}\_\texttt{home})$

In [8]:
my_log_red = ClassificationReducer(LogisticRegression(solver='lbfgs'))
my_log_red.fit_task(NBA_results_task_train_LR)
preds = my_log_red.predict_task(NBA_results_task_predict_LR)

In [9]:
# predict_task returns a SubsetPosetVector which has the attributes
# top_input_data and boot_input_data corresponding to chosen and not chosen 
# alternatives.
preds.top_input_data, preds.boot_input_data

(array(['Dallas Mavericks', 'Charlotte Hornets', 'Brooklyn Nets', ...,
        'Washington Wizards', 'Washington Wizards', 'Orlando Magic'],
       dtype=object),
 array(['Atlanta Hawks', 'Atlanta Hawks', 'Atlanta Hawks', ...,
        'Atlanta Hawks', 'Boston Celtics', 'Washington Wizards'],
       dtype=object))

In [10]:
NBA_results_task_predict_LR.primary_table.head()

Unnamed: 0,team1,team2,season_start,team1_wins,team_1_home
7380,Atlanta Hawks,Dallas Mavericks,2017,1,0
7381,Atlanta Hawks,Charlotte Hornets,2017,0,0
7382,Atlanta Hawks,Brooklyn Nets,2017,0,0
7383,Atlanta Hawks,Miami Heat,2017,0,0
7384,Atlanta Hawks,Chicago Bulls,2017,0,0


In [11]:
NBA_results_task_predict_LR.primary_table.tail()

Unnamed: 0,team1,team2,season_start,team1_wins,team_1_home
9835,Washington Wizards,Houston Rockets,2017,0,0
9836,Washington Wizards,Cleveland Cavaliers,2017,0,0
9837,Washington Wizards,Atlanta Hawks,2017,0,1
9838,Washington Wizards,Boston Celtics,2017,1,1
9839,Washington Wizards,Orlando Magic,2017,0,0


In [12]:
# All this learns so far is the home team advantage, since its the only 
# covariate in the test_data table
nice_print_results(
    my_log_red.predict_proba_task(NBA_results_task_predict_LR,
                                  outcome=['Dallas Mavericks', 'Atlanta Hawks']))

Dallas Mavericks  [0.58 0.   0.   ... 0.   0.   0.  ]
Atlanta Hawks     [0.42 0.42 0.42 ... 0.42 0.   0.  ]


In [13]:
nice_print_results(
    my_log_red.predict_proba_task(NBA_results_task_predict_LR,
                                  column=['team1', 'team2'])
)

team1 is preferred  [0.42 0.42 0.42 ... 0.58 0.58 0.42]
team2 is preferred  [0.58 0.58 0.58 ... 0.42 0.42 0.58]


## Fitting a Bradley Terry model
As we can see in the example above the logistic regression approach does not learn different probabilities for a team winning or losing based on which other team they are playing. The Dallas Mavericks could be playing against the strongest or weakest team in the league and their estimated probability of winning would be the same. The difference between the Bradley-Terry model and logistic regression is that Bradley-Terry learns a function that can estimate whether each team will win or lose given the other team they are playing.

The task we will use for Bradley-Terry will be defined in a slightly different way, because in the first demo we won't use any covariates, therefore we define features_to_use=None

In the Bradley-Terry model each team gets a latent strength parameter $\lambda_{\text{team}}$, for example $\lambda_{\text{Atlanta Hawks}}$.

The Bradley-Terry model learns these strength parameters to maximise the likelihood according to the following formulation for observation $i$:
$$P(\texttt{team1}\_\texttt{wins}=1)_i= \frac{e^{\lambda_{\texttt{team1}_i}}}{e^{\lambda_{\texttt{team1}_i}} + e^{\lambda_{\texttt{team2}_i}}}$$

In [14]:
NBA_results_task_train_BT = PairwiseComparisonTask(
    primary_table=train_data,
    primary_table_alternatives_names=['team1', 'team2'],
    primary_table_target_name ='team1_wins',
    target_column_correspondence='team1',
    features_to_use=None
)

NBA_results_task_predict_BT = PairwiseComparisonTask(
    primary_table=test_data,
    primary_table_alternatives_names=['team1', 'team2'],
    primary_table_target_name ='team1_wins',
    target_column_correspondence='team1',
    features_to_use=None
)

In [15]:
# Fitting Bradley Terry model
mybt = BradleyTerry(method='BFGS', alpha=1e-5)
mybt.fit_task(NBA_results_task_train_BT)

In [16]:
mybt.params_

Unnamed: 0,entity,learned_strength
0,Atlanta Hawks,0.047522
1,Boston Celtics,0.580896
2,Brooklyn Nets,-1.178393
3,Charlotte Hornets,-0.278154
4,Chicago Bulls,-0.037967
5,Cleveland Cavaliers,0.489737
6,Dallas Mavericks,-0.386261
7,Denver Nuggets,-0.040408
8,Detroit Pistons,-0.225709
9,Golden State Warriors,1.538386


We can use the latent alternative strength parameters that Bradley-Terry models learn to rank the teams, either by sorting the mybt.params_ DataFrame by the learned_strength parameter, or by running the rank_entities function

In [17]:
mybt.rank_entities(ascending=False)

['Golden State Warriors',
 'San Antonio Spurs',
 'Houston Rockets',
 'Boston Celtics',
 'Los Angeles Clippers',
 'Utah Jazz',
 'Cleveland Cavaliers',
 'Toronto Raptors',
 'Washington Wizards',
 'Oklahoma City Thunder',
 'Memphis Grizzlies',
 'Atlanta Hawks',
 'Portland Trail Blazers',
 'Milwaukee Bucks',
 'Indiana Pacers',
 'Miami Heat',
 'Chicago Bulls',
 'Denver Nuggets',
 'Detroit Pistons',
 'Charlotte Hornets',
 'New Orleans Pelicans',
 'Dallas Mavericks',
 'Sacramento Kings',
 'Minnesota Timberwolves',
 'New York Knicks',
 'Orlando Magic',
 'Philadelphia 76ers',
 'Los Angeles Lakers',
 'Phoenix Suns',
 'Brooklyn Nets']

In [18]:
# we can create the probability for each team winning in a specific observaion,
nice_print_results(
    mybt.predict_proba_task(NBA_results_task_predict_BT,
                            outcome=['Atlanta Hawks', 'Washington Wizards'])
)

Atlanta Hawks       [0.61 0.58 0.77 ... 0.42 0.   0.  ]
Washington Wizards  [0.   0.   0.   ... 0.58 0.45 0.73]


In [19]:
nice_print_results(
    mybt.predict_proba_task(NBA_results_task_predict_BT,
                            column=['team1', 'team2'])
)

team1 is preferred  [0.61 0.58 0.77 ... 0.58 0.45 0.73]
team2 is preferred  [0.39 0.42 0.23 ... 0.42 0.55 0.27]


In [20]:
mybt.predict_choice_task(NBA_results_task_predict_BT)

array(['Atlanta Hawks', 'Atlanta Hawks', 'Atlanta Hawks', ...,
       'Washington Wizards', 'Boston Celtics', 'Washington Wizards'],
      dtype=object)

In [21]:
preds = mybt.predict_task(NBA_results_task_predict_BT)

In [22]:
preds.top_input_data, preds.boot_input_data

(array(['Atlanta Hawks', 'Atlanta Hawks', 'Atlanta Hawks', ...,
        'Washington Wizards', 'Boston Celtics', 'Washington Wizards'],
       dtype=object),
 array(['Dallas Mavericks', 'Charlotte Hornets', 'Brooklyn Nets', ...,
        'Atlanta Hawks', 'Washington Wizards', 'Orlando Magic'],
       dtype=object))

## Augmenting the models with covariates
In this section we will start introducing more covariates in the models above, we will introduce one additional covariate which is the team salary budget. We can also see how we can define a single task which we can use to run different models in skpref.

In [23]:
NBA_results_task_train = PairwiseComparisonTask(
    primary_table=train_data,
    primary_table_alternatives_names=['team1', 'team2'],
    primary_table_target_name ='team1_wins',
    target_column_correspondence='team1',
    features_to_use=['salary', 'team1_home'],
    secondary_table=NBA_team_salary_budget,
    secondary_to_primary_link={
        'team': ['team1', 'team2'],
        'season_start': 'season_start'
    })

NBA_results_task_predict = PairwiseComparisonTask(
    primary_table=test_data,
    primary_table_alternatives_names=['team1', 'team2'],
    primary_table_target_name ='team1_wins',
    target_column_correspondence='team1',
    features_to_use=['salary', 'team1_home'],
    secondary_table=NBA_team_salary_budget,
    secondary_to_primary_link={
        'team': ['team1', 'team2'],
        'season_start': 'season_start'
    })

# Reduction to logistic regression with covariates
Here we fit a logistic regression on three covariates, whether team 1 is playing home or not, team1's salary budget and team2's salary budget.
$P(\texttt{team1}\_\texttt{wins}=1) = logit(\alpha + \beta_1 \texttt{team}\_\texttt{1}\_\texttt{home} + \beta_2 \texttt{team1}\_\texttt{salary} + \beta_3 \texttt{team2}\_\texttt{salary})$

In [24]:
my_log_red = ClassificationReducer(LogisticRegression(solver='lbfgs'))
my_log_red.fit_task(NBA_results_task_train)
preds = my_log_red.predict_task(NBA_results_task_predict)

In [25]:
# We can investigate the internal table that was fed into LogisticRegression.fit()
my_log_red.model_input.head(7)

Unnamed: 0,team1_wins,team_1_home,salary_team1,salary_team2
0,1,0,99375302,85753772
1,0,0,99375302,117228164
2,0,0,99375302,95964560
3,0,0,99375302,129458084
4,0,0,99375302,89524016
5,0,1,99375302,107015203
6,0,1,99375302,115375243


In [26]:
# We can also investigate the coefficients which were learned
my_log_red.model.coef_ 

array([[ 5.35210228e-15,  1.54775613e-08, -1.54775613e-08]])

We can see that the coefficients learned for $\beta_2$ and $\beta_3$ are very similar to each other, just opposite signs. ClassificationReducer allows users the option to take the difference in features directly rather than split them out, effectively learning the following model:
$P(\texttt{team1}\_\texttt{wins}=1) = logit(\alpha + \beta_1 \texttt{team}\_\texttt{1}\_\texttt{home} + \beta_2 (\texttt{team1}\_\texttt{salary} - \texttt{team2}\_\texttt{salary}))$

In [27]:
my_log_red = ClassificationReducer(
    LogisticRegression(solver='lbfgs'),
    take_feature_diff_for_pairwise_comparison=True
)
my_log_red.fit_task(NBA_results_task_train)
preds = my_log_red.predict_task(NBA_results_task_predict)

In [28]:
my_log_red.model_input.head(7)

Unnamed: 0,team1_wins,team_1_home,salary_diff
0,1,0,13621530
1,0,0,-17852862
2,0,0,3410742
3,0,0,-30082782
4,0,0,9851286
5,0,1,-7639901
6,0,1,-15999941


In [29]:
my_log_red.model.coef_

array([[2.67602286e-15, 1.54775613e-08]])

In [30]:
preds.top_input_data, preds.boot_input_data

(array(['Atlanta Hawks', 'Charlotte Hornets', 'Atlanta Hawks', ...,
        'Washington Wizards', 'Washington Wizards', 'Washington Wizards'],
       dtype=object),
 array(['Dallas Mavericks', 'Atlanta Hawks', 'Brooklyn Nets', ...,
        'Atlanta Hawks', 'Boston Celtics', 'Orlando Magic'], dtype=object))

In [31]:
# All this learns so far is the home team advantage, since its the only 
# covariate in the test_data table
nice_print_results(
    my_log_red.predict_proba_task(NBA_results_task_predict,
                                  column='team1')
)

team1 is preferred  [0.55 0.43 0.51 ... 0.59 0.53 0.61]


# Bradley Terry model with salary covariate

Here we augment the initial Bradley-Terry model to learn the following relationship:

$$P(\texttt{team1}\_\texttt{wins}=1)_i= \frac{e^{(\lambda_{\texttt{team1}_i} + \beta_1 \texttt{team1}\_\texttt{salary}_i)}}{e^{(\lambda_{\texttt{team1}_i} + \beta_1 \texttt{team1}\_\texttt{salary}_i)} + e^{(\lambda_{\texttt{team2}_i}+ \beta_1 \texttt{team2}\_\texttt{salary}_i)}}$$

In [32]:
mybt = BradleyTerry(method='BFGS', alpha=1e-5)
mybt.fit_task(NBA_results_task_train)
mybt.rank_entities(ascending=False)

array(['Golden State Warriors', 'San Antonio Spurs', 'Houston Rockets',
       'Utah Jazz', 'Boston Celtics', 'Oklahoma City Thunder',
       'Washington Wizards', 'Toronto Raptors', 'Los Angeles Clippers',
       'Denver Nuggets', 'Atlanta Hawks', 'Indiana Pacers',
       'Chicago Bulls', 'Cleveland Cavaliers', 'Memphis Grizzlies',
       'Miami Heat', 'Milwaukee Bucks', 'Charlotte Hornets',
       'Minnesota Timberwolves', 'Portland Trail Blazers',
       'New Orleans Pelicans', 'Sacramento Kings', 'Detroit Pistons',
       'Dallas Mavericks', 'Philadelphia 76ers', 'New York Knicks',
       'Phoenix Suns', 'Los Angeles Lakers', 'Orlando Magic',
       'Brooklyn Nets'], dtype=object)

In [33]:
nice_print_results(mybt.predict_proba_task(NBA_results_task_predict, column=['team1', 'team2']))

team1 is preferred  [0.69 0.48 0.75 ... 0.65 0.43 0.82]
team2 is preferred  [0.31 0.52 0.25 ... 0.35 0.57 0.18]


In [34]:
mybt.predict_choice_task(NBA_results_task_predict)

array(['Atlanta Hawks', 'Charlotte Hornets', 'Atlanta Hawks', ...,
       'Washington Wizards', 'Boston Celtics', 'Washington Wizards'],
      dtype=object)

In [35]:
mybt.predict_task(NBA_results_task_predict).top_input_data

array(['Atlanta Hawks', 'Charlotte Hornets', 'Atlanta Hawks', ...,
       'Washington Wizards', 'Boston Celtics', 'Washington Wizards'],
      dtype=object)

# Example using GridSearchCV()
The models we have fitted above also have hyperparameters, such as the method of gradient descent or regularisation. To  optimise the hyperparameter selection, we can use GridSearchCV(). GridSearchCV() tries out a series of hyperparameter combinations and runs a k-fold cross-validation on an accuracy metric determined by the user to check which ones have performed best.

In [36]:
to_tune = {'alpha': [1, 2, 4], 'method': ['BFGS']}
gs_bt = GridSearchCV(BradleyTerry(), to_tune,  cv=3, scoring='neg_log_loss')
gs_bt.fit_task(NBA_results_task_train)
gs_bt.inspect_results()

The model with the best parameters was:
BradleyTerry(alpha=2, method='BFGS')
With a score of -0.6265008194657992
All the trials results summarised in descending score
   alpha method  mean_test_score
1      2   BFGS        -0.626501
0      1   BFGS        -0.626742
2      4   BFGS        -0.628853


In [37]:
# Showing that sklearn.metrics works also
to_tune = {'alpha': [1, 2, 4], 'method': ['BFGS']}
gs_bt = GridSearchCV(BradleyTerry(), to_tune,  cv=3, scoring=f1_score)
gs_bt.fit_task(NBA_results_task_train)
gs_bt.inspect_results()

The model with the best parameters was:
BradleyTerry(alpha=4, method='BFGS')
With a score of 0.6337744652191032
All the trials results summarised in descending score
   alpha method  mean_test_score
2      4   BFGS         0.633774
1      2   BFGS         0.631136
0      1   BFGS         0.630085


In [38]:
to_tune = {'C': [0.5, 1, 2, 4, 8], 'solver': ['saga'], 'penalty': ['l1','l2'],
           'fit_intercept': [True, False]}
gs_lr = GridSearchCV(ClassificationReducer(LogisticRegression()), to_tune,
                     cv=3, scoring='neg_log_loss')
gs_lr.fit_task(NBA_results_task_train)
gs_lr.inspect_results()

The model with the best parameters was:
ClassificationReducer(model=LogisticRegression(C=1, solver='saga'))
With a score of -0.6864987580029042
All the trials results summarised in descending score
    model__C  model__fit_intercept model__penalty model__solver  \
5        1.0                  True             l2          saga   
18       8.0                 False             l1          saga   
0        0.5                  True             l1          saga   
13       4.0                  True             l2          saga   
4        1.0                  True             l1          saga   
16       8.0                  True             l1          saga   
2        0.5                 False             l1          saga   
10       2.0                 False             l1          saga   
7        1.0                 False             l2          saga   
14       4.0                 False             l1          saga   
12       4.0                  True             l1          saga  

In [39]:
gs_lr.predict_task(NBA_results_task_predict).top_input_data

array(['Atlanta Hawks', 'Charlotte Hornets', 'Atlanta Hawks', ...,
       'Washington Wizards', 'Washington Wizards', 'Washington Wizards'],
      dtype=object)

In [40]:
nice_print_results(gs_lr.predict_proba_task(NBA_results_task_predict, column='team1'))

team1 is preferred  [0.55 0.43 0.51 ... 0.59 0.53 0.61]


In [41]:
nice_print_results(gs_bt.predict_proba_task(NBA_results_task_predict, column='team1'))

team1 is preferred  [0.67 0.47 0.7  ... 0.64 0.45 0.79]


In [42]:
gs_bt.rank_entities(ascending=False)

array(['Golden State Warriors', 'San Antonio Spurs', 'Houston Rockets',
       'Utah Jazz', 'Boston Celtics', 'Oklahoma City Thunder',
       'Washington Wizards', 'Toronto Raptors', 'Los Angeles Clippers',
       'Denver Nuggets', 'Atlanta Hawks', 'Indiana Pacers',
       'Chicago Bulls', 'Cleveland Cavaliers', 'Memphis Grizzlies',
       'Miami Heat', 'Milwaukee Bucks', 'Charlotte Hornets',
       'Minnesota Timberwolves', 'Portland Trail Blazers',
       'Detroit Pistons', 'New Orleans Pelicans', 'Sacramento Kings',
       'Philadelphia 76ers', 'Dallas Mavericks', 'New York Knicks',
       'Phoenix Suns', 'Los Angeles Lakers', 'Orlando Magic',
       'Brooklyn Nets'], dtype=object)