# Player ratings based on the team results

In this notebook we will predict the player's invidual ratings based on the teams results. We will use dataset taken from popular in CIS team game called "What? Where? When?".

In [1]:
# imports
import numpy as np
import pandas as pd
import math
import pickle

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from scipy.stats import kendalltau, spearmanr
from scipy.special import logit, expit

import pdb
import zipfile
import operator
from IPython.display import clear_output

### 1. Processing data 

In [2]:
# load the needed files
! wget https://www.dropbox.com/s/s4qj0fpsn378m2i/chgk.zip -nc

with zipfile.ZipFile('chgk.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

tournaments = pickle.load(open('tournaments.pkl', 'rb'))
results = pickle.load(open('results.pkl', 'rb'))
clear_output()

In [3]:
# get tournaments for training and testing model
def get_tournaments_ids(year):
    cur_tournaments = [v for k,v in tournaments.items() if v['dateStart'][:4] == year]
    tournaments_with_results = [v for idx, v in enumerate(cur_tournaments) if v['id'] in results and results[v['id']] != []]
    tournaments_with_results_mask = [v for idx, v in enumerate(tournaments_with_results) if 'mask' in results[v['id']][0]]
    return [v['id'] for v in tournaments_with_results_mask]

train_tournaments_id = set(get_tournaments_ids('2019'))
test_tournaments_id = set(get_tournaments_ids('2020'))

In [4]:
# create a table for training and testing using the results file
def make_table(tournaments_ids):
    table = []
    question_id = 0

    for tourn_id, tourn_data in results.items():
        if tourn_id in tournaments_ids:
            # get number of questions in the tournament
            numb_questions = 9999
            for team in tourn_data:
                if 'mask' in team and team['mask'] != None:
                    if len(team['mask']) < numb_questions:
                        numb_questions = len(team['mask'])
            
            for team in tourn_data:
                if 'mask' in team and team['mask'] != None:
                    total_answers = 0
                    for answer in str(team['mask']):
                        if answer == '1' or answer == '0':
                            total_answers = total_answers + int(answer)

                    team_id = team['team']['id']
                    for member in team['teamMembers']:
                        member_id = member['player']['id']
                        for i in range(numb_questions):
                            answer = team['mask'][i]
                            if answer == '1' or answer == '0':
                                table.append([tourn_id, team_id, member_id, question_id+i, int(answer), total_answers])
            question_id = question_id + numb_questions           

    return pd.DataFrame(table, columns=['tourn_id', 'team_id', 'p_id', 'q_id', 'answer', 'total_answers'])

In [5]:
# get dataset for model training purposes
df_train = make_table(train_tournaments_id)

### 2. Baseline Model for Players Ratings

We will train the model in the following way. We will use one hot encoding to represent our data. Columns will be players and questions. The target will be whether question was answered or not. As a result, coefficients of logistic regression model will correspond to players skills (higher coefficient of player column, better player's skill) and question complexity(higher coefficient of question column, easier the question). Using those coefficients we can approximately rate player's skills.

In [67]:
# use one hot encoder from scikit
one_hot_enc = OneHotEncoder(handle_unknown='ignore')
X_train = one_hot_enc.fit_transform(df_train[['p_id', 'q_id']])
y_train = np.array(df_train['answer'], dtype=np.int32)

# train logistic regression 
baseline_model = LogisticRegression(random_state=42)
baseline_model.fit(X_train, y_train)
clear_output()

### 3. Model Accuracy Metric

We will predict rating of the teams based on the probability of at least one team member answering the question

In [10]:
# get dataset for model testing purposes
df_test = make_table(test_tournaments_id)

# keep only the players present in training set
train_players = np.unique(df_train['p_id'])
df_test = df_test[df_test.p_id.isin(train_players)]

# remove question ids because we won't account for question complexity
df_test['q_id_copy'] = df_test['q_id']
df_test['q_id'] = -999

In [11]:
# get the model's probability predictions for answering the question
X_test = one_hot_enc.transform(df_test[['p_id', 'q_id']])
y_pred_prob = baseline_model.predict_proba(X_test)[:, 1]

In [12]:
# calculate the predicted score based on probabilities of answering questions by the teammates
df_test['answer_prob'] = y_pred_prob
df_test['prob_score'] = df_test.groupby(['tourn_id', 'team_id', 'q_id_copy'])['answer_prob'].transform(lambda x: 1 - np.prod(1 - x))

# create dataframe for comparing ratings
df_ratings = df_test[['tourn_id', 'team_id', 'total_answers', 'prob_score']].drop_duplicates() 

# true rating is based on the number of questions team answered in the given tournament 
df_ratings = df_ratings.sort_values(by=['tourn_id', 'total_answers'], ascending=False) 
df_ratings['true_rating'] = df_ratings.groupby('tourn_id')['total_answers'].transform(lambda x: np.arange(1, len(x) + 1))

# predicted rating is based on the probability scores calculated previously
df_ratings = df_ratings.sort_values(by=['tourn_id', 'prob_score'], ascending=False) 
df_ratings['pred_rating'] = df_ratings.groupby('tourn_id')['prob_score'].transform(lambda x: np.arange(1, len(x) + 1))
df_ratings['pred_rating'] = df_ratings['pred_rating'].astype(np.int32)

In [13]:
# calculate the correlation to check model's accuracy
print(f"Spearmann correlation: {df_ratings.groupby('tourn_id').apply(lambda x: spearmanr(x['true_rating'], x['pred_rating']).correlation).mean()}")
print(f"Kendall correlation: {df_ratings.groupby('tourn_id').apply(lambda x: kendalltau(x['true_rating'], x['pred_rating']).correlation).mean()}")

Spearmann correlation: 0.7884643695008277
Kendall correlation: 0.6164033759517489


### 4. EM model

We now will train EM model. We will introduce the latent variable z for each player-question pair. z = 0 for all teammates, when no one in the team answers the question. z = 1 for at least one teammate, when team correctly answers the question. <br>

At the E-step we will calculate the expectation of z, by fixing players' skills and question complexities. At the M-step we will train logistic regression by setting z as target for each players. <br>

The formula we can use at the E-step would be $P(z_{iq}=1) =  \frac{\sigma(x)}{1 - \prod_{j\in t}^{} (1 - \sigma(x)))}$
,where $z_{iq}$ is the result whether player q answers question i, $\sigma$ is logistic regression model and $x$ is the input to the model. <br>

Reference: Sergei Nikolenko, A probabilistic Rating System for Team Competitions with Individual Contributions

In [None]:
# get the dictionary of one-hots
labels = one_hot_enc.categories_[0]
one_hot_dict = {}
for i in range(len(labels)): one_hot_dict[labels[i]] = i 

In [None]:
class EM() :
    def __init__(self, learning_rate = 10, iterations = 10) :        
        self.learning_rate = learning_rate        
        self.iterations = iterations

    def _quest_prob(self, q_id):
        row_num = q_id#self.one_hot_dict[q_id]
        return expit(self.X[row_num].dot(self.W) + self.b)

    # E-step: calculate z-values fixing skill and complexity
    def _E_step(self, data):
        print('e started')
        # invidiual question probabilities
        self.team_info['q_prob'] = self.team_info.apply(lambda x: self._quest_prob(x['q_id']), axis=1)
        print('prob')
        # team question probabilities
        self.team_info['t_prob'] = self.team_info.groupby(['team_id', 'q_id'])['q_prob'].transform(lambda x: np.prod(1 - x))
        self.team_info['t_prob'] = self.team_info.groupby('t_prob').apply(lambda x: 1 - x)
        print('group')
        # update latent variable Z
        self.Z = (self.team_info['q_prob'] / self.team_info['t_prob']).to_numpy()
        self.Z[self.Y == 0] = 0
        print('e-step done')
        

    # M-step: train log regression with z-values as targets
    def _M_step(self, iter=1):
        for i in range(iter):
            self._update_weights()

        return self

    def _update_weights(self) :           
        A = expit(self.X.dot(self.W) + self.b) 
          
        # calculate gradients        
        tmp = (A - self.Z.T)        
        tmp = np.reshape(tmp, self.m)        
        dW = np.dot(self.X.T, tmp) / self.m         
        db = np.sum(tmp) / self.m 
          
        # update weights    
        self.W = self.W - self.learning_rate * dW    
        self.b = self.b - self.learning_rate * db
          
        return self

          
    # Function for model training    
    def fit(self, X, Y, data, one_hot_dict) :        
        # no_of_training_examples, no_of_features        
        self.m, self.n = X.shape        
        # weight initialization        
        self.W = np.zeros(self.n)        
        self.b = 0        
        self.X = X        
        self.Y = Y
        self.Z = Y
        self.one_hot_dict = one_hot_dict

        # helper frame initialization
        self.team_info = pd.DataFrame({'team_id': data['team_id'], 'q_id': data['q_id']})
        self.team_info['q_prob'] = self.Y
        self.team_info['t_prob'] = self.Y
        data['z'] = self.Z
          
        # gradient descent learning
        for i in range(self.iterations) :            
            self.Z = self._E_step(data) 
            self._M_step()            
        
        return self
      
    def predict(self) :    
        temp = expit(self.X.dot(self.W) + self.b)  
        Y = np.where(temp > 0.5, 1, 0 )           
        return Y
    
    def predict_prob(self):
        return expit(self.X.dot(self.W))

In [None]:
em_model = EM(iterations=1)
em_model.fit(X_train, y_train, df_train, one_hot_dict)

Unfortunately, don't have enough RAM to check whether the implementation is working.

### 5. Tournament ratings based on question complexity

We will measure the rating of the tournament based on the average complexity of the questions. In the baseline model we have coefficicients of each question. Using those coefficients we will approximate average question complexity of the tournament.

In [43]:
# import tournament names for displaying results
df_tournaments = pd.DataFrame(pd.DataFrame(pd.read_pickle('tournaments.pkl')).T)
df_tournaments['dateStart'] =  pd.to_datetime(df_tournaments['dateStart'],utc=True).dt.year
df_tournaments = df_tournaments[df_tournaments['dateStart'] == 2019]
df_tournaments = df_tournaments.drop(columns=['dateEnd', 'dateStart', 'type','season','orgcommittee','synchData','questionQty'])
tourn_dict = dict(zip(df_tournaments.id, df_tournaments.name))

In [59]:
# find question average complexity for tournaments and sort results
questions = np.unique(df_train['q_id'])
q_ratings = dict(zip(questions, baseline_model.coef_[0][-len(questions):]))
df_train['q_diffic'] = df_train['q_id'].map(q_ratings)
t_ratings = df_train[['tourn_id', 'q_id', 'q_diffic']].drop_duplicates()
t_ratings = t_ratings.groupby('tourn_id')['q_diffic'].mean().sort_values()
ratings_list = t_ratings.index

In [66]:
top_n = 20
# print top top_n tournaments
print(f"Toughest {top_n} tournaments:\n")
for i in range(top_n): print(tourn_dict[ratings_list[i]]) 

print(f"\n\n\nEasiest {top_n} tournaments:\n")
# print lowest top_n tournaments
for i in range(1, top_n): print(tourn_dict[ratings_list[-i]]) 

Toughest 20 tournaments:

Чемпионат Санкт-Петербурга. Первая лига
Угрюмый Ёрш
Синхрон высшей лиги Москвы
Первенство правого полушария
Воображаемый музей
Записки охотника
Знание – Сила VI
Ускользающая сова
Кубок городов
Чемпионат Мира. Этап 2. Группа В
Чемпионат Минска. Лига А. Тур четвёртый
VERSUS: Коробейников vs. Матвеев
All Cats Are Beautiful
Антибинго
Чемпионат Мира. Этап 2 Группа С
Чемпионат России
Львов зимой. Адвокат
Чемпионат Мира. Этап 3. Группа В
Кубок Москвы
Чемпионат Мира. Этап 1. Группа С



Easiest 20 tournaments:

(а)Синхрон-lite. Лига старта. Эпизод V
Синхрон Лиги Разума
(а)Синхрон-lite. Лига старта. Эпизод III
(а)Синхрон-lite. Лига старта. Эпизод IX
(а)Синхрон-lite. Лига старта. Эпизод VI
(а)Синхрон-lite. Лига старта. Эпизод VII
Второй тематический турнир имени Джоуи Триббиани
(а)Синхрон-lite. Лига старта. Эпизод X
(а)Синхрон-lite. Лига старта. Эпизод IV
Синхрон-lite. Выпуск XXX
Joystick Cup
Синхрон-lite. Выпуск XXIX
Лига Сибири. VI тур.
Лига Сибири. IV тур.
Лига вузов

### 6. Players with few games but high ratings (bonus)

To deal with this kind of outliers affectings the model results we can set a certain threshold after which the players answer results will start counting towards the rating system. For example, we can say that after 100 answered questions we will start accoutning for players contribution towards player rating system. <br>

Another possibility is adding number of answered questions to the logistic regression model. We will add one more column that will keep track of how many questions player answered and depending on that model will have certain coefficients that we can use to adjust the rating score.

### 7. Accounting for players skill improvements over time (bonus)

Our goal is to account less for the past results and more for the recent ones. This can be done by dropping out the results from the past. We can say that games played long time ago should have a a high dropout probability, while the games played recently will have low probability. In this way our model will account more for the recently played results.