# TrueSkill: A Bayesian Skill Rating System

- TrueSkill is a generalisation of the Elo system used in Chess
    - Can deal with any number of competiing entities,
    - and infer individual skills from team results.

- Why do we need skill rating system?
    - Making balanced matches, 
    - stimulating interest and competition,
    - and using as qualification for tournaments.

## Elo system

- Created by Arpad Elo in 1959.
- The probability that player 1 wubs us given by the probability that his performance $p_1$ exceends the oppoents's performance $p_2$.
$$
P(p_1 > p_2 | s_1, s_2) = \Phi(\frac{s_1 - s_2}{\sqrt{2}\beta})
$$
where $s_1$ and $s_2$ are player 1 and player 2 skill ratings. $\Phi$ denotes the cumulative density of a zero-mean unit-variance Gaussian(Standard Normal Distribution).

- After the game the $s_1$ and $s_2$ are updated such that $s_1 + s_2$ equals some constant. 
- To update player 1 score, let $y=+1$ if player 1 wins, $y=-1$ if player 1 loses, and $y=0$ if there is a draw.
- Then Linearised Elo update is given by 
$
s_1 \leftarrow s_1 + y\Delta, 
s_2 \leftarrow s_2 - y\Delta,
$
and 

$$
\Delta = \alpha \beta \sqrt{\pi}\left(\frac{y+1}{2} - \Phi\left(\frac{s_1 - s_2}{\sqrt{2}\beta}\right)\right),
$$
where $0 < \alpha < 1$ determines the weighting of the new evidence versus the old estimate.

- Elo variants uses [logistic distribution](https://en.wikipedia.org/wiki/Logistic_distribution) instead of Gaussian because it is argued to provide a better fit for Chess data.
- Elo system addressed the problem of estimating from paired comparison data from other models.
- Player's rating is considered provisional(need to have another rating system) if the player only played 20-25 games.
    - Glicko model addresses this problem.

TrueSkill addresses these two challenges in online matchmaking games:
1. Game outcomes often refer to to teams of plaers yet a skill rating for individual players is needed for future matchmaking.
2. More than two players or teams compete such that the game outcome is a permutation of teams or players rather than just a winner and a loser.

## Section 2-3:
bunch of math

## Section 4: Experiments and Online Service
- To test the TrueSkill algorithm, the study used thousands of dataset of Halo 2 matches for four different types of games.
1. 8 player free-for-all
2. 4v4
3. 1v1 (head-to-head)
4. 8v8

- They compared TrueSkill algorithm to Elo with a Gaussian performance distribution.
- When they had to process team game with more than two teams they used the so-called duelling heruristic(compute $\Delta$ based on team and update score with average $\Delta$).
- According to the study, TrueSkill is significantly better at predicting matches than the Elo system.

- Match Quality
    - TrueSkill is significantly better for matchmaking in free-for-all and head-to-head games but fails in small teams (could be because of game type in Halo 2).
- Win Probability
    - Used winning ratio and minimal games played to compute the winning probability. With TrueSkill players with very few games got mostly fair matches.
- Convergence Properties
    - using data from the two highest rated players in free-for-all
    - TrueSkill chooses the correct learning rate (knowing the player's future match results) while Elo slowly converges to the target skill curve.
    
   

In [1]:
import trueskill

In [2]:
from trueskill import Rating, quality_1vs1, rate_1vs1
alice, bob = Rating(25), Rating(30)  # assign Alice and Bob's ratings
if quality_1vs1(alice, bob) < 0.50:
    print('This match seems to be not so fair')
alice, bob = rate_1vs1(alice, bob)  # update the ratings after the match

This match seems to be not so fair


In [3]:
Rating()

trueskill.Rating(mu=25.000, sigma=8.333)