** results listed below methods

## Pre-match Predictions

Before play has started, an in-match prediction model cannot draw on information from the match itself. Then, before a match between players $i$ and $j$ commences, it makes sense that this model should use the most well-informed pre-match forecast $\hat{\pi}_{ij}(t)$ as a starting point for predictions. Therefore, we first explore pre-match models as a starting point for in-match prediction.

Earlier this year, Kovalchik released a survey of eleven different pre-match prediction models, assessing them side-by-side in accuracy, log-loss, calibration, and discrimination. 538's elo-based model and the Bookmaker Consensus Model performed the best. Elo-based prediction incorporates player $i$ and $j$'s entire match histories, while the BCM model incorporates all information encoded in the betting market. However, the paper leaves out a point-based method  devised by Klaassen and Magnus that derives serving probabilities from historical player data (combining player outcomes).


## Elo Rating System

Elo was originally developed as a head-to-head rating system for chess players (1978). Recently, 538's elo variant has gained prominence in the media. For match $t$ between $p_i$ and $p_j$ with elo ratings $E_i(t)$ and $E_j(t)$, $p_i$ is forecasted to win with probability:

$\hat{\pi}_{ij}(t) = (1 + 10*\frac{E_j(t)-E_i(t)}{400})^{-1}$


$p_i$'s rating for the following match $t+1$ is then updated accordingly:

$E_i(t+1) = E_i(t) + K_{it}*(\hat{\pi}_{ij}(t)-W_i(t))$

$W_i(t)$ is an indicator for whether $p_i$ won the given match, while $K_{it}$ is the learning rate for $p_i$ at time $t$. According to 538's analysts, elo ratings perform optimally when allowing $K_{it}$ to  decay slowly over time. With $m_i(t)$ representing the $p_i$'s career matches played at time $t$ we update our learning rate:

$K_{it} = 250/(5+m(t))^{.4} $

This variant updates a player's elo most quickly when we have no information about a player and makes smaller changes as $m_i(t)$ accumulates. To apply this elo rating method to our dataset, we initalize each player's elo rating at $E_i(0)=1500$ and match history $m_i(0)=0$. Then, we iterate through all tour-level matches from 1968-2017 in chronological order, storing $E_i(t),E_j(t)$ for each match and updating each player's elo accordingly.

## Point-based Model

The hierarchical Markov Model offers an analytical solution to win probability $\hat{\pi}_{ij}(t)$ between players $p_i$ and $p_j$, given serving probabilities $f_{ij}$,$f_{ji}$. Klaassen and Magnus outline a way to estimate each player's serving probability from historical serve and return data. 

$f_{ij} = f_t + (f_i-f_{av})-(g_j-g_{av})$

$f_{ji} = f_t + (f_j-f_{av})-(g_i-g_{av})$

Each player's serve percentage is a function of their own serving ability and their opponent's returning ability. $f_t$ denotes the average serve percentage for the match's given tournament, while $f_i,f_j$ and $g_i,g_j$ represent player $i$ and $j$'s percentage of points won on serve and return, respectively. $f_{av},g_{av}$ are the tour-level averages in serve and return percentage; since all points are won by either server or returner, $f_{av} =1-g_{av}$.

As per Klaassen and Magnus' implementation, we use the previous year's tournament serving statistics to calculate $f_t$ for a given tournament and year, where $(w,y)$ represents the set of all matches played at tournament $w$ in year $y$.

$f_t(w,y) = \frac{\sum_{k \in (w,y-1)}{\text{# of points won on serve in match k}}}{\sum_{k \in (w,y-1)}\text{# of points played in match k}}$

With our tour-level match dataset, we can keep a year-long tally of serve/return statistics for each player at any point in time (for more details, see latex file). Below, we combine player statistics over the past 12 months to produce $f_{ij},f_{ji}$ for Kevin Anderson and Fernando Verdasco's 3rd round match at the 2013 Australian Open.

From 2012 Australian Open statistics, $f_t=.6153$. From tour-level data spanning 2010-2017, $f_{av} = 0.6468; g_{av} = 1-f_{av} =.3532$ Using the above serve/return statistics from 02/12-01/13, we can calculate:

$f_{ij} = f_t + (f_i-f_{av})-(g_j-g_{av})$ = .6153 + (.6799-.6468) - (.3795-.3532) = .6221

$f_{ji} = f_t + (f_j-f_{av})-(g_i-g_{av})$ = .6153 + (.6461-.6468) - (.3478-.3532) = .6199$

With the above serving percentages, Kevin Anderson is favored to win the best-of-five match with probability $M_p(0,0,0,0,0,0) = .5139$

## Results

From experimentation, we found that 30% weight to surface-specific elo ratings and serve/return statistics was optimal in minimizing cross entropy for both elo and point-based methods. Elo ratings still far outperform point-based models (70% vs 65%), yet the point-based models do improve significantly when serve/return stats are normalized by the James-Stein estimators (cross entropy decreased from .649 to .616). Still, standard elo ratings are superior a cross-entropy of .59. the  While we have yet to test a point-based model with adjusted serve/return percentages, it still seems that elo ratings provide one of the most reliable pre-match forecasts, short of betting odds. This is consistent with findings from Kovalchik's "Searching for the GOAT of tennis win prediction" (2017). This also suggests that an effective in-match prediction model must incorporate player's elo ratings. While this is easy to plug into logistic regression/neural nets, we must find a suitable way to incorporate this information into our point-based hierarchical Markov Model.

In [1]:
from helper_functions import validate_results
import pandas as pd
import numpy as np
from sklearn.metrics import log_loss,accuracy_score

# can test this on our subset of 10,000 matches as well as all matches in the database:
df = pd.read_csv('../my_data/elo_pbp_with_surface_11_26_dynamic_rating_tny_level_wrong.csv')
del df['Unnamed: 0']

# currently looking at 2014 tour-level matches, excluding Davis Cup
df = df[df['match_year']==2017].reset_index(drop=True)

In [2]:
print 'elo baseline: ',  sum((df['elo_diff']>0) == df['winner'])/float(len(df))
print log_loss(df['winner'],[(1+10**(diff/-400.))**-1 for diff in df['elo_diff']])
print log_loss(df['winner'],[(1+10**(diff/-400.))**-1 for diff in df['sf_elo_diff']])
print 'surface elo baseline: ', sum((df['sf_elo_diff']>0) == df['winner'])/float(len(df))
print 'elo 538 baseline: ',  sum((df['elo_diff_538']>0) == df['winner'])/float(len(df))
print log_loss(df['winner'],[(1+10**(diff/-400.))**-1 for diff in df['elo_diff_538']])
print log_loss(df['winner'],[(1+10**(diff/-400.))**-1 for diff in df['sf_elo_diff_538']])
print 'surface elo 538 baseline: ', sum((df['sf_elo_diff_538']>0) == df['winner'])/float(len(df))

elo baseline:  0.718614718615
0.5549066549627972
0.5494030388214018
surface elo baseline:  0.735930735931
elo 538 baseline:  0.718614718615
0.5738021551287015
0.5771998135696892
surface elo 538 baseline:  0.744588744589


In [55]:
cols = [['elo_diff_538'],['elo_diff','sf_elo_diff'],['elo_diff_538','sf_elo_diff_538'],\
        ['elo_diff','sf_elo_diff','match_z_kls'],\
        ['elo_diff_538','sf_elo_diff_538','match_z_kls']]
probs = ['match_prob_wgt_0.3',u'match_prob_kls',u'match_prob_kls_JS', u'match_prob_sf_kls',\
          u'match_prob_sf_kls_JS']


n_splits = 5
validate_results(df,probs=probs,lm_columns=cols,n_splits=n_splits)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').