## Testing of model Transitive V1

This notebook aims to use cross validation to test the performance of mode Transitive V1.

Transitive V1 is a simple model that instead of predicting matches, aims to predict the probability of a player winning any given point. Then, with dynamic programming we convert the point probabilities to a match win probability. This model only considers two different types of points, service and return points.

We estimate a player's point % win rate against another player by using common opponents. Based on the performance of each player against these common opponents, we estimate the relative serve and return win % between the two plyers.

2024 data is reserved for the validation set.

In [1]:
import numpy as np
import pandas as pd

from atp_forecaster.models.transitive_v1 import TransitiveV1
from atp_forecaster.data.clean import get_cleaned_atp_matches

matches = get_cleaned_atp_matches()
matches[matches['tourney_date'] >= 20240101].head()


Unnamed: 0,surface,draw_size,tourney_level,tourney_date,id_a,name_a,hand_a,ht_a,age_a,id_b,...,2ndWon_b,SvGms_b,bpSaved_b,bpFaced_b,rank_a,rank_points_a,rank_b,rank_points_b,result,order
88725,Hard,32,A,20240101,209992,Juncheng Shang,L,180.0,18.9,126207,...,11.0,10.0,6.0,8.0,183.0,335.0,16.0,2310.0,1,88725
88726,Hard,32,A,20240101,209950,Arthur Fils,R,185.0,19.5,126094,...,15.0,13.0,7.0,10.0,36.0,1158.0,5.0,4805.0,0,88726
88727,Hard,32,A,20240101,200325,Emil Ruusuvuori,R,188.0,24.7,124116,...,22.0,15.0,9.0,12.0,69.0,771.0,43.0,1048.0,1,88727
88728,Hard,32,A,20240101,209992,Juncheng Shang,L,180.0,18.9,126094,...,19.0,13.0,5.0,8.0,183.0,335.0,5.0,4805.0,0,88728
88729,Hard,32,A,20240101,126094,Andrey Rublev,R,188.0,26.1,200325,...,12.0,10.0,5.0,7.0,5.0,4805.0,69.0,771.0,1,88729


In [None]:
from sklearn.metrics import log_loss, roc_auc_score, accuracy_score

def test_transitive(model, start_date: int, end_date: int):
    """
    Test the performance of the Transitive V1 model over a given date range.
    """
    # Get all matches between start_date and end_date
    train = matches[matches['tourney_date'] < start_date]
    test = matches[(matches['tourney_date'] >= start_date) & (matches['tourney_date'] < end_date)]

    model.fit(train)

    y_pred = []
    y_true = []

    # Predict the outcome of each match in the test set
    for (_, row) in test.iterrows():
        if (len(model.find_common_opponents(row['id_a'], row['id_b'])) < 15):
            continue

        prob = model.predict_match(row['id_a'], row['id_b'])

        y_pred.append(prob)
        y_true.append(row['result'])

        print(row['id_a'], row['id_b'], row['name_a'], row['name_b'], prob)

        # update model with the match
        model.add_match(row)
    
    y_pred = np.array(y_pred, dtype=float)
    y_true = np.array(y_true, dtype=int)

    loss = log_loss(y_true, y_pred)
    auc = roc_auc_score(y_true, y_pred)
    acc = accuracy_score(y_true, (y_pred > 0.5).astype(int))

    return loss, auc, acc

model = TransitiveV1(
    look_back=None,
    base_spw=0.6
)

In [11]:
loss, auc, acc = test_transitive(model, 20230101, 20230231)
print("Log Loss: ", loss)
print("AUC: ", auc)
print("Accuracy: ", acc)

105138 126094 Roberto Bautista Agut Andrey Rublev 0.41309284337186225
105138 200624 Roberto Bautista Agut Sebastian Korda 0.4368147398535824
207733 111575 Jack Draper Karen Khachanov 0.3901875844525627
106421 132283 Daniil Medvedev Lorenzo Sonego 0.6565995101670611
200624 206173 Sebastian Korda Jannik Sinner 0.3981600888596126
122330 128034 Alexander Bublik Hubert Hurkacz 0.6131612746372195
105676 105777 David Goffin Grigor Dimitrov 0.4026369956471162
105676 126774 David Goffin Stefanos Tsitsipas 0.39962411065167464
111815 126203 Cameron Norrie Taylor Fritz 0.3291934446187826
128034 126203 Hubert Hurkacz Taylor Fritz 0.4681493528532111
207518 126207 Lorenzo Musetti Frances Tiafoe 0.21829625429289598
105554 126207 Daniel Evans Frances Tiafoe 0.45784843429836136
111575 106421 Karen Khachanov Daniil Medvedev 0.5670044735945442
106421 104925 Daniil Medvedev Novak Djokovic 0.5394048469480298
104925 200624 Novak Djokovic Sebastian Korda 0.5512734424902674
111815 104745 Cameron Norrie Rafael 

In [9]:
model.fit(matches[matches['tourney_date'] < 20241231])

# sinner vs Kokkinakis
model.predict_match(206173, 106423, debug=True)

delta_serve:  0.03162463828028106 delta_return:  -0.03090179765265194 p:  0.5036286035524267
delta_serve:  0.08749999999999991 delta_return:  -0.09545454545454546 p:  0.46266102900720596
delta_serve:  0.05062770117308091 delta_return:  0.030604026845637566 p:  0.8508508168714084
delta_serve:  0.062324739936680196 delta_return:  0.06397390677741116 p:  0.9480945687658601
delta_serve:  -0.03029301608283763 delta_return:  -0.1298309178743961 p:  0.024652042474509042
delta_serve:  0.05940343781597568 delta_return:  0.2064153678869155 p:  0.9997690126301189
delta_serve:  -0.10343602887885117 delta_return:  -0.10466826525729692 p:  0.0039224703732727885
delta_serve:  -0.09589542596261513 delta_return:  -0.08123393770777831 p:  0.011224796881983716


0.47560041756959825