## Optimize K On The Go

This notebook will take in your dataset, and it will find the K value that leads to the Elo ratings best fitting the data at all points throughout the season. This notebook also runs the Elo system at that K value.

Compared to OptimizeK_FullTraining, this method for optimizing K will value its accuracy at all points in the season, so it produces the K that is the most accurate to the outcome of every game at the time of the game. This idea originally comes from https://opisthokonta.net/?p=1387

-Grant Harkins

In [1]:
#This cell contains the main calculations associated with the Elo method
#The first function calculates the probability of each team winning
#The second function takes those probabilities, and the outcome of the game to determine the new ratings

import math  
def Probability(rating1, rating2): 
  
    return 1.0 * 1.0 / (1 + 1.0 * math.pow(10, 1.0 * (rating1 - rating2) / 400)) 
  
# Function to calculate Elo rating 
# K is a constant. 
# Player A wins over Player B.  
# tie = true if tie, false otherwise
def EloRating(Ra, Rb, K, tie): 
    
    # To calculate the Winning 
    # Probability of Player B 
    Pb = Probability(Ra, Rb) 
  
    # To calculate the Winning 
    # Probability of Player A 
    Pa = Probability(Rb, Ra) 
  
    # Updating the Elo Ratings 
    if tie:
       Ra = Ra + K * (1/2 - Pa) 
        
       Rb = Rb + K * (1/2 - Pb) 
    else:        
       Ra = Ra + K * (1 - Pa) 
       Rb = Rb + K * (0 - Pb) 
    
    return Ra, Rb

In [2]:
import pandas as pd
pathGames = '/FILEPATH/' #Load in Game filepath
pathTeams = '/FILEPATH/' #Load in the team filepath
gameFilename = '.txt' #Load Game file
teamFilename = '.txt' #Load Teamfile
games = pd.read_csv(pathGames + gameFilename, skiprows = 1, header = None)

#We got our data from masseyratings.com, so reading the files is based on the structure of those files

In [3]:
import pandas as pd

teamNames = pd.read_csv(pathTeams + teamFilename, header = None)
numTeams = len(teamNames)

In [4]:
# columns of games are:
#	column 0 = days since 1/1/0000
#	column 1 = date in YYYYMMDD format
#	column 2 = team1 index
#	column 3 = team1 homefield (1 = home, -1 = away, 0 = neutral)
#	column 4 = team1 score
#	column 5 = team2 index
#	column 6 = team2 homefield (1 = home, -1 = away, 0 = neutral)
#	column 7 = team2 score
numGames = len(games)

In [5]:
#Finding the optimal K
import numpy as np

sigFig = 2 #number of sigfigs for optimal K to go to 
for p in range(sigFig):
    
    if p == 0:    
        startK = 0 #The K's to check range from startK to endK (inclusive)
        endK = 30
        step = 1
    else:
        startK = bestK - step*(1/2)
        endK = bestK + step*(1/2)
        step = 10**(-p)
        
    runs = int((endK - startK) / step) + 1 #number of checks at each sigfig power 

    allErrors = [] #holds all the errors at current sigfig level
    for m in range(runs):
        K = startK + (m * step)
        squaredErrors = [] #Holds the errors for this particular K
        eloRatings = np.zeros(numTeams) #resets ratings for this particular K
        for i in range(numGames): 
            team1ID = games.loc[i, 2] - 1 # subtracting 1 since python indexes at 0
            team1Score = games.loc[i, 4]
            team2ID = games.loc[i, 5] - 1 # subtracting 1 since python indexes at 0
            team2Score = games.loc[i, 7]
            
            # Update ratings 
            if team1Score > team2Score: 
                localError = 2*((Probability(eloRatings[team2ID],eloRatings[team1ID]) - 1)**2) #Errors calculated at the time of the game
                squaredErrors.append(localError)
                team1Rating, team2Rating = EloRating(eloRatings[team1ID], eloRatings[team2ID], K, False)
            elif team1Score < team2Score: 
                localError = 2*((Probability(eloRatings[team1ID],eloRatings[team2ID]) - 1)**2) 
                squaredErrors.append(localError)
                team2Rating, team1Rating = EloRating(eloRatings[team2ID], eloRatings[team1ID], K, False)
            else:  
                team1Rating, team2Rating = EloRating(eloRatings[team1ID], eloRatings[team2ID], K, True)
                
            eloRatings[team1ID] = team1Rating
            eloRatings[team2ID] = team2Rating
        meanError = sum(squaredErrors) / len(squaredErrors) 
        
        allErrors.append(meanError)
        #print(f'The mean squared Error for K = {K} is {meanError}') #optional to see MSE at each K value
    
    bestK = (np.argmin(allErrors) * step) + startK
    print(f'The best K in [{startK}, {endK}] is {bestK} with MSE of {allErrors[np.argmin(allErrors)]}')
K = bestK #Setting K to the best K to see the rankings with that K

The best K in [0, 30] is 28 with MSE of 0.4364364901863854
The best K in [27.5, 28.5] is 28.2 with MSE of 0.4364360773920701


In [6]:
#Printing the ratings/rankings with the optimal K just found

k = 0 #Number of teams to display; k = 0 displays all teams
iSort = np.argsort(-eloRatings)

print('\n\n************** ELO Rating Method **************\n')
print('===========================')
print('Rank   Rating      Team   ')
print('===========================')
if k == 0:
    for i in range(numTeams):
        print(f'{i+1:4d}   {eloRatings[iSort[i]]:.5f}  {teamNames.loc[iSort[i],1]}')
else:
    for i in range(k):
        print(f'{i+1:4d}   {eloRatings[iSort[i]]:.5f}  {teamNames.loc[iSort[i],1]}')

print('')   # extra carriage return



************** ELO Rating Method **************

Rank   Rating      Team   
   1   264.49582   Oklahoma_City
   2   190.66572   Boston
   3   160.96067   Cleveland
   4   154.24309   LA_Clippers
   5   130.43202   Indiana
   6   112.55701   Minnesota
   7   105.60607   Golden_State
   8   105.12610   Milwaukee
   9   95.13273   LA_Lakers
  10   88.06545   Houston
  11   84.38761   Denver
  12   65.80435   New_York
  13   26.35769   Orlando
  14   25.25217   Detroit
  15   20.92438   Memphis
  16   20.10014   Chicago
  17   -1.90479   Portland
  18   -26.79873   Sacramento
  19   -35.68379   Miami
  20   -42.91009   Atlanta
  21   -60.79175   Dallas
  22   -65.47211   Phoenix
  23   -74.81054   San_Antonio
  24   -96.61217   Toronto
  25   -163.02383   Brooklyn
  26   -180.40347   New_Orleans
  27   -201.22185   Washington
  28   -226.81253   Philadelphia
  29   -233.91940   Utah
  30   -239.74597   Charlotte

