## Implementing Elo Ratings

Many websites and experts have attempted to predict NBA game outcomes with varying degrees of success. One notable example is FiveThirtyEight, which has built a reputation for using data and statistics to make predictions in various fields, including politics and sports. I will be using some of the methodology from FiveThirtyEight's ELO model in my analysis. FiveThirtyEight's methodology can be found here: https://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/. The ELO Rating system was not founded by FiveThirtyEight, rather it was invented and used much earlier for predicting chess outcomes and later adapted to professional sports leagues such as the NBA and NFL. 

The ELO Rating system measures a team's relative strength throughout a season. Each team begins their first game (in my dataset, this would be the first game of the 2014-2015 season) with an ELO rating of 1500. After each game, the two teams who participated in the game exchange a certain number of points based on factors such as the winner, the margin of victory, and home court advantage, with more points being awarded for upsets where the underdog won the game.

After the season is over, teams do not have a "hard reset" back to 1500 points for the beginning of the next season; rather they revert to the mean in a "soft reset". The Elo rating for the start of the next season is calculated as 75% of the team's Elo rating at the end of the previous season, plus 25% of the original Elo rating of 1500. The reason that the soft reset is desirable here is to account for the fact that not all teams are created equal, and good teams tend to stay good from season-to-season while bad teams tend to stay bad. Take the Golden State Warriors, for example, who pay a large amount of what is called 'luxury tax' to keep expensive players on their roster. Of course, factors such as trades in the offseason, player skill change, and new coaching can have a big impact on team performance, but to assume that all teams revert to being equal every season would be grossly incorrect.

At the end of each game, the two participating teams exchange Elo points in a zero-sum game, meaning that points that are awarded to the winning team are taken from the losing team in the same quantity. The nice thing about the Elo rating is it builds in factors such as margin of victory and upsets that can be hard to measure. The basic premise is that points are awarded to the winner of the game, but the number of points increase if the winner won by a greater margin of points, did not have home court advantage, or was an underdog, while points decrease if the winner won by a smaller margin of victory, had home court advantage, or was the stronger team coming into the game. The Elo Ratings are updated after every game as follows: 

1) First, Elo Ratings for each team from the last game that the team played are obtained. To account for home-court advantage, 100 points are awarded to the team that is playing at home before proceeding with the below calculations. 
2) A win probability is calculated for each team according to the below formula: 
    * Team Probability of Winning =  1 / (1 + 10 ^ ((Opponent Elo Rating - Team Elo Rating) / 400) )
3) A margin of victory multiplier is calculated for the winning team. This multiplier makes it so that the model accounts for the fact that a 1-point win and a 30-point win are indicative of different things when it comes to team performance and ability. The multiplier also builds in diminishing returns, meaning that the difference between a 1-point and a 10-point game is weighed more heavily than the difference between 20-point and a 30-point game.
    * Margin of Victory Multiplier = ((Winning Team Points Scored - Losing Team Points Scored)^(0.8)) / (7.5 + (.006 * (Winning Team Elo Rating - Losing Team Elo Rating)))
4) Finally, Elo Ratings for each team are updated according to the below formula. The K Factor is a set constant that determines how quickly the Elo Ratings react to new game results. It also functions as a cap on how many points can be exchanged between teams in a single game. According to FiveThirtyEight's methodology, the optimal K-factor for NBA games is 20, which is what I will use here. In addition, note that the inclusion of 1 minus the win probability in the equation makes it so that teams that have a lower probability of winning are awarded more points if they win. Also note that when updating Elo Ratings, we do not take into account the 100 points that was awarded to the team with home-court advantage - that is just used for calculating the Probability of Winning and the MOV Multiplier.
    * Winning Team Updated Elo Rating = Winning Team Previous Elo Rating + (K Factor * (1 - Probability of Winning Team Winning) * MOV Multipier) 
    * Losing Team Elo Rating = Losing Team Previous Elo Rating - (K Factor * (1 - Probability of Winning Team Winning) * MOV Multipier)

Please see below how I was able to implement this rating system & its nuances in code. 

In [333]:
league_games = league_games.sort_values(by=['GAME_DATE', 'TEAM_ID', 'SEASON_ID'])

# Initialize a dictionary to store the final ELO score for each team from the previous season
previous_season_elo = {}
elo_scores = {}

k_factor = 20 

# Loop through each unique season in the dataset
for season_id in league_games['SEASON_ID'].unique():
    
    # Initialize a dictionary to store the starting ELO score for each team in the current season
    starting_elo = {}
    
    # Loop through each unique team in the dataset
    for team_id in league_games['TEAM_ID'].unique():
        
        # If the team played in the previous season, use their final ELO score from the previous season as their starting ELO score
        if team_id in previous_season_elo:
            starting_elo[team_id] = (0.75 * previous_season_elo[team_id]) + (0.25 * 1500)
        
        # If the team did not play in the previous season, use the static 1500 value as their starting ELO score
        else:
            starting_elo[team_id] = 1500
    
    # Loop through each game in the current season and update the ELO scores for each team
    for game_id in league_games.loc[league_games['SEASON_ID'] == season_id, 'GAME_ID'].unique():
        for i, team_id in enumerate(league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id)), 
                                                     'TEAM_ID'].unique()):
            
            # Flag variable to indicate whether this is the first or second team in the game
            if i == 1:
                continue  # Skip the rest of the for loop for the second team 

            # If this is the first game of the season for the team, use their starting ELO score
            if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == team_id) & 
                                 (league_games['GAME_ID'] == game_id)), 'GAME_DATE'].iloc[0] == 
                                league_games.loc[((league_games['SEASON_ID'] == season_id) 
                                & (league_games['TEAM_ID'] == team_id)), 'GAME_DATE'].min():
                team_elo = starting_elo[team_id]
                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id) & 
                                  (league_games['TEAM_ID'] == team_id)), 'elo'] = team_elo
            
            # Otherwise, use the ELO score from the previous game
            else:
                team_elo = elo_scores[team_id]
                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id) 
                                  & (league_games['TEAM_ID'] == team_id)), 'elo'] = team_elo
            
            # Find the ID of the opponent team
            opponent_id = league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id) 
                                            & (league_games['TEAM_ID'] != team_id)), 'TEAM_ID'].iloc[0]
            
            # If this is the first game of the season for the opponent team, use their starting ELO score
            if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == opponent_id) & 
                                 (league_games['GAME_ID'] == game_id)), 'GAME_DATE'].iloc[0] ==  \
                                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == opponent_id)), 
                                'GAME_DATE'].min():
                opponent_elo = starting_elo[opponent_id]
                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id) & 
                                  (league_games['TEAM_ID'] == opponent_id)), 'elo'] = opponent_elo

            # Otherwise, use the ELO score from the previous game
            else:
                opponent_elo = elo_scores[opponent_id]
                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id) & 
                                  (league_games['TEAM_ID'] == opponent_id)), 'elo'] = opponent_elo

            # If team has home court advantage, add 100 to their elo score

            if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == team_id) & 
                                 (league_games['GAME_ID'] == game_id)), 'home'].iloc[0] == 1: 
                team_elo_home = team_elo + 100 
            
            # Otherwise, add 100 to opponent elo score
            else:
                opponent_elo_home = opponent_elo + 100

            # Get number of points scored by each team in game_id
            team_points = league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == team_id) 
                                            & (league_games['GAME_ID'] == game_id)), 'PTS'].iloc[0]
            opponent_points = league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] != team_id) 
                                                & (league_games['GAME_ID'] == game_id)), 'PTS'].iloc[0]

            # Calculate the expected win probability for each team, taking into account whether they are home or away

            # If team won
            if league_games.loc[(league_games['SEASON_ID'] == season_id) & (league_games['GAME_ID'] == game_id) & 
                                (league_games['TEAM_ID'] == team_id), 'win'].iloc[0] == True:
                
                # If team won and is home
                if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == team_id) & 
                                     (league_games['GAME_ID'] == game_id)), 'home'].iloc[0] == 1: 
                    team_win_prob =  1 / (1 + 10**((opponent_elo - team_elo_home) / 400))
                    mov_multiplier = ((team_points-opponent_points)**(0.8))/(7.5 + (.006 * (team_elo_home - opponent_elo)))
                    elo_scores[team_id] = team_elo + (k_factor * (1 - team_win_prob) * mov_multiplier)
                    elo_scores[opponent_id] = opponent_elo - (k_factor * (1 - team_win_prob) * mov_multiplier)
                # If team won and is away 
                else:
                    team_win_prob =  1 / (1 + 10**((opponent_elo_home - team_elo) / 400))
                    mov_multiplier = ((team_points-opponent_points)**(0.8))/(7.5 + (.006 * (team_elo - opponent_elo_home)))
                    elo_scores[team_id] = team_elo + (k_factor * (1 - team_win_prob) * mov_multiplier)
                    elo_scores[opponent_id] = opponent_elo - (k_factor * (1 - team_win_prob) * mov_multiplier)

            # If opponent won
            else: 

                # If opponent won and is home
                if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == opponent_id) & 
                                     (league_games['GAME_ID'] == game_id)), 'home'].iloc[0] == 1: 
                    opponent_win_prob = 1 / (1 + 10**((team_elo - opponent_elo_home) / 400))
                    mov_multiplier = ((opponent_points-team_points)**(0.8))/(7.5 + (.006 * (opponent_elo_home - team_elo)))
                    elo_scores[opponent_id] = opponent_elo + (k_factor * (1 - opponent_win_prob) * mov_multiplier)
                    elo_scores[team_id] = team_elo - (k_factor * (1 - opponent_win_prob) * mov_multiplier)
                # If opponent won and is away
                else: 
                    opponent_win_prob = 1 / (1 + 10**((team_elo_home - opponent_elo) / 400))
                    mov_multiplier = ((opponent_points-team_points)**(0.8))/(7.5 + (.006 * (opponent_elo - team_elo_home)))
                    elo_scores[opponent_id] = opponent_elo + (k_factor * (1 - opponent_win_prob) * mov_multiplier)
                    elo_scores[team_id] = team_elo - (k_factor * (1 - opponent_win_prob) * mov_multiplier)
            
            # If this is the last game of the season for the team, set previous_season_elo = team_elo
            if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == team_id) & 
                                 (league_games['GAME_ID'] == game_id)), 'GAME_DATE'].iloc[0] == \
                                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == team_id)), 
                                'GAME_DATE'].max():
                previous_season_elo[team_id] = elo_scores[team_id]

            # If this is the last game of the season for the opponent, set previous_season_elo = opponent_elo           
            if league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == opponent_id) & 
                                 (league_games['GAME_ID'] == game_id)), 'GAME_DATE'].iloc[0] == \
                                league_games.loc[((league_games['SEASON_ID'] == season_id) & (league_games['TEAM_ID'] == opponent_id)), 
                                'GAME_DATE'].max():
                previous_season_elo[opponent_id] = elo_scores[opponent_id]
