<a href="https://colab.research.google.com/github/sdac-vt/Ranking-Workshop-2021/blob/main/Ranking_Workshop_2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Rating Sports Teams with Current ACC Basketball Data

This notebook will run through a process of rating and ranking ACC Mens Basketball teams using the current game data from prior to the ACC tournament.

The blocks below import data from https://www.masseyratings.com/scores.php?s=320158&sub=10423&all=1&mode=2&format=1.

The columns are a game ID, date, team 1, a 1 for a win or -1 for a loss, the points team 1 scored, then the next three columns are the same but for team 2.

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('https://www.masseyratings.com/scores.php?s=320158&sub=10423&all=1&mode=2&format=1', header=None)
df

We will want to clean up this dataframe so that it looks nicer for us to work with.

In [None]:
df.columns =['game', 'date', 'team1', 'win1', 'points1', 'team2', 'win2', 'points2']
df

We will take a subset of the dataframe so that it only includes games from the regular season and only the columns we are interested in.

In [None]:
df = df.loc[df['date'] < 20210309]
df = df[['team1', 'points1', 'team2', 'points2']]
df

Next, we create the X matrix to hold the matchups and the p vector to hold the point differentials. Using the shape function allows us to reproduce this anytime for the number of games in the regular season.

In [None]:
df.shape

In [None]:
games = df.shape[0]
teams = 15
X = np.zeros((games, teams))
p = np.zeros((teams,1))

The next loop iterates through the data frame to populate the X matrix with matchup data and the p vector with point differentials.

In [None]:
for k in range (0, games):
    col = df['team1'][k] - 1
    col2 = df['team2'][k] - 1
    
    for1 = df['points1'][k]-df['points2'][k]
    for2 = df['points2'][k]-df['points1'][k]
    
    p[col] = p[col]+for1
    p[col2]= p[col2]+for2
    
    if (for1 > for2):
        wL = 1
        wL2 = -1
    else:
        wL = -1
        wL2 = 1
        
    X[k][col] = wL
    X[k][col2] = wL2

We can see some of our X matrix here:

In [None]:
X

Now we will create the M matrix and verify that it looks correct, with the bottom row changed to all ones.

In [None]:
M = (X.transpose())@X
M[14:] = np.ones(teams)
M

We also have to change the final entry of our p vector to match the change made in M.

In [None]:
p[14] = 0
p

Finally, we can calculate our ratings for the teams with the Mr = p equation.

In [None]:
r = np.linalg.solve(M, p)
r

Next, we'll visualize these ratings in a nicer way.

In [None]:
ratings_df = pd.DataFrame(r, columns=['Rating'],
                index = ['Boston_College', 'Clemson', 'Duke',
                         'Florida_St', 'Georgia_Tech', 'Louisville', 
                         'Miami_FL', 'NC_State', 'North_Carolina', 
                         'Notre_Dame', 'Pittsburgh', 'Syracuse',
                         'Virginia', 'Virginia_Tech', 'Wake_Forest'])

ratings_df.sort_values('Rating', inplace=True, ascending=False)
ranking = [x+1 for x in range(teams)]

ratings_df['Ranking'] = ranking

ratings_df

So, we might be able to construct an argument that the ACC tournament should have been seeded according to these rankings.

Now, we can take the next steps of adding offensive and defensive ratings.

We start by creating the diagonal T matrix, and then subtract to obtain the off-diagonal P matrix.  The new M1 matrix is created because we no longer alter the M matrix the way we did earlier.

In [None]:
M1 = (X.transpose())@X
T = (np.diag(np.diag(M1)))
T

In [None]:
P = T-M1
P

We can now calculate our f, a, and p vectors. The new p1 vector is created because we no longer alter the p vector the way we did earlier.

In [None]:
f = np.zeros((teams,1))
a = np.zeros((teams,1))
p1 = np.zeros((teams,1))

for k in range (0, games):
    col = df['team1'][k] - 1
    col2 = df['team2'][k] - 1
    
    for1 = df['points1'][k]-df['points2'][k]
    for2 = df['points2'][k]-df['points1'][k]
    
    p1[col] = p1[col]+for1
    p1[col2]= p1[col2]+for2
    
    all_for1 = df['points1'][k]
    all_against1 = df['points2'][k]
    all_for2 = df['points2'][k]
    all_against2 = df['points1'][k]
    
    f[col] = f[col]+all_for1
    f[col2]= f[col2]+all_for2
    
    a[col] = a[col]+all_against1
    a[col2] = a[col2]+all_against2

We can verify that our f, a, and p vectors all look appropriate.

In [None]:
f

In [None]:
a

Here we also check that the p vector is eaual to f - a.

In [None]:
f-a

In [None]:
p1

We now calculate the defensive rating, d, using the system of equations that we derived previously.

In [None]:
d = np.linalg.solve((T+P),((T@r)-f))
d

Then, using the existing d and r vectors, we can find the offensive rating, o.

In [None]:
o = r-d
o

Once agian, we will view the ratings in a prettier way.

In [None]:
def_ratings_df = pd.DataFrame(d, columns=['Defensive Rating'],
                index = ['Boston_College', 'Clemson', 'Duke',
                         'Florida_St', 'Georgia_Tech', 'Louisville', 
                         'Miami_FL', 'NC_State', 'North_Carolina', 
                         'Notre_Dame', 'Pittsburgh', 'Syracuse',
                         'Virginia', 'Virginia_Tech', 'Wake_Forest'])

def_ratings_df.sort_values('Defensive Rating', inplace=True, ascending=False)
ranking_d = [x+1 for x in range(teams)]

def_ratings_df['Defensive Ranking'] = ranking_d

def_ratings_df

In [None]:
off_ratings_df = pd.DataFrame(o, columns=['Offensive Rating'],
                index = ['Boston_College', 'Clemson', 'Duke',
                         'Florida_St', 'Georgia_Tech', 'Louisville', 
                         'Miami_FL', 'NC_State', 'North_Carolina', 
                         'Notre_Dame', 'Pittsburgh', 'Syracuse',
                         'Virginia', 'Virginia_Tech', 'Wake_Forest'])

off_ratings_df.sort_values('Offensive Rating', inplace=True, ascending=False)
ranking_o = [x+1 for x in range(teams)]

off_ratings_df['Offensive Ranking'] = ranking_o

off_ratings_df


Now we can take these ratings and test predicting the outcomes of the ACC Tournament games.

Using the equation p_i = o_i - d_j, we calculate the predicted scores for each team.

Here is an example for Duke vs. Boston College.

In [None]:
off_ratings_df.loc['Duke']['Offensive Rating'] - def_ratings_df.loc['Boston_College']['Defensive Rating']

In [None]:
off_ratings_df.loc['Boston_College']['Offensive Rating'] - def_ratings_df.loc['Duke']['Defensive Rating']

To make the calculation for the whole tournament easier, we can write a funciton to predict the outcomes.

In [None]:
def predict_points(team1, team2):
  team1_points = off_ratings_df.loc[team1]['Offensive Rating'] - def_ratings_df.loc[team2]['Defensive Rating']
  team2_points = off_ratings_df.loc[team2]['Offensive Rating'] - def_ratings_df.loc[team1]['Defensive Rating']

  return(print(team1, round(team1_points,0), team2, round(team2_points,0)))

In [None]:
predict_points('Duke', 'Boston_College')

Here we can make a list of all of the matchups that took place in the tournament, and test to see if our ratings would have predicted the winner and the score correctly.

In [None]:
matchups = [['Pittsburgh', 'Miami_FL'],
            ['Duke', 'Boston_College'],
            ['Notre_Dame', 'Wake_Forest'],
            ['Syracuse', 'NC_State'],
            ['Miami_FL', 'Clemson'],
            ['Duke', 'Louisville'],
            ['Notre_Dame', 'North_Carolina'],
            ['Virginia', 'Syracuse'],
            ['Georgia_Tech', 'Miami_FL'],
            ['Florida_St', 'Duke'],
            ['Virginia_Tech', 'North_Carolina'],
            ['Virginia', 'Georgia_Tech'],
            ['Florida_St', 'North_Carolina'],
            ['Georgia_Tech', 'Florida_St']]

In [None]:
for i in range(0, 14):
  predict_points(matchups[i][0], matchups[i][1])  

And just for fun, we can see what the outcome may have been in the championship most people believe should have happened.

In [None]:
predict_points('Florida_St', 'Virginia')