In [1]:
import numpy as np
import pandas as pd

Implementing the Massey Method
===

The [Massey Method](https://en.wikipedia.org/wiki/Kenneth_Massey) is a way to rank groups based on a score between them. It is commonly used to rank sports teams. The general setup is a list of games, with a game containing two teams and the score between the two. The score is found by doing `team one's score - team two's score`. So if a games final score is Team A: 40, Team B: 37, then the entry will be Team A, Team B, 3. I have the data in this format already. Now, to do the Massey Method I will be following the shortcut as laid out by [Dr. Justin Wyss-Gallifent](https://www.math.umd.edu/~immortal/) in [this chapter](https://www.math.umd.edu/~immortal/MATH401/ch_team_ranking.pdf).

In [2]:
df = pd.read_csv("team_scores.csv")
df.head()

Unnamed: 0,Team 1,Team 2,Score
0,Florida,MiamiFlorida,4
1,Arizona,Hawaii,-7
2,Villanova,Colgate,20
3,YoungstownSt,Samford,23
4,UCLA,CincinnatiU,-10


Functions
---

To assist, two functions were made.

The first function below, `teams_dict()`, gives me a Python dictionary mapping the teams to an assingned number. This number later on will be their index for Matricies. The function works by concatinating all the teams in `'Team 1'` and `'Team 2'`, and then using `pd.unique()` to get all the unique team names.

The next function, `make_massey()`, will take `df` and return `M` and `q`. `M` is a matrix where M<sub>i,i</sub> equals the number of games played by team i, and M<sub>i, j</sub> will be the inverse of the number of times teams i and j have played. `q` will be a vector such that q<sub>i</sub> is team i's cumulative score over all their games. Lastly, as per the shortcut method, I will set the last row of `M` to be all 1, and the last entry in `q` to be 0.

In [3]:
# Takes a DataFrame with 2 team lists. Will get all unique
# team names, and give them a mapping.
def teams_dict(df):
    # Gets list of all unique team names
    temp = pd.concat([df['Team 1'], df['Team 2']], axis=0)
    unique_teams = pd.unique(temp)

    # Return our dict, with the names zipped to numbers
    return dict(zip(unique_teams, range(0, len(unique_teams))))


# This method will iterate over the rows of our df. For each row, 
# it will get teams i and j. It incrments Mii and Mjj, decrements Mij and Mji, 
# and updates the total score in q appropriately. Then it sets the last row
# of M to 1's, last entry of q to 0's, and returns
def make_massey (df):
    # Our dictionary of teams mapped to index.
    # This is a dictionary so I have O(1) access time for team names
    teams = teams_dict(df)
    num_teams = len(teams)

    M = np.zeros((num_teams, num_teams)  )
    q = np.zeros((num_teams, 1)  )

    for ind, x in df.iterrows():
        # Getting the index of team 1 and team 2
        i = teams[x['Team 1']]
        j = teams[x['Team 2']]

        # Updating the Mii and Mjj
        M[i, i] += 1
        M[j, j] += 1

        # Update Mij and Mji
        M[i, j] -= 1
        M[j, i] -= 1

        # Updating our scores
        q[i] += x['Score']

        # Notice this is -=. That is because the scores are in terms of team1
        # so team 2 would want += -1*score, or just subtract the score
        q[j] -= x['Score']
    
    # I need the last row of M to be all 1's, and last q is 0
    M[-1] = np.ones((1, num_teams)  )
    q[-1] = 0

    return (M, q)

Putting it All Together
---

Now that I have these functions to help, the process is simple. I can call `make_massey()`. Then, all I need to do is solve the linear equation 

> Mx = q

for `x`. This is done using `np.linalg.solve()`. Lastly, I now have the relative rankings. I want to make a `DataFrame` that has the ordered rank, team name, and Massey ranking for each team. I do this by making a temporary `DataFrame` of just the Team names and Score rankings. Then I can sort this `temp` on the Rankings to get the teams in order. Lastly, I can just add a column for the numerical rankings, and export this as a csv file.

In [4]:
# Find the massey M and q
M, q = make_massey(df)

# I can now just do a normal linear solve to get the rankings, x
x = np.linalg.solve(M,q)

In [5]:
# This gives me a way to get all of the team names
team_dict = teams_dict(df)

# The temp df, will hold team names and their massey score
temp = pd.DataFrame()
temp['Team Name'] = team_dict.keys()
temp['Score Ranking'] = x

# Now, sort this temp and make it my ranked df. I can then insert numerical ranks
ranked_df = temp.sort_values(by=['Score Ranking'], ascending=False)
ranked_df.insert(0, 'Rank', range(1, len(team_dict) + 1))

ranked_df.head()

Unnamed: 0,Rank,Team Name,Score Ranking
194,1,OhioState,60.327306
150,2,LSU,52.58465
209,3,Clemson,50.521256
201,4,Alabama,47.377479
33,5,Wisconsin,45.387632


In [6]:
# And export as a CSV as well
ranked_df.to_csv('rank.csv', index=False)