<a href="https://colab.research.google.com/github/mattscocchia/NHL-Player-Ratings/blob/main/Player_Ratings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introducing the [ScotiaStat](https://linktr.ee/scotiastat) Player Rating system.
These ratings are intended to be a catch-all metric to value a player's performance both defensively, and offensively. They are far from perfect, but they are an attempt at taking (free) publicly sourced data and utilizing it to build a player performance evaluation metric. They are inspired by the "all-in-one player value stat" (Net Rating) that Dom Luszczyszyn at The Athletic has made. I have tried to design this to be easy to reproduce and/or alter.

The player and team statistics are from [MoneyPuck](https://moneypuck.com/data.htm) and the Woodmoney data is from [PuckIQ](https://puckiq.com/woodmoney)


---


I will be adding more in depth comments when I have the time to. I will also add new code when I have new changes or additions. Unless my commitments change, I will be tirelessly working on this behind the scenes. If you have any comments/suggestions/feedback or want to be a part of this somehow, please let me know. I am open to hearing how you think this can be improved. You can message me on any method you prefer [X](https://x.com/ScotiaStat), [Instagram](https://www.instagram.com/scotiastathockey), [Email](matthewscocchia@gmail.com).


---


If I have time in the future, I will try to add code for more open source hockey related advanced statistics/ratings/models, but for now this is all I have to share. Also, please feel free to copy this and make it your own. It is meant to be used as a starting point.

Enjoy

If you want to use your own data from MoneyPuck, you can source it using the below code. For team stats, replace file_type='skaters' with file_type='teams'. You can also replace gametype='regular' with gametype='playoffs' for playoff stats.


```
!pip install nhldata

from nhldata import moneypuck

# Make connectors for the file endpoints
moneypuck = moneypuck.Connector()

# Pull the file(s) and return the raw data
moneypuck.season_stats(file_type='skaters',seasons=[2023,2024],gametype='regular')
```



### How The Ratings Work

The rating magnitudes are designed to be the total amount of goals better than average a forward or defenceman is. This value is an accumulation over the amount of games the player has played during the season, but can also be measured as a per game value. For example, in Connor McDavid's 2022-2023 season, he had an Overall Rating of 45.51 and played 82 games. This can be understood as McDavid being 45.51 goals better than average over the course of the 82 games he played. His per game Overall Rating would be 45.51/82 = 0.555. This can be interpreted as McDavid being 0.555 goals per game better than the average forward. If we break that down into offense and defense, he is 0.529 goals better on offense and 0.026 goals better on defence. Or the team is expected to score 0.529 more goals and allow 0.026 less goals with McDavid on the ice.

### Limitations/Known Issues/Possible Issues



*   One of the limitations of the MoneyPuck dataset, is that a players team, is only the team they finished the season on, even though their statistics will be from the whole season. This will affect players moving between good and bad teams midseason. ie can't distinguish between how many goals against came from the first vs second team they played for.
*   Faceoffs are weighted too heavily in the defense rating system, but centers don't get enough recognition otherwise.
*   Overall Ratings are way too low for players who haven't played a lot of games. Threshold seems to be aorund 10 games.
*   Team/Line results might be impacting team stats too heavily.





### Possible Improvements/Next to Test



*   On the fly shift starts/shift start % seems to be an indicator of poor defensive play. Better players start less shifts on the fly. Just an observation at this point, hasn't been tested.



### Results

You can view the full list separated by season on [this spreadhseet](https://docs.google.com/spreadsheets/d/1ztc2N8VahNJjAKbbMSWRbSOxiCE4Euj_J_lyWuHMQhA/edit?usp=sharing).

I have tested it against the Hart and Norris Trophy winners and it has been successful, but only more recently. 65% success rate overall.

|index|name|season|ford|games\_played|team|off\_rating|def\_rating|off\_percentile|def\_percentile|ovr\_rating|ovr\_ptile|Won Hart or Norris?|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|2108|Erik Karlsson|2014|D|82|OTT|22\.58|1\.89|1\.0|0\.77|24\.48|1\.0|Yes|
|2576|John Tavares|2014|F|82|NYI|25\.71|0\.83|1\.0|0\.78|26\.54|1\.0|No|
|2109|Erik Karlsson|2015|D|82|OTT|25\.41|1\.97|1\.0|0\.85|27\.38|1\.0|No|
|1841|Patrick Kane|2015|F|82|CHI|27\.61|-0\.35|1\.0|0\.29|27\.26|1\.0|Yes|
|566|Brent Burns|2016|D|81|S\.J|23\.96|1\.95|1\.0|0\.81|25\.91|1\.0|Yes|
|6858|Connor McDavid|2016|F|81|EDM|28\.67|-0\.46|1\.0|0\.23|28\.21|1\.0|Yes|
|3693|John Klingberg|2017|D|81|DAL|19\.06|5\.6|0\.99|0\.99|24\.66|1\.0|No|
|6859|Connor McDavid|2017|F|82|EDM|31\.3|-0\.22|1\.0|0\.29|31\.08|1\.0|No|
|568|Brent Burns|2018|D|82|S\.J|24\.28|2\.22|1\.0|0\.86|26\.5|1\.0|No|
|4261|Nikita Kucherov|2018|F|82|T\.B|33\.36|-0\.52|1\.0|0\.19|32\.84|1\.0|Yes|
|2166|John Carlson|2019|D|69|WSH|22\.32|1\.63|1\.0|0\.82|23\.95|1\.0|No|
|6225|Leon Draisaitl|2019|F|71|EDM|28\.64|1\.16|1\.0|0\.85|29\.8|1\.0|Yes|
|7618|Adam Fox|2020|D|55|NYR|12\.1|5\.81|0\.99|0\.99|17\.91|1\.0|Yes|
|6862|Connor McDavid|2020|F|56|EDM|29\.07|0\.05|1\.0|0\.56|29\.12|1\.0|Yes|
|8462|Cale Makar|2021|D|77|COL|25\.24|5\.13|0\.99|0\.98|30\.37|1\.0|Yes|
|6863|Connor McDavid|2021|F|80|EDM|35\.78|1\.64|1\.0|0\.89|37\.42|1\.0|No|
|2116|Erik Karlsson|2022|D|82|SJS|29\.66|1\.74|1\.0|0\.76|31\.4|1\.0|Yes|
|6864|Connor McDavid|2022|F|82|EDM|43\.34|2\.17|1\.0|0\.89|45\.51|1\.0|Yes|
|8772|Quinn Hughes|2023|D|82|VAN|27\.09|2\.67|1\.0|0\.86|29\.76|1\.0|Yes|
|5820|Nathan MacKinnon|2023|F|82|COL|40\.0|-0\.32|1\.0|0\.27|39\.68|1\.0|Yes|
|8465|Cale Makar|2024|D|55|COL|18\.9|4\.6|1\.0|0\.98|23\.5|1\.0|TBD|
|6230|Leon Draisaitl|2024|F|54|EDM|21\.49|2\.49|1\.0|0\.95|23\.98|1\.0|TBD|


---

Similarly, I have tested it against the Selke Trophy winners with a 60% success rate.

|index|name|season|ford|games\_played|team|off\_rating|def\_rating|off\_percentile|def\_percentile|ovr\_rating|ovr\_ptile|Won Selke?
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|619|Patrice Bergeron|2014|F|81|BOS|15\.03|5\.73|0\.92|1\.0|20\.76|0\.99|Yes|
|579|Ryan Kesler|2015|F|79|ANA|13\.44|5\.81|0\.89|1\.0|19\.25|0\.97|No|
|621|Patrice Bergeron|2016|F|79|BOS|15\.56|7\.78|0\.93|1\.0|23\.34|0\.99|Yes|
|258|Mikko Koivu|2017|F|82|MIN|11\.02|6\.15|0\.81|1\.0|17\.17|0\.91|No|
|2534|Ryan O'Reilly|2018|F|82|STL|19\.77|6\.72|0\.95|1\.0|26\.49|0\.98|Yes|
|2535|Ryan O'Reilly|2019|F|71|STL|14\.11|5\.27|0\.93|1\.0|19\.38|0\.98|No|
|703|Joe Pavelski|2020|F|56|DAL|11\.85|6\.72|0\.96|1\.0|18\.58|0\.99|No|
|626|Patrice Bergeron|2021|F|73|BOS|16\.13|8\.35|0\.91|1\.0|24\.48|0\.97|Yes|
|627|Patrice Bergeron|2022|F|78|BOS|16\.9|9\.28|0\.88|1\.0|26\.17|0\.97|Yes|
|5831|Aleksander Barkov|2023|F|73|FLA|20\.95|6\.97|0\.96|1\.0|27\.92|0\.98|Yes|
|5017|Jordan Martinook|2024|F|54|CAR|3\.75|6\.01|0\.69|1\.0|9\.76|0\.87|TBD|

### Possible FAQs

This will get updated as more questions are asked.

How do you know that these are the right weightings?
* I don't. They are based loosely on the model that  Dom Luszczyszyn at The Athletic has made, but they include some different stats in addition to some different weights. Measuring results against winners of major individual NHL awards seemed like a logical starting point.

These ratings suck.
* Tell me how to fix them then

# Code Section

## Initialize

In [None]:
#!pip install requests

import requests
import numpy as np
import pandas as pd
from scipy.stats import rankdata

## Load the Woodmoney data

The Woodmoney data provides the minutes players play against each type of competition (Elite, Middle, Gritensity).

In [None]:

seasons=["20142015","20152016","20162017","20172018","20182019","20192020","20202021","20212022","20222023","20232024","20242025"]
teams=["ANA","ARI","BOS","BUF","CGY","CAR","CHI","COL","CBJ","DAL","DET","EDM","FLA","LAK","MIN","MTL","NSH","NJD","NYI","NYR","OTT","PHI","PIT","SJS","SEA","STL","TBL","TOR","UTA","VAN","VGK","WSH","WPG"]

def get_woodmoney_data(payload):
    url = "https://api.puckiq.com/woodmoney"  # Replace with the correct base URL
    try:
        response = requests.post(url, json=payload)
        response.raise_for_status()  # Raise an exception for HTTP errors
        try:
            # Force JSON parsing even if Content-Type isn't set properly
            data = response.json()
        except ValueError:
            # If Content-Type is not application/json, handle raw response text
            data = json.loads(response.text)
        return data
    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")
        return None

# Example usage
combined_df2 = pd.DataFrame()
if __name__ == "__main__":
  for season in seasons:
    for team in teams:
        payload = {
            "season": season,
            "team": team
        }
        data = get_woodmoney_data(payload)

        if data:
            # Extract the results
            results = data.get("results", [])

            # Create a DataFrame from the results
            df = pd.DataFrame(results)

            # Print the DataFrame
            combined_df2 = pd.concat([combined_df2, df], ignore_index=False)

# 2. Concatenate the datasets and clean the data
woodmoney = combined_df2.copy()
woodmoney.rename(columns={'player_id': 'playerId'}, inplace=True)
woodmoney['season'] = woodmoney['season'].astype(str).str[:4]
woodmoney = woodmoney.groupby(['playerId', 'season', 'name', 'woodmoneytier']).sum().reset_index()

pivoted_woodmoney = woodmoney.pivot(index=['playerId', 'season', 'name'], columns='woodmoneytier', values=['ctoipct','evtoi'])

pivoted_woodmoney.columns = [
    f"ctoipct_{tier}" if col == "ctoipct" else tier
    for col, tier in pivoted_woodmoney.columns
]
pivoted_woodmoney.reset_index(inplace=True)

pivoted_woodmoney['ctoipct_All'] = 100
pivoted_woodmoney['ctoipct_Elite'] = pivoted_woodmoney['Elite']/pivoted_woodmoney['All']
pivoted_woodmoney['ctoipct_Middle'] = pivoted_woodmoney['Middle']/pivoted_woodmoney['All']
pivoted_woodmoney['ctoipct_Gritensity'] = pivoted_woodmoney['Gritensity']/pivoted_woodmoney['All']


# Define weight mapping for each woodmoneytier
tier_weights = {'Elite': 0.9, 'Middle': 0.5, 'Gritensity': 0.2, 'All': 1}

# 1. Apply weights to each column directly
pivoted_woodmoney['Elite_weighted'] = pivoted_woodmoney['Elite'] * tier_weights['Elite']
pivoted_woodmoney['Middle_weighted'] = pivoted_woodmoney['Middle'] * tier_weights['Middle']
pivoted_woodmoney['Gritensity_weighted'] = pivoted_woodmoney['Gritensity'] * tier_weights['Gritensity']
# For 'All', keep it unweighted (raw TOI)
pivoted_woodmoney['All_weighted'] = pivoted_woodmoney['All']

# 2. Calculate total weighted TOI across tiers (excluding 'All')
pivoted_woodmoney['total_weighted_toi'] = (
    pivoted_woodmoney['Elite_weighted'] +
    pivoted_woodmoney['Middle_weighted'] +
    pivoted_woodmoney['Gritensity_weighted']
)

# 3. Calculate the final opponent strength as a weighted average
pivoted_woodmoney['minute_strength'] = (
    pivoted_woodmoney['total_weighted_toi'] /
    1.6
)

# 4. Keep only relevant columns for the summary
opponent_strength_summary = pivoted_woodmoney[['playerId', 'name', 'season', 'minute_strength']]

opponent_strength_summary['season'] = pd.Categorical(opponent_strength_summary['season'])
opponent_strength_summary['season'] = opponent_strength_summary['season'].astype(int)

## Load the player and team data

In [None]:
# Load the data.
df = pd.read_csv('all_players.csv')

# Make all team names uniform.
df[df['team'] == "L.A"] = "LAK"
df[df['team'] == "N.J"] = "NJD"
df[df['team'] == "S.J"] = "SJS"
df[df['team'] == "T.B"] = "TBL"
# Optional to change Arizona to Utah
#df[df['team'] == "ARI"] = "UTA"

# Implement a cut year as Woodmoney data only goes back to 2014.
cut_year = 2014
data = df[df['season'] >= cut_year].copy()

# Drop rows with missing values.
data = data.dropna()

# Establish a required games played cutoff if desired.
games_played_req = 1

# Merge the Woodmoney dataset with the player dataset, while also filtering the player dataset to only include 5on5 stats and the required games played.
new_data = pd.merge(data[(data['situation'] == '5on5') & (data['games_played'] > games_played_req)], opponent_strength_summary[['playerId', 'season', 'minute_strength']], on=['playerId','season'], how='left')

# Load the team statistics and merge it to the player dataset.
teams = pd.read_csv("all_teams.csv")

# Make all team names uniform.
teams[teams['team'] == "L.A"] = "LAK"
teams[teams['team'] == "N.J"] = "NJD"
teams[teams['team'] == "S.J"] = "SJS"
teams[teams['team'] == "T.B"] = "TBL"
# Optional to change Arizona to Utah
#teams[teams['team'] == "ARI"] = "UTA"

new_data['goalsAgainst'] = pd.merge(new_data, teams[teams['situation'] == '5on5'][['team','goalsAgainst','season']], on=['team','season'], how='left')['goalsAgainst']
new_data['goalsFor'] = pd.merge(new_data, teams[teams['situation'] == '5on5'][['team','goalsFor','season']], on=['team','season'], how='left')['goalsFor']
new_data['xGoalsAgainst'] = pd.merge(new_data, teams[teams['situation'] == '5on5'][['team','xGoalsAgainst','season']], on=['team','season'], how='left')['xGoalsAgainst']
new_data['xGoalsFor'] = pd.merge(new_data, teams[teams['situation'] == '5on5'][['team','xGoalsFor','season']], on=['team','season'], how='left')['xGoalsFor']
new_data['team_icetime'] = pd.merge(new_data, teams[teams['situation'] == '5on5'][['team','iceTime','season']], on=['team','season'], how='left')['iceTime']
new_data['pk_team_icetime'] = pd.merge(new_data, teams[teams['situation'] == '5on4'][['team','iceTime','season']], on=['team','season'], how='left')['iceTime']
new_data['pp_team_icetime'] = pd.merge(new_data, teams[teams['situation'] == '4on5'][['team','iceTime','season']], on=['team','season'], how='left')['iceTime']
new_data['pp_goalsAgainst'] = pd.merge(new_data, teams[teams['situation'] == '5on4'][['team','goalsAgainst','season']], on=['team','season'], how='left')['goalsAgainst_y']
new_data['pp_goalsFor'] = pd.merge(new_data, teams[teams['situation'] == '5on4'][['team','goalsFor','season']], on=['team','season'], how='left')['goalsFor_y']
new_data['pp_xgoalsAgainst'] = pd.merge(new_data, teams[teams['situation'] == '5on4'][['team','xGoalsAgainst','season']], on=['team','season'], how='left')['xGoalsAgainst_y']
new_data['pp_xgoalsFor'] = pd.merge(new_data, teams[teams['situation'] == '5on4'][['team','xGoalsFor','season']], on=['team','season'], how='left')['xGoalsFor_y']
new_data['pk_xgoalsAgainst'] = pd.merge(new_data, teams[teams['situation'] == '4on5'][['team','xGoalsAgainst','season']], on=['team','season'], how='left')['xGoalsAgainst_y']
new_data['pk_xgoalsFor'] = pd.merge(new_data, teams[teams['situation'] == '4on5'][['team','xGoalsFor','season']], on=['team','season'], how='left')['xGoalsFor_y']
new_data['pk_goalsAgainst'] = pd.merge(new_data, teams[teams['situation'] == '4on5'][['team','goalsAgainst','season']], on=['team','season'], how='left')['goalsAgainst_y']
new_data['pk_goalsFor'] = pd.merge(new_data, teams[teams['situation'] == '4on5'][['team','goalsFor','season']], on=['team','season'], how='left')['goalsFor_y']
new_data['team_games'] = pd.merge(new_data, teams[teams['situation'] == '4on5'][['team','games_played','season']], on=['team','season'], how='left')['games_played_y']
new_data['xGoalsp'] = pd.merge(new_data, teams[teams['situation'] == '5on5'][['team','xGoalsPercentage','season']], on=['team','season'], how='left')['xGoalsPercentage']

# Load the players statistics from other situations for later use.
icetime = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['icetime']).reset_index()['icetime']
timeOnBench = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['timeOnBench']).reset_index()['timeOnBench']
oigf = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['OnIce_F_goals']).reset_index()['OnIce_F_goals']
oiga = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['OnIce_A_goals']).reset_index()['OnIce_A_goals']
oixgf = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['OnIce_F_xGoals']).reset_index()['OnIce_F_xGoals']
oixga = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['OnIce_A_xGoals']).reset_index()['OnIce_A_xGoals']
penaltiesT = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['penalties']).reset_index()['penalties']
penaltiesD = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['penaltiesDrawn']).reset_index()['penaltiesDrawn']
takeaways = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_takeaways']).reset_index()['I_F_takeaways']
giveaways = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_giveaways']).reset_index()['I_F_giveaways']
points = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_points']).reset_index()['I_F_points']
I_F_primaryAssists = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_primaryAssists']).reset_index()['I_F_primaryAssists']
I_F_secondaryAssists = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_secondaryAssists']).reset_index()['I_F_secondaryAssists']
I_F_goals = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_goals']).reset_index()['I_F_goals']
I_F_xGoals = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['I_F_xGoals']).reset_index()['I_F_xGoals']
shotsBlockedByPlayer = data[data['games_played']>games_played_req].pivot(index=['playerId', 'season', 'name'], columns='situation', values=['shotsBlockedByPlayer']).reset_index()['shotsBlockedByPlayer']

# Add the below statistics to the player data.
new_data['pk_time'] = icetime['4on5']
new_data['pk_time_bench'] = icetime['4on5']
new_data['pp_time'] = icetime['5on4']
new_data['pk_gf'] = oigf['4on5']
new_data['pp_gf'] = oigf['5on4']
new_data['pk_ga'] = oiga['4on5']
new_data['pp_ga'] = oiga['5on4']
new_data['pk_xgf'] = oixgf['4on5']
new_data['pp_xgf'] = oixgf['5on4']
new_data['pk_xga'] = oixga['4on5']
new_data['pp_xga'] = oixga['5on4']

# Overwrite the player statistics below using the totals from all situations.
new_data['penalties'] = penaltiesT['all']
new_data['penaltiesDrawn'] = penaltiesD['all']
new_data['I_F_takeaways'] = takeaways['all']
new_data['I_F_giveaways'] = giveaways['all']
new_data['pp_points'] = points['5on4']
new_data['I_F_primaryAssists'] = I_F_primaryAssists['all']
new_data['I_F_secondaryAssists'] = I_F_secondaryAssists['all']
new_data['I_F_goals'] = I_F_goals['all']
new_data['I_F_xGoals'] = I_F_xGoals['all']
new_data['shotsBlockedByPlayer'] = shotsBlockedByPlayer['all']

# Make a temporary dataframe to filter forwards and defenceman.
combined_df = new_data.copy()

# Step 1: Assign values to 'ford' based on 'position'
def determine_ford(position):
    if "D" in position:
        return "D"
    else:
        return "F"

combined_df["ford"] = combined_df["position"].apply(determine_ford)

combined_df['minute_strength_percentile'] = (
    combined_df.groupby(["situation", "season", "ford"])['minute_strength']
    .rank(pct=True)
)

# Split back the percentiles into the original dataframes
new_data['minute_strength_percentile'] = combined_df.loc[:, 'minute_strength_percentile'].values

## Evaluation

In [None]:
### DEFENSE CALCULATION

# Component 1 is blocked shots.
new_data['Component_1'] = 0.02*new_data['shotsBlockedByPlayer']

# Component 2 is faceoffs.
new_data['Component_2'] = 0.01 * (new_data['faceoffsWon'] - new_data['faceoffsLost'])

# Component 3 is penalties taken.
new_data['Component_3'] = 0.06 * ( - new_data['penalties'])

# Component 4 is xGoals against compared to the season average.
xga_d = -2*((((new_data['OnIce_A_xGoals'] / new_data['icetime']) -
     (new_data.groupby("season")["xGoalsAgainst"].transform("mean") /
      new_data.groupby("season")["team_icetime"].transform("mean"))))*60*60)
new_data['Component_4'] = np.where(
    new_data['ford'] == 'F',
    1.75 * xga_d,
    2.3 * xga_d
)

# Component 5 is goals against compared to the season average.
ga_d = -2*((((new_data['OnIce_A_goals'] / new_data['icetime']) -
     (new_data.groupby("season")["goalsAgainst"].transform("mean") /
      new_data.groupby("season")["team_icetime"].transform("mean"))))*60*60)
new_data['Component_5'] = np.where(
    new_data['ford'] == 'F',
    0.4375 * ga_d,
    0.575 * ga_d
)

# 5on5 d rating is the measure of goals against and xGoals against both compared to the season average.
base_rating = 2*(new_data['Component_4'] + new_data['Component_5'])
new_data['5on5_drating'] = np.where(
    base_rating < 0,
    base_rating * (1 - new_data['minute_strength_percentile']),
    base_rating * new_data['minute_strength_percentile']
)

# pk rating is the sum of pk goals against and pk xGoals against both compared to the season average.
pkt = new_data['pk_time'] / 60 / 60
pk_goals = -(0.1*(( new_data['pk_ga']/new_data['pk_time'] - (new_data.groupby("season")["pk_goalsAgainst"].transform("mean")/new_data.groupby("season")["pk_team_icetime"].transform("mean")))*60*60).fillna(0) + 0.1*((new_data['pk_xga']/new_data['pk_time'] - (new_data.groupby("season")["pk_xgoalsAgainst"].transform("mean")/new_data.groupby("season")["pk_team_icetime"].transform("mean")))*60*60).fillna(0))
new_data['pk_rating'] = np.where(
    pkt > 0.3,
    pk_goals*pkt,
    0
)

# Component 6 is takeaways.
new_data['Component_6'] = 0.01 * new_data['I_F_takeaways']


# Turn the 5on5 d rating into a percentile.
new_data["5on5_drating_pct"] = (
    new_data.groupby(["season"])["5on5_drating"]
    .rank(pct=True)
)

# Calculate the defence rating stat.
def_rating = (
    new_data['Component_1'] +
    new_data['Component_2'] +
    new_data['Component_3'] +
    new_data['5on5_drating'] +
    new_data['pk_rating'] +
    new_data['Component_6']
) * (new_data['games_played']/new_data['team_games'])

# Adjust defence rating by position to make the average be 0 for each.
new_data['def_rating'] = np.where(
    new_data['ford'] == 'F',
    def_rating-0.05,
    def_rating-0.35
)

# Turn the defence rating into a percentile.
new_data["def_percentile"] = (
    new_data.groupby(["season","ford"])["def_rating"]
    .rank(pct=True)
)

# Display the defence rating by it's components.
components_table = new_data[['name','season', 'games_played', 'team', 'ford','minute_strength_percentile', 'Component_1', 'Component_2', 'Component_3', 'Component_4',
                             'Component_5', 'Component_6','pk_rating','5on5_drating', '5on5_drating_pct', 'def_rating', 'def_percentile']]

# Display the components table
components_table

In [None]:
### OFFENSE CALCULATION

# Component 1 is individual goals scored.
new_data['Component_1'] = np.where(
    new_data['ford'] == 'F',
    0.3 * new_data['I_F_goals'],
    0.4 * new_data['I_F_goals']
)

# Component 2 is primary assists.
new_data['Component_2'] = 0.4 * new_data['I_F_primaryAssists']
# Component 3 is secondary assists.
new_data['Component_3'] = 0.25 * new_data['I_F_secondaryAssists']

# Component 6 is individual expected goals for.
new_data['Component_6'] = np.where(
    new_data['ford'] == 'F',
    0.25 * new_data['I_F_xGoals'],
    0.3 * new_data['I_F_xGoals']
)

# Component 7 is penalties drawn.
new_data['Component_7'] = 0.06 * (new_data['penaltiesDrawn'])

# Component 8 is giveaways.
new_data['Component_8'] = -0.01 * new_data['I_F_giveaways']

# Component 4 is 5on5 xGoals for compared to the season average.
xgf_o = 2*((((new_data['OnIce_F_xGoals'] / new_data['icetime']) -
     (new_data.groupby("season")["xGoalsFor"].transform("mean") /
      new_data.groupby("season")["team_icetime"].transform("mean"))))*60*60)
new_data['Component_4'] = np.where(
    new_data['ford'] == 'F',
    0.625 * xgf_o,
    1.7 * xgf_o
)

# Component 5 is 5on5 goals for compared to the season average.
gf_o = 2*((((new_data['OnIce_F_goals'] / new_data['icetime']) -
     (new_data.groupby("season")["goalsFor"].transform("mean") /
      new_data.groupby("season")["team_icetime"].transform("mean"))))*60*60)
new_data['Component_5'] = np.where(
    new_data['ford'] == 'F',
    0.625 * gf_o,
    0.425 * gf_o
)

# 5on5 o rating is the measure of 5on5 goals for and 5on5 xGoals for both compared to the season average.
base_rating = (new_data['Component_4'] + new_data['Component_5'])
new_data['5on5_orating'] = np.where(
    base_rating < 0,
    base_rating * (1 - new_data['minute_strength_percentile']),
    base_rating * new_data['minute_strength_percentile']
)

# pp rating is the sum of pp goals for and pp xGoals for both compared to the season average.
ppt = new_data['pp_time'] / 60 / 60
pp_goals = -(0.1*((new_data['pp_gf']/new_data['pp_time'] - (new_data.groupby("season")["pp_goalsFor"].transform("mean")/new_data.groupby("season")["pp_team_icetime"].transform("mean")))*60*60).fillna(0) + 0.1*(((-new_data['pp_xgf']/new_data['pp_time'] - (new_data.groupby("season")["pp_xgoalsFor"].transform("mean")/new_data.groupby("season")["pp_team_icetime"].transform("mean"))))*60*60).fillna(0))
new_data['pp_rating'] = np.where(
    ppt > 0.3,
    pp_goals*ppt,
    0
)

# Turn the 5on5 o rating into a percentile.
new_data["5on5_orating_pct"] = (
    new_data.groupby(["season"])["5on5_orating"]
    .rank(pct=True)
)

# Calculate the offence rating stat.
off_rating = (
    new_data['Component_1'] +
    new_data['Component_2'] +
    new_data['Component_3'] +
    new_data['Component_8'] +
    new_data['5on5_orating'] +
    new_data['pp_rating'] +
    new_data['Component_6'] +
    new_data['Component_7']
) * (new_data['games_played']/new_data['team_games']) /1.5

# Adjust offence rating by position to make the average be 0 for each.
new_data['off_rating'] = np.where(
    new_data['ford'] == 'F',
    off_rating-4.4,
    off_rating-1.85
)

# Turn the offence rating into a percentile.
new_data["off_percentile"] = (
    new_data.groupby(["season","ford"])["off_rating"]
    .rank(pct=True)
)

# Display the offence rating by it's components.
components_table = new_data[['name','season', 'games_played', 'team', 'ford','minute_strength_percentile', 'Component_1', 'Component_2', 'Component_3', 'Component_4',
                             'Component_5', 'Component_6', 'Component_7','Component_8','pp_rating','5on5_orating', '5on5_orating_pct', 'off_rating', 'off_percentile']]

# Display the components table
components_table

In [None]:
### OVERALL CALCULATION

# Calculate the overall rating.
new_data['ovr_rating'] = new_data['off_rating'] + new_data['def_rating']

# Turn the overall rating into a percentile.
new_data["ovr_percentile"] = (
    new_data.groupby(["season","ford"])["ovr_rating"]
    .rank(pct=True)
)

# Make a new table with the overall rating by it's components.
components_table = new_data[['name','season','team', 'ford', 'games_played', 'off_rating', 'def_rating', 'off_percentile', 'def_percentile', 'ovr_rating', 'ovr_percentile']]

# Save the components table as a csv file.
components_table.to_csv('player_ratings.csv', index=False)

In [None]:
### PRORATED OVERALL CALCULATION

# Calculate the overall rating.
new_data['ovr_rating'] = new_data['off_rating'] + new_data['def_rating']

# Turn the overall rating into a percentile.
new_data["ovr_percentile"] = (
    new_data.groupby(["season","ford"])["ovr_rating"]
    .rank(pct=True)
)

# Calculate the overall rating on a per game basis.
new_data['ovr_pg'] = new_data['ovr_rating']/new_data['games_played']

# Calculate the overall rating prorated to 82 games.
new_data['ovr_rating_pr'] = new_data['ovr_pg']*82

# Calculate the defensive rating on a per game basis.
new_data['def_pg'] = new_data['def_rating']/new_data['games_played']

# Calculate the defensive rating prorated to 82 games.
new_data['def_rating_pr'] = new_data['def_pg']*82

# Calculate the offensive rating on a per game basis.
new_data['off_pg'] = new_data['off_rating']/new_data['games_played']

# Calculate the offensive rating prorated to 82 games.
new_data['off_rating_pr'] = new_data['off_pg']*82

# Turn the prorated overall rating into a percentile.
new_data["ovr_percentile_pr"] = (
    new_data.groupby(["season","ford"])["ovr_rating_pr"]
    .rank(pct=True)
)

# Make a new table with the overall rating by it's components.
components_table = new_data[['name','season', 'ford', 'games_played', 'team', 'off_rating_pr', 'def_rating_pr', 'off_pg', 'def_pg', 'ovr_rating', 'ovr_percentile','ovr_pg', 'ovr_rating_pr', 'ovr_percentile_pr']]

# Display the prorated ratings.
components_table

In [None]:
# Average on-ice goals for per game by position per season prorated to 82 games
#new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['totalGoalsForpr'].mean()/new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['games_played'].mean()*82

# Average on-ice goals for per game by position per season
new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['totalGoalsForpr'].mean()/new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['games_played'].mean()

In [None]:
# Average on-ice goals against per game by position per season prorated to 82 games
#new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['totalGoalsAgainstpr'].mean()/new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['games_played'].mean()*82

# Average on-ice goals against per game by position per season
new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['totalGoalsAgainstpr'].mean()/new_data[new_data['games_played'] > 0].groupby(["season",'ford'])['games_played'].mean()