# Predicting Hockey Contracts - The Complete Guide

by Luke Kerwin, Eric Wu, Brian Ellis, and Griffin Jordan

In [1]:
import requests
import json
import pandas as pd
import numpy as np
from datetime import datetime
from bs4 import BeautifulSoup

# Step 1: Getting the Data

We decided as a group that we were going to take on the task of predicting NHL (hockey) contracts for players in the season 2022-2023. We chose this specific season as it is the most recent completed season. In order to reach our goal, we needed to gather data on the players and their contracts, as well as player information data, such as age, height, weight, etc. We also needed to gather data on the players' performance, such as goals, assists, points, etc. We gathered this data from the following sources:

- [CapFriendly](https://www.capfriendly.com/) - a website that tracks NHL contracts
- [Hockey Reference](https://www.hockey-reference.com/) - a website that tracks NHL player performance statistics
- [NHL.com](https://www.nhl.com/) - the official website of the NHL, which tracks player information

Below is the code we used to scrape the data from these websites. We used the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) library to scrape the data from the websites. We also used the [requests](https://docs.python-requests.org/en/master/) library to make the HTTP requests to the websites.

### CapFriendly

In [18]:
# We need to get the last 10 seasons of contracts for each player
years = range(2012, 2023)

# Iterate through each season, month by month
contracts = []
for year in years:
    for month in range(1, 13):
        print(f'Getting contract data for {year}-{month}...', end='\r')
        # Get the data from capfriendly
        url = f'https://www.capfriendly.com/signings/all/all/all/1-15/0-15000000/{month}01{year}-{month+1}31{year}'
        r = requests.get(url)
        table = pd.read_html(r.text)[0]
        # Add the data to our list
        contracts.append(table)

contracts = pd.concat(contracts)
contracts.to_csv('data/contracts.csv', index=False)
print('\nDone!')
contracts.head()
        

Getting contract data for 2022-12...
Done!


Unnamed: 0,PLAYER,PLAYER.1,AGE,POS,TEAM,DATE,TYPE,EXTENSION,STRUCTURE,LENGTH,VALUE,CAP HIT
0,Patrick Kane,Patrick Kane,34,RW,DET,"Nov. 28, 2023",Stnd (UFA),,1-way,1,"$2,750,000","$2,750,000"
1,Justin Bailey,Justin Bailey,27,RW,SJS,"Nov. 27, 2023",Stnd (UFA),,2-way,1,"$775,000","$775,000"
2,Ben Hemmerling,Ben Hemmerling,19,RW,VGK,"Nov. 26, 2023",ELC,,2-way,3,"$2,532,500","$844,167"
3,Samuel Laberge,Samuel Laberge,26,C,NJD,"Nov. 25, 2023",Stnd (UFA),,2-way,1,"$775,000","$775,000"
4,Nils Åman,Nils Åman,24,C,VAN,"Nov. 24, 2023",Stnd (RFA),,1-way,2,"$1,650,000","$825,000"


### Hockey Reference

We are going to use the 3 seasons of statistics before the player is awarded their contract to predict the contract.

In [26]:
stats_seasons = range(2009, 2023)

stats = []
for season in stats_seasons:
    print(f'Getting stats for {season}...', end='\r')
    req = requests.get(f'https://www.hockey-reference.com/leagues/NHL_{season}_skaters.html')
    table = pd.read_html(req.text)[0]
    table['season'] = season
    stats.append(table)

Getting stats for 2022...

In [29]:
stats_ = pd.concat(stats)
# Remove the multi-index
stats_.columns = stats_.columns.droplevel()
stats_

Unnamed: 0,Rk,Player,Age,Tm,Pos,GP,G,A,PTS,+/-,...,S,S%,TOI,ATOI,BLK,HIT,FOW,FOL,FO%,Unnamed: 21
0,1,Justin Abdelkader,21,DET,LW,2,0,0,0,0,...,2,0.0,19,9:18,0,3,4,3,57.1,2009
1,2,Craig Adams,31,TOT,RW,45,2,5,7,-3,...,47,4.3,391,8:41,20,67,8,13,38.1,2009
2,2,Craig Adams,31,CHI,RW,36,2,4,6,-3,...,38,5.3,314,8:43,16,53,6,10,37.5,2009
3,2,Craig Adams,31,PIT,RW,9,0,1,1,0,...,9,0.0,77,8:34,4,14,2,3,40.0,2009
4,3,Maxim Afinogenov,29,BUF,RW,48,6,14,20,-7,...,93,6.5,605,12:36,11,20,0,3,0.0,2009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1207,Rk,Player,Age,Tm,Pos,GP,G,A,PTS,+/-,...,S,S%,TOI,ATOI,BLK,HIT,FOW,FOL,FO%,2022
1208,1001,Radim Zohorna,25,PIT,F,17,2,4,6,12,...,9,22.2,176,10:20,3,23,4,8,33.3,2022
1209,1002,Artem Zub,26,OTT,D,81,6,16,22,1,...,92,6.5,1704,21:02,124,155,0,0,,2022
1210,1003,Mats Zuccarello,34,MIN,LW,70,24,55,79,21,...,159,15.1,1301,18:35,33,36,21,34,38.2,2022
