# NHL Stats Preparation
## Collecting NHL Players' Stats from EliteProspects
This step gathers player performance data across leagues and seasons for downstream NHL prediction.

- Collect league-wide player stats for a given season from
https://eliteprospects.com/league/{league}/stats/{season}?page={n}

- Extract metadata, including individual player profile links.

- Build a unique player list from seasons 2000–2025 by storing profile links in a set.

- Scrape each player’s page to retrieve their first 5 years of NHL stats, filtered by seasons where they played more than 30 games.

- Gather pre-NHL stats for each player (junior, college, international, etc.).

### Import Libraries

In [1]:
import eliteprospects_scraper_api as ep
import pandas as pd
import os
import time
import random

### Collecting NHL Players' Stats from EliteProspects


In [2]:
if not os.path.exists('./data/nhl/nhl_players_metadata.csv'):
    # Collect players from 2000-2025 season
    for i in range(0, 25):
        season = f'20{str(i).zfill(2)}-20{str(i + 1).zfill(2)}'
        print(f'Scraping {season}')
        nhl_players = ep.get_season_roster("nhl", season)
        nhl_players.to_csv(f'./data/nhl/players/nhl_players_{season}.csv', index=False, encoding='utf-8-sig')
        print(f'Finished scraping {season}')
else:
    print('NHL players metadata already exists. Skipping scraping.')

NHL players metadata already exists. Skipping scraping.


In [3]:
if not os.path.exists('./data/nhl/nhl_players_metadata.csv'):
    # Collect players' metadata from 2000-2025 season
    nhl_players_metadata = pd.DataFrame()
    for i in range(0, 25):
        season = f'20{str(i).zfill(2)}-20{str(i + 1).zfill(2)}'
        nhl_players = pd.read_csv(f'./data/nhl/players/nhl_players_{season}.csv')
        nhl_players_metadata = pd.concat([nhl_players_metadata, ep.get_players_metadata(nhl_players)]).reset_index(drop=True)
        # Remove duplicates
        nhl_players_metadata = nhl_players_metadata.drop_duplicates(subset=['playername']).reset_index(drop=True)
        print(f'Finished Merging Metadata for {season}')
else:
    print('NHL players metadata already exists. Skipping merging.')
    nhl_players_metadata = pd.read_csv('./data/nhl/nhl_players_metadata.csv')


NHL players metadata already exists. Skipping merging.


"### Collect Stats for Each Player from 2000-2025 CSV file

In [4]:
# Define output file
output_path = './data/nhl/stats/nhl_players_stats.csv'

In [5]:
def get_players_stats_by_batch(players_to_scrape):
    curr_len = len(players_to_scrape)
    fail_count = 0

    for i in range(curr_len):
        player_metadata = players_to_scrape.iloc[i]
        player_name = player_metadata['playername']
        player_url = player_metadata['link']
        print(f"\n [{i + 1}] Collecting stats for {player_name} at {player_url}")

        try:
            player_stats = ep.get_player_stats(player_metadata)

            # Write to CSV file
            if os.path.exists(output_path):
                player_stats.to_csv(output_path, mode='a', header=False, index=False, encoding='utf-8-sig')
            else:
                player_stats.to_csv(output_path, index=False, encoding='utf-8-sig')
            print(f'Successfully scraped stats for {player_name}')
            
            # Print Fail Rate
            print(f'Failed rate: {fail_count / (i + 1):.2f}')

            # Add random sleep to prevent getting blocked
            if i < curr_len - 1:
                sleep_time = random.uniform(10, 120)
                print(f"Sleep for {sleep_time / 60:.2f} minutes to prevent getting blocked")
                time.sleep(sleep_time) 
        except Exception as e:
            print(f"Failed to get stats for {player_name}: {e}")

            fail_count += 1

            if i < curr_len - 1:
                # Sleep for 15-60 seconds before trying the next player
                sleep_time = random.uniform(15, 60)
                print(f"Sleeping for {sleep_time / 60:.2f} seconds before trying the next player")
                time.sleep(sleep_time)

### Fetch Players' Stats by Batch

In [6]:
# Scrape players from 380-400
curr_batch_metadata = nhl_players_metadata[380:400]
get_players_stats_by_batch(curr_batch_metadata)


 [1] Collecting stats for Paul Ranheim at https://www.eliteprospects.com/player/31355/paul-ranheim
Collecting Regular Season + Postseason stats for Paul Ranheim at https://www.eliteprospects.com/player/31355/paul-ranheim
Error scraping Paul Ranheim: HTTPConnectionPool(host='localhost', port=64334): Read timed out. (read timeout=120)
Failed to get stats for Paul Ranheim: 'NoneType' object has no attribute 'to_csv'
Sleeping for 0.47 seconds before trying the next player

 [2] Collecting stats for René Corbet at https://www.eliteprospects.com/player/20911/rene-corbet
Collecting Regular Season + Postseason stats for René Corbet at https://www.eliteprospects.com/player/20911/rene-corbet
Successfully scraped stats for René Corbet
Failed rate: 0.50
Sleep for 0.60 minutes to prevent getting blocked

 [3] Collecting stats for Ladislav Nagy at https://www.eliteprospects.com/player/5522/ladislav-nagy
Collecting Regular Season + Postseason stats for Ladislav Nagy at https://www.eliteprospects.com

In [7]:
# Scrape players from 400-450
curr_batch_metadata = nhl_players_metadata[400:450]
get_players_stats_by_batch(curr_batch_metadata)


 [1] Collecting stats for Bill Houlder at https://www.eliteprospects.com/player/67298/bill-houlder
Collecting Regular Season + Postseason stats for Bill Houlder at https://www.eliteprospects.com/player/67298/bill-houlder
Successfully scraped stats for Bill Houlder
Failed rate: 0.00
Sleep for 0.24 minutes to prevent getting blocked

 [2] Collecting stats for Ossi Väänänen at https://www.eliteprospects.com/player/2685/ossi-vaananen
Collecting Regular Season + Postseason stats for Ossi Väänänen at https://www.eliteprospects.com/player/2685/ossi-vaananen
Successfully scraped stats for Ossi Väänänen
Failed rate: 0.00
Sleep for 2.00 minutes to prevent getting blocked

 [3] Collecting stats for Jonathan Girard at https://www.eliteprospects.com/player/67360/jonathan-girard
Collecting Regular Season + Postseason stats for Jonathan Girard at https://www.eliteprospects.com/player/67360/jonathan-girard
Successfully scraped stats for Jonathan Girard
Failed rate: 0.00
Sleep for 1.77 minutes to prev

In [8]:
# Scrape players from 450-500
curr_batch_metadata = nhl_players_metadata[450:500]
get_players_stats_by_batch(curr_batch_metadata)


 [1] Collecting stats for Matthew Barnaby at https://www.eliteprospects.com/player/9073/matthew-barnaby
Collecting Regular Season + Postseason stats for Matthew Barnaby at https://www.eliteprospects.com/player/9073/matthew-barnaby
Successfully scraped stats for Matthew Barnaby
Failed rate: 0.00
Sleep for 0.86 minutes to prevent getting blocked

 [2] Collecting stats for Marty Reasoner at https://www.eliteprospects.com/player/8708/marty-reasoner
Collecting Regular Season + Postseason stats for Marty Reasoner at https://www.eliteprospects.com/player/8708/marty-reasoner
Successfully scraped stats for Marty Reasoner
Failed rate: 0.00
Sleep for 1.36 minutes to prevent getting blocked

 [3] Collecting stats for James Patrick at https://www.eliteprospects.com/player/32586/james-patrick
Collecting Regular Season + Postseason stats for James Patrick at https://www.eliteprospects.com/player/32586/james-patrick
Successfully scraped stats for James Patrick
Failed rate: 0.00
Sleep for 0.71 minutes

In [9]:
# Scrape players from 500-550
curr_batch_metadata = nhl_players_metadata[500:550]
get_players_stats_by_batch(curr_batch_metadata)


 [1] Collecting stats for Nathan Dempsey at https://www.eliteprospects.com/player/8741/nathan-dempsey
Collecting Regular Season + Postseason stats for Nathan Dempsey at https://www.eliteprospects.com/player/8741/nathan-dempsey
Successfully scraped stats for Nathan Dempsey
Failed rate: 0.00
Sleep for 1.88 minutes to prevent getting blocked

 [2] Collecting stats for Willie Mitchell at https://www.eliteprospects.com/player/8548/willie-mitchell
Collecting Regular Season + Postseason stats for Willie Mitchell at https://www.eliteprospects.com/player/8548/willie-mitchell
Error scraping Willie Mitchell: HTTPConnectionPool(host='localhost', port=56623): Read timed out. (read timeout=120)
Failed to get stats for Willie Mitchell: 'NoneType' object has no attribute 'to_csv'
Sleeping for 0.39 seconds before trying the next player

 [3] Collecting stats for Kirk Muller at https://www.eliteprospects.com/player/22413/kirk-muller
Collecting Regular Season + Postseason stats for Kirk Muller at https:

In [10]:
# Scrape players from 550-600
curr_batch_metadata = nhl_players_metadata[550:600]
get_players_stats_by_batch(curr_batch_metadata)


 [1] Collecting stats for Ian Moran at https://www.eliteprospects.com/player/5085/ian-moran
Collecting Regular Season + Postseason stats for Ian Moran at https://www.eliteprospects.com/player/5085/ian-moran
Successfully scraped stats for Ian Moran
Failed rate: 0.00
Sleep for 0.65 minutes to prevent getting blocked

 [2] Collecting stats for Marc Chouinard at https://www.eliteprospects.com/player/8547/marc-chouinard
Collecting Regular Season + Postseason stats for Marc Chouinard at https://www.eliteprospects.com/player/8547/marc-chouinard
Successfully scraped stats for Marc Chouinard
Failed rate: 0.00
Sleep for 1.68 minutes to prevent getting blocked

 [3] Collecting stats for Juha Lind at https://www.eliteprospects.com/player/259/juha-lind
Collecting Regular Season + Postseason stats for Juha Lind at https://www.eliteprospects.com/player/259/juha-lind
Successfully scraped stats for Juha Lind
Failed rate: 0.00
Sleep for 0.57 minutes to prevent getting blocked

 [4] Collecting stats for

### Fetch Players' Stats for Players Not Scraped

In [24]:
# Get unique players from nhl_players_stats.csv
if os.path.exists(output_path):
    nhl_players_stats = pd.read_csv(output_path)
    unique_players = set(nhl_players_stats['playername'].unique())
else:
    unique_players = set()

In [25]:
# Find players in nhl_players_metadata 0-600 not in unique_players
subset = nhl_players_metadata[0:600]
players_to_scrape = subset[~subset['playername'].isin(unique_players)].reset_index(drop=True)
players_to_scrape

Unnamed: 0,playername,fw_def,link
0,Bryan Helmer,DEF,https://www.eliteprospects.com/player/11481/br...


In [26]:
curr_len = len(players_to_scrape)
# curr_len = 10

for i in range(curr_len):
    player_metadata = players_to_scrape.iloc[i]
    player_name = player_metadata['playername']
    player_url = player_metadata['link']
    print(f"\nCollecting stats for {player_name} at {player_url}")

    try:
        player_stats = ep.get_player_stats(player_metadata)

        # Write to CSV file
        if os.path.exists(output_path):
            player_stats.to_csv(output_path, mode='a', header=False, index=False, encoding='utf-8-sig')
        else:
            player_stats.to_csv(output_path, index=False, encoding='utf-8-sig')
        print(f'Successfully scraped stats for {player_name}')

        # Add random sleep to prevent getting blocked
        if i < curr_len - 1:
            sleep_time = random.uniform(10, 120)
            print(f"Sleep for {sleep_time / 60:.2f} minutes to prevent getting blocked")
            time.sleep(sleep_time)
    except Exception as e:
        print(f"Failed to get stats for {player_name}: {e}")


        if i < curr_len - 1:
            # Sleep for 15-60 seconds before trying the next player
            sleep_time = random.uniform(15, 60)
            print(f"Sleeping for {sleep_time / 60:.2f} seconds before trying the next player")
            time.sleep(sleep_time)


Collecting stats for Bryan Helmer at https://www.eliteprospects.com/player/11481/bryan-helmer
Collecting Regular Season + Postseason stats for Bryan Helmer at https://www.eliteprospects.com/player/11481/bryan-helmer
Successfully scraped stats for Bryan Helmer


In [27]:
# Check how many distinct players are in the output_path
nhl_players_stats = pd.read_csv(output_path)
unique_players = set(nhl_players_stats['playername'].unique())
print(f'Number of unique players: {len(unique_players)}')

Number of unique players: 600


In [28]:
print(nhl_players_metadata.iloc[[380]])

       playername fw_def                                               link
380  Paul Ranheim     FW  https://www.eliteprospects.com/player/31355/pa...
