# NHL Stats Preparation
## Collecting NHL Players' Stats from EliteProspects
This step gathers player performance data across leagues and seasons for downstream NHL prediction.

- Collect league-wide player stats for a given season from
https://eliteprospects.com/league/{league}/stats/{season}?page={n}

- Extract metadata, including individual player profile links.

- Build a unique player list from seasons 2000–2025 by storing profile links in a set.

- Scrape each player’s page to retrieve their first 5 years of NHL stats, filtered by seasons where they played more than 30 games.

- Gather pre-NHL stats for each player (junior, college, international, etc.).

### Import Libraries

In [1]:
import eliteprospects_scraper_api as ep
import pandas as pd
import os
import time
import random

### Collecting NHL Players' Stats from EliteProspects


In [2]:
if not os.path.exists('./data/nhl/nhl_players_metadata.csv'):
    # Collect players from 2000-2025 season
    for i in range(0, 25):
        season = f'20{str(i).zfill(2)}-20{str(i + 1).zfill(2)}'
        print(f'Scraping {season}')
        nhl_players = ep.get_season_roster("nhl", season)
        nhl_players.to_csv(f'./data/nhl/players/nhl_players_{season}.csv', index=False, encoding='utf-8-sig')
        print(f'Finished scraping {season}')
else:
    print('NHL players metadata already exists. Skipping scraping.')

NHL players metadata already exists. Skipping scraping.


In [3]:
if not os.path.exists('./data/nhl/nhl_players_metadata.csv'):
    # Collect players' metadata from 2000-2025 season
    nhl_players_metadata = pd.DataFrame()
    for i in range(0, 25):
        season = f'20{str(i).zfill(2)}-20{str(i + 1).zfill(2)}'
        nhl_players = pd.read_csv(f'./data/nhl/players/nhl_players_{season}.csv')
        nhl_players_metadata = pd.concat([nhl_players_metadata, ep.get_players_metadata(nhl_players)]).reset_index(drop=True)
        # Remove duplicates
        nhl_players_metadata = nhl_players_metadata.drop_duplicates(subset=['playername']).reset_index(drop=True)
        print(f'Finished Merging Metadata for {season}')
else:
    print('NHL players metadata already exists. Skipping merging.')
    nhl_players_metadata = pd.read_csv('./data/nhl/nhl_players_metadata.csv')


NHL players metadata already exists. Skipping merging.


"### Collect Stats for Each Player from 2000-2025 CSV file

In [4]:
# Define output file
output_path = './data/nhl/stats/nhl_players_stats.csv'

In [5]:
def get_players_stats_by_batch(players_to_scrape):
    curr_len = len(players_to_scrape)
    fail_count = 0

    for i in range(curr_len):
        player_metadata = players_to_scrape.iloc[i]
        player_name = player_metadata['playername']
        player_url = player_metadata['link']
        print(f"\nCollecting stats for {player_name} at {player_url}")

        try:
            player_stats = ep.get_player_stats(player_metadata)

            # Write to CSV file
            if os.path.exists(output_path):
                player_stats.to_csv(output_path, mode='a', header=False, index=False, encoding='utf-8-sig')
            else:
                player_stats.to_csv(output_path, index=False, encoding='utf-8-sig')
            print(f'Successfully scraped stats for {player_name}')
            
            # Print Fail Rate
            print(f'Failed rate: {fail_count / (i + 1):.2f}')

            # Add random sleep to prevent getting blocked
            if i < curr_len - 1:
                sleep_time = random.uniform(10, 180)
                print(f"Sleep for {sleep_time / 60:.2f} minutes to prevent getting blocked")
                time.sleep(sleep_time) 
        except Exception as e:
            print(f"Failed to get stats for {player_name}: {e}")

            fail_count += 1

            if i < curr_len - 1:
                # Sleep for 15-60 seconds before trying the next player
                sleep_time = random.uniform(15, 60)
                print(f"Sleeping for {sleep_time / 60:.2f} seconds before trying the next player")
                time.sleep(sleep_time)

### Fetch Players' Stats by Batch

In [None]:
# Scrape players from 340-350
curr_batch_metadata = nhl_players_metadata[340:350]
get_players_stats_by_batch(curr_batch_metadata)

In [9]:
# Scrape players from 350-400
curr_batch_metadata = nhl_players_metadata[350:400]
get_players_stats_by_batch(curr_batch_metadata)


Collecting stats for Hans Jonsson at https://www.eliteprospects.com/player/715/hans-jonsson
Collecting Regular Season + Postseason stats for Hans Jonsson at https://www.eliteprospects.com/player/715/hans-jonsson
Successfully scraped stats for Hans Jonsson
Failed rate: 0.00
Sleep for 2.24 minutes to prevent getting blocked

Collecting stats for Dan Boyle at https://www.eliteprospects.com/player/5369/dan-boyle
Collecting Regular Season + Postseason stats for Dan Boyle at https://www.eliteprospects.com/player/5369/dan-boyle
Successfully scraped stats for Dan Boyle
Failed rate: 0.00
Sleep for 1.89 minutes to prevent getting blocked

Collecting stats for Dmitri Kalinin at https://www.eliteprospects.com/player/8673/dmitri-kalinin
Collecting Regular Season + Postseason stats for Dmitri Kalinin at https://www.eliteprospects.com/player/8673/dmitri-kalinin
Successfully scraped stats for Dmitri Kalinin
Failed rate: 0.00
Sleep for 0.63 minutes to prevent getting blocked

Collecting stats for Wayn

In [None]:
# Scrape players from 400-450
curr_batch_metadata = nhl_players_metadata[400:450]
get_players_stats_by_batch(curr_batch_metadata)

In [None]:
# Scrape players from 450-500
curr_batch_metadata = nhl_players_metadata[450:500]
get_players_stats_by_batch(curr_batch_metadata)

In [None]:
# Scrape players from 500-600
curr_batch_metadata = nhl_players_metadata[500:600]
get_players_stats_by_batch(curr_batch_metadata)

### Fetch Players' Stats for Players Not Scraped

In [13]:
# Get unique players from nhl_players_stats.csv
if os.path.exists(output_path):
    nhl_players_stats = pd.read_csv(output_path)
    unique_players = set(nhl_players_stats['playername'].unique())
else:
    unique_players = set()

In [14]:
# Find players in nhl_players_metadata 0-600 not in unique_players
subset = nhl_players_metadata[0:340]
players_to_scrape = subset[~subset['playername'].isin(unique_players)].reset_index(drop=True)
players_to_scrape

Unnamed: 0,playername,fw_def,link


In [12]:
curr_len = len(players_to_scrape)

for i in range(curr_len):
    player_metadata = players_to_scrape.iloc[i]
    player_name = player_metadata['playername']
    player_url = player_metadata['link']
    print(f"\nCollecting stats for {player_name} at {player_url}")

    try:
        player_stats = ep.get_player_stats(player_metadata)

        # Write to CSV file
        if os.path.exists(output_path):
            player_stats.to_csv(output_path, mode='a', header=False, index=False, encoding='utf-8-sig')
        else:
            player_stats.to_csv(output_path, index=False, encoding='utf-8-sig')
        print(f'Successfully scraped stats for {player_name}')

        # Add random sleep to prevent getting blocked
        if i < curr_len - 1:
            sleep_time = random.uniform(10, 180)
            print(f"Sleep for {sleep_time / 60:.2f} minutes to prevent getting blocked")
            time.sleep(sleep_time)
    except Exception as e:
        print(f"Failed to get stats for {player_name}: {e}")


        if i < curr_len - 1:
            # Sleep for 15-60 seconds before trying the next player
            sleep_time = random.uniform(15, 60)
            print(f"Sleeping for {sleep_time / 60:.2f} seconds before trying the next player")
            time.sleep(sleep_time)

Collecting stats for Larry Murphy at https://www.eliteprospects.com/player/21498/larry-murphy
Collecting Regular Season + Postseason stats for Larry Murphy at https://www.eliteprospects.com/player/21498/larry-murphy
Successfully scraped stats for Larry Murphy

Sleep for 2.24 minutes to prevent getting blocked
Collecting stats for Tie Domi at https://www.eliteprospects.com/player/9138/tie-domi
Collecting Regular Season + Postseason stats for Tie Domi at https://www.eliteprospects.com/player/9138/tie-domi
Successfully scraped stats for Tie Domi

Sleep for 0.97 minutes to prevent getting blocked
Collecting stats for Oleg Kvasha at https://www.eliteprospects.com/player/8587/oleg-kvasha
Collecting Regular Season + Postseason stats for Oleg Kvasha at https://www.eliteprospects.com/player/8587/oleg-kvasha
Successfully scraped stats for Oleg Kvasha

Sleep for 1.12 minutes to prevent getting blocked
Collecting stats for Chris Simon at https://www.eliteprospects.com/player/8882/chris-simon
Colle