# Player Facts Preparation
## Collecting NHL Players' Facts from EliteProspects
This step gathers player facts data for downstream NHL prediction.

- Collect league-wide player stats for a given season from
https://eliteprospects.com/league/{league}/stats/{season}?page={n}

- Extract metadata, including individual player profile links.

- Build a unique player list from seasons 2000–2025 by storing profile links in a set.

- Scrape each player’s page to retrieve their first 5 years of NHL stats, filtered by seasons where they played more than 30 games.

- Gather pre-NHL stats for each player (junior, college, international, etc.).

## Import Libraries

In [1]:
import eliteprospects_scraper_api as ep
import pandas as pd
import os
import time
import random

### Collecting NHL Players' Metadata from EliteProspects

In [2]:
if not os.path.exists('./data/nhl/nhl_players_metadata.csv'):
    # Collect players from 2000-2025 season
    for i in range(0, 25):
        season = f'20{str(i).zfill(2)}-20{str(i + 1).zfill(2)}'
        print(f'Scraping {season}')
        nhl_players = ep.get_season_roster("nhl", season)
        nhl_players.to_csv(f'./data/nhl/players/nhl_players_{season}.csv', index=False, encoding='utf-8-sig')
        print(f'Finished scraping {season}')
else:
    print('NHL players metadata already exists. Skipping scraping.')

NHL players metadata already exists. Skipping scraping.


In [3]:
if not os.path.exists('./data/nhl/nhl_players_metadata.csv'):
    # Collect players' metadata from 2000-2025 season
    nhl_players_metadata = pd.DataFrame()
    for i in range(0, 25):
        season = f'20{str(i).zfill(2)}-20{str(i + 1).zfill(2)}'
        nhl_players = pd.read_csv(f'./data/nhl/players/nhl_players_{season}.csv')
        nhl_players_metadata = pd.concat([nhl_players_metadata, ep.get_players_metadata(nhl_players)]).reset_index(drop=True)
        # Remove duplicates
        nhl_players_metadata = nhl_players_metadata.drop_duplicates(subset=['playername']).reset_index(drop=True)
        print(f'Finished Merging Metadata for {season}')
else:
    print('NHL players metadata already exists. Skipping merging.')
    nhl_players_metadata = pd.read_csv('./data/nhl/nhl_players_metadata.csv')


NHL players metadata already exists. Skipping merging.


### Testing API

In [4]:
cale_makar_metadata = nhl_players_metadata.iloc[3188]
cale_makar_metadata

playername                                           Cale Makar
fw_def                                                      DEF
link          https://www.eliteprospects.com/player/199655/c...
Name: 3188, dtype: object

In [5]:
cale_makar_facts = ep.get_player_facts(cale_makar_metadata)

Collecting facts for Cale Makar at https://www.eliteprospects.com/player/199655/cale-makar
Player type: PP Specialist
Player type: Speedster
Player type: Two-Way Defenseman


In [6]:
cale_makar_facts

Unnamed: 0,Player Name,Nation,Position,Height (cm),Weight (kg),Shoots,Player type,NHL Rights,Draft,Highlights,Description
0,Cale Makar,Canada,D,180,85,R,"[PP Specialist, Speedster, Two-Way Defenseman]",Colorado Avalanche / Signed,"(1, 4, 2017)","[1-time WJAC-19 Gold Medal, 1-time U20 WJC Gol...",Many have called Cale Makar one of the purest ...


### Collecting NHL Players' Facts from EliteProspects

In [None]:
# Define output file
output_path = './data/nhl/stats/nhl_players_facts.csv'

In [None]:
def get_players_stats_by_batch(players_to_scrape):
    curr_len = len(players_to_scrape)
    fail_count = 0

    for i in range(curr_len):
        player_metadata = players_to_scrape.iloc[i]
        player_name = player_metadata['playername']
        player_url = player_metadata['link']
        print(f"\n [{i + 1}] Collecting facts for {player_name} at {player_url}")

        try:
            player_stats = ep.get_player_facts(player_metadata)

            # Write to CSV file
            if os.path.exists(output_path):
                player_stats.to_csv(output_path, mode='a', header=False, index=False, encoding='utf-8-sig')
            else:
                player_stats.to_csv(output_path, index=False, encoding='utf-8-sig')
            print(f'Successfully scraped facts for {player_name}')

            # Print Fail Rate
            print(f'Failed rate: {fail_count / (i + 1):.2f}')

            # Add random sleep to prevent getting blocked
            if i < curr_len - 1:
                sleep_time = random.uniform(10, 120)
                print(f"Sleep for {sleep_time / 60:.2f} minutes to prevent getting blocked")
                time.sleep(sleep_time)
        except Exception as e:
            print(f"Failed to get facts for {player_name}: {e}")

            fail_count += 1

            if i < curr_len - 1:
                # Sleep for 15-60 seconds before trying the next player
                sleep_time = random.uniform(15, 60)
                print(f"Sleeping for {sleep_time / 60:.2f} seconds before trying the next player")
                time.sleep(sleep_time)

### Fetch Player Facts in Batches