<h1 style="text-align:center;">WNBA DATA SCRAPE</h1> 🏀

## I used this notebook to scrape game logs from [www.basketball-reference.com](www.basketball-reference.com). This is the [youtube video](https://youtu.be/pZBcBrOxCVQ?si=Bgi8-ZXV9h7O4Pcu) I followed which details how to scrape NBA data. I had to make a few adjustments. 

### Note: I wanted ten seasons of data. But the current Las Vegas Aces were the San Antonio Silver Stars for the first two years of the ten year window. I have not, yet, decided how I wanted to proceed. This notebook produces a dataset that has ten seasons for every other team and 8 seasons for the Las Vegas Aces.

In [1]:
# Import libraries
import numpy as np
import pandas as pd
import random
import time
from unidecode import unidecode

In [2]:
# Create list of team codes
teams = ['ATL', 'CHI', 'CON', 'IND', 'LAS', 'MIN', 'NYL', 'PHO', 'LVA', 'SEA', 'DAL', 'WAS', 'GSV']
len(teams)

13

In [3]:
# seasons = ['2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024']
seasons = ['2025']
len(seasons)

1

In [4]:
stats = [
    'FG', 'FGA', 'FG%', 
    '3P', '3PA', '3P%', 
    'FT', 'FTA', 'FT%', 
    'ORB', 'TRB', 'AST', 
    'STL', 'BLK', 'TOV', 'PF'
]

# Create a team stats dictionary
tm_stats_dict = {stat: 'Tm_' + str(stat) for stat in stats} 

# Create the opponents stats dictionary
opp_stats_dict = {stat + '.1': 'Opp_' + str(stat) for stat in stats}

In [5]:
# Create an empty dataframe to append
wnba_df = pd.DataFrame()

# Iterate through the seasons
for season in seasons: 
    # Iterate through the teams
    for team in teams:
        # Set the URL
        url = f'https://www.basketball-reference.com/wnba/teams/{team}/{season}/gamelog'
        print(f"Scraping URL: {url}")
        
        try:
            # Attempt to scrape the table
            team_df = pd.read_html(url, header=1, attrs={'id': 'wnba_tgl_basic'})[0]
            
            # Drop rows where 'Rk' is null or where 'Rk' is equal to 'Rk'
            team_df = team_df[team_df['Rk'].astype(str).str.isnumeric()]
            
            # Drop the blank column
            team_df = team_df.drop(columns=['Rk', 'Unnamed: 6', 'Unnamed: 9', 'Unnamed: 26'])

            # Rename columns
            team_df = team_df.rename(columns={'Unnamed: 3': 'Home', 'Tm': 'Tm_Pts', 'Opp.1': 'Opp_Pts'})
            team_df = team_df.rename(columns=tm_stats_dict)
            team_df = team_df.rename(columns=opp_stats_dict)

            # Replace values in 'Home'
            team_df['Home'] = team_df['Home'].apply(lambda x: 0 if x == '@' else 1)

            # Add two columns to the front of team_df
            team_df.insert(loc=0, column='Season', value=season)
            team_df.insert(loc=1, column='Team', value=team.upper())

            # Append the current year and team gamelogs to the aggregate dataframe
            wnba_df = pd.concat([wnba_df, team_df], ignore_index=True)

        except ValueError as e:
            # Handle the error if the table is not found
            print(f"Table not found for Team: {team.upper()} in Season: {season}. Skipping...")

        # Pause program to abide by basketball-reference.com rules
        time.sleep(random.randint(4, 6))

# Display the aggregate dataframe
print(wnba_df)

Scraping URL: https://www.basketball-reference.com/wnba/teams/ATL/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/CHI/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/CON/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/IND/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/LAS/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/MIN/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/NYL/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/PHO/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/LVA/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/SEA/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/DAL/2025/gamelog
Scraping URL: https://www.basketball-reference.com/wnba/teams/WAS/2025/gamelog
Scraping URL: https://www.basketball-reference.com/w

In [6]:
pwd

'/home/maxkennebrew/projects/sports_etl_project/notebooks'

In [7]:
wnba_df.to_csv('../data/wnba_gamelogs_2025_1stQtr.csv', index = False)