# Introduction
This project is a fun, data-driven exploration of WNBA Draft picks from 2020 to 2024. Using data on player position, drafting team, and draft round, I analyze trends in who gets waived and why. I also use machine learning with Scikit-learn to predict whether a player is likely to be waived before playing for their team.

#Data Sources
I worked with two CSV files for this analysis. The first file, enriched_wnba_draftees.csv, contains a list of all WNBA draftees from 1997 to 2024. I initially imported it directly from Kaggle but noticed that many players were missing position data. To fix this, I downloaded the CSV, updated the missing information, and then uploaded the revised version to my content folder.

The second file, WNBADraft_20_24.csv, includes all players drafted between 2020 and 2024 who were waived before playing for their teams. I created this dataset myself by checking the current status of each player from those draft years. Sources used for verification are listed in the CSV.

In the section below, I clean this data up a bit to prepare it for use.



In [152]:
import pandas as pd
import re

def clean_team(name):
    # Remove " FROM" or " VIA " I just care about who actually drafted them
      return re.split(r'FROM|VIA', name.upper())[0].strip()

f_all = pd.read_csv('enriched_wnba_draftees.csv')
f_waived = pd.read_csv('/content/WNBADraft_20_24.csv')

f_all_2020_2024 = f_all[f_all['year'].between(2020, 2024)].copy()

f_all_2020_2024['team'] = f_all_2020_2024['team'].str.strip().str.upper()
f_all_2020_2024['team'] = f_all_2020_2024['team'].apply(clean_team)
f_waived['Team'] = f_waived['Team'].str.strip().str.upper()

# Normalize f_all_2020_2024 positions to lowercase
f_all_2020_2024['position'] = f_all_2020_2024['position'].str.strip().str.lower()

# Normalize f_waived positions (convert abbreviations to full names)
position_map = {
    'g': 'guard',
    'f': 'forward',
    'c': 'center',
    'g/f': 'guard/forward'
}

f_waived['Position'] = f_waived['Position'].str.strip().str.lower().map(position_map).fillna(f_waived['Position'].str.strip().str.lower())


# Data Analysis
The following sections explore waiver trends by team, player position, and draft pick. Each analysis focuses on identifying patterns in who gets waived and when.

## Team Waiver Trends
In this section I compare the number of draft picks per team and the number of players that team has waived. I then use this to calculate a waiver rate for that team. It seems that Connecticut waives *a lot* of players!

In [153]:
waived_by_team = f_waived['Team'].value_counts().sort_values(ascending=False)
drafts_by_team = f_all_2020_2024['team'].value_counts().sort_values(ascending=False)

team_stats = pd.DataFrame({
    'Draft Picks(2020-2024)': drafts_by_team,
    'Waived Players' : waived_by_team
}).fillna(0)

team_stats = team_stats.astype(int)


team_stats['Waiver Rate (%)'] = (team_stats['Waived Players'] / team_stats['Draft Picks(2020-2024)'] * 100).round(1)

team_stats = team_stats.sort_values('Waiver Rate (%)', ascending=False)

team_stats


Unnamed: 0,Draft Picks(2020-2024),Waived Players,Waiver Rate (%)
CONNECTICUT SUN,14,11,78.6
WASHINGTON MYSTICS,10,5,50.0
PHOENIX MERCURY,10,5,50.0
LAS VEGAS ACES,14,7,50.0
INDIANA FEVER,24,10,41.7
CHICAGO SKY,10,4,40.0
DALLAS WINGS,19,7,36.8
MINNESOTA LYNX,12,4,33.3
SEATTLE STORM,17,5,29.4
NEW YORK LIBERTY,17,5,29.4


### Visualization
Here I include an interactive bar chart that visualizes the chart from above.

In [154]:
import plotly.graph_objects as go

# Sort team stats
team_stats_sorted = team_stats.sort_values('Waiver Rate (%)', ascending=False)

# Create interactive bar chart
fig = go.Figure()

fig.add_trace(go.Bar(
    x=team_stats_sorted.index,
    y=team_stats_sorted['Draft Picks(2020-2024)'],
    name='Draft Picks (2020–2024)',
    marker_color='skyblue'
))

fig.add_trace(go.Bar(
    x=team_stats_sorted.index,
    y=team_stats_sorted['Waived Players'],
    name='Waived Players',
    marker_color='salmon'
))

# Update layout
fig.update_layout(
    title='Draft Picks vs. Waived Players by Team (2020–2024)',
    xaxis_title='Team',
    yaxis_title='Count',
    barmode='group',
    xaxis_tickangle=-45,
    hovermode='x unified',
    height=500,
    width=1000
)

fig.show()


## Draft Round Waiver Risk

This section visualizes the rate of waiving for each draft round. It's no surprise that the last round of the draft has the highest percentage.

In [142]:
import plotly.express as px

# Make sure round is treated as a string for x-axis clarity
round_stats['Round'] = round_stats['Round'].astype(str)

fig = px.bar(
    round_stats,
    x='Round',
    y='Waiver Rate (%)',
    text='Waiver Rate (%)',
    title='Waiver Rate by Draft Round (2020–2024)',
    labels={'Round': 'Draft Round', 'Waiver Rate (%)': 'Waiver Rate (%)'},
    color='Waiver Rate (%)',
    color_continuous_scale='Blues'
)

fig.update_traces(texttemplate='%{text}%', textposition='outside')
fig.update_layout(
    xaxis_title='Draft Round',
    yaxis_title='Waiver Rate (%)',
    uniformtext_minsize=8,
    uniformtext_mode='hide',
    height=400
)

fig.show()



## Waiver Trends by Position

This section visualizes the waiver rate based on position. This shows that guards are both the most drafted as well as the most waived. This is likely due to:

- A surplus of guard talent in the draft
- Limited guard roster spots on WNBA teams
- More variability in guard performance

In [156]:
import plotly.graph_objects as go


# Reindex to ensure consistent order
position_df = position_df.sort_index()

# Create the interactive bar chart
fig = go.Figure()

fig.add_trace(go.Bar(
    x=position_df.index,
    y=position_df['Drafted'],
    name='Drafted',
    marker_color='skyblue'
))

fig.add_trace(go.Bar(
    x=position_df.index,
    y=position_df['Waived'],
    name='Waived',
    marker_color='salmon'
))

fig.update_layout(
    title='Drafted vs. Waived Players by Position (2020–2024)',
    xaxis_title='Position',
    yaxis_title='Number of Players',
    barmode='group',
    xaxis_tickangle=0,
    hovermode='x unified',
    height=400,
    width=700
)

fig.show()


#Predictive Modeling

This is where things get a bit interesting! In the next few sections, I train a Random Forest classifier to predict a player's odds of being waived based on:

- Draft Round
- Position
- Team

In [144]:
df_model = f_all_2020_2024.copy()

df_model = df_model[['round_pick', 'team', 'position', 'Waived']]
df_model = df_model[['round_pick', 'team', 'position', 'Waived']].rename(columns={'round_pick': 'round'})


df_model

Unnamed: 0,round,team,position,Waived
0,1,INDIANA FEVER,guard,False
1,1,LOS ANGELES SPARKS,forward,False
2,1,CHICAGO SKY,center,False
3,1,LOS ANGELES SPARKS,forward,False
4,1,DALLAS WINGS,guard,False
...,...,...,...,...
175,3,CHICAGO SKY,forward,True
176,3,LAS VEGAS ACES,forward,True
177,3,LOS ANGELES SPARKS,guard,True
178,3,CONNECTICUT SUN,guard,True


## Model Prep
Before training a model, I encoded the categorical features team and position using LabelEncoder from scikit-learn. This converts the string values into numeric format so they can be used in machine learning models.

The features used for prediction include the draft round, encoded team, and encoded position. The target variable is whether or not a player was waived.

In [147]:
from sklearn.preprocessing import LabelEncoder

le_team = LabelEncoder()
le_position = LabelEncoder()

df_model['team_encoded'] = le_team.fit_transform(df_model['team'].str.lower())
df_model['position_encoded'] = le_position.fit_transform(df_model['position'].str.lower())


X = df_model[['round', 'team_encoded', 'position_encoded']]
y = df_model['Waived']


## Model Training

I split the data into training and testing sets using an 80/20 split to evaluate model performance on unseen data. Then, I trained a RandomForestClassifier using the training set. I set class_weight to "balanced" to account for any imbalance between waived and non-waived players, helping the model treat both classes more equally.

In [148]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(random_state=42, class_weight='balanced')
model.fit(X_train, y_train)

## Prediction Function
This function, predict_waiver, takes a team name, player position, and draft round to estimate the likelihood that a player will be waived. It encodes the input values using the same label encoders used during training and feeds the resulting data into the trained RandomForestClassifier.

The model returns the probability of a waiver. If the probability exceeds a specified threshold (default is 0.5), the function predicts that the player will be waived. The output includes a formatted summary showing the input details, the prediction, and the waiver probability.

In [149]:
def predict_waiver(team_name, position_name, round_number, threshold=0.5):
    """
    Predicts whether a WNBA draft pick will be waived based on team, position, and round (1-3).
    """
    try:
        team_encoded = le_team.transform([team_name.lower()])[0]
        position_encoded = le_position.transform([position_name.lower()])[0]

        new_player = pd.DataFrame({
            'round': [round_number],
            'team_encoded': [team_encoded],
            'position_encoded': [position_encoded]
        })

        probability = model.predict_proba(new_player)[0][1]  # probability of being waived
        prediction = "Yes" if probability > threshold else "No"

        summary = (
            f"{team_name.title()} | {position_name.title()} | Round {round_number}\n"
            f"Prediction: {'Waived' if prediction == 'Yes' else 'Kept on roster'}\n"
            f"Waiver Probability: {round(probability * 100, 2)}%"
        )


        return summary

    except Exception as e:
        return f"Error: {str(e)}", None

## Usage!

Let's test it out with the following teams: Connecticut Sun, Dallas Wings, and Atlanta Dream. I chose the team that has waived the most percentage of players, a team in the middle and the team that waives the least.

In [157]:
print(predict_waiver('connecticut sun', 'guard', 3))
print(predict_waiver('dallas wings', 'guard', 3))
print(predict_waiver('atlanta dream', 'guard', 3))


Connecticut Sun | Guard | Round 3
Prediction: Waived
Waiver Probability: 99.66%
Dallas Wings | Guard | Round 3
Prediction: Waived
Waiver Probability: 59.48%
Atlanta Dream | Guard | Round 3
Prediction: Kept on roster
Waiver Probability: 48.39%
