# NBA Player Seasons Count Dataset

**Overview**:
This dataset contains comprehensive statistics and information related to NBA players, specifically focusing on their performance during their debut season and its relationship to their subsequent career longevity in the NBA.

**Description**:
The dataset is derived from season-average statistics of NBA players. Each row represents a player's performance metrics during their first season in the NBA. These metrics are then used to predict the total number of future seasons a player will compete in the NBA, giving insights into their career longevity.

**Data Filtering and Preprocessing**:
- The dataset has been filtered to include only data from the NBA, excluding other leagues like ABA or BAA.
- Active players, especially those who played in the last five seasons (2018-19 to 2022-23), have been excluded to avoid skewed predictions due to ongoing careers.
- Players who played for multiple teams during their debut season have been removed to maintain consistency in the data.

**Primary Task**:
Using this dataset, the main objective is to predict the number of future seasons a player will play in the NBA based on their debut season's statistics. This prediction task can be approached as a regression problem, where the debut season metrics act as features, and the total seasons played (excluding the debut season) acts as the target variable.

In [2]:
import numpy as np
import pandas as pd
import os

In [3]:
dataset_name = 'nba_seasons_count'

In [4]:
input_dir = './data'
seasons_fname = "player_season_averages.csv"
output_dir = f'./../../processed/{dataset_name}/'
outp_fname = os.path.join(output_dir, f'{dataset_name}.csv')

# Read Data

In [5]:
seasons = pd.read_csv(os.path.join(input_dir, seasons_fname))
seasons.head()

Unnamed: 0,Name,Season,Age,Tm,Lg,Pos,G,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GS
0,Šarūnas Jasikevičius,2005-06,29.0,IND,NBA,PG,75,20.8,2.3,5.7,...,0.3,1.8,2.0,3.0,0.5,0.1,1.5,1.4,7.3,15.0
1,Šarūnas Jasikevičius,2006-07,30.0,TOT,NBA,"PG,SG",63,15.4,2.0,5.1,...,0.2,0.9,1.1,2.7,0.4,0.0,1.4,1.2,6.1,3.0
2,Šarūnas Jasikevičius,2006-07,30.0,IND,NBA,SG,37,17.9,2.5,6.0,...,0.3,1.0,1.3,3.0,0.4,0.0,1.6,1.4,7.4,1.0
3,Šarūnas Jasikevičius,2006-07,30.0,GSW,NBA,PG,26,11.9,1.4,3.9,...,0.2,0.6,0.8,2.3,0.5,0.0,1.2,0.8,4.3,2.0
4,A.C. Green,1985-86,22.0,LAL,NBA,PF,82,18.8,2.5,4.7,...,2.0,2.7,4.6,0.7,0.6,0.6,1.2,2.8,6.4,1.0


# Process Data

In [6]:
# 1. Filter for NBA-only data
filtered_seasons = seasons[seasons['Lg'] == 'NBA']

# 2. Exclude active players
recent_players = filtered_seasons[filtered_seasons['Season'].str.startswith(tuple(['2018', '2019', '2020', '2021', '2022']))]['Name'].unique()
filtered_seasons = filtered_seasons[~filtered_seasons['Name'].isin(recent_players)]

# 3. Identify and filter out players who played for multiple teams in their first season
filtered_seasons['Start Year'] = filtered_seasons['Season'].apply(lambda x: int(x.split('-')[0]))
min_start_years = filtered_seasons.groupby('Name')['Start Year'].min().reset_index()
first_seasons = pd.merge(min_start_years, filtered_seasons, on=['Name', 'Start Year'], how='left')
player_counts_first_season = first_seasons['Name'].value_counts()
multi_team_players_first_season = player_counts_first_season[player_counts_first_season > 1].index.tolist()
first_seasons_filtered = first_seasons[~first_seasons['Name'].isin(multi_team_players_first_season)]

# 4. Compute the number of future seasons
season_counts = filtered_seasons.groupby('Name')['Season'].nunique().reset_index()
season_counts['Season'] = season_counts['Season'] - 1
season_counts.rename(columns={'Season': 'Future Seasons Played'}, inplace=True)

# 5. Merge this new target with the filtered first seasons data
final_dataset_seasons = pd.merge(first_seasons_filtered, season_counts, on='Name', how='left')

# 6. Drop unnecessary columns "Start Year" and "Lg"
final_dataset_seasons = final_dataset_seasons.drop(columns=["Start Year", "Lg"])

In [7]:
final_dataset_seasons.head()

Unnamed: 0,Name,Season,Age,Tm,Pos,G,MP,FG,FGA,FG%,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GS,Future Seasons Played
0,A.C. Green,1985-86,22.0,LAL,PF,82,18.8,2.5,4.7,0.539,...,2.7,4.6,0.7,0.6,0.6,1.2,2.8,6.4,1.0,15
1,A.J. Bramlett,1999-00,23.0,CLE,C,8,7.6,0.5,2.6,0.19,...,1.3,2.8,0.0,0.1,0.0,0.4,1.6,1.0,0.0,0
2,A.J. English,1990-91,23.0,WSB,SG,70,20.6,3.6,8.2,0.439,...,1.2,2.1,2.5,0.4,0.2,1.6,1.8,8.8,12.0,1
3,A.J. Guyton,2000-01,22.0,CHI,PG,33,19.1,2.4,5.8,0.406,...,0.8,1.1,1.9,0.3,0.2,0.7,1.1,6.0,8.0,2
4,A.J. Hammons,2016-17,24.0,DAL,C,22,7.4,0.8,1.9,0.405,...,1.3,1.6,0.2,0.0,0.6,0.5,1.0,2.2,0.0,0


In [8]:
final_dataset_seasons.shape

(3261, 31)

# Save Main Data File

In [9]:
final_dataset_seasons.to_csv(outp_fname, index=False)