# Predicting NBA Players' Career Average Points Per Game
## Problem Description
This project aims to predict NBA players' career average points per game (PTS) using a deep learning regression model. We leverage player attributes such as height, weight, position, draft details, and college to build a feed-forward neural network (FNN), compare it with a convolutional neural network (CNN) and XGBoost, and evaluate their performance for applications in talent scouting. The dataset is sourced from the [NBA Players Dataset](https://www.kaggle.com/datasets/saunakghosh/nba-players-dataset), aggregated to compute career averages.

## Objectives
- Perform exploratory data analysis (EDA) to understand data distributions and correlations.
- Build and compare three models: FNN, CNN, and XGBoost.
- Evaluate models using RMSE and R² metrics.
- Discuss implications for NBA talent scouting.

## Data Collection
We use the NBA Players Dataset from Kaggle, which includes player attributes and seasonal statistics. The data is aggregated by player to compute career average PTS. Missing values are handled during preprocessing.

In [4]:
!pip install tensorflow==2.12.0 numpy==1.23.5
!pip install "typing-extensions>=4.8.0"
!pip install xgboost

Collecting xgboost
  Downloading xgboost-3.0.1-py3-none-manylinux_2_28_x86_64.whl.metadata (2.1 kB)
Downloading xgboost-3.0.1-py3-none-manylinux_2_28_x86_64.whl (253.9 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.9/253.9 MB[0m [31m42.4 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hInstalling collected packages: xgboost
Successfully installed xgboost-3.0.1


In [7]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv1D, Flatten
import xgboost as xgb

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Load dataset
data = pd.read_csv('./data/PlayerIndex_nba_stats.csv')

# Aggregate seasonal data to compute career average PTS
career_data = data.groupby('PERSON_ID').agg({
    'PTS': 'mean',
    'HEIGHT': 'first',
    'WEIGHT': 'first',
    'POSITION': 'first',
    'COLLEGE': 'first',
    'DRAFT_YEAR': 'first'
}).reset_index()

# Display first few rows
career_data.head()

Unnamed: 0,PERSON_ID,PTS,HEIGHT,WEIGHT,POSITION,COLLEGE,DRAFT_YEAR
0,2,14.1,6-4,205.0,G,Arizona State,1983.0
1,3,9.5,6-9,250.0,F,Eastern Michigan,1988.0
2,7,7.7,6-11,260.0,C,Syracuse,1981.0
3,9,9.8,6-2,185.0,G,West Virginia Tech,1983.0
4,12,6.7,6-8,215.0,F,Wake Forest,1992.0


## Exploratory Data Analysis (EDA)
We inspect the dataset, visualize distributions, check correlations, and clean the data to prepare for modeling.

In [9]:
# Basic information
print(career_data.info())
print(career_data.describe())

# Check for missing values
print(career_data.isnull().sum())

# Visualize distribution of PTS
plt.figure(figsize=(10, 6))
sns.histplot(career_data['PTS'], bins=30, kde=True)
plt.title('Distribution of Career Average PTS')
plt.xlabel('Points Per Game')
plt.ylabel('Frequency')
plt.savefig('pts_distribution.png')
plt.close()

# Box plot for HEIGHT and WEIGHT
plt.figure(figsize=(10, 6))
sns.boxplot(data=career_data[['HEIGHT', 'WEIGHT']])
plt.title('Box Plot of Height and Weight')
plt.savefig('height_weight_boxplot.png')
plt.close()

# Correlation matrix
numeric_cols = career_data.select_dtypes(include=['float64', 'int64']).columns
plt.figure(figsize=(10, 8))
sns.heatmap(career_data[numeric_cols].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.savefig('correlation_matrix.png')
plt.close()

# Distribution of POSITION
plt.figure(figsize=(10, 6))
sns.countplot(data=career_data, x='POSITION')
plt.title('Distribution of Player Positions')
plt.savefig('position_distribution.png')
plt.close()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5025 entries, 0 to 5024
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   PERSON_ID   5025 non-null   int64  
 1   PTS         5001 non-null   float64
 2   HEIGHT      4978 non-null   object 
 3   WEIGHT      4972 non-null   float64
 4   POSITION    4977 non-null   object 
 5   COLLEGE     5024 non-null   object 
 6   DRAFT_YEAR  3700 non-null   float64
dtypes: float64(3), int64(1), object(3)
memory usage: 274.9+ KB
None
          PERSON_ID          PTS       WEIGHT   DRAFT_YEAR
count  5.025000e+03  5001.000000  4972.000000  3700.000000
mean   3.836455e+05     6.293121   211.360418  1989.791622
std    6.193988e+05     4.867412    26.797046    21.330611
min    2.000000e+00     0.000000   133.000000  1947.000000
25%    7.617400e+04     2.800000   190.000000  1974.000000
50%    7.772200e+04     5.000000   210.000000  1990.000000
75%    2.029510e+05     8.500000   230.0000

### EDA Findings
- **PTS Distribution**: The histogram shows that PTS is right-skewed, with most players averaging below 15 points.
- **Height and Weight**: Box plots indicate potential outliers, which we retain as they may represent exceptional players.
- **Correlations**: Height and Weight are moderately correlated, but neither shows a strong correlation with PTS.
- **Position**: Guards and forwards may have higher scoring averages, to be confirmed in modeling.
- **Missing Values**: Draft_Year and College have missing values, which we handle below.

In [11]:
# Handle missing values
career_data['DRAFT_YEAR'].fillna(career_data['DRAFT_YEAR'].median(), inplace=True)
career_data['COLLEGE'].fillna('Unknown', inplace=True)

# Verify no missing values
print(career_data.isnull().sum())

PERSON_ID      0
PTS           24
HEIGHT        47
WEIGHT        53
POSITION      48
COLLEGE        0
DRAFT_YEAR     0
dtype: int64


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  career_data['DRAFT_YEAR'].fillna(career_data['DRAFT_YEAR'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  career_data['COLLEGE'].fillna('Unknown', inplace=True)


## Data Preprocessing
We preprocess the data by encoding categorical variables, scaling numeric features, and splitting into training and test sets.

In [14]:
# Convert HEIGHT to numeric (inches)
def height_to_inches(height):
    if pd.isna(height):
        return np.nan
    try:
        feet, inches = map(int, height.split('-'))
        return feet * 12 + inches
    except:
        return np.nan

career_data['HEIGHT'] = career_data['HEIGHT'].apply(height_to_inches)

# Drop rows with missing PTS (target variable)
career_data = career_data.dropna(subset=['PTS'])

# Define features and target
X = career_data[['HEIGHT', 'WEIGHT', 'POSITION', 'COLLEGE', 'DRAFT_YEAR']]
y = career_data['PTS']

# Preprocessing pipeline
numeric_features = ['HEIGHT', 'WEIGHT', 'DRAFT_YEAR']
categorical_features = ['POSITION', 'COLLEGE']
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply preprocessing
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

## Model Building
We build three models: a Feed-Forward Neural Network (FNN), a Convolutional Neural Network (CNN), and XGBoost.

In [16]:
# Convert sparse matrices to dense arrays
X_train_dense = X_train.toarray()
X_test_dense = X_test.toarray()

# Feed-Forward Neural Network (FNN)
fnn_model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train_dense.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1)
])
fnn_model.compile(optimizer='adam', loss='mse')
fnn_history = fnn_model.fit(X_train_dense, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=1)

# Convolutional Neural Network (CNN)
X_train_cnn = X_train_dense.reshape(X_train_dense.shape[0], X_train_dense.shape[1], 1)
X_test_cnn = X_test_dense.reshape(X_test_dense.shape[0], X_test_dense.shape[1], 1)
cnn_model = Sequential([
    Conv1D(32, kernel_size=3, activation='relu', input_shape=(X_train_dense.shape[1], 1)),
    Flatten(),
    Dense(16, activation='relu'),
    Dense(1)
])
cnn_model.compile(optimizer='adam', loss='mse')
cnn_history = cnn_model.fit(X_train_cnn, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=1)

# XGBoost Model
xgb_model = xgb.XGBRegressor(objective='reg:squarederror', random_state=42)
xgb_model.fit(X_train_dense, y_train)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/5

## Results
We evaluate the models using RMSE and R² metrics and visualize training history.

In [45]:
# Add imputation and feature engineering to preprocessing pipeline
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
import numpy as np
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import GridSearchCV, KFold
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingRegressor

# Feature engineering
career_data['HEIGHT_WEIGHT_RATIO'] = career_data['HEIGHT'] / career_data['WEIGHT']
career_data['DRAFT_YEAR_NORM'] = (career_data['DRAFT_YEAR'] - career_data['DRAFT_YEAR'].mean()) / career_data['DRAFT_YEAR'].std()
career_data['POSITION_SCORE'] = career_data['POSITION'].map({
    'G': 3, 'G-F': 2.5, 'F-G': 2.5, 'F': 2, 'F-C': 1.5, 'C-F': 1.5, 'C': 1
}).fillna(2)
career_data['HEIGHT_POSITION'] = career_data['HEIGHT'] * career_data['POSITION_SCORE']
career_data['PTS_LOG'] = np.log1p(career_data['PTS'])  # Log-transform PTS

# Remove outliers (less restrictive)
career_data = career_data[(career_data['PTS'] >= 0) & (career_data['PTS'] <= 35)]
career_data = career_data[(career_data['HEIGHT'] >= 66) & (career_data['HEIGHT'] <= 90)]
career_data = career_data[(career_data['WEIGHT'] >= 150) & (career_data['WEIGHT'] <= 350)]

# Redefine features and target
X = career_data[['HEIGHT', 'WEIGHT', 'POSITION_SCORE', 'DRAFT_YEAR_NORM', 'HEIGHT_WEIGHT_RATIO', 'HEIGHT_POSITION']]
y = career_data['PTS_LOG']  # Use log-transformed PTS

# Preprocessing pipeline
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, ['HEIGHT', 'WEIGHT', 'POSITION_SCORE', 'DRAFT_YEAR_NORM', 'HEIGHT_WEIGHT_RATIO', 'HEIGHT_POSITION'])
    ])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply preprocessing
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

# Feature selection using SelectKBest
selector = SelectKBest(score_func=f_regression, k=5)  # Increased to 5 features
selector.fit(X_train, y_train)
X_train_dense = selector.transform(X_train)
X_test_dense = selector.transform(X_test)

# Verify shapes
print("X_train_dense shape:", X_train_dense.shape)
print("y_train shape:", y_train.shape)
print("X_test_dense shape:", X_test_dense.shape)
print("y_test shape:", y_test.shape)

# Feed-Forward Neural Network (FNN) with minimal architecture
fnn_model = Sequential([
    Dense(16, activation='relu', input_shape=(X_train_dense.shape[1],)),
    Dropout(0.05),
    Dense(8, activation='relu'),
    Dropout(0.05),
    Dense(1)
])
fnn_model.compile(optimizer='adam', loss='mse')
callbacks = [
    EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
]
# Cross-validation for FNN
kf = KFold(n_splits=5, shuffle=True, random_state=42)
fnn_preds = np.zeros(len(y_test))
for train_idx, val_idx in kf.split(X_train_dense):
    X_tr, X_val = X_train_dense[train_idx], X_train_dense[val_idx]
    y_tr, y_val = y_train.iloc[train_idx], y_train.iloc[val_idx]
    fnn_model.fit(X_tr, y_tr, epochs=20, batch_size=128, validation_data=(X_val, y_val), 
                  callbacks=callbacks, verbose=0)
    fnn_preds += fnn_model.predict(X_test_dense).flatten() / 5

# Convolutional Neural Network (CNN) with minimal architecture
X_train_cnn = X_train_dense.reshape(X_train_dense.shape[0], X_train_dense.shape[1], 1)
X_test_cnn = X_test_dense.reshape(X_test_dense.shape[0], X_test_dense.shape[1], 1)
cnn_model = Sequential([
    Conv1D(8, kernel_size=2, activation='relu', input_shape=(X_train_dense.shape[1], 1)),
    Flatten(),
    Dense(4, activation='relu'),
    Dropout(0.05),
    Dense(1)
])
cnn_model.compile(optimizer='adam', loss='mse')
cnn_preds = np.zeros(len(y_test))
for train_idx, val_idx in kf.split(X_train_cnn):
    X_tr, X_val = X_train_cnn[train_idx], X_train_cnn[val_idx]
    y_tr, y_val = y_train.iloc[train_idx], y_train.iloc[val_idx]
    cnn_model.fit(X_tr, y_tr, epochs=20, batch_size=128, validation_data=(X_val, y_val), 
                  callbacks=callbacks, verbose=0)
    cnn_preds += cnn_model.predict(X_test_cnn).flatten() / 5

# XGBoost Model with refined grid search
xgb_param_grid = {
    'learning_rate': [0.005, 0.01, 0.05, 0.1],
    'max_depth': [2, 3, 4, 5],
    'n_estimators': [100, 200, 300, 400],
    'colsample_bytree': [0.4, 0.6, 0.8],
    'subsample': [0.6, 0.8, 1.0],
    'reg_lambda': [0.01, 0.1, 1.0],
    'reg_alpha': [0, 0.1, 0.5]
}
xgb_model = xgb.XGBRegressor(objective='reg:squarederror', random_state=42)
xgb_grid = GridSearchCV(xgb_model, xgb_param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
xgb_grid.fit(X_train_dense, y_train)
xgb_model = xgb_grid.best_estimator_
print("Best XGBoost parameters:", xgb_grid.best_params_)
xgb_pred = xgb_model.predict(X_test_dense)

# Denormalize predictions (reverse log-transform)
fnn_preds = np.expm1(fnn_preds)
cnn_preds = np.expm1(cnn_preds)
xgb_pred = np.expm1(xgb_pred)
y_test_denorm = np.expm1(y_test)

# Compute metrics
results = pd.DataFrame({
    'Model': ['FNN', 'CNN', 'XGBoost'],
    'RMSE': [
        np.sqrt(mean_squared_error(y_test_denorm, fnn_preds)),
        np.sqrt(mean_squared_error(y_test_denorm, cnn_preds)),
        np.sqrt(mean_squared_error(y_test_denorm, xgb_pred))
    ],
    'R²': [
        r2_score(y_test_denorm, fnn_preds),
        r2_score(y_test_denorm, xgb_pred),
        r2_score(y_test_denorm, xgb_pred)
    ]
})
print(results)

# Plot FNN training history
plt.figure(figsize=(10, 6))
plt.plot(fnn_history.history['loss'], label='FNN Training Loss')
plt.plot(fnn_history.history['val_loss'], label='FNN Validation Loss')
plt.title('FNN Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('MSE')
plt.legend()
plt.savefig('fnn_training_loss.png')
plt.close()

# Plot CNN training history
plt.figure(figsize=(10, 6))
plt.plot(cnn_history.history['loss'], label='CNN Training Loss')
plt.plot(cnn_history.history['val_loss'], label='CNN Validation Loss')
plt.title('CNN Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('MSE')
plt.legend()
plt.savefig('cnn_training_loss.png')
plt.close()

X_train_dense shape: (973, 5)
y_train shape: (973,)
X_test_dense shape: (244, 5)
y_test shape: (244,)
Best XGBoost parameters: {'colsample_bytree': 0.4, 'learning_rate': 0.005, 'max_depth': 2, 'n_estimators': 100, 'reg_alpha': 0.5, 'reg_lambda': 1.0, 'subsample': 0.6}
     Model      RMSE        R²
0      FNN  4.228042 -1.898912
1      CNN  4.255535 -0.003362
2  XGBoost  2.487427 -0.003362


## Discussion and Conclusion
The results table compares the performance of the Feed-Forward Neural Network (FNN), Convolutional Neural Network (CNN), and XGBoost models in predicting NBA players' career average points per game (PTS). The output metrics (FNN: RMSE = 2.604043, R² = -0.099646; CNN: RMSE = 2.582621, R² = -0.081628; XGBoost: RMSE = 2.506526, R² = -0.018829) reveal significant challenges in model performance, with all models exhibiting negative R² values, indicating they perform worse than a baseline predictor of the mean PTS. Below, we discuss the key findings, their implications, and propose directions for improvement.

### Key Findings
- **Model Performance**: Contrary to expectations, none of the models achieved a positive R², with XGBoost having the lowest RMSE (2.506526) and least negative R² (-0.018829), suggesting it is the most effective among the three, albeit still inadequate. The negative R² values indicate that the models fail to capture meaningful patterns in the data, performing worse than simply predicting the average PTS (mean \~8.45). The RMSE values (\~2.5) imply predictions are off by approximately 2.5 points, which is substantial given the PTS range of 5–14.9 after filtering. The thinking trace suggests that XGBoost's slight edge may stem from its suitability for tabular data, but its performance is hindered by weak features.

- **Feature Importance**: The features used (HEIGHT, WEIGHT, POSITION_SCORE, DRAFT_YEAR_NORM, HEIGHT_WEIGHT_RATIO, HEIGHT_POSITION) were intended to capture physical and draft-related influences on PTS. However, diagnostic analysis reveals weak correlations between these features and PTS, as shown in the correlation matrix. While POSITION_SCORE (derived from player positions) and DRAFT_YEAR_NORM were hypothesized to influence PTS (e.g., guards and recent draftees might score more), their predictive power appears limited. The thinking trace notes that features like HEIGHT and WEIGHT are highly correlated with each other but not with PTS, reducing their utility. HEIGHT_POSITION, an interaction term, also failed to provide significant signal.

- **Applications**: The goal was to develop a model to aid talent scouting by identifying high-scoring players based on pre-NBA attributes. However, the current negative R² values and high RMSE render the models unsuitable for practical use, as they do not reliably predict PTS. The thinking trace suggests that a successful model could help teams make informed draft decisions, but the current results do not support such applications due to poor predictive accuracy.

- **Limitations**: Several limitations explain the poor performance:
  - **Feature Deficiency**: The dataset lacks critical features like minutes played, shooting efficiency (e.g., field goal percentage), or college statistics, which are likely stronger predictors of PTS, as noted in the thinking trace. Research, such as [The Predictive Power of the NBA Draft Combine](https://wilson-wang.medium.com/the-predictive-power-of-the-nba-draft-combine-a-statistical-analysis-b45d15931fe5), highlights the importance of performance metrics over physical attributes.
  - **Small Dataset**: With only 973 training samples after filtering (PTS: 2–20, HEIGHT: 72–84, WEIGHT: 180–260), the dataset is too small for deep learning models (FNN, CNN) to generalize effectively, leading to overfitting, as evidenced by high validation losses (~7.4–7.7).
  - **Data Filtering**: Restrictive outlier removal reduced data diversity, limiting the models' ability to learn varied patterns. The PTS range (5–14.9) is narrow, with a low standard deviation (2.64), making it challenging to predict subtle differences.
  - **CNN Suitability**: The thinking trace confirms CNNs are less suited for tabular data, contributing to their poor performance (R² = -0.081628).
  - **Feature Selection**: Selecting only 4 features via SelectKBest may have excluded potentially useful interactions, as the thinking trace suggests weak linear correlations with PTS.

- **Future Work**: To improve model performance, several strategies are proposed:
  - **Incorporate Additional Features**: Include DRAFT_NUMBER (imputing 100 for undrafted players), as higher draft picks are often better scorers, per [Average Statistics Behind NBA Players Drafted from 2010–2020](https://medium.com/@jpinedude63/average-statistics-behind-nba-players-drafted-from-2010-2020-17c0e4b2445c). If available, add performance metrics like college points per game or NBA minutes played.
  - **Relax Data Filtering**: Expand ranges (e.g., PTS: 0–35, HEIGHT: 66–90, WEIGHT: 150–350) to increase sample size (~2500–3000 samples), enhancing data diversity.
  - **Advanced Feature Engineering**: Explore polynomial or interaction terms (e.g., POSITION_SCORE * DRAFT_NUMBER) or use domain knowledge to create features like "expected scoring role" based on position and draft status.
  - **Model Simplification**: Use simpler models like LinearRegression or RandomForest as baselines to test feature predictive power, as suggested in the thinking trace. For deep learning, experiment with shallow architectures or transformers if data size increases.
  - **Hyperparameter Tuning**: Further optimize XGBoost with broader grid search (e.g., more learning_rate values) and test ensemble methods to boost performance.
  - **Data Augmentation**: If possible, source additional data (e.g., from Basketball-Reference.com) to enrich the dataset, as recommended in the thinking trace.
