## Luka Teammates Modeling Analysis

### Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import statsmodels.api as sm
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

In [3]:
luka_merged_df = pd.read_csv('../outputs/luka_teammate_playtypes_percentiles_minutes_limit.csv')

The following features include everything except the PG's percentiles, because these will be used in scenarios where Luka is already the PG. We just want to learn about his teammates. This means the features informing the model are the offensive and defensive percentiles of the SG, SF, PF, and C.

In [5]:
luka_pg_features = ['SG_Defensive_Handoff',
       'SG_Defensive_OffScreen', 'SG_Defensive_PRBallHandler',
       'SG_Defensive_PRRollMan', 'SG_Defensive_Postup', 'SG_Defensive_Spotup',
       'SG_Offensive_Cut', 'SG_Offensive_Handoff', 'SG_Offensive_Misc',
       'SG_Offensive_OffScreen', 'SG_Offensive_PRBallHandler',
       'SG_Offensive_PRRollMan', 'SG_Offensive_Postup', 'SG_Offensive_Spotup',
       'SF_Defensive_Handoff', 'SF_Defensive_OffScreen',
       'SF_Defensive_PRBallHandler', 'SF_Defensive_PRRollMan',
       'SF_Defensive_Postup', 'SF_Defensive_Spotup', 'SF_Offensive_Cut',
       'SF_Offensive_Handoff', 'SF_Offensive_Misc', 'SF_Offensive_OffScreen',
       'SF_Offensive_PRBallHandler', 'SF_Offensive_PRRollMan',
       'SF_Offensive_Postup', 'SF_Offensive_Spotup', 'PF_Defensive_Handoff',
       'PF_Defensive_OffScreen', 'PF_Defensive_PRBallHandler',
       'PF_Defensive_PRRollMan', 'PF_Defensive_Postup', 'PF_Defensive_Spotup',
       'PF_Offensive_Cut', 'PF_Offensive_Handoff', 'PF_Offensive_Misc',
       'PF_Offensive_OffScreen', 'PF_Offensive_PRBallHandler',
       'PF_Offensive_PRRollMan', 'PF_Offensive_Postup', 'PF_Offensive_Spotup',
       'C_Defensive_Handoff', 'C_Defensive_OffScreen',
       'C_Defensive_PRBallHandler', 'C_Defensive_PRRollMan',
       'C_Defensive_Postup', 'C_Defensive_Spotup', 'C_Offensive_Cut',
       'C_Offensive_Handoff', 'C_Offensive_Misc', 'C_Offensive_OffScreen',
       'C_Offensive_PRBallHandler', 'C_Offensive_PRRollMan',
       'C_Offensive_Postup', 'C_Offensive_Spotup']

The following features include everything except the PG's percentiles, because these will be used in scenarios where Luka is already the PG. We just want to learn about his teammates. This means the features informing the model are the offensive percentiles of the SG, SF, PF, and C.

In [6]:
luka_pg_offensive_features = [ 'SG_Offensive_Cut', 'SG_Offensive_Handoff', 'SG_Offensive_Misc',
       'SG_Offensive_OffScreen', 'SG_Offensive_PRBallHandler',
       'SG_Offensive_PRRollMan', 'SG_Offensive_Postup', 'SG_Offensive_Spotup', 'SF_Offensive_Cut',
       'SF_Offensive_Handoff', 'SF_Offensive_Misc', 'SF_Offensive_OffScreen',
       'SF_Offensive_PRBallHandler', 'SF_Offensive_PRRollMan',
       'SF_Offensive_Postup', 'SF_Offensive_Spotup',
       'PF_Offensive_Cut', 'PF_Offensive_Handoff', 'PF_Offensive_Misc',
       'PF_Offensive_OffScreen', 'PF_Offensive_PRBallHandler',
       'PF_Offensive_PRRollMan', 'PF_Offensive_Postup', 'PF_Offensive_Spotup',
        'C_Offensive_Cut','C_Offensive_Handoff', 'C_Offensive_Misc', 'C_Offensive_OffScreen',
       'C_Offensive_PRBallHandler', 'C_Offensive_PRRollMan',
       'C_Offensive_Postup', 'C_Offensive_Spotup']

The following features include everything except the PG's percentiles, because these will be used in scenarios where Luka is already the PG. We just want to learn about his teammates. This means the features informing the model are the defensive percentiles of the SG, SF, PF, and C.

In [7]:
luka_pg_defensive_features = ['PG_Defensive_Handoff', 'PG_Defensive_OffScreen',
       'PG_Defensive_PRBallHandler', 'PG_Defensive_PRRollMan',
       'PG_Defensive_Postup', 'PG_Defensive_Spotup', 'SG_Defensive_Handoff',
       'SG_Defensive_OffScreen', 'SG_Defensive_PRBallHandler',
       'SG_Defensive_PRRollMan', 'SG_Defensive_Postup', 'SG_Defensive_Spotup', 
       'SF_Defensive_Handoff', 'SF_Defensive_OffScreen',
       'SF_Defensive_PRBallHandler', 'SF_Defensive_PRRollMan',
       'SF_Defensive_Postup', 'SF_Defensive_Spotup','PF_Defensive_Handoff',
       'PF_Defensive_OffScreen', 'PF_Defensive_PRBallHandler',
       'PF_Defensive_PRRollMan', 'PF_Defensive_Postup', 'PF_Defensive_Spotup',
       'C_Defensive_Handoff', 'C_Defensive_OffScreen',
       'C_Defensive_PRBallHandler', 'C_Defensive_PRRollMan',
       'C_Defensive_Postup', 'C_Defensive_Spotup']

### Net Rating Models

In [8]:
# Define features and target
X = luka_merged_df[luka_pg_features]
y = luka_merged_df['NET_RATING']
weights = np.sqrt(luka_merged_df['MIN']) # Square root weighting on mins played

In [9]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['NET_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             NET_RATING   R-squared:                       0.257
Model:                            WLS   Adj. R-squared:                 -0.047
Method:                 Least Squares   F-statistic:                    0.8453
Date:                Sun, 11 May 2025   Prob (F-statistic):              0.760
Time:                        23:57:21   Log-Likelihood:                -831.90
No. Observations:                 194   AIC:                             1778.
Df Residuals:                     137   BIC:                             1964.
Df Model:                          56                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The weighted least squares (WLS) model highlights that **SG_Offensive_PRRollMan, SG_Offensive_PostUp, SF_Offensive_PostUp, PF_Defensive_PostUp** are particularly strong features for predicting NET_RATING, with large positive coefficients indicating that excelling in these skills leads to higher ratings.

In [10]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [11]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,SG_Defensive_Handoff,-0.084654
1,SG_Defensive_OffScreen,0.146844
2,SG_Defensive_Postup,-0.083972
3,SG_Offensive_Misc,0.033463
4,SG_Offensive_Postup,0.185993
5,SF_Defensive_Handoff,0.129105
6,SF_Defensive_PRBallHandler,0.016173
7,SF_Defensive_PRRollMan,-0.221012
8,SF_Defensive_Postup,-0.04755
9,SF_Offensive_Misc,0.064298


This Lasso model agrees with the WLS model on the **SG_Offensive_PostUp and PF_Defensive_PostUp.**

### Offensive Rating Models

The following models are only using offensive percentiles to try to predict the offensive ratings of lineups where Luka is present as the PG.

In [12]:
# Define features and target
X = luka_merged_df[luka_pg_offensive_features]
y = luka_merged_df['OFF_RATING']
weights = np.sqrt(luka_merged_df['MIN']) # Square root weighting on mins played

In [13]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['OFF_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             OFF_RATING   R-squared:                       0.158
Model:                            WLS   Adj. R-squared:                 -0.010
Method:                 Least Squares   F-statistic:                    0.9415
Date:                Mon, 12 May 2025   Prob (F-statistic):              0.562
Time:                        00:01:30   Log-Likelihood:                -772.75
No. Observations:                 194   AIC:                             1611.
Df Residuals:                     161   BIC:                             1719.
Df Model:                          32                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **SG_Offensive_PRBallHandler, SG_Offensive_SpotUp, SF_Offensive_OffScreen, PF_Offensive_OffScreen** are particularly strong features for predicting OFF_RATING, with large positive coefficients indicating that excelling in these skills leads to higher ratings.

In [14]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [15]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,SG_Offensive_Postup,0.009743
1,SG_Offensive_Spotup,0.023069
2,SF_Offensive_OffScreen,-0.019034
3,PF_Offensive_Cut,-0.045906
4,PF_Offensive_PRBallHandler,-0.010446
5,PF_Offensive_Spotup,0.039802
6,C_Offensive_Handoff,-0.132914
7,C_Offensive_PRBallHandler,0.032117


This Lasso model agrees with the WLS model on the **SG_Offensive_SpotUp and SF_Offensive_OffScreen.**

### Defensive Rating Models

The following models are only using defensive percentiles to try to predict the defensive ratings of lineups where Luka is present as the PG.

In [16]:
# Define features and target
X = luka_merged_df[luka_pg_defensive_features]
y = luka_merged_df['DEF_RATING']
weights = np.sqrt(luka_merged_df['MIN']) # Square root weighting on mins played

In [17]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['DEF_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             DEF_RATING   R-squared:                       0.093
Model:                            WLS   Adj. R-squared:                 -0.073
Method:                 Least Squares   F-statistic:                    0.5599
Date:                Mon, 12 May 2025   Prob (F-statistic):              0.968
Time:                        00:04:14   Log-Likelihood:                -781.73
No. Observations:                 194   AIC:                             1625.
Df Residuals:                     163   BIC:                             1727.
Df Model:                          30                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **PG_Defensive_PRBallHandler, SG_Defensive_HandOff, SF_Defensive_Handoff, and C_Defensive_SpotUp** are particularly strong features for predicting DEF_RATING, with large negative coefficients indicating that excelling in these skills leads to lower defensive ratings.

In [18]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [19]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Defensive_PRRollMan,0.021835
1,SG_Defensive_OffScreen,-0.035119
2,SG_Defensive_Spotup,-0.019746
3,SF_Defensive_PRRollMan,0.01423
4,SF_Defensive_Postup,0.109445
5,PF_Defensive_Spotup,0.114343
6,C_Defensive_PRBallHandler,9.1e-05


This Lasso model agrees with the WLS model on the none of the features.

### Final Results

After analyzing the features deemed important towards strong net, offensive, and defensive ratings, we found that most of the key features were for the SG and SF position, with the skillsets of Offensive_PostUp, Offensive_SpotUp, Offensive_OffScreen, and Defensive_Handoff being the key skillsets. In the recommendations script, we will be looking for wings from the 2024-2025 season that have succeeded in those attributes.