## Lebron Teammates Modeling Analysis

### Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import statsmodels.api as sm
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

In [2]:
lebron_merged_df = pd.read_csv('../outputs/lebron_teammate_playtypes_percentiles_minutes_limit.csv')

The following features include everything except the SF's percentiles, because these will be used in scenarios where LeBron is already the SF. We just want to learn about his teammates. This means the features informing the model are the offensive and defensive percentiles of the PG, SG, PF, and C.

In [3]:
lebron_sf_features = ['PG_Defensive_Handoff', 'PG_Defensive_OffScreen',
       'PG_Defensive_PRBallHandler', 'PG_Defensive_PRRollMan',
       'PG_Defensive_Postup', 'PG_Defensive_Spotup', 'PG_Offensive_Cut',
       'PG_Offensive_Handoff', 'PG_Offensive_Misc', 'PG_Offensive_OffScreen',
       'PG_Offensive_PRBallHandler', 'PG_Offensive_PRRollMan',
       'PG_Offensive_Postup', 'PG_Offensive_Spotup', 'SG_Defensive_Handoff',
       'SG_Defensive_OffScreen', 'SG_Defensive_PRBallHandler',
       'SG_Defensive_PRRollMan', 'SG_Defensive_Postup', 'SG_Defensive_Spotup',
       'SG_Offensive_Cut', 'SG_Offensive_Handoff', 'SG_Offensive_Misc',
       'SG_Offensive_OffScreen', 'SG_Offensive_PRBallHandler',
       'SG_Offensive_PRRollMan', 'SG_Offensive_Postup', 'SG_Offensive_Spotup','PF_Defensive_Handoff',
       'PF_Defensive_OffScreen', 'PF_Defensive_PRBallHandler',
       'PF_Defensive_PRRollMan', 'PF_Defensive_Postup', 'PF_Defensive_Spotup',
       'PF_Offensive_Cut', 'PF_Offensive_Handoff', 'PF_Offensive_Misc',
       'PF_Offensive_OffScreen', 'PF_Offensive_PRBallHandler',
       'PF_Offensive_PRRollMan', 'PF_Offensive_Postup', 'PF_Offensive_Spotup',
       'C_Defensive_Handoff', 'C_Defensive_OffScreen',
       'C_Defensive_PRBallHandler', 'C_Defensive_PRRollMan',
       'C_Defensive_Postup', 'C_Defensive_Spotup', 'C_Offensive_Cut',
       'C_Offensive_Handoff', 'C_Offensive_Misc', 'C_Offensive_OffScreen',
       'C_Offensive_PRBallHandler', 'C_Offensive_PRRollMan',
       'C_Offensive_Postup', 'C_Offensive_Spotup']

The following features include everything except the PF's percentiles, because these will be used in scenarios where LeBron is already the PF. We just want to learn about his teammates. This means the features informing the model are the offensive and defensive percentiles of the PG, SG, SF, and C.

In [4]:
lebron_pf_features = ['PG_Defensive_Handoff', 'PG_Defensive_OffScreen',
       'PG_Defensive_PRBallHandler', 'PG_Defensive_PRRollMan',
       'PG_Defensive_Postup', 'PG_Defensive_Spotup', 'PG_Offensive_Cut',
       'PG_Offensive_Handoff', 'PG_Offensive_Misc', 'PG_Offensive_OffScreen',
       'PG_Offensive_PRBallHandler', 'PG_Offensive_PRRollMan',
       'PG_Offensive_Postup', 'PG_Offensive_Spotup', 'SG_Defensive_Handoff',
       'SG_Defensive_OffScreen', 'SG_Defensive_PRBallHandler',
       'SG_Defensive_PRRollMan', 'SG_Defensive_Postup', 'SG_Defensive_Spotup',
       'SG_Offensive_Cut', 'SG_Offensive_Handoff', 'SG_Offensive_Misc',
       'SG_Offensive_OffScreen', 'SG_Offensive_PRBallHandler',
       'SG_Offensive_PRRollMan', 'SG_Offensive_Postup', 'SG_Offensive_Spotup',
       'SF_Defensive_Handoff', 'SF_Defensive_OffScreen',
       'SF_Defensive_PRBallHandler', 'SF_Defensive_PRRollMan',
       'SF_Defensive_Postup', 'SF_Defensive_Spotup', 'SF_Offensive_Cut',
       'SF_Offensive_Handoff', 'SF_Offensive_Misc', 'SF_Offensive_OffScreen',
       'SF_Offensive_PRBallHandler', 'SF_Offensive_PRRollMan',
       'SF_Offensive_Postup', 'SF_Offensive_Spotup',
       'C_Defensive_Handoff', 'C_Defensive_OffScreen',
       'C_Defensive_PRBallHandler', 'C_Defensive_PRRollMan',
       'C_Defensive_Postup', 'C_Defensive_Spotup', 'C_Offensive_Cut',
       'C_Offensive_Handoff', 'C_Offensive_Misc', 'C_Offensive_OffScreen',
       'C_Offensive_PRBallHandler', 'C_Offensive_PRRollMan',
       'C_Offensive_Postup', 'C_Offensive_Spotup']

The following features include everything except the SF's percentiles, because these will be used in scenarios where LeBron is already the SF. We just want to learn about his teammates. This means the features informing the model are the offensive percentiles of the PG, SG, PF, and C.

In [5]:
lebron_sf_offensive_features = ['PG_Offensive_Cut',
       'PG_Offensive_Handoff', 'PG_Offensive_Misc', 'PG_Offensive_OffScreen',
       'PG_Offensive_PRBallHandler', 'PG_Offensive_PRRollMan',
       'PG_Offensive_Postup', 'PG_Offensive_Spotup', 
       'SG_Offensive_Cut', 'SG_Offensive_Handoff', 'SG_Offensive_Misc',
       'SG_Offensive_OffScreen', 'SG_Offensive_PRBallHandler',
       'SG_Offensive_PRRollMan', 'SG_Offensive_Postup', 'SG_Offensive_Spotup',
       'PF_Offensive_Cut', 'PF_Offensive_Handoff', 'PF_Offensive_Misc',
       'PF_Offensive_OffScreen', 'PF_Offensive_PRBallHandler',
       'PF_Offensive_PRRollMan', 'PF_Offensive_Postup', 'PF_Offensive_Spotup',
        'C_Offensive_Cut','C_Offensive_Handoff', 'C_Offensive_Misc', 'C_Offensive_OffScreen',
       'C_Offensive_PRBallHandler', 'C_Offensive_PRRollMan',
       'C_Offensive_Postup', 'C_Offensive_Spotup']

The following features include everything except the PF's percentiles, because these will be used in scenarios where LeBron is already the PF. We just want to learn about his teammates. This means the features informing the model are the offensive percentiles of the PG, SG, SF, and C.

In [6]:
lebron_pf_offensive_features = ['PG_Offensive_Cut',
       'PG_Offensive_Handoff', 'PG_Offensive_Misc', 'PG_Offensive_OffScreen',
       'PG_Offensive_PRBallHandler', 'PG_Offensive_PRRollMan',
       'PG_Offensive_Postup', 'PG_Offensive_Spotup', 
       'SG_Offensive_Cut', 'SG_Offensive_Handoff', 'SG_Offensive_Misc',
       'SG_Offensive_OffScreen', 'SG_Offensive_PRBallHandler',
       'SG_Offensive_PRRollMan', 'SG_Offensive_Postup', 'SG_Offensive_Spotup',
        'SF_Offensive_Cut','SF_Offensive_Handoff', 'SF_Offensive_Misc', 'SF_Offensive_OffScreen',
       'SF_Offensive_PRBallHandler', 'SF_Offensive_PRRollMan',
       'SF_Offensive_Postup', 'SF_Offensive_Spotup',
        'C_Offensive_Cut','C_Offensive_Handoff', 'C_Offensive_Misc', 'C_Offensive_OffScreen',
       'C_Offensive_PRBallHandler', 'C_Offensive_PRRollMan',
       'C_Offensive_Postup', 'C_Offensive_Spotup']

The following features include everything except the SF's percentiles, because these will be used in scenarios where LeBron is already the SF. We just want to learn about his teammates. This means the features informing the model are the defensive percentiles of the PG, SG, PF, and C.

In [7]:
lebron_sf_defensive_features = ['PG_Defensive_Handoff', 'PG_Defensive_OffScreen',
       'PG_Defensive_PRBallHandler', 'PG_Defensive_PRRollMan',
       'PG_Defensive_Postup', 'PG_Defensive_Spotup', 'SG_Defensive_Handoff',
       'SG_Defensive_OffScreen', 'SG_Defensive_PRBallHandler',
       'SG_Defensive_PRRollMan', 'SG_Defensive_Postup', 'SG_Defensive_Spotup',
       'PF_Defensive_Handoff',
       'PF_Defensive_OffScreen', 'PF_Defensive_PRBallHandler',
       'PF_Defensive_PRRollMan', 'PF_Defensive_Postup', 'PF_Defensive_Spotup',
       'C_Defensive_Handoff', 'C_Defensive_OffScreen',
       'C_Defensive_PRBallHandler', 'C_Defensive_PRRollMan',
       'C_Defensive_Postup', 'C_Defensive_Spotup']

The following features include everything except the PF's percentiles, because these will be used in scenarios where LeBron is already the PF. We just want to learn about his teammates. This means the features informing the model are the defensive percentiles of the PG, SG, SF, and C.

In [8]:
lebron_pf_defensive_features = ['PG_Defensive_Handoff', 'PG_Defensive_OffScreen',
       'PG_Defensive_PRBallHandler', 'PG_Defensive_PRRollMan',
       'PG_Defensive_Postup', 'PG_Defensive_Spotup', 'SG_Defensive_Handoff',
       'SG_Defensive_OffScreen', 'SG_Defensive_PRBallHandler',
       'SG_Defensive_PRRollMan', 'SG_Defensive_Postup', 'SG_Defensive_Spotup',
       'SF_Defensive_Handoff', 'SF_Defensive_OffScreen',
       'SF_Defensive_PRBallHandler', 'SF_Defensive_PRRollMan',
       'SF_Defensive_Postup', 'SF_Defensive_Spotup', 
       'C_Defensive_Handoff', 'C_Defensive_OffScreen',
       'C_Defensive_PRBallHandler', 'C_Defensive_PRRollMan',
       'C_Defensive_Postup', 'C_Defensive_Spotup']

In [9]:
lebron_merged_sf_df = lebron_merged_df[lebron_merged_df['SF'] == 'L. James']
lebron_merged_pf_df = lebron_merged_df[lebron_merged_df['PF'] == 'L. James']

### Net Rating Models

### LeBron as SF

The following models are based on the rows in which LeBron is a SF. This means the features informing the model are the offensive and defensive percentiles of the PG, SG, PF, and C.

In [32]:
# Define features and target
X = lebron_merged_sf_df[lebron_sf_features]
y = lebron_merged_sf_df['NET_RATING']
weights = np.sqrt(lebron_merged_sf_df['MIN']) # Square root weighting on mins played

In [34]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['NET_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             NET_RATING   R-squared:                       0.322
Model:                            WLS   Adj. R-squared:                 -0.039
Method:                 Least Squares   F-statistic:                    0.8910
Date:                Wed, 07 May 2025   Prob (F-statistic):              0.679
Time:                        00:03:07   Log-Likelihood:                -668.30
No. Observations:                 162   AIC:                             1451.
Df Residuals:                     105   BIC:                             1627.
Df Model:                          56                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The weighted least squares model highlights that **PF_Defensive_Spotup, C_Defensive_PRBallHandler, and PF_Defensive_Offscreen** are particularly strong features for predicting NET_RATING, with large positive coefficients indicating that excelling in these skills leads to higher ratings. On the other hand, SG_Defensive_OffScreen shows a negative relationship with NET_RATING, suggesting that this defensive skill may be less valuable in improving overall player performance.

In [41]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [40]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Defensive_PRBallHandler,-0.120357
1,PG_Defensive_Postup,-0.110942
2,PG_Defensive_Spotup,-0.152619
3,PG_Offensive_Misc,0.002122
4,PG_Offensive_PRRollMan,-0.103377
5,PG_Offensive_Postup,-0.215471
6,SG_Defensive_OffScreen,0.072246
7,SG_Offensive_Misc,-0.328829
8,PF_Defensive_PRRollMan,-0.034998
9,PF_Offensive_Cut,-0.145044


This Lasso model only produces negative coefficients, which contextually doesn't make sense.

### LeBron as PF

The following models are based on the rows in which LeBron is a PF. This means the features informing the model are the offensive and defensive percentiles of the PG, SG, SF, and C.

In [10]:
# Define features and target
X = lebron_merged_pf_df[lebron_pf_features]
y = lebron_merged_pf_df['NET_RATING']
weights = np.sqrt(lebron_merged_pf_df['MIN']) # Square root weighting on mins played

In [12]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['NET_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             NET_RATING   R-squared:                       0.648
Model:                            WLS   Adj. R-squared:                  0.331
Method:                 Least Squares   F-statistic:                     2.040
Date:                Sun, 11 May 2025   Prob (F-statistic):            0.00330
Time:                        23:26:52   Log-Likelihood:                -462.87
No. Observations:                 119   AIC:                             1040.
Df Residuals:                      62   BIC:                             1198.
Df Model:                          56                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **PG_Defensive_Spotup, PG_Offensive_Handoff, SG_Defensive_PostUp, SG_Offensive_Misc, C_Defensive_PRBallHandler, C_Defensive_PRRollMan, C_Offensive_Cut, C_Offensive_PRBallHandler** are particularly strong features for predicting NET_RATING, with large positive coefficients indicating that excelling in these skills leads to higher ratings. 

In [14]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [15]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Defensive_Spotup,0.010878
1,PG_Offensive_Cut,0.095567
2,PG_Offensive_Handoff,0.162849
3,SG_Offensive_Cut,-0.161526
4,SG_Offensive_Handoff,0.074711
5,SG_Offensive_OffScreen,0.112214
6,SG_Offensive_PRBallHandler,0.013465
7,SG_Offensive_Spotup,-0.086397
8,SF_Defensive_Handoff,-0.037277
9,SF_Defensive_PRRollMan,-0.013871


This Lasso model agrees with the findings of the WLS on the **PG_Defensive_Spotup, PG_Offensive_Handoff, and the C_Defensive_PRBallHandler.**

### Offensive Rating Models

The following models are only using offensive percentiles to try to predict the offensive ratings of lineups where LeBron is present, first as a SF and then later as a PF.

### LeBron as SF

In [16]:
# Define features and target
X = lebron_merged_sf_df[lebron_sf_offensive_features]
y = lebron_merged_sf_df['OFF_RATING']
weights = np.sqrt(lebron_merged_sf_df['MIN']) # Square root weighting on mins played

In [18]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['OFF_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             OFF_RATING   R-squared:                       0.192
Model:                            WLS   Adj. R-squared:                 -0.008
Method:                 Least Squares   F-statistic:                    0.9594
Date:                Sun, 11 May 2025   Prob (F-statistic):              0.536
Time:                        23:35:11   Log-Likelihood:                -616.87
No. Observations:                 162   AIC:                             1300.
Df Residuals:                     129   BIC:                             1402.
Df Model:                          32                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **PG_Offensive_Misc, PG_Offensive_SpotUp, and PF_Offensive_PRRollMan** are particularly strong features for predicting OFF_RATING, with large positive coefficients indicating that excelling in these skills leads to higher ratings.

In [19]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [20]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Offensive_Misc,0.195002
1,PG_Offensive_Postup,-0.071531
2,SG_Offensive_Handoff,0.057001
3,SG_Offensive_Misc,-0.189726
4,SG_Offensive_OffScreen,-0.024201
5,SG_Offensive_Postup,0.111389
6,PF_Offensive_Cut,-0.043414
7,PF_Offensive_Misc,-0.017625
8,PF_Offensive_OffScreen,-0.190047


This Lasso model agrees with the findings of the WLS on the **PG_Offensive_Misc**

### LeBron as PF

In [21]:
# Define features and target
X = lebron_merged_pf_df[lebron_pf_offensive_features]
y = lebron_merged_pf_df['OFF_RATING']
weights = np.sqrt(lebron_merged_pf_df['MIN']) # Square root weighting on mins played

In [22]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['OFF_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             OFF_RATING   R-squared:                       0.260
Model:                            WLS   Adj. R-squared:                 -0.015
Method:                 Least Squares   F-statistic:                    0.9458
Date:                Sun, 11 May 2025   Prob (F-statistic):              0.557
Time:                        23:38:17   Log-Likelihood:                -458.67
No. Observations:                 119   AIC:                             983.3
Df Residuals:                      86   BIC:                             1075.
Df Model:                          32                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **PG_Offensive_PostUp, SG_Offensive_SpotUp, SF_Offensive_OffScreen, C_Offensive_Cut, C_Offensive_Handoff** are particularly strong features for predicting OFF_RATING, with large positive coefficients indicating that excelling in these skills leads to higher ratings. 

In [23]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [24]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Offensive_Misc,-0.079521
1,PG_Offensive_PRBallHandler,-0.012227
2,PG_Offensive_PRRollMan,0.002154
3,SG_Offensive_Cut,-0.24068
4,SG_Offensive_Misc,-0.033855
5,SG_Offensive_OffScreen,-0.114199
6,SG_Offensive_Spotup,-0.063252
7,SF_Offensive_Cut,-0.060417
8,SF_Offensive_Handoff,0.092743
9,SF_Offensive_Misc,-0.037442


This Lasso model agrees with the findings of the WLS on the **SG_Offensive_SpotUp.**

### Defensive Rating Models

The following models are only using defensive percentiles to try to predict the defensive ratings of lineups where LeBron is present, first as a SF and then later as a PF.

### LeBron as SF

In [25]:
# Define features and target
X = lebron_merged_sf_df[lebron_sf_defensive_features]
y = lebron_merged_sf_df['DEF_RATING']
weights = np.sqrt(lebron_merged_sf_df['MIN']) # Square root weighting on mins played

In [26]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['DEF_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             DEF_RATING   R-squared:                       0.172
Model:                            WLS   Adj. R-squared:                  0.027
Method:                 Least Squares   F-statistic:                     1.187
Date:                Sun, 11 May 2025   Prob (F-statistic):              0.265
Time:                        23:41:25   Log-Likelihood:                -640.14
No. Observations:                 162   AIC:                             1330.
Df Residuals:                     137   BIC:                             1407.
Df Model:                          24                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **SG_Defensive_PRBallHandler, PF_Defensive_OffScreen, PF_Defensive_PRRollMan, and C_Defensive_PRBallHandler** are particularly strong features for predicting DEF_RATING, with large negative coefficients indicating that excelling in these skills leads to lower defensive ratings.

In [27]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [28]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Defensive_PRBallHandler,0.128822
1,PG_Defensive_Postup,0.062066
2,SG_Defensive_Handoff,0.102763
3,SG_Defensive_OffScreen,0.034954
4,SG_Defensive_PRRollMan,0.106726
5,SG_Defensive_Postup,0.227398
6,PF_Defensive_Handoff,-0.001089
7,PF_Defensive_OffScreen,-0.070803
8,PF_Defensive_Postup,0.059494
9,C_Defensive_PRRollMan,-0.067548


This Lasso model agrees with the findings of the WLS on the **PF_Defensive_OffScreen.**

### LeBron as PF

In [29]:
# Define features and target
X = lebron_merged_pf_df[lebron_pf_defensive_features]
y = lebron_merged_pf_df['DEF_RATING']
weights = np.sqrt(lebron_merged_pf_df['MIN']) # Square root weighting on mins played

In [30]:
# Add constant for WLS
X_wls = sm.add_constant(X)

# Replace infs with NaNs
X_wls_clean = X_wls.replace([np.inf, -np.inf], np.nan)

# Combine all into one DataFrame and drop rows with NaNs
data_clean = pd.concat([X_wls_clean, y, weights], axis=1).dropna()

# Separate back into X, y, weights
X_clean = data_clean[X_wls_clean.columns]
y_clean = data_clean['DEF_RATING']
weights_clean = data_clean['MIN']**0.5  # Recalculate in case any rows were dropped

# Fit Weighted Least Squares (WLS) model
wls_model = sm.WLS(y_clean, X_clean, weights=weights_clean).fit()

# Output summary
print("WLS Model Summary:")
print(wls_model.summary())

# Get WLS residuals (adjusted net rating)
adjusted_y = wls_model.resid

WLS Model Summary:
                            WLS Regression Results                            
Dep. Variable:             DEF_RATING   R-squared:                       0.379
Model:                            WLS   Adj. R-squared:                  0.220
Method:                 Least Squares   F-statistic:                     2.389
Date:                Sun, 11 May 2025   Prob (F-statistic):            0.00155
Time:                        23:46:43   Log-Likelihood:                -466.47
No. Observations:                 119   AIC:                             982.9
Df Residuals:                      94   BIC:                             1052.
Df Model:                          24                                         
Covariance Type:            nonrobust                                         
                                 coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------
c

The WLS model highlights that **PG_Defensive_OffScreen, PG_Defensive_PRBallHandler, PG_Defensive_SpotUp, SG_Defensive_Handoff, SG_Defensive_OffScreen, SG_Defensive_SpotUp, C_Defensive_PRRollMan** are particularly strong features for predicting DEF_RATING, with large negative coefficients indicating that excelling in these skills leads to lower defensive ratings. 

In [31]:
# Standardize features for Lasso using X_clean
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_clean)

# Fit Lasso regression on WLS residuals (adjusted net rating)
lasso_model = Lasso(alpha=0.1)  # You can adjust alpha for regularization strength
lasso_model.fit(X_scaled, adjusted_y.loc[X_clean.index])

# Extract selected features and coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]  # skip intercept

In [32]:
# Create a list of selected features and their corresponding non-zero coefficients
selected_features = [feature for feature, coef in zip(X_clean.columns[1:], lasso_model.coef_[1:]) if coef != 0]
coefficients = [coef for coef in lasso_model.coef_[1:] if coef != 0]

# Create a DataFrame with the selected features and coefficients
lasso_results_df = pd.DataFrame({
    'Selected Features': selected_features,
    'Lasso Coefficients': coefficients
})

lasso_results_df

Unnamed: 0,Selected Features,Lasso Coefficients
0,PG_Defensive_Handoff,-0.059549
1,PG_Defensive_Postup,0.005495
2,SG_Defensive_Handoff,0.020982
3,SG_Defensive_OffScreen,0.06532
4,SG_Defensive_PRRollMan,-0.050211
5,SF_Defensive_OffScreen,0.085058
6,SF_Defensive_PRRollMan,0.050483
7,SF_Defensive_Postup,-0.088943
8,C_Defensive_OffScreen,-0.003794
9,C_Defensive_PRBallHandler,-0.041406


This Lasso model agrees with the findings of the WLS on the **SG_Defensive_OffScreen.**

### Final Results

After analyzing the features deemed important towards strong net, offensive, and defensive ratings, we found that most of the key features were for the PF and C position, with the skillsets of Defensive_PRBallHandler, Defensive_OffScreen, Defensive_PRRollMan, and Offensive_Cut being the key skillsets. In the recommendations script, we will be looking for big men from the 2024-2025 season that have succeeded in those attributes.