# R01. Output Weights
- This predicts the impact of plate appearance outputs on a player's fantasy points, useful for weighting outputs for M03. Plate Appearance model evaluations
- Type: Research
- Run Frequency: Irregular
- Sources:
    - A10. Player Results
- Created: 3/2/2025
- Updated: 3/2/2025

Since plate appearance models predict a variety of possible outputs, choosing a single metric on which to evaluate accuracy is impossible. <br>
However, since the ultimate goal is to accurately predict fantasy points, FP is a decent substitute. <br>
One could weigh singles as worth 3 FP, HR as 10, etc..., but this ignores the impact of RBI, R, and SB. <br>
The regressions below allow for proper weighting based on the FP a player is likely to score throughout the game as a result of that PA result.

### Imports

In [1]:
%run "U1. Imports.ipynb"
%run "U2. Utilities.ipynb"
%run "U3. Classes.ipynb"
%run "U4. Datasets.ipynb"
%run "U5. Models.ipynb"

### Batters

##### Data

In [2]:
%%time
# List to hold all dataframes
batters_data = []

# Loop through all files in the 'A10. Player Results' directory and its subdirectories
for root, dirs, files in os.walk(os.path.join(baseball_path, 'A10. Player Results')):
    for filename in files:
        if 'batters' in filename and filename.endswith('.csv'):
            file_path = os.path.join(root, filename)
            df = pd.read_csv(file_path)
            batters_data.append(df)

# Concatenate all dataframes into one
batters_combined = pd.concat(batters_data, ignore_index=True)

CPU times: total: 42.6 s
Wall time: 2min 42s


##### Create Variables

In [3]:
batters_combined['singles'] = batters_combined['h'] - batters_combined[['doubles', 'triples', 'hr']].sum(axis=1)

In [4]:
batters_combined['runs_produced'] = batters_combined['rbi'] + batters_combined['r'] - batters_combined['hr']
batters_combined['runs_produced2'] = (batters_combined['rbi'] + batters_combined['r']) / 2

##### Regression

In [5]:
X = batters_combined[['singles', 'doubles', 'triples', 'hr', 'bb', 'hbp']]
y = batters_combined['fp']

model = sm.OLS(y, X).fit()

print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                     fp   R-squared (uncentered):                   0.953
Model:                            OLS   Adj. R-squared (uncentered):              0.953
Method:                 Least Squares   F-statistic:                          5.163e+05
Date:                Mon, 03 Mar 2025   Prob (F-statistic):                        0.00
Time:                        18:37:44   Log-Likelihood:                     -3.2696e+05
No. Observations:              153986   AIC:                                  6.539e+05
Df Residuals:                  153980   BIC:                                  6.540e+05
Df Model:                           6                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

### Pitchers

##### Data

In [14]:
%%time
# List to hold all dataframes
pitchers_data = []

# Loop through all files in the 'A10. Player Results' directory and its subdirectories
for root, dirs, files in os.walk(os.path.join(baseball_path, 'A10. Player Results')):
    for filename in files:
        if 'pitchers' in filename and filename.endswith('.csv'):
            file_path = os.path.join(root, filename)
            df = pd.read_csv(file_path)
            pitchers_data.append(df)

# Concatenate all dataframes into one
pitchers_combined = pd.concat(pitchers_data, ignore_index=True)

CPU times: total: 23.5 s
Wall time: 1min 26s


##### Create Variables

Note: we can't use this data to distinguish between non-strikeout outs

In [17]:
pitchers_combined['non_strikeout'] = pitchers_combined['outs'] - pitchers_combined['k']

##### Regression

In [30]:
X = pitchers_combined[['k', 'non_strikeout']]
y = pitchers_combined['fp']

model = sm.OLS(y, X).fit()

print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                     fp   R-squared (uncentered):                   0.801
Model:                            OLS   Adj. R-squared (uncentered):              0.801
Method:                 Least Squares   F-statistic:                          1.275e+05
Date:                Sun, 02 Mar 2025   Prob (F-statistic):                        0.00
Time:                        08:52:07   Log-Likelihood:                     -1.8166e+05
No. Observations:               63279   AIC:                                  3.633e+05
Df Residuals:                   63277   BIC:                                  3.633e+05
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
                    coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------