# R01. Output Weights
- This predicts the impact of plate appearance outputs on a player's fantasy points, useful for weighting outputs for M03. Plate Appearance model evaluations
- Type: Research
- Run Frequency: Irregular
- Sources:
    - A10. Player Results
- Created: 3/2/2025
- Updated: 3/2/2025

Since plate appearance models predict a variety of possible outputs, choosing a single metric on which to evaluate accuracy is impossible. <br>
However, since the ultimate goal is to accurately predict fantasy points, FP is a decent substitute. <br>
One could weigh singles as worth 3 FP, HR as 10, etc..., but this ignores the impact of RBI, R, and SB. <br>
The regressions below allow for proper weighting based on the FP a player is likely to score throughout the game as a result of that PA result.

### Imports

In [2]:
%run "U1. Imports.ipynb"
%run "U2. Functions.ipynb"
%run "U3. Classes.ipynb"
%run "U4. Datasets.ipynb"
%run "U5. Models.ipynb"

### Batters

##### Data

In [2]:
%%time
# List to hold all dataframes
batters_data = []

# Loop through all files in the 'A10. Player Results' directory and its subdirectories
for root, dirs, files in os.walk(os.path.join(baseball_path, 'A10. Player Results')):
    for filename in files:
        if 'batters' in filename and filename.endswith('.csv'):
            file_path = os.path.join(root, filename)
            df = pd.read_csv(file_path)
            batters_data.append(df)

# Concatenate all dataframes into one
batters_combined = pd.concat(batters_data, ignore_index=True)

CPU times: total: 42.6 s
Wall time: 2min 42s


##### Create Variables

In [3]:
batters_combined['singles'] = batters_combined['h'] - batters_combined[['doubles', 'triples', 'hr']].sum(axis=1)

In [4]:
batters_combined['runs_produced'] = batters_combined['rbi'] + batters_combined['r'] - batters_combined['hr']
batters_combined['runs_produced2'] = (batters_combined['rbi'] + batters_combined['r']) / 2

##### Regression

In [5]:
X = batters_combined[['singles', 'doubles', 'triples', 'hr', 'bb', 'hbp']]
y = batters_combined['fp']

model = sm.OLS(y, X).fit()

print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                     fp   R-squared (uncentered):                   0.953
Model:                            OLS   Adj. R-squared (uncentered):              0.953
Method:                 Least Squares   F-statistic:                          5.163e+05
Date:                Mon, 03 Mar 2025   Prob (F-statistic):                        0.00
Time:                        18:37:44   Log-Likelihood:                     -3.2696e+05
No. Observations:              153986   AIC:                                  6.539e+05
Df Residuals:                  153980   BIC:                                  6.540e+05
Df Model:                           6                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

### Pitchers

##### Data

In [5]:
%%time
# List to hold all dataframes
pitchers_data = []

# Loop through all files in the 'A10. Player Results' directory and its subdirectories
for root, dirs, files in os.walk(os.path.join(baseball_path, 'A10. Player Results')):
    for filename in files:
        if 'pitchers' in filename and filename.endswith('.csv'):
            file_path = os.path.join(root, filename)
            try:
                df = pd.read_csv(file_path)
            except:
                pass
            pitchers_data.append(df)

# Concatenate all dataframes into one
pitchers_combined = pd.concat(pitchers_data, ignore_index=True)

CPU times: total: 43.5 s
Wall time: 1min 11s


In [6]:
pitchers_combined.head()

Unnamed: 0,name,personId,starter,ip,pa,outs,h,r,er,bb,k,hr,hbp,w,l,cg,cgso,nh,fp,gamePk,date,year,venue_id,team,teamabbrev
0,McKenzie,663474,1,5.2,24.0,17,7,4,4,0,6,1,0,0,1,0,0,0,12.55,661032,20220426,2022,1,away,CLE
1,"De Los Santos, E",660853,0,1.1,4.0,4,0,0,0,0,1,0,0,0,0,0,0,0,5.0,661032,20220426,2022,1,away,CLE
2,Gose,543238,0,1.0,4.0,3,1,0,0,1,0,0,0,0,0,0,0,0,1.05,661032,20220426,2022,1,away,CLE
3,Sandoval,663776,1,7.0,24.0,21,2,0,0,1,9,0,0,1,0,0,0,0,35.95,661032,20220426,2022,1,home,LAA
4,Loup,571901,0,1.0,3.0,3,0,0,0,0,1,0,0,0,0,0,0,0,4.25,661032,20220426,2022,1,home,LAA


##### Create Variables

Note: we can't use this data to distinguish between non-strikeout outs or non-hr hits

In [10]:
pitchers_combined['non_strikeout'] = pitchers_combined['outs'] - pitchers_combined['k']
pitchers_combined['non_hr'] = pitchers_combined['h'] - pitchers_combined['hr']

##### Regression

In [11]:
X = pitchers_combined[['k', 'non_strikeout', 'non_hr', 'hr', 'bb']]
y = pitchers_combined['fp']

model = sm.OLS(y, X).fit()

print(model.summary())

                                 OLS Regression Results                                
Dep. Variable:                     fp   R-squared (uncentered):                   0.956
Model:                            OLS   Adj. R-squared (uncentered):              0.956
Method:                 Least Squares   F-statistic:                          3.388e+05
Date:                Fri, 15 Aug 2025   Prob (F-statistic):                        0.00
Time:                        10:54:14   Log-Likelihood:                     -1.6633e+05
No. Observations:               78485   AIC:                                  3.327e+05
Df Residuals:                   78480   BIC:                                  3.327e+05
Df Model:                           5                                                  
Covariance Type:            nonrobust                                                  
                    coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------