# M05. Leverage
- This predicts pitcher leverage to determine which types of pitchers to put into the game
- Type: Model
- Run Frequency: Irregular
- Sources:
    - MLB Stats API
    - Steamer
- Created: 12/30/2024
- Updated: 10/31/2025

### Imports

In [1]:
%run "C:\Users\james\Documents\MLB\Code\U1. Imports.ipynb"
%run "C:\Users\james\Documents\MLB\Code\U2. Functions.ipynb"
%run "C:\Users\james\Documents\MLB\Code\U3. Classes.ipynb"
%run "C:\Users\james\Documents\MLB\Code\U4. Datasets.ipynb"

### Data

##### Plate Appearances 

In [2]:
complete_dataset = pd.read_csv(os.path.join(baseball_path, "Final Dataset.csv"))

##### Bullpens

In [3]:
def append_files_from_bullpens_folders(base_path):
    # Initialize an empty list to store dataframes
    dataframes = []

    # Walk through the directory and its subdirectories
    for root, dirs, files in os.walk(base_path):
        # Filter folders that start with "Bullpens 2024"
        for dir_name in dirs:
            if dir_name.startswith("Bullpens 2024"):
                folder_path = os.path.join(root, dir_name)
                
                # Get all files in this folder
                for file in os.listdir(folder_path):
                    file_path = os.path.join(folder_path, file)

                    # Attempt to read the file into a dataframe
                    try:
                        df = pd.read_csv(file_path)  # Use the correct reader for your file format
                        dataframes.append(df)
                    except Exception as e:
                        print(f"Error reading {file_path}: {e}")

    # Concatenate all dataframes into a single dataframe
    if dataframes:
        combined_df = pd.concat(dataframes, ignore_index=True)
    else:
        combined_df = pd.DataFrame()  # Return an empty dataframe if no files found

        
    return combined_df

In [4]:
bullpen_df = append_files_from_bullpens_folders(r"C:\Users\james\Documents\MLB\Database\A04. Bullpens")

### Clean

Shrink dataset

In [5]:
complete_dataset = complete_dataset[['date', 'pitcher', 'pitcherName', 'halfInning', 'inning', 'prePitcherScore', 'preBatterScore', 'startingPitcher']]

### Merge

In [6]:
merged_df = complete_dataset.merge(bullpen_df, left_on=['pitcherName', 'date'], right_on=['Name', 'date'], how='inner', suffixes=("_Actual", "_Assigned"))

### Sample

Keep recent data

In [17]:
merged_df = merged_df[merged_df['date'].astype(str) > '20240101']

Only keep non-missing reliever leverages

In [18]:
merged_df = merged_df[merged_df['Leverage'] != 0]

### Model

$ \hat{\text{Leverage}} = pitcherLead + top + inning\_dummy\_list $

##### Inputs

Pitcher's team is leading

In [19]:
merged_df['pitcherLead'] = merged_df['prePitcherScore'] - merged_df['preBatterScore']

Top of the inning

In [24]:
merged_df['top'] = (merged_df['halfInning'] == "top").astype(int)

Inning dummies

In [25]:
for i in range(1, 12):
    merged_df[f'inning_{i}'] = (merged_df['inning'] == i).astype(int)
    
merged_df['inning_11'] = (merged_df['inning'] >= 11).astype(int)

In [26]:
inning_dummy_list = [col for col in merged_df.columns if col.startswith("inning_")]

Model inputs

In [27]:
leverage_inputs = ['pitcherLead', 'top'] + inning_dummy_list

##### Train/Test Split

Define features and target

In [28]:
X = merged_df[leverage_inputs]
y = merged_df['Leverage']

Split

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

##### Train

In [30]:
# Define model
predict_leverage = MLPClassifier(hidden_layer_sizes=(100,100), max_iter=500, random_state=42)
predict_leverage.fit(X_train, y_train)

# Save model
pickle.dump(predict_leverage, open(os.path.join(model_path, "M05. Leverage", f"predict_leverage_{todaysdate}.sav"), 'wb'))

### Evaluate

This is a simple model with low stakes that performs quite well in the simulation, but evaluations should be created eventually to improve accuracy.