Skip to content

joshsalce/arbitration_tensorflow

Repository files navigation

Predicting MLB Arbitration Salaries with Keras Sequential Network

Description

This project is an attempt to predict MLB arbitration salaries based on a bevy of factors. This repository includes Jupyter notebooks containing cleaning and model building procedures. All models were neural networks trained in Tensorflow Keras and evaluated on metrics including Mean Absolute Error and Mean Absolute Percentage Error. Also included are Python files containing helper functions used in the data cleaning and preprocessing sections. An article explaining this project can be found at my Medium page.

Packages and Tech Used

Table of Contents

Component Description
Data CSV files of raw data to be cleaned. Includes custom Fangraphs data, scraped and collected arbitration data with further data collection. Includes 'Metadata' folder describing datasets
Predictions Contains Excel file with test set true and predicted salaires in tables, pivot tables
Train Test Data Contains CSV files of data after cleaning process and splitting into training, test sets
Visualizations Includes visualizations of histograms visualizing distributions of individual features, scatterplots for model predictions according to different groupings
Data_Cleaning Jupyter notebook for importing and cleaning data. Procedure includes standardizing names, positions, changing data types and values
DNN_pitchers Jupyter notebook for importing, preprocessing data, training and evaluating Tensorflow Neural network for MLB pitchers. Training done on 2011-2022 pitchers, tested on 2023 pitchers
DNN_players Jupyter notebook for importing, preprocessing data, training and evaluating Tensorflow Neural network for MLB position players. Training done on 2011-2022 position players, tested on 2023 position players
Helper Functions Includes Python files containing written helper functions to scrape data, build histogram visuals, clean data

Visualizations- Descriptions