Skip to content

jacobdanovitch/Trouble-With-The-Curve

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

Trouble with the Curve

This repository contains the data, models, and web app for my paper Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports.

img

Data

To the best of my knowledge, this is the only existing dataset of its kind for baseball prospect profiles. Almost 10,000 profiles were acquired from MLB.com and FanGraphs containing players' scouting reports and 20-80 scale grades, as well as select metadata.

Models

With the above data, an obvious question arises: Can we predict if a player will make the major leagues? We use a variety of deep learning methods to attempt to answer this question, and achieve a strong "maybe". We also present an analysis of the language variations within the reports between successful players, as well as between positions.

Model Accuracy F1
Bag-Of-Embeddings 64.65% 53.78%
TextCNN 69.02% 56.42%
LSTM+SelfAttn 68.64% 54.65%
BCN 73.52% 43.33%
HAN 66.00% 54.07%

Web App

A Hierarchical Attention Network is trained as part of the above question, allowing not only a demonstration of the research problem, but also an interpretable visualization for each prediction using attention weights.

About

Using scouting reports to predict if players will make the MLB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published