Skip to content
Using scouting reports to predict if players will make the MLB.
HTML Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Trouble with the Curve

This repository contains the data, models, and web app for my paper Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports.



To the best of my knowledge, this is the only existing dataset of its kind for baseball prospect profiles. Almost 10,000 profiles were acquired from and FanGraphs containing players' scouting reports and 20-80 scale grades, as well as select metadata.


With the above data, an obvious question arises: Can we predict if a player will make the major leagues? We use a variety of deep learning methods to attempt to answer this question, and achieve a strong "maybe". We also present an analysis of the language variations within the reports between successful players, as well as between positions.

Model Accuracy F1
Bag-Of-Embeddings 64.65% 53.78%
TextCNN 69.02% 56.42%
LSTM+SelfAttn 68.64% 54.65%
BCN 73.52% 43.33%
HAN 66.00% 54.07%

Web App

A Hierarchical Attention Network is trained as part of the above question, allowing not only a demonstration of the research problem, but also an interpretable visualization for each prediction using attention weights.

You can’t perform that action at this time.