Skip to content
App to determine who is the greatest player for a team during a given time period
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
__pycache__
static
views
.gitignore
README.md
baseball.sql
batting.sql
fabfile.py
index.js
init.txt
package.json
pitching.sql
queries.sql
remote.sql

README.md

GOATFinder

Not goat as in the animal that eats tin cans. GOAT as in Greatest Of All Time. The purpose of this app is to figure out, using various weights and measures, who the greatest baseball players of all time are, and the take it a step further and answer the question: who are the greatest baseball players of any given time period?

The data was sourced from Sean Lahman's baseball database, and extends to 2016. This project was an exercise and study in SQL

Inital MVP: Create a slider that queries for a user-given year window and returns a roster of the greatest baseball players by their primary position through various criteria: batting stats (On Base Plus Slugging) for position players, Wins then ERA/WHIP for starting pitchers, and Saves then ERA/WHIP for bullpen pitchers.

Stretch goals:

  • Allow for user-defined weighting of various other stats, such as homeruns, strikeouts, etc.
  • *Done!
  • Allow for querying by a given team/division/year
  • *Done, except for league/division, which will require far more research
  • Refactor in React to allow for dynamic return of data
  • Fork off to a dashboard in Mongo or another noSQL database to learn its dynamics

Challenges:

  • Finding a comprehensive dataset online
  • Working with MySQL after being so comfortable with Postgres
  • Creating an aggregate query. The dataset was raw and there were all sorts of quirks to the database.
    • Datatypes were not as expected (varchar on an integer-only column)
    • Teamid was not what I thought it was, and there's a key called franchid that I had to research what the difference was
  • Making the query more efficient. Initial query was 4 seconds long. I used combination of indexes and materialized views, as the data was static. I also broke out some of the queries in to separate queries to reduce the number of joins. Aggregator query is now .3 seconds.

Technologies used:

  • MySQL
  • PostgreSQL
  • Express
  • Node.js
  • Bulma
You can’t perform that action at this time.