FFDB: Local Statcast database

Tired of using the MLB API? Downloads folder full of CSV files exported from Baseball Savant? Look no further. Here's what you can get out of FFDB:

Access to all of the pitch-level and play-by-play data in the pitch-tracking era (beginning in the late 2000s)
A robust database schema based on the MLB API format
Lightning-fast queries using Apache Parquet and DuckDB

This was originally created as an internal tool for my own sabermetrics projects, but I cleaned it up just enough to make it public in case anyone else wanted to use it as well.

(You can find more about me, the author, at https://harperawl.net. Feel free to contact me with any questions you have about this project!)

Setup

1. Create a virtual environment and install dependencies:

python -m venv .venv
.venv\Scripts\activate
pip install -e .

2. Create a `.env` file (see `.env.example`) and set:

RAW_DATA_DIR
PROCESSED_DATA_DIR
DUCKDB_PATH

3. Run the CLI with args

This command line tool helps you set up the database and refresh it as needed.

Set up the database from start to finish:

ffdb setup --start-year 2024 --end-year 2026

Refresh the database for the current year (defaults to the current year):

ffdb refresh

Queries

Install the DuckDB package into your environment:

Python:

pip install duckdb

R:

install.packages("duckdb")

Make your query using SQL syntax:

Python:

import duckdb

conn = duckdb.connect("path/to/your/database.duckdb")

data = conn.execute("""
    SELECT * FROM games LIMIT 100;
""")

R:

library(duckdb)
library(dplyr)

conn <- dbConnect(duckdb(), dbdir = "path/to/your/database.duckdb", read_only = TRUE)

pitches <- dbGetQuery(conn, "
    SELECT * FROM games LIMIT 100;
")

See the database documentation for more information on how to query the database.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cli		cli
docs		docs
extractors		extractors
references		references
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.py		config.py
duck.py		duck.py
games.py		games.py
helpers.py		helpers.py
pipeline.py		pipeline.py
players.py		players.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FFDB: Local Statcast database

Setup

1. Create a virtual environment and install dependencies:

2. Create a `.env` file (see `.env.example`) and set:

3. Run the CLI with args

Queries

Install the DuckDB package into your environment:

Make your query using SQL syntax:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FFDB: Local Statcast database

Setup

1. Create a virtual environment and install dependencies:

2. Create a .env file (see .env.example) and set:

3. Run the CLI with args

Queries

Install the DuckDB package into your environment:

Make your query using SQL syntax:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Create a `.env` file (see `.env.example`) and set:

Packages