- Install from CRAN
install.packages("baseballDBR")
- Or, install the latest development version from GitHub:
devtools::install_github("keberwein/baseballDBR")
The baseballDBR
package requires data that is formatted similar to the Baseball Databank or Sean Lahman's Baseball Database. The package also contains the get_bbdb()
function, which allows us to download the most up-to-date tables directly from the Chadwick Bureau's GitHub repository. For example, we can easily load the "Batting" table into our R environment.
library(baseballDBR)
get_bbdb(table = "Batting")
head(Batting)
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB
#> 1 abercda01 1871 1 TRO <NA> 1 4 0 0 0 0 0 0 0 0 0
#> 2 addybo01 1871 1 RC1 <NA> 25 118 30 32 6 0 0 13 8 1 4
#> 3 allisar01 1871 1 CL1 <NA> 29 137 28 40 4 5 0 19 3 1 2
#> 4 allisdo01 1871 1 WS3 <NA> 27 133 28 44 10 2 2 27 1 1 0
#> 5 ansonca01 1871 1 RC1 <NA> 25 120 29 39 11 3 0 16 6 2 2
#> 6 armstbo01 1871 1 FW1 <NA> 12 49 9 11 2 1 0 5 0 1 0
#> SO IBB HBP SH SF GIDP
#> 1 0 NA NA NA NA NA
#> 2 0 NA NA NA NA NA
#> 3 5 NA NA NA NA NA
#> 4 2 NA NA NA NA NA
#> 5 1 NA NA NA NA NA
#> 6 1 NA NA NA NA NA
library(Lahman)
library(baseballDBR)
Batting <- Lahman::Batting
head(Batting)
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB
#> 1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 0 0
#> 2 addybo01 1871 1 RC1 NA 25 118 30 32 6 0 0 13 8 1 4
#> 3 allisar01 1871 1 CL1 NA 29 137 28 40 4 5 0 19 3 1 2
#> 4 allisdo01 1871 1 WS3 NA 27 133 28 44 10 2 2 27 1 1 0
#> 5 ansonca01 1871 1 RC1 NA 25 120 29 39 11 3 0 16 6 2 2
#> 6 armstbo01 1871 1 FW1 NA 12 49 9 11 2 1 0 5 0 1 0
#> SO IBB HBP SH SF GIDP
#> 1 0 NA NA NA NA NA
#> 2 0 NA NA NA NA NA
#> 3 5 NA NA NA NA NA
#> 4 2 NA NA NA NA NA
#> 5 1 NA NA NA NA NA
#> 6 1 NA NA NA NA NA
Simple batting metrics can be easily added to any batting data frame. For example, we can add slugging percentage, on-base percentage and on-base plus slugging. Note that OPS and OBP appears as "NA" for the years before IBB was tracked.
library(baseballDBR)
Batting$SLG <- SLG(Batting)
Batting$OBP <- OBP(Batting)
head(Batting, 3)
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB
#> 1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 0 0
#> 2 addybo01 1871 1 RC1 NA 25 118 30 32 6 0 0 13 8 1 4
#> 3 allisar01 1871 1 CL1 NA 29 137 28 40 4 5 0 19 3 1 2
#> SO IBB HBP SH SF GIDP SLG OBP
#> 1 0 NA NA NA NA NA 0.000 NA
#> 2 0 NA NA NA NA NA 0.322 NA
#> 3 5 NA NA NA NA NA 0.394 NA
The package includes a suite of advanced metrics such as wOBA, RAA, and FIP, among others. Many of the advanced metrics require multiple tables. For example, the wOBA metric requires the Batting, Pitching, and Fielding tables in order to establish a player's regular defensive position.
library(baseballDBR)
get_bbdb(table = c("Batting", "Pitching", "Fielding"))
Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = T)
head(Batting, 3)
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB
#> 1 abercda01 1871 1 TRO <NA> 1 4 0 0 0 0 0 0 0 0 0
#> 2 addybo01 1871 1 RC1 <NA> 25 118 30 32 6 0 0 13 8 1 4
#> 3 allisar01 1871 1 CL1 <NA> 29 137 28 40 4 5 0 19 3 1 2
#> SO IBB HBP SH SF GIDP wOBA
#> 1 0 NA NA NA NA NA 0.0000000
#> 2 0 NA NA NA NA NA 0.2855902
#> 3 5 NA NA NA NA NA 0.3078849
The code above uses Fangraphs wOBA values. The default behavior is to uses Tom Tango's adapted SQL formula. Other options include Sep.Leagues
, which may act as a buffer to any bias created by the designated hitter.
library(baseballDBR)
get_bbdb(table = c("Batting", "Pitching", "Fielding"))
Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = F, Sep.Leagues = T)
head(Batting, 3)
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB
#> 1 abercda01 1871 1 TRO <NA> 1 4 0 0 0 0 0 0 0 0 0
#> 2 addybo01 1871 1 RC1 <NA> 25 118 30 32 6 0 0 13 8 1 4
#> 3 allisar01 1871 1 CL1 <NA> 29 137 28 40 4 5 0 19 3 1 2
#> SO IBB HBP SH SF GIDP wOBA
#> 1 0 NA NA NA NA NA NA
#> 2 0 NA NA NA NA NA NA
#> 3 5 NA NA NA NA NA NA
We can also produce a data frame that only shows the wOBA multipliers. Notice the Fangraphs wOBA multipliers slightly differ from the Tango multipliers.
library(baseballDBR)
get_bbdb(table = c("Batting", "Pitching", "Fielding"))
fangraphs_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=T)
head(fangraphs_woba, 3)
#> yearID lg_woba woba_scale wBB wHBP w1B w2B w3B wHR runSB
#> 1 2017 0.320 1.192 0.693 0.723 0.878 1.236 1.558 1.989 0.2
#> 2 2016 0.318 1.212 0.691 0.721 0.878 1.242 1.569 2.015 0.2
#> 3 2015 0.313 1.251 0.687 0.718 0.881 1.256 1.594 2.065 0.2
#> runCS lg_r_pa lg_r_w cFIP
#> 1 -0.421 0.121 10.007 3.126
#> 2 -0.410 0.118 9.778 3.147
#> 3 -0.392 0.113 9.421 3.134
tango_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=F)
head(tango_woba, 3)
#> # A tibble: 3 x 35
#> # Groups: yearID, RperOut, runBB, runHBP, run1B, run2B, run3B, runHR,
#> # runSB, runCS [3]
#> yearID AB R H X2B X3B HR SB CS BB SO IBB
#> <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 1871 23179 5659 6616 950 495 101 948 270 817 371 0
#> 2 1872 34755 7487 10003 1212 293 88 536 264 477 532 0
#> 3 1873 40346 8487 11832 1308 472 102 395 253 747 552 0
#> # ... with 23 more variables: HBP <dbl>, SF <dbl>, RperOut <dbl>,
#> # runBB <dbl>, runHBP <dbl>, run1B <dbl>, run2B <dbl>, run3B <dbl>,
#> # runHR <dbl>, runSB <dbl>, runCS <dbl>, runMinus <dbl>, runPlus <dbl>,
#> # lg_woba <dbl>, woba_scale <dbl>, wBB <dbl>, wHBP <dbl>, w1B <dbl>,
#> # w2B <dbl>, w3B <dbl>, wHR <dbl>, wSB <dbl>, wCS <dbl>
A relational database is not needed to work with these data. However, we may want to store the data to be called more quickly at a later time. We can download all of the tables at once with the get_bbdb()
function and then write them to an empty schema in our favorite database. The example uses a newly created PostgreSQL instance, but other database tools can be used assuming an appropriate R package exists.
library(baseballDBR)
library(RPostgreSQL)
# Load all tables into the Global Environment.
get_bbdb(AllTables = TRUE)
# Make a list of all data frames.
dbTables <- names(Filter(isTRUE, eapply(.GlobalEnv, is.data.frame)))
# Load data base drivers and load all data frames in a loop.
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host= "localhost", dbname= "lahman", user= "YOUR_USERNAME", password = "YOUR_PASSWORD")
for (i in 1:length(dbTables)) {
dbWriteTable(con, name = dbTables[i], value = get0(dbTables[i]), overwrite = TRUE)
}
# Disconnect from database.
dbDisconnect(con)
rm(con, drv)