R package to download, import, convert, and cache the RAM Legacy Stock Assessment Database
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
man
.Rbuildignore
.gitignore
.travis.yml
DESCRIPTION
NAMESPACE
README.Rmd
README.md
ramlegacy.Rproj

README.md

Import the RAM Legacy Stock Assessment Database into R

Travis-CI Build Status

This package does one thing: it downloads a Microsoft Access copy of the RAM Legacy Stock Assessment Database and converts it to a local sqlite3 database. This makes it easy to work with dplyr/dbplyr, for example. The make_ramlegacy() function also leaves a copy of .csv files for each table in the database in the R working directory if you’d prefer to work with those.

Note that you must have the utility mdb-tables installed and in your path from mdbtools. This utility provides tools for extracting Access databases. You can find installation instructions at http://mdbtools.sourceforge.net. If you’re on OS X and using homebrew, you can install it with brew install mdbtools.

Example use

Install the package:

# install.packages("devtools")
devtools::install_github("seananderson/ramlegacy")

Cache and convert the database:

library("ramlegacy")
make_ramlegacy()
#> Converting table area
#> Converting table assessment
#> Converting table assessmethod
#> Converting table assessor
#> Converting table biometrics
#> Converting table bioparams
#> Converting table bioparams_units_views
#> Converting table bioparams_values_views
#> Converting table management
#> Converting table model_results
#> Converting table stock
#> Converting table taxonomy
#> Converting table timeseries
#> Converting table timeseries_units_views
#> Converting table timeseries_values_views
#> Converting table tsmetrics

Work with the data:

library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
ram <- src_sqlite("ramlegacy.sqlite3")
ram # see the available tables
#> src:  sqlite 3.19.3 [/Users/seananderson/src/ramlegacy/ramlegacy.sqlite3]
#> tbls: area, assessment, assessmethod, assessor, biometrics, bioparams,
#>   bioparams_units_views, bioparams_values_views, management,
#>   model_results, sqlite_stat1, sqlite_stat4, stock, taxonomy, timeseries,
#>   timeseries_units_views, timeseries_values_views, tsmetrics

Access the area table:

tbl(ram, "area")
#> # Source:   table<area> [?? x 6]
#> # Database: sqlite 3.19.3
#> #   [/Users/seananderson/src/ramlegacy/ramlegacy.sqlite3]
#>    country   areatype areacode areaname        alternateareaname areaid   
#>    <chr>     <chr>    <chr>    <chr>           <chr>             <chr>    
#>  1 Argentina CFP      ARG-N    Northern Argen… <NA>              Argentin…
#>  2 Argentina CFP      ARG-S    Southern Argen… <NA>              Argentin…
#>  3 Australia AFMA     CASCADE  Cascade Plateau <NA>              Australi…
#>  4 Australia AFMA     ESE      Eastern half o… <NA>              Australi…
#>  5 Australia AFMA     GAB      Great Australi… <NA>              Australi…
#>  6 Australia AFMA     MI       Macquarie Isla… ""                Australi…
#>  7 Australia AFMA     NAUST    Northern Austr… <NA>              Australi…
#>  8 Australia AFMA     NSWWA    New South Wale… <NA>              Australi…
#>  9 Australia AFMA     SE       Southeast Aust… <NA>              Australi…
#> 10 Australia AFMA     TAS      Tasmania        <NA>              Australi…
#> # ... with more rows

Join the time series ts and stock tables on the stockid column:

ts <- tbl(ram, "timeseries")
stock <- tbl(ram, "stock")
select(stock, stockid, scientificname, commonname, region) %>%
  inner_join(ts)
#> Joining, by = "stockid"
#> # Source:   lazy query [?? x 9]
#> # Database: sqlite 3.19.3
#> #   [/Users/seananderson/src/ramlegacy/ramlegacy.sqlite3]
#>    stockid  scientificname  commonname region assessid  stocklong   tsid  
#>    <chr>    <chr>           <chr>      <chr>  <chr>     <chr>       <chr> 
#>  1 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  2 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  3 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  4 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  5 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  6 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  7 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  8 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#>  9 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#> 10 ACADRED… Sebastes fasci… Acadian r… US Ea… NEFSC-AC… Acadian re… BdivB…
#> # ... with more rows, and 2 more variables: tsyear <dbl>, tsvalue <dbl>

Note that because you are working with dplyr and a database, you will need to use dplyr::collect() once you want to load the data into a local data frame. For example:

tbl(ram, "area") %>% 
  collect()
#> # A tibble: 275 x 6
#>    country   areatype areacode areaname        alternateareaname areaid   
#>    <chr>     <chr>    <chr>    <chr>           <chr>             <chr>    
#>  1 Argentina CFP      ARG-N    Northern Argen… <NA>              Argentin…
#>  2 Argentina CFP      ARG-S    Southern Argen… <NA>              Argentin…
#>  3 Australia AFMA     CASCADE  Cascade Plateau <NA>              Australi…
#>  4 Australia AFMA     ESE      Eastern half o… <NA>              Australi…
#>  5 Australia AFMA     GAB      Great Australi… <NA>              Australi…
#>  6 Australia AFMA     MI       Macquarie Isla… ""                Australi…
#>  7 Australia AFMA     NAUST    Northern Austr… <NA>              Australi…
#>  8 Australia AFMA     NSWWA    New South Wale… <NA>              Australi…
#>  9 Australia AFMA     SE       Southeast Aust… <NA>              Australi…
#> 10 Australia AFMA     TAS      Tasmania        <NA>              Australi…
#> # ... with 265 more rows

For safety, you may want to use dplyr::collect(n = Inf) to return all rows of data, not just the minimum default number. In this case it won’t make a difference.

tbl(ram, "area") %>% 
  collect(n = Inf)
#> # A tibble: 275 x 6
#>    country   areatype areacode areaname        alternateareaname areaid   
#>    <chr>     <chr>    <chr>    <chr>           <chr>             <chr>    
#>  1 Argentina CFP      ARG-N    Northern Argen… <NA>              Argentin…
#>  2 Argentina CFP      ARG-S    Southern Argen… <NA>              Argentin…
#>  3 Australia AFMA     CASCADE  Cascade Plateau <NA>              Australi…
#>  4 Australia AFMA     ESE      Eastern half o… <NA>              Australi…
#>  5 Australia AFMA     GAB      Great Australi… <NA>              Australi…
#>  6 Australia AFMA     MI       Macquarie Isla… ""                Australi…
#>  7 Australia AFMA     NAUST    Northern Austr… <NA>              Australi…
#>  8 Australia AFMA     NSWWA    New South Wale… <NA>              Australi…
#>  9 Australia AFMA     SE       Southeast Aust… <NA>              Australi…
#> 10 Australia AFMA     TAS      Tasmania        <NA>              Australi…
#> # ... with 265 more rows