Archive and unarchive databases as flat text files
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
docs
inst
man
pkgdown/favicon
tests
vignettes
.Rbuildignore
.gitignore
.travis.yml
CODE_OF_CONDUCT.md
DESCRIPTION
LICENSE
LICENSE.md
NAMESPACE
NEWS.md
README.Rmd
README.md
appveyor.yml
arkdb.Rproj
codecov.yml
codemeta.json
cran-comments.md

README.md

arkdb

Travis build status Coverage status Build status CRAN_Status_Badge lifecycle CRAN RStudio mirror downloads DOI

The goal of arkdb is to provide a convenient way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much too large to read into memory all at once.

Links

Installation

You can install arkdb from GitHub with:

# install.packages("devtools")
devtools::install_github("cboettig/arkdb")

Basic use

library(arkdb)

# additional libraries just for this demo
library(dbplyr)
library(dplyr)
library(fs)

Creating an archive of a database

Consider the nycflights database in SQLite:

tmp <- tempdir() # Or can be your working directory, "."
db <- dbplyr::nycflights13_sqlite(tmp)
#> Caching nycflights db at /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//RtmpGaGbYx/nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather

Create an archive of the database:

dir <- fs::dir_create(fs::path(tmp, "nycflights"))
ark(db, dir, lines = 50000)
#> Exporting airlines in 50000 line chunks:
#>  ...Done! (in 0.007248163 secs)
#> Exporting airports in 50000 line chunks:
#>  ...Done! (in 0.02768707 secs)
#> Exporting flights in 50000 line chunks:
#>  ...Done! (in 12.79838 secs)
#> Exporting planes in 50000 line chunks:
#>  ...Done! (in 0.03708911 secs)
#> Exporting weather in 50000 line chunks:
#>  ...Done! (in 0.907186 secs)

Unarchive

Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:

files <- fs::dir_ls(dir)
new_db <- src_sqlite(fs::path(tmp, "local.sqlite"), create=TRUE)

unark(files, new_db, lines = 50000)
#> Importing airlines.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.01713181 secs)
#> Importing airports.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.03514004 secs)
#> Importing flights.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 10.64125 secs)
#> Importing planes.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.136945 secs)
#> Importing weather.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.999218 secs)

new_db
#> src:  sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpGaGbYx/local.sqlite]
#> tbls: airlines, airports, flights, planes, weather

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer