Skip to content
Archive and unarchive databases as flat text files
R
Branch: master
Clone or download
cboettig refactoring bulk import (#27)
* tidy code slightly, add messaging for bulk import

* update tests

* exploring use of ... passing to bulk import method, needs hardening to handle both readr and read.table syntax at least before suitable for this...

* include timings

* use message instead of warning on fallback method
Latest commit 0513ba7 Jun 19, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R refactoring bulk import (#27) Jun 18, 2019
docs
inst
man experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
pkgdown/favicon pkgdown with favicon now Dec 24, 2018
tests refactoring bulk import (#27) Jun 18, 2019
vignettes
.Rbuildignore
.gitignore
.travis.yml ci + code coverage Jun 8, 2018
CODE_OF_CONDUCT.md suggest fs Jun 8, 2018
DESCRIPTION experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
LICENSE update docs Jun 8, 2018
LICENSE.md update docs Jun 8, 2018
NAMESPACE experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
NEWS.md
README.Rmd add hex Dec 23, 2018
README.md add hex Dec 23, 2018
appveyor.yml ci + code coverage Jun 8, 2018
arkdb.Rproj
codecov.yml
codemeta.json experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
cran-comments.md

README.md

arkdb

Travis build status Coverage status Build status CRAN_Status_Badge lifecycle CRAN RStudio mirror downloads DOI

The goal of arkdb is to provide a convenient way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much too large to read into memory all at once.

Links

Installation

You can install arkdb from GitHub with:

# install.packages("devtools")
devtools::install_github("cboettig/arkdb")

Basic use

library(arkdb)

# additional libraries just for this demo
library(dbplyr)
library(dplyr)
library(fs)

Creating an archive of a database

Consider the nycflights database in SQLite:

tmp <- tempdir() # Or can be your working directory, "."
db <- dbplyr::nycflights13_sqlite(tmp)
#> Caching nycflights db at /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//RtmpGaGbYx/nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather

Create an archive of the database:

dir <- fs::dir_create(fs::path(tmp, "nycflights"))
ark(db, dir, lines = 50000)
#> Exporting airlines in 50000 line chunks:
#>  ...Done! (in 0.007248163 secs)
#> Exporting airports in 50000 line chunks:
#>  ...Done! (in 0.02768707 secs)
#> Exporting flights in 50000 line chunks:
#>  ...Done! (in 12.79838 secs)
#> Exporting planes in 50000 line chunks:
#>  ...Done! (in 0.03708911 secs)
#> Exporting weather in 50000 line chunks:
#>  ...Done! (in 0.907186 secs)

Unarchive

Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:

files <- fs::dir_ls(dir)
new_db <- src_sqlite(fs::path(tmp, "local.sqlite"), create=TRUE)

unark(files, new_db, lines = 50000)
#> Importing airlines.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.01713181 secs)
#> Importing airports.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.03514004 secs)
#> Importing flights.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 10.64125 secs)
#> Importing planes.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.136945 secs)
#> Importing weather.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.999218 secs)

new_db
#> src:  sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpGaGbYx/local.sqlite]
#> tbls: airlines, airports, flights, planes, weather

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

ropensci_footer

You can’t perform that action at this time.