Skip to content
Archive and unarchive databases as flat text files
Branch: master
Clone or download
cboettig refactoring bulk import (#27)
* tidy code slightly, add messaging for bulk import

* update tests

* exploring use of ... passing to bulk import method, needs hardening to handle both readr and read.table syntax at least before suitable for this...

* include timings

* use message instead of warning on fallback method
Latest commit 0513ba7 Jun 19, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
R refactoring bulk import (#27) Jun 18, 2019
man experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
pkgdown/favicon pkgdown with favicon now Dec 24, 2018
tests refactoring bulk import (#27) Jun 18, 2019
.travis.yml ci + code coverage Jun 8, 2018 suggest fs Jun 8, 2018
DESCRIPTION experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
LICENSE update docs Jun 8, 2018 update docs Jun 8, 2018
NAMESPACE experimental support for bulk importer in MonetDB (#25) Jun 7, 2019
README.Rmd add hex Dec 23, 2018 add hex Dec 23, 2018
appveyor.yml ci + code coverage Jun 8, 2018
codemeta.json experimental support for bulk importer in MonetDB (#25) Jun 7, 2019


Travis build status Coverage status Build status CRAN_Status_Badge lifecycle CRAN RStudio mirror downloads DOI

The goal of arkdb is to provide a convenient way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much too large to read into memory all at once.



You can install arkdb from GitHub with:

# install.packages("devtools")

Basic use


# additional libraries just for this demo

Creating an archive of a database

Consider the nycflights database in SQLite:

tmp <- tempdir() # Or can be your working directory, "."
db <- dbplyr::nycflights13_sqlite(tmp)
#> Caching nycflights db at /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//RtmpGaGbYx/nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather

Create an archive of the database:

dir <- fs::dir_create(fs::path(tmp, "nycflights"))
ark(db, dir, lines = 50000)
#> Exporting airlines in 50000 line chunks:
#>  ...Done! (in 0.007248163 secs)
#> Exporting airports in 50000 line chunks:
#>  ...Done! (in 0.02768707 secs)
#> Exporting flights in 50000 line chunks:
#>  ...Done! (in 12.79838 secs)
#> Exporting planes in 50000 line chunks:
#>  ...Done! (in 0.03708911 secs)
#> Exporting weather in 50000 line chunks:
#>  ...Done! (in 0.907186 secs)


Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:

files <- fs::dir_ls(dir)
new_db <- src_sqlite(fs::path(tmp, "local.sqlite"), create=TRUE)

unark(files, new_db, lines = 50000)
#> Importing airlines.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.01713181 secs)
#> Importing airports.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.03514004 secs)
#> Importing flights.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 10.64125 secs)
#> Importing planes.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.136945 secs)
#> Importing weather.tsv.bz2 in 50000 line chunks:
#>  ...Done! (in 0.999218 secs)

#> src:  sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpGaGbYx/local.sqlite]
#> tbls: airlines, airports, flights, planes, weather

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.


You can’t perform that action at this time.