rpg
Timothy H. Keitt
02/08/2017
This package wraps PostgreSQL's libpq, a library for interacting with a PostreSQL database. Unlike other database access packages for R, rpg is designed to be specific to PostgreSQL and, as such, exposes a more of the functionality of libpq. A great deal of thought went into making rpg simple to use. The major difference between rpg and most other database packages is that rpg does not use an object-oriented model. There are no connection objects, result objects and so on. This simplifies the interface and makes using rpg a lot like using psql, PostgreSQL's command line interface. I basically wrote rpg as a nice comfy environment for my own work. If you are building infrastructure, you probably want to investigate DBI and RPostgres. There is also the excellent RPostgreSQL package on CRAN.
Installation
install.packages(rpg) # Released
devtools::install_github("thk686/rpg") # Development
Queries
rpg supports parameterized queries and prepared statements.
library(rpg)
createdb("exampledb"); connect("exampledb")## No connection... attempting reset... nope... trying default... that worked.
## CREATE DATABASE
data(mtcars); write_table(mtcars)## Initiating new transaction block... done.
## INSERT 0 1
fetch("select * from mtcars where mpg > $1", 30)## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 4 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
prepared = prepare("select * from mtcars where mpg > $1")
prepared(25); fetch(); prepared(30); fetch()## SELECT 6
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 4 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## 5 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## 6 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## SELECT 4
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 3 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 4 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Connection stack
Even though the database connection is implicit, you are not restricted to a single connection. Connections can be managed on a stack.
push_conn()
createdb("exampledb2"); connect("exampledb2")## No connection... attempting reset... nope... trying default... that worked.
## CREATE DATABASE
data(iris); write_table(iris); list_tables()## Initiating new transaction block... done.
## INSERT 0 1
## [1] "iris"
swap_conn(); list_tables()## [1] "mtcars"
show_conn_stack()## host dbname status.ok
## 1 /tmp exampledb2 TRUE
swap_conn(); describe_table("iris")## schema table column position type default
## 1 public iris Sepal.Length 1 double precision <NA>
## 2 public iris Sepal.Width 2 double precision <NA>
## 3 public iris Petal.Length 3 double precision <NA>
## 4 public iris Petal.Width 4 double precision <NA>
## 5 public iris Species 5 text <NA>
pop_conn(); get_conn_info("dbname")## [1] "exampledb"
dropdb("exampledb2")## DROP DATABASE
Database cursors and foreach
The rpg package integrates nicely with the iterators package. This allows you to use foreach to iterate over query results. This is perhaps the most unique feature of rpg.
library(foreach)
c1 = cursor("SELECT * FROM mtcars", by = 8)## Initiating new transaction block... done.
x = foreach(i = c1, .combine = rbind) %do% { i$mpg }
print(x, digits = 2)## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## result.1 21 21 23 21 19 18 14 24
## result.2 23 19 18 16 17 15 10 10
## result.3 15 32 30 34 22 16 15 13
## result.4 19 27 26 30 16 20 15 21
This usage is compatible with %dopar%; you can access these cursors from multiple threads of execution and even across a heterogeneous cluster of machines. See the documentation for cursor for an example.
Note: after some additional experimentation, I have concluded that parallel processing of cursors in PostgreSQL does not in fact work. You can iterate over a cursor on multiple backends. However it appears that only one of these backends will do any work, probably to ensure the access is serialized. My guess is that requests are forwarded to the portal where the cursor was opened.
Asynchronous queries
You can submit asynchronous queries with rpg. When combined with the connection stack, this allows you to launch queries, push the connection, make a new connection while the first query is completing. By swapping the first connection back to the active state, you can then check for completion or try to cancel the transaction.
async_query("SELECT a.* FROM mtcars a, mtcars b, mtcars c") # >30,000 rows## [1] TRUE
count = 0
repeat {
status = async_status()
if (status == "BUSY") {
if (count > 2) {
cat("calling cancel...\n")
cancel() }}
else break
cat("busy... \n")
Sys.sleep(1)
count = count + 1 }## busy...
## busy...
## busy...
## calling cancel...
## busy...
## calling cancel...
## busy...
## calling cancel...
## busy...
print(status)## ERROR: canceling statement due to user request
##
finish_async()Transactions and save points
A PostgreSQL-specific feature are save points for nested transactions. rpg provides a convenient wrapper.
sp1 = savepoint(); write_table(iris)## Error in current transaction: rolling back... done.
## Initiating new transaction block... done.
## INSERT 0 1
sp2 = savepoint(); data(Loblolly); write_table(Loblolly); list_tables()## INSERT 0 1
## [1] "Loblolly" "iris" "mtcars"
rollback(sp2); list_tables()## ROLLBACK
## [1] "iris" "mtcars"
rollback(sp1); list_tables()## ROLLBACK
## [1] "mtcars"
Object serialization and storage
The stow and stow.image functions write entire R objects into a database. You can stow your entire working environment and reload it anywhere else you can reconnect to the database. (A facility listing and commenting images is in the works.)
stow("mtcars")## Initiating new transaction block... done.
rm(list = objects())
ls()## character(0)
list_stowed()## objname stamp
## 1 mtcars 2017-10-27 12:16:01-05
retrieve("mtcars")
ls()## [1] "mtcars"
disconnect()
dropdb("exampledb")## No connection... attempting reset... nope... trying default... that worked.
## DROP DATABASE
Additional features
- Intelligent handling of passwords:
rpgwill query for the password only if it is needed. You can set a default password. - Easy creation of connection aliases: call
make_serviceand your connection settings will be saved as a named service. You only have to type complex connection options once. If you used a password, that will be saved as well. - Lots of options for getting and setting connection options and defaults.
- Easy tracing of data flowing between client and server.
- High bandwidth options for reading/writing bulk data from/to the database.
News
2/7/17 -- Gave up trying to build libpq on the fly and instead borrowed the excellent work done by the RPostgres team. RPostgres is looking very promising for DBI work.
4/15/16 -- I ripped out and replaced the build setup. I was using the autoconf bits and libpq files distributed with RPostgreSQL (a very good package you should check out). However it had a few peculiarities, like never using the included libpq on Linux. Also, I could not check the libpq version number. So now the package will check your postgres install using the pg_config command and if its not found or the version is not new enough, then libpq is downloaded and built.