Skip to content

Latest commit

 

History

History
67 lines (54 loc) · 1.54 KB

RPostgreSQL.md

File metadata and controls

67 lines (54 loc) · 1.54 KB

RPostgreSQL

PostgreSQL is an open source, relational database.

Read about it here: http://en.wikipedia.org/wiki/PostgreSQL

What is RPostgreSQL?

RPostgreSQL is an R library that allows you to interact with databases in PostgreSQL in a much more efficient way than your standard ODBC connector.

Documentation: https://code.google.com/p/rpostgresql/

Why use RPostgreSQL?

  1. Data is often stored in databases.
  2. Automated R/DB interaction means less time transferring data and more time doing analysis!

PostgreSQL basics

From terminal

psql database_name

Try these commands:

\h, \?, \l, \c database_name, \d, \d table_name, \du, \e

Create a table

CREATE TABLE my_table (
    id serial primary key,
    some_string varchar(255),
    some_int int,
    some_float float4,
    bigger_float float8,
    some_date date,
    ...
)

Integrating R and PostgreSQL

Setup Connection:

con <- dbConnect(PostgreSQL(), user='user', password='password', dbname='mydb')```

Find your tables: 
```R 
dbListTables(con)```

Run a Query and Save as R object:
```R
query <- "SELECT some_stuff FROM some_db"
mydata <- dbGetQuery(con, query)

Oh noes, query too big! Return 500 line at a time.

query <- "SELECT tons_of_stuff FROM some_db"
result <- dbSendQuery(con, query)
mydata <- fetch(result, n=500) 

Screw it, let the server/database do all the work (beta)

query <- "SELECT tons_of_stuff FROM some_db"
result <- dbSendQuery(con, query)
findings <- dbApply(result, INDEX='group_by_variables', FUN=some_function)