Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot overwrite or append in schemas #140

Closed
etiennebr opened this issue Dec 10, 2017 · 4 comments
Closed

Cannot overwrite or append in schemas #140

etiennebr opened this issue Dec 10, 2017 · 4 comments

Comments

@etiennebr
Copy link

Description

It seems that specifying a schema using DBI::SQL() prevents the use of overwrite and append. Here's an example using the public schema, but obviously it is when using other schemas that it is hard to work around.

Database

PostgreSQL 9.3.20 64-bit

Example

library(DBI)                                                                       
con <- dbConnect(odbc::odbc(), driver = "PostgreSQL Unicode", database = "odbc")
                                                                                   
dbWriteTable(con, "a", data.frame(a=1, b=2))                                       
dbWriteTable(con, "a", data.frame(a=1, b=2))                                       
#> Error: Table a exists in database, and both overwrite and append are FALSE
dbWriteTable(con, "a", data.frame(a=1, b=2), overwrite = TRUE)                     
dbWriteTable(con, "a", data.frame(a=1, b=2), append = TRUE)                        
                                                                                   
dbWriteTable(con, DBI::SQL("a"), data.frame(a=1, b=2))                             
#> Error: Table a exists in database, and both overwrite and append are FALSE
dbWriteTable(con, DBI::SQL("a"), data.frame(a=1, b=2), overwrite = TRUE)           
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#>  target signature 'PostgreSQL#SQL'.
#>  "OdbcConnection#character" would also be valid
dbWriteTable(con, DBI::SQL("a"), data.frame(a=1, b=2), append = TRUE)              
                                                                                   
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a=1, b=2))                      
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: ERROR: relation "a" already exists;
#> Error while executing the query
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a=1, b=2), overwrite = TRUE)    
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: ERROR: relation "a" already exists;
#> Error while executing the query
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a=1, b=2), append = TRUE)       
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: ERROR: relation "a" already exists;
#> Error while executing the query
@colearendt
Copy link

Working with schemas is tricky in DBI, in general, but work is thankfully outlined on that front, at least (r-dbi/DBI#24). The easiest way to workaround this upstream limitation is by defining the schema server-side.

Maybe your point is that this workaround shouldn't be necessary, but I think (hope) some of the schema finicky behavior will be taken care of when DBI has taken schemas into account.

library(odbc)
library(DBI)

con <- dbConnect(odbc::odbc(), ...)

dbWriteTable(con, "a", iris)  ## ok

dbWriteTable(con, "a", iris)  ## err
#> Error: Table a exists in database, and both overwrite and append are FALSE

dbWriteTable(con, "a", iris, overwrite = TRUE)  ## ok

dbWriteTable(con, DBI::SQL("public.a"), iris, overwrite = TRUE)  ## err
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#>  target signature 'PostgreSQL#SQL'.
#>  "OdbcConnection#character" would also be valid
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "Sepal.Length" DOUBLE PRECISION,
#>   "Sepal.Width" DOUBLE PRECISION,
#>   "Petal.Length" DOUBLE PRECISION,
#>   "Petal.Width" DOUBLE PRECISION,
#>   "Species" TEXT
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 

## working in a different schema

dbWriteTable(con, DBI::SQL("datawarehouse.a"), iris)  ## ok

dbWriteTable(con, DBI::SQL("datawarehouse.a"), iris, overwrite = TRUE)  ## err
#> Error: <SQL> 'CREATE TABLE datawarehouse.a (
#>   "Sepal.Length" DOUBLE PRECISION,
#>   "Sepal.Width" DOUBLE PRECISION,
#>   "Petal.Length" DOUBLE PRECISION,
#>   "Petal.Width" DOUBLE PRECISION,
#>   "Species" TEXT
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 

dbExecute(con, "SET search_path = datawarehouse")
#> [1] 0

dbWriteTable(con, "a", iris, overwrite = TRUE) ## ok

Wondering if maybe the dispatch of dbWriteTable on a SQL object doesn't include a DROP TABLE if overwrite=TRUE...

The only place that this really gets painful is cross-schema joins... but dbplyr comes to the rescue there!!

library(dbplyr)
library(dplyr)
library(odbc)
library(DBI)


con <- dbConnect(odbc::odbc(), ...)

tbl_a <- tbl(con, in_schema("public", "a"))
tbl_b <- tbl(con, in_schema("datawarehouse", "a"))

tbl_a %>% left_join(tbl_b, by = c("Species")) %>% sql_render()
#> <SQL> SELECT "TBL_LEFT"."Sepal.Length" AS "Sepal.Length.x", "TBL_LEFT"."Sepal.Width" AS "Sepal.Width.x", "TBL_LEFT"."Petal.Length" AS "Petal.Length.x", "TBL_LEFT"."Petal.Width" AS "Petal.Width.x", "TBL_LEFT"."Species" AS "Species", "TBL_RIGHT"."Sepal.Length" AS "Sepal.Length.y", "TBL_RIGHT"."Sepal.Width" AS "Sepal.Width.y", "TBL_RIGHT"."Petal.Length" AS "Petal.Length.y", "TBL_RIGHT"."Petal.Width" AS "Petal.Width.y"
#>   FROM public.a AS "TBL_LEFT"
#>   LEFT JOIN datawarehouse.a AS "TBL_RIGHT"
#>   ON ("TBL_LEFT"."Species" = "TBL_RIGHT"."Species")

@colearendt
Copy link

colearendt commented Dec 21, 2017

So extending what I said, I think what you have uncovered @etiennebr is an inconsistency in the way that SQL interpretation happens wrt schemas. Again, something that will hopefully be taken into account and tested for as DBI adds schema support.

Also, I reiterate my suggestion to do the following until that is the case.

  1. Use dbplyr for in_schema() references
  2. Alter the search path
  3. Use database valid names... double quoting can get nasty
  4. Use verbatim SQL with dbGetQuery or dbSendQuery

Quick semantics - dbWriteTable uses dbExistsTable as a test internally when overwrite=TRUE to see whether it should drop the existing table. The problem you are running into comes from dbExistsTable and dbWriteTable understanding schemas differently.

Check out this sequence:

library(odbc)
library(DBI)

con <- dbConnect(odbc::odbc(), ...)

# I got this problematic sequence from the original example above
dbWriteTable(con, DBI::SQL("a"), data.frame(a = 1, b = 2))
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a = 1, b = 2))
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 
dbWriteTable(con, DBI::SQL("public.a"), data.frame(a = 1, b = 2), overwrite = TRUE)
#> Error: <SQL> 'CREATE TABLE public.a (
#>   "a" DOUBLE PRECISION,
#>   "b" DOUBLE PRECISION
#> )
#> '
#>   nanodbc/nanodbc.cpp:1587: 42P07: [RStudio][PostgreSQL] (30) Error occurred while trying to execute a query: [SQLState 42P07] ERROR:  relation "a" already exists
#> 

# this is why that happens
dbExistsTable(con, DBI::SQL("public.a"))
#> [1] FALSE
dbWriteTable(con, "public.a", data.frame(a = 1, b = 2))
dbExistsTable(con, DBI::SQL("public.a"))
#> [1] TRUE

And our schema looks like...
image

Essentially, when you say overwrite=TRUE, dbExistsTable comes along and says "no worries - this table is not defined yet." So "DROP TABLE" never happens. Then the "Create Table" fails because the table already exists. They're referencing different tables!

@colearendt
Copy link

Oops. This is a duplicate of #91 . Same issue, I think.

@etiennebr
Copy link
Author

Thanks @colearendt, you're absolutely right and your workaround using dbplyr is ingenious! I thought the issue was specific to odbc's options overwrite and append, but indeed it is generalized to DBI and I know there are plans to fix schema management, hopefully soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants