Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pool compatible with dbplyr #42

Merged
merged 12 commits into from Jun 22, 2017
Merged

Make pool compatible with dbplyr #42

merged 12 commits into from Jun 22, 2017

Conversation

@bborgesr
Copy link
Contributor

@bborgesr bborgesr commented Jun 19, 2017

Fixes #39

# install.packages("nycflights13")
# install.packages("dplyr")
# devtools::install_github("rstudio/pool@fix/dbplyr")

library(dplyr)
library(pool)

# Using dbplyr directly

# con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
# 
# copy_to(con, nycflights13::flights, "flights",
#   temporary = FALSE, 
#   indexes = list(
#     c("year", "month", "day"), 
#     "carrier", 
#     "tailnum",
#     "dest"
#   )
# )
# 
# (flights_db <- tbl(con, "flights"))
# 
# flights_db %>% select(year:day, dep_delay, arr_delay)
# flights_db %>% filter(dep_delay > 240)
# flights_db %>% 
#   group_by(dest) %>%
#   summarise(delay = mean(dep_time))
# 
# tailnum_delay_db <- flights_db %>% 
#   group_by(tailnum) %>%
#   summarise(
#     delay = mean(arr_delay),
#     n = n()
#   ) %>% 
#   arrange(desc(delay)) %>%
#   filter(n > 100)
# 
# tailnum_delay_db


# Using pool
pool <- dbPool(RSQLite::SQLite(), path = ":memory:")

copy_to(pool, nycflights13::flights, "flights",
  temporary = FALSE, 
  indexes = list(
    c("year", "month", "day"), 
    "carrier", 
    "tailnum",
    "dest"
  )
)

(flights_db <- tbl(pool, "flights"))

flights_db %>% select(year:day, dep_delay, arr_delay)
flights_db %>% filter(dep_delay > 240)
flights_db %>% 
  group_by(dest) %>%
  summarise(delay = mean(dep_time))

tailnum_delay_db <- flights_db %>% 
  group_by(tailnum) %>%
  summarise(
    delay = mean(arr_delay),
    n = n()
  ) %>% 
  arrange(desc(delay)) %>%
  filter(n > 100)

tailnum_delay_db

poolClose(pool)
bborgesr added 7 commits Jun 19, 2017
R/dbplyr.R Outdated
#' @include DBI.R
NULL

temporaryErrorMessage <- paste0("You cannot use `temporary = TRUE`",

This comment has been minimized.

@hadley

hadley Jun 20, 2017
Member

I think this would be slightly better as checkTemporary(), i.e. including the boilerplate if (temporary) stop(temporaryErrorMessage)

This comment has been minimized.

@jcheng5

jcheng5 Jun 20, 2017
Member

@hadley What do you think of long, narrative error messages like this?

This comment has been minimized.

@hadley

hadley Jun 21, 2017
Member

I have mixed feelings. They feel inelegant to me, but users tend to really appreciate them.

}

#' @export
db_create_table.Pool <- function(con, table, types, temporary = FALSE, ...) {

This comment has been minimized.

@hadley

hadley Jun 20, 2017
Member

I do wonder about changing the default value of temporary here. Obviously it's going to be annoying if it's TRUE, but it might be surprising if it's FALSE.

This comment has been minimized.

@jcheng5

jcheng5 Jun 20, 2017
Member

Isn't the default always FALSE? Are you wondering about changing the default in dplyr itself?

This comment has been minimized.

@bborgesr

bborgesr Jun 20, 2017
Author Contributor

You mean changing it in the original generic in dplyr? Or just here in pool?

This comment has been minimized.

@bborgesr

bborgesr Jun 20, 2017
Author Contributor

(sorry Joe, didn't see your comment asking exactly the same thing)

This comment has been minimized.

@hadley

hadley Jun 21, 2017
Member

Isn't the default in the generic TRUE?

This comment has been minimized.

This comment has been minimized.

@hadley

hadley Jun 21, 2017
Member

Ok, I was confused. You can ignore this whole thread.

R/dbplyr.R Outdated
#' @export
copy_to.Pool <- function(dest, df, name = deparse(substitute(df)),
overwrite = FALSE, ...) {
db_con <- poolCheckout(dest)

This comment has been minimized.

@hadley

hadley Jun 20, 2017
Member

I now know how to convert

db_con <- poolCheckout(dest)
on.exit(poolReturn(db_con))

to

db_con <- localPoolCheckout(dest)

with the on.exit() being automatically added.

It's up to you if you'd prefer that API.

This comment has been minimized.

@jcheng5

jcheng5 Jun 20, 2017
Member

I find that slightly terrifying

This comment has been minimized.

@bborgesr

bborgesr Jun 20, 2017
Author Contributor

If it's quick to explain, can you elaborate on why, @jcheng5? Also, @hadley, this is merely in your head now, right? (I couldn't find any instances of localPoolCheckoutin github)

@bborgesr
Copy link
Contributor Author

@bborgesr bborgesr commented Jun 21, 2017

@hadley, is there anything that you think I may have missed? All I did was a thin Pool wrapper around all the methods for DBIConnection (including copy_to() and tbl()), and throw an error when temporary = TRUE. Is this enough? In particular, I was wondering about your 3rd item from here:

Rework joins so that upload happens later, using same connection as the rest of the query

db_con <- poolCheckout(con)
on.exit(poolReturn(db_con))
sql_subquery(db_con, from = from, name = name, ... = ...)
}

This comment has been minimized.

@bborgesr

bborgesr Jun 21, 2017
Author Contributor

another worry I have is if this function is amenable to pool. Is the name = random_table_name()'s value actually made into a table using create_table or something?

This comment has been minimized.

@hadley

hadley Jun 21, 2017
Member

No, it's for databases that require subqueries be named (even if that name isn't actually used for anything).

Copy link
Member

@hadley hadley left a comment

The 3rd item is optional, so I wouldn't worry about it for now.

You should probably also include some basic tests of pool dplyr using RSQLite.

db_con <- poolCheckout(con)
on.exit(poolReturn(db_con))
sql_subquery(db_con, from = from, name = name, ... = ...)
}

This comment has been minimized.

@hadley

hadley Jun 21, 2017
Member

No, it's for databases that require subqueries be named (even if that name isn't actually used for anything).

@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented on tests/testthat/test-dplyr.R in 68e4a50 Jun 22, 2017

This will create pool in the global environment, which you probably don't want

@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented on tests/testthat/test-dplyr.R in 68e4a50 Jun 22, 2017

warning -> error

bborgesr added 2 commits Jun 22, 2017
@bborgesr bborgesr merged commit 2980515 into master Jun 22, 2017
1 of 3 checks passed
1 of 3 checks passed
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
@bborgesr bborgesr deleted the fix/dbplyr branch Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.