You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dplyr 0.5.0 has a new set of internals for SQL database backends. For the most part, the frontend APIs function the same. However, when it comes to using variables in building SQL statements, there's a material difference between 0.5.0 and 0.4.3. The variables are only evaluated when sql_render is finally called, usually within collect.
In the following example, I create a vector of 3 carrier names and a loop to go through the vector. Each step of the loop is to filter a part of the flights_sqlite table according to the carrier. I store the sub-tables in a list. When I called sql_render on each element outside the loop, they all appear to be the same. This is because all three sub-tables have filter ops on the variable crr which is not evaluated in the loop. When sql_render is called outside of the loop, crr has the last value in the vector, namely EV, and all three sub-tables are filtered to have EV airlines only.
I don't consider this a bug, but this behavior is very different from dplyr 0.4.3 in which the variables would be evaluated immediately when filter is called and thus remembered in the sub-table. It would be great if dplyr 0.5.0 could offer a way to keep this behavior rather than simply replace it with a pass-by-reference style.
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(RSQLite)
flights_sqlite<- tbl(nycflights13_sqlite(), "flights")
#> Caching nycflights db at /var/folders/0x/zdvx0wzs3dn8mzmj9dftnqxm00377p/T//Rtmpe1R0r2/nycflights13.sqlite#> Creating table: airlines#> Creating table: airports#> Creating table: flights#> Creating table: planes#> Creating table: weathervec.carriers<- c("UA", "DL", "EV")
list.flights<-list()
for (crrinvec.carriers) {
list.flights[[crr]] <-flights_sqlite %>%
filter(carrier==crr)
}
sql_render(list.flights[[1]])
#> <SQL> SELECT *#> FROM `flights`#> WHERE (`carrier` = 'EV')
sql_render(list.flights[[2]])
#> <SQL> SELECT *#> FROM `flights`#> WHERE (`carrier` = 'EV')
sql_render(list.flights[[3]])
#> <SQL> SELECT *#> FROM `flights`#> WHERE (`carrier` = 'EV')
The text was updated successfully, but these errors were encountered:
Here is an approach that doesn't address the underlying problem but does give you your desired result. Try using lapply rather than a for loop (for some reason...)
dplyr
0.5.0 has a new set of internals for SQL database backends. For the most part, the frontend APIs function the same. However, when it comes to using variables in building SQL statements, there's a material difference between 0.5.0 and 0.4.3. The variables are only evaluated whensql_render
is finally called, usually withincollect
.In the following example, I create a vector of 3 carrier names and a loop to go through the vector. Each step of the loop is to
filter
a part of theflights_sqlite
table according to the carrier. I store the sub-tables in a list. When I calledsql_render
on each element outside the loop, they all appear to be the same. This is because all three sub-tables have filter ops on the variablecrr
which is not evaluated in the loop. Whensql_render
is called outside of the loop,crr
has the last value in the vector, namely EV, and all three sub-tables are filtered to have EV airlines only.I don't consider this a bug, but this behavior is very different from
dplyr
0.4.3 in which the variables would be evaluated immediately whenfilter
is called and thus remembered in the sub-table. It would be great ifdplyr
0.5.0 could offer a way to keep this behavior rather than simply replace it with a pass-by-reference style.The text was updated successfully, but these errors were encountered: