Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutate behaves differently in R and in database #1831

Closed
iangow opened this issue May 16, 2016 · 1 comment
Closed

Mutate behaves differently in R and in database #1831

iangow opened this issue May 16, 2016 · 1 comment
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@iangow
Copy link

iangow commented May 16, 2016

In SQL, mutate using an existing variable name adds another variable of the same name; in R, it replaces the variable.

Sample code:

library(dplyr)
pg <- src_postgres()
stuff <- tbl(pg, sql("SELECT 5 AS a, 2 AS b"))

stuff %>%
    mutate(a = a * 2)

stuff %>%
    collect() %>%
    mutate(a = a * 2)

stuff %>%
    mutate(a = a * 2) %>%
    explain()

Similar code, but for SQLite:

library(dplyr)
pg <- src_sqlite(path = tempfile(), create = TRUE)
stuff <- tbl(pg, sql("SELECT 5 AS a, 2 AS b"))

stuff %>%
    mutate(a = a * 2)

stuff %>%
    collect() %>%
    mutate(a = a * 2)

stuff %>%
    mutate(a = a * 2) %>%
    explain()

It seems that by tweaking the SQL, it should be possible to mimic the behaviour of mutate on R-side tbl_df objects.

@hadley hadley added bug an unexpected problem or unintended behavior database labels May 26, 2016
@hadley hadley added this to the future milestone May 26, 2016
@iangow
Copy link
Author

iangow commented Jul 4, 2016

It seems this was fixed in dplyr 0.5.0.

> library(dplyr)

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

> pg <- src_sqlite(path = tempfile(), create = TRUE)
> stuff <- tbl(pg, sql("SELECT 5 AS a, 2 AS b"))
> 
> stuff %>%
+     mutate(a = a * 2)
Source:   query [?? x 2]
Database: sqlite 3.8.6 [/var/folders/c6/c3yp0f9148zbwmvvd4pl1xrr0000gn/T//Rtmp9przwS/file144ef5c94a9c]

      b     a
  <int> <dbl>
1     2    10
> 
> stuff %>%
+     collect() %>%
+     mutate(a = a * 2)
# A tibble: 1 x 2
      a     b
  <dbl> <int>
1    10     2
> 
> stuff %>%
+     mutate(a = a * 2) %>%
+     explain()
<SQL>
SELECT `b`, `a` * 2.0 AS `a`
FROM (SELECT 5 AS a, 2 AS b)


<PLAN>
   addr        opcode p1 p2 p3 p4 p5 comment
1     0          Init  0 14  0    00    <NA>
2     1 InitCoroutine  1  6  2    00    <NA>
3     2       Integer  5  2  0    00    <NA>
4     3       Integer  2  3  0    00    <NA>
5     4         Yield  1  0  0    00    <NA>
6     5  EndCoroutine  1  0  0    00    <NA>
7     6 InitCoroutine  1  0  2    00    <NA>
8     7         Yield  1 13  0    00    <NA>
9     8          Copy  3  4  0    00    <NA>
10    9          Copy  2  6  0    00    <NA>
11   10      Multiply  7  6  5    00    <NA>
12   11     ResultRow  4  2  0    00    <NA>
13   12          Goto  0  7  0    00    <NA>
14   13          Halt  0  0  0    00    <NA>
15   14          Real  0  7  0  2 00    <NA>
16   15          Goto  0  1  0    00    <NA>

@hadley hadley closed this as completed Jul 5, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants