Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate doesn't allow variable assignment in expression #315

Closed
wch opened this issue Mar 12, 2014 · 12 comments
Closed

mutate doesn't allow variable assignment in expression #315

wch opened this issue Mar 12, 2014 · 12 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@wch
Copy link
Member

wch commented Mar 12, 2014

Sometimes it's useful to use a multi-line expression with transform or mutate, and have some temporary variables in there. But variable assignment within such an expression doesn't work with dplyr::mutate:

# OK
transform(mtcars, cyl2 = { x <- cyl^2; -x } )

# OK
plyr::mutate(mtcars, cyl2 = { x <- cyl^2; -x } )

# Error
dplyr::mutate(mtcars, cyl2 = { x <- cyl^2; -x } )
# Error in c(36, 36, 16, 36, 64, 36, 64, 16, 16, 36, 36, 64, 64, 64, 64,  : 
#  invalid (do_set) left-hand side to assignment 
@romainfrancois
Copy link
Member

Very odd. it works the first time, but not on subsequent calls :

> mutate(mtcars, cyl2 = { x <- cyl^2; -x } )
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb cyl2
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  -36
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  -36
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  -16
...
> mutate(mtcars, cyl2 = { x <- cyl^2; -x } )
Erreur dans c(36, 36, 16, 36, 64, 36, 64, 16, 16, 36, 36, 64, 64, 64, 64,  :
  membre gauche de l'assignation (do_set) incorrect
> mutate(mtcars, cyl2 = { x <- cyl^2; -x } )
Erreur dans c(36, 36, 16, 36, 64, 36, 64, 16, 16, 36, 36, 64, 64, 64, 64,  :
  membre gauche de l'assignation (do_set) incorrect

@wch
Copy link
Member Author

wch commented Mar 14, 2014

Another weird thing: after you run it that first time, it leaves x in the global environment:

dplyr::mutate(mtcars, cyl2 = { x <- cyl^2; -x } )
x
# [1] 36 36 16 36 64 36 64 16 16 36 36 64 64 64 64 64 64 16 16 16 16 64 64 64 64
#[26] 16 16 16 64 36 64 16

transform and plyr::mutate don't do this.

@hadley
Copy link
Member

hadley commented Mar 14, 2014

It's not obvious to me where x should be created - I think dplyr::mutate generally does a better job of evaluating things in the right environment, and creating it in the global environment is not obviously wrong.

Maybe we should just prevent assignment altogether?

@wch for your use case, dplyr::mutate(mtcars, cyl2 = local({ x <- cyl^2; -x }) ) seems preferable.

@romainfrancois
Copy link
Member

Ah. That does help me understand what is going on. The first time around the expression is replaced by:

{
    x <- c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8,
    4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4)^2
    -x
}

i.e. cyl is substituted in place. The second time xexists in the global environment and so its value is substituted in the expression, which then becomes:

{
    c(36, 36, 16, 36, 64, 36, 64, 16, 16, 36, 36, 64, 64, 64,
    64, 64, 64, 16, 16, 16, 16, 64, 64, 64, 64, 16, 16, 16, 64,
    36, 64, 16) <- c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8,
    8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4)^2
    -c(36, 36, 16, 36, 64, 36, 64, 16, 16, 36, 36, 64, 64, 64,
    64, 64, 64, 16, 16, 16, 16, 64, 64, 64, 64, 16, 16, 16, 64,
    36, 64, 16)
}

I think I need to have a fresh look at void traverse_call in CallProxy and GroupedCallProxy. Perhaps I'm trying to do too much, and should instead let R do its work.

@wch
Copy link
Member Author

wch commented Mar 14, 2014

@hadley It makes sense to me to create it in a temporary environment (e.g., a child of the global env) that gets discarded at the end. This is what transform and plyr::mutate do, right? They evaluate the expressions in a new environment, which should be effectively the same as using local().

Is there a performance reason for not creating a new environment in dplyr::mutate?

@hadley
Copy link
Member

hadley commented Mar 14, 2014

I don't that's think that's how they work, and even if it was, it wasn't by design.

@wch
Copy link
Member Author

wch commented Mar 14, 2014

FWIW:

invisible(transform(mtcars, cyl2 = { print(devtools::parenvs()); -cyl }))
#   label                      name
# 1 <environment: 0x358d038>   ""  
# 2 <environment: R_GlobalEnv> ""  

invisible(plyr::mutate(mtcars, cyl2 = { print(devtools::parenvs()); -cyl }))
#   label                      name
# 1 <environment: 0x358d038>   ""  
# 2 <environment: R_GlobalEnv> ""  

invisible(dplyr::mutate(mtcars, cyl2 = { print(devtools::parenvs()); -cyl }))
#   label                      name
# 1 <environment: R_GlobalEnv> ""  

@hadley
Copy link
Member

hadley commented Mar 14, 2014

I think that's just a result of it being evaluated inside a function. I think we should just disallow assignment inside dplyr functions.

@hadley hadley added the bug label Mar 17, 2014
@hadley hadley added this to the v0.2 milestone Mar 17, 2014
@hadley
Copy link
Member

hadley commented Mar 17, 2014

@romainfrancois can you make <- throw an error? And make sure code inside local() is not interpreted by the hybrid evaluator?

@wch I think the semantics are not clear without an explicit local. It should probably be a fresh environment per group, but I think this is sufficiently unusual behaviour that dplyr doesn't need to support it out of the box.

@romainfrancois
Copy link
Member

About not doing anything special with local, then what happens is this:

 mutate(mtcars, cyl2 = local({ x <- cyl^2; -x }) )
Erreur dans eval(expr, envir, enclos) : objet 'cyl' introuvable

because it is not local within the environment of the data frame. This works:

> mutate(mtcars, cyl2 = local({ x <- cyl^2; -x }, mtcars) )
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb cyl2
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  -36
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  -36
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  -16
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  -36
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  -64

Should this be implicit ?

There may also be other complications.

> df <- data.frame(cyl = 2 )
> mutate(mtcars, cyl2 = with(df, cyl) )
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb cyl2
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4    6
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4    6
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1    4
...

Because cyl is substituted by cyl from the data.

romainfrancois added a commit that referenced this issue Apr 2, 2014
@romainfrancois
Copy link
Member

I have forbidden assignments with <-. I've let <<- alone as it is probably doing the right thing.

But not sure how to handle mix of non standard evaluation, e.g. the code with with above. To be stricly correct, we'd have to let R evaluate, but we lose performance if we do that. perhaps that's documentation ...

@hadley
Copy link
Member

hadley commented Apr 9, 2014

I think forbidding <- is a reasonable fix for now. At some point we'll need to think through the implications more broadly, but I don't think there's any urgency.

@hadley hadley closed this as completed Apr 9, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants