Fix scoping issues #3

Closed
hadley opened this Issue Mar 2, 2010 · 15 comments

Projects

None yet

3 participants

@hadley
Owner
hadley commented Mar 2, 2010
library(plyr)

# Generate some data
set.seed(321)
myD <- data.frame( 
  Place = sample(c("AWQ","DFR", "WEQ"), 10, replace=T),
  Light = sample(LETTERS[1:2], 15, replace=T),
  value=rnorm(30) 
)

myfunc <- function(xdf, sctype) {
  force(sctype)
  ddply(xdf, .(Place, Light), transform, 
    rng = paste(value, sctype))
}

myfunc(myD, "range")
@hadley
Owner
hadley commented Sep 8, 2010

This involves capturing parent.frame() at the first plyr call, then passing it on down. Check for performance implications

@hadley
Owner
hadley commented Dec 30, 2010

This is only necessary for functions that use non-standard/dynamic evaluation, like subset, summarise, replicate etc.

@hadley
Owner
hadley commented Dec 30, 2010
a <- 1
res <- local({
  a <- 2
  llply(1, replicate, a ^ 2)
})
expect_that(res[[1]], equals(2))
@wch
Collaborator
wch commented Feb 7, 2012

Update: After some more experimentation and the stuff I figured out (in my next post), I don't think that this idea is workable.

Let me preface this by saying that I only just started playing around with this stuff. This bug has been bothering me for a while and I finally decided to take a crack at it.

It looks like it works if you put a wrapper function around the problem function. For example:

wrapreplicate <- function(...) {
  # For debugging: print out this frame and all the parents
  print(sys.frame())
  for (i in 1:sys.nframe()) {
    print(parent.frame(i))  
  }

  do.call(replicate, list(...))
}

a <- 1
res <- local({
  a <- 2
 llply(1, wrapreplicate, a ^ 2)
})
res
# [[1]]
# [1] 4

(I think the answer is supposed to be 4, and not 2?)

I tried wraping all such functions by putting the following near the top of llply, and changing all following instances of .fun to myfun:

  myfun <- function(...) {
    do.call(.fun, list(...))
  }

It passed the test here using llply(1, replicate, a ^ 2) inside the local block. However, it broke some other things in the testing suite. For example:

> ddply(bsmall, "id", summarise, first = min(year))
Error in do.call(.fun, list(...)) : object 'year' not found

So it's not a complete solution, but maybe it's a start.

@wch
Collaborator
wch commented Feb 7, 2012

I did some investigation and found that it might be possible to get much better control over evaluation, but it might require extensive modifications.

This is the regular case:

dat <- data.frame(x=1:4)
summarise(dat, x = mean(x))
# 2.5

This is what happens when you try to call it inside a function with do.call and list(...):

mycall1 <- function(.fun, ...) {
  do.call(.fun, list(...))
}
mycall1(.fun=summarise, dat, x = mean(x))
# Error in mean(x) : object 'x' not found
# It tried to evaluate mean(x) when list(...) was called

However, you can use match.call to get a pairlist that doesn't get evaluated right away. You can even convert that pairlist to a list that doesn't get evaluated immediately.

mycall2 <- function(.fun, ...) {
  # Instead of list(...)
  # Use match.call instead. This turns the ... into a dotted pairlist
  # The dotted pairlist can contain language objects like mean(x) without evaluating them
  # (Stole it from section 6.5 of http://cran.r-project.org/doc/manuals/R-lang.html) 
  myargs <- match.call(expand.dots = FALSE)$...
  print(myargs)

  # Call the function, and coerce it to a normal list (it still doesn't evaluate mean(x))
  do.call(.fun, as.list(myargs))
  # If we used list(myargs), it would try to evaluate mean(x)
}
mycall2(.fun=summarise, dat, x = mean(x))
# 2.5

# The content of 'myargs' is:
# [[1]]
# dat
#
# $x
# mean(x)
#
# It contains an uneval'd language object x!

The above seems like it might be a solution, but it gets unhappy when it's called via a chain of functions that use .... This can be problematic for the way the plyr functions are presently designed, since they pass along ... many times.

mycall3 <- function(myfun, ...) {
  mycall2(.fun=summarise, ...)
}
mycall3(dat, x = mean(x))
# Error in eval(substitute(list(...)), .data, parent.frame()) : 
#  argument ".data" is missing, with no default
# The printed myargs (in mycall2) starts getting into ..1 and other
# crazy stuff like that. If you type ..1 and press enter, it will try to evaluate
# mean(x), and say "Error during wrapup: object 'x' not found"

It may be possible to pass along a usable set of arguments by converting to an unevaluated list (via the match.call method) in all the user-facing functions, then passing that list along the chain and evaluating the elements only when desired.

Hopefully this helps. My brain is burned out now...

@hadley
Owner
hadley commented Oct 9, 2012

The challenge is in (e.g.) llply where we need to evaluate something like .fun(.data[[i]], ...) - .fun and .data need to be evaluate in the llply environment and ... (and the function as a whole) needs to be evaluated in the parent frame from which it was called.

I think that means we need a version of llply (and similarly l_ply and rlply) that works with quoted calls. Then all user facing functions would capture their quoted expressions (using substitute) and pass on to the backends.

@hadley
Owner
hadley commented Oct 9, 2012

Except that can't work either, because most of the plyr functions do something to the input data: e.g. pieces <- splitter_d(.data, .variables, drop = .drop) - so we need to work with the data in the plyr function environment.

But I guess we can always ram that data directly into the call (although I think this will have performance implications because we'll need to make a copy). Does that mean the splitter functions should really return a call that shows how to extract the data from the original dataset? (That would have the advantage of eliminating a copy of the data). It might also make debugging easier, and remove the need for .inform?

@hadley
Owner
hadley commented Oct 9, 2012

It also seems useful to study the behaviour of lapply:

lapply(1:10, function(i) browser())
# Browse[1]> match.call()
# FUN(i = 1:10[[1L]])

lapply(1:10, function(...) browser(), a = 1, b = sum, c = d + e)
# Browse[1]> match.call()
# FUN(1:10[[1L]], a = 1, b = ..2, c = ..3)

lapply(1:10, function(i, a, b, c) browser(), a = 1, b = sum, c = d + e)
# Browse[1]> match.call()
# FUN(i = 1:10[[1L]], a = 1, b = ..2, c = ..3

f <- function(...) {
   lapply(1:10, function(i, ...) browser(), ...)
}
f(a = 1, b = asdf)
# Browse[1]> match.call()
# FUN(i = 1:10[[1L]], a = 1, b = ..2)
@crowding
Contributor
crowding commented Oct 9, 2012

How about putting a wrapper around such functions so that they see an appropriate parent.frame()?

here <- function(f) eval(substitute(function(...) f(...), list(f = f)), parent.frame())

Then anywhere **ply(data, fn, ....) gets you a scope problem, **ply(data, here(fn), ... usually will work.

myfunc <- function(xdf, sctype) {
  force(sctype)
  ddply(xdf, .(Place, Light), here(transform), 
    rng = paste(value, sctype))
}

myfunc(myD, "range")

It's not making the call in the best frame, only a child of it, but that works for almost all the examples I find. And it doesn't break argument evaluation for standard evaluation functions.

@crowding
Contributor

IMO the problem is not one of finding out a new subtle way to call a function -- the normal way is fine. The problem is that a few functions like transform are ill-behaved and think that parent.frame() gives the scope of their arguments. Promises and everything else in R use lexical scope, so functions that use parent.frame() to effect dynamic scope clash with our intuitions.

Changing plyr's behavior to suit these misbehaving functions while breaking R's evaluation model for normal functions does not seem likely to succeed. I'd rather frame the problem as one of how to work around the small set of functions that abuse parent.frame().

This could be a wrapper like above; another more bizarre strategy might be to put a trace on parent.frame while a **ply is running and edit the results.

@hadley
Owner
hadley commented Oct 10, 2012

@crowding I somewhat agree - but lapply some how manages to do this, so it would be nice to be consistent. I do like the here idea though.

@hadley
Owner
hadley commented Oct 10, 2012

@crowing Hmmm, I'm not sure I understand how your here function works. It's a bit confusing because:

here <- function(f) {
  eval(substitute(function(...) f(...), list(f = f)), parent.frame())
}
here(transform)
# function(...) f(...)

But that mainly seems to be a printing problem:

here <- function(f) {
  substitute(function(...) f(...), list(f = f))
}
here(transform)
# function(...) function (`_data`, ...) 
# UseMethod("transform")(...)

However, I suspect you really want to quote the call to f:

here <- function(f) {
  substitute(function(...) f(...), list(f = substitute(f)))
}
here(transform)
# function(...) transform(...)
# Although then that has problems with anonymous functions
here(function(x) x + 1)
# function(...) function(x) x + 1(...)

# Adding parenthesis fixes that
here <- function(f) {
  substitute(function(...) (f)(...), list(f = substitute(f)))
}
here(transform)
# function(...) (transform)(...)
here(function(x) x + 1)
# function(...) (function(x) x + 1)(...)

Also, in the course of mucking around, I discovered that this variant still works in this example, which really confuses me.

here <- function(f) {
  eval(substitute(function(...) f(...)), parent.frame())
}
@crowding
Contributor

I don't think I want to quote the call to f --- the pattern for all *ply functions is that they evaluate the function argument once, and normally, so here should also evaluate its function argument normally and work with the resulting function object rather than the quoted argument. Think of putting a colwise() or each() or other constructed function as the argument to here.

The way that it's printed is somewhat broken, yes. What's confusing there is that a function object is printed by R in the same way as a call to function.

a <- quote(function(x) y)
b <- function(x) y

a #function(x) y
b #function(x) y

class(a) #call
class(b) #function

a[[1]] # `function`
#b[[1]] #Error in b[[1]] : object of type 'closure' is not subsettable
body(a) #NULL
body(b) #y

I'm using substitute to put the whole function object in.

here <- function(f) {
  eval(substitute(function(...) f(...), list(f = f)), parent.frame())
}
x <- here(transform)
identical(body(x)[[1]], transform) #TRUE
identical(body(x)[[1]], as.name("transform")) #FALSE

Your last example works for this example because substitute picks up the quoted value of f, but that version of here is semantically different:

here <- function(f) {
  eval(substitute(function(...) f(...)), parent.frame())
}
x <- here(transform)
identical(body(x)[[1]], transform) #FALSE
identical(body(x)[[1]], as.name("transform")) #TRUE

You can elicit a difference in behavior between the two like this;

fn <- here( { print("when is the argument to 'here' evaluated?"); summarize } )
ddply(baseball, .(year), fn, sum(rbi)) 

Maybe a more understandable way of achieving the here effect would be:

here <- function(f) {
  eval(substitute(
         local({
           `__here_function__` <- f
           function(...) `__here_function__`(...)
         })),
         parent.frame())
}

Only difference being f would now be getting called from a grandchild of the original environment instead of a child.

@crowding
Contributor

I don't see how lapply manages anything different, using transform with it leads to the same scoping issue:

# Generate some data
set.seed(321)
myD <- data.frame( 
  Place = sample(c("AWQ","DFR", "WEQ"), 10, replace=T),
  Light = sample(LETTERS[1:2], 15, replace=T),
  value=rnorm(30) 
)

myfunc <- function(xdf, sctype) {
  force(sctype)
  splits <- splitter_d(xdf, .(Place, Light))
  lapply(splits, transform,
    rng = paste(value, sctype))
}

myfunc(myD, "range")
@hadley
Owner
hadley commented Oct 10, 2012

Oh hmm, I don't see how I missed that. In that case, it's not a bug, and I'll include your here function in the next version. Thanks!

@hadley hadley closed this in 05cc488 Oct 11, 2012
@jucor jucor referenced this issue in xfim/ggmcmc Jun 25, 2013
Merged

Fix: warnings in ggs_autocorrelation #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment