Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose more foreach settings #33

Closed
hadley opened this issue Mar 20, 2011 · 4 comments
Closed

Expose more foreach settings #33

hadley opened this issue Mar 20, 2011 · 4 comments

Comments

@hadley
Copy link
Owner

hadley commented Mar 20, 2011

e.g. (from Paul Hiemstra)

bla = function(x) {
   x*y
}
y = 10
ldply(dat, .(category), bla, .parallel = TRUE, .export =  "y")
@PaulHiemstra
Copy link

To fix this I would add a parameter foreachPars to l*ply functions. Within llply a small change would be need in the call to foreach:

result = foreach(i = seq_len(n)) %dopar% do.ply(i)

changes into:

foreachPars = c(i = seq_len(n), foreachPars)
result = do.call('foreach', foreachPars) %dopar% do.ply(i)

I think this should work in the way I intend it...the use would be:

ldply(dat, .(category), bla, .parallel = TRUE, foreachPars = list(.export = 'y'))

The advantage would be that all foreach parameters can be used in a call to ldply without adding all the parameters explicitely to ldply.

@PaulHiemstra
Copy link

I have a fix for this issue which involves no changes to plyr. Once a cluster is active, one can use clusterExport to load variables into the workers. An example:

library(ggplot2)

# Functions
createCluster = function(noCores, logfile = "") {
  require(doSNOW)
  cl = makeCluster(noCores, type = "SOCK", outfile = logfile)
  registerDoSNOW(cl)
  return(cl)
}

bla = function(arg) {
  return(arg$x*y)
}

# Constants
y = 10
dat = data.frame(x = 1:10, category = LETTERS[1:10])

# Create a cluster
cl = createCluster(2)

# Fails
#   Error in do.ply(i) : task 1 failed - "object 'y' not found"
ddply(dat, .(category), bla, .parallel = TRUE)

# Works!
clusterExport(cl, list("y"))
ddply(dat, .(category), bla, .parallel = TRUE)

The same approach is possible for libraries (found this on stackoverflow):

clusterEvalQ(cl, library(boot))

I was far too much obsessed with solving this within foreach, while the solution was already there by loading stuff directly into workers :).

@PaulHiemstra
Copy link

The following example includes an extended version of 'createCluster' which supports passing on objects to export and libraries to load. It requires an adapted version of clusterExport because it needs to find the variable to be exported not in the .GlobalEnv, but in the environment of the function.

library(ggplot2)

# Functions
clusterExport = local({
  gets = function(n, v) { assign(n, v, envir = .GlobalEnv); NULL }
  function(cl, list, envir = .GlobalEnv) {
    ## do this with only one clusterCall--loop on slaves?
    for (name in list) {
      clusterCall(cl, gets, name, get(name, envir = envir))
    }
  }
})
 
# Functions
createCluster = function(noCores, logfile = "/dev/null", export = NULL, lib = NULL) {
  require(doSNOW)
  cl = makeCluster(noCores, type = "SOCK", outfile = logfile)
  if(!is.null(export)) clusterExport(cl, export)
  if(!is.null(lib)) {
    l_ply(lib, function(dum) { 
      clusterExport(cl, "dum", envir = environment())
      clusterEvalQ(cl, library(dum, character.only = TRUE))
    })
  }
  registerDoSNOW(cl)
  return(cl)
}

library(ggplot2)
library(doSNOW)
 
bla = function(arg) {
  dum = ggplot(aes(x = x, y = x), data = arg)
  summary(dum)
  xi = bla2(arg$x)
  return(arg$x*xi)
}
 
bla2 = function(arg) {
  return(arg + 1)
}
 
# Constants
y = 10
dat = data.frame(x = 1:10, category = LETTERS[1:10])
 
# Create a cluster
 
# Fails
# Error in do.ply(i) : task 1 failed - "could not find function "ggplot""
cl = createCluster(2)
res = ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
 
# Fails, pacakge is loaded, function 'bla2' is not
# Error in do.ply(i) : task 1 failed - "could not find function "bla2""
cl = createCluster(2, lib = list("ggplot2"))
res =ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
 
# Works! Also export the function 'bla2' and object 'y'
cl = createCluster(2, export = list("bla2","y"), lib = list("ggplot2"))
res = ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
 
# Sanity check
all.equal(res, ddply(dat, .(category), bla, .parallel = FALSE))
# TRUE!

@hadley
Copy link
Owner Author

hadley commented Oct 8, 2012

Duplicate of #84 (closing this one because there's more discussion there).

@hadley hadley closed this as completed Oct 8, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants