-
Notifications
You must be signed in to change notification settings - Fork 1
/
clusterApplyLB.html
275 lines (234 loc) · 9.77 KB
/
clusterApplyLB.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Apply Operations using Clusters</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="stylesheet" type="text/css" href="R.css">
</head><body>
<table width="100%" summary="page for clusterApply"><tr><td>clusterApply</td><td align="right">R Documentation</td></tr></table>
<h2>Apply Operations using Clusters</h2>
<h3>Description</h3>
<p>These functions provide several ways to parallelize computations using
a cluster.
</p>
<h3>Usage</h3>
<pre>
clusterCall(cl = NULL, fun, ...)
clusterApply(cl = NULL, x, fun, ...)
clusterApplyLB(cl = NULL, x, fun, ...)
clusterEvalQ(cl = NULL, expr)
clusterExport(cl = NULL, varlist, envir = .GlobalEnv)
clusterMap(cl = NULL, fun, ..., MoreArgs = NULL, RECYCLE = TRUE,
SIMPLIFY = FALSE, USE.NAMES = TRUE,
.scheduling = c("static", "dynamic"))
clusterSplit(cl = NULL, seq)
parLapply(cl = NULL, X, fun, ...)
parSapply(cl = NULL, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
parApply(cl = NULL, X, MARGIN, FUN, ...)
parRapply(cl = NULL, x, FUN, ...)
parCapply(cl = NULL, x, FUN, ...)
parLapplyLB(cl = NULL, X, fun, ...)
parSapplyLB(cl = NULL, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>cl</code></td>
<td>
<p>a cluster object, created by this package or by package
<a href="http://CRAN.R-project.org/package=snow"><span class="pkg">snow</span></a>. If <code>NULL</code>, use the registered default cluster.</p>
</td></tr>
<tr valign="top"><td><code>fun, FUN</code></td>
<td>
<p>function or character string naming a function.</p>
</td></tr>
<tr valign="top"><td><code>expr</code></td>
<td>
<p>expression to evaluate.</p>
</td></tr>
<tr valign="top"><td><code>seq</code></td>
<td>
<p>vector to split.</p>
</td></tr>
<tr valign="top"><td><code>varlist</code></td>
<td>
<p>character vector of names of objects to export.</p>
</td></tr>
<tr valign="top"><td><code>envir</code></td>
<td>
<p>environment from which t export variables</p>
</td></tr>
<tr valign="top"><td><code>x</code></td>
<td>
<p>a vector for <code>clusterApply</code> and <code>clusterApplyLB</code>, a
matrix for <code>parRapply</code> and <code>parCapply</code>.</p>
</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
<p>additional arguments to pass to <code>fun</code> or <code>FUN</code>:
beware of partial matching to earlier arguments.</p>
</td></tr>
<tr valign="top"><td><code>MoreArgs</code></td>
<td>
<p>additional arguments for <code>fun</code>.</p>
</td></tr>
<tr valign="top"><td><code>RECYCLE</code></td>
<td>
<p>logical; if true shorter arguments are recycled.</p>
</td></tr>
<tr valign="top"><td><code>X</code></td>
<td>
<p>A vector (atomic or list) for <code>parLapply</code> and
<code>parSapply</code>, an array for <code>parApply</code>.</p>
</td></tr>
<tr valign="top"><td><code>MARGIN</code></td>
<td>
<p>vector specifying the dimensions to use.</p>
</td></tr>
<tr valign="top"><td><code>simplify, USE.NAMES</code></td>
<td>
<p>logical; see <code>sapply</code>.</p>
</td></tr>
<tr valign="top"><td><code>SIMPLIFY</code></td>
<td>
<p>logical; see <code>mapply</code>.</p>
</td></tr>
<tr valign="top"><td><code>.scheduling</code></td>
<td>
<p>should tasks be statically allocated to nodes or
dynamic load-balancing used?</p>
</td></tr>
</table>
<h3>Details</h3>
<p><code>clusterCall</code> calls a function <code>fun</code> with identical
arguments <code>...</code> on each node.
</p>
<p><code>clusterEvalQ</code> evaluates a literal expression on each cluster
node. It a parallel version of <code>evalq</code>, and is a convenience
function invoking <code>clusterCall</code>.
</p>
<p><code>clusterApply</code> calls <code>fun</code> on the first node with
arguments <code>seq[[1]]</code> and <code>...</code>, on the second node with
<code>seq[[2]]</code> and <code>...</code>, and so on, recycling nodes as needed.
</p>
<p><code>clusterApplyLB</code> is a load balancing version of
<code>clusterApply</code>. If the length <code>p</code> of <code>seq</code> is not
greater than the number of nodes <code>n</code>, then a job is sent to
<code>p</code> nodes. Otherwise the first <code>n</code> jobs are placed in order
on the <code>n</code> nodes. When the first job completes, the next job is
placed on the node that has become free; this continues until all jobs
are complete. Using <code>clusterApplyLB</code> can result in better
cluster utilization than using <code>clusterApply</code>, but increased
communication can reduce performance. Furthermore, the node that
executes a particular job is non-deterministic.
</p>
<p><code>clusterMap</code> is a multi-argument version of <code>clusterApply</code>,
analogous to <code>mapply</code> and <code>Map</code>. If
<code>RECYCLE</code> is true shorter arguments are recycled (and either none
or all must be of length zero); otherwise, the result length is the
length of the shortest argument. Nodes are recycled if the length of
the result is greater than the number of nodes. (<code>mapply</code> always
uses <code>RECYCLE = TRUE</code>, and has argument <code>SIMPLIFY = TRUE</code>.
<code>Map</code> always uses <code>RECYCLE = TRUE</code>.)
</p>
<p><code>clusterExport</code> assigns the values on the master <font face="Courier New,Courier" color="#666666"><b>R</b></font> process of
the variables named in <code>varlist</code> to variables of the same names
in the global environment (aka ‘workspace’) of each node. The
environment on the master from which variables are exported defaults
to the global environment.
</p>
<p><code>clusterSplit</code> splits <code>seq</code> into a consecutive piece for
each cluster and returns the result as a list with length equal to the
number of nodes. Currently the pieces are chosen to be close
to equal in length: the computation is done on the master.
</p>
<p><code>parLapply</code>, <code>parSapply</code>, and <code>parApply</code> are parallel
versions of <code>lapply</code>, <code>sapply</code> and <code>apply</code>.
<code>parLapplyLB</code>, <code>parSapplyLB</code> are load-balancing versions,
intended for use when applying <code>FUN</code> to different elements of
<code>X</code> takes quite variable amounts of time, and either the function
is deterministic or reproducible results are not required.
</p>
<p><code>parRapply</code> and <code>parCapply</code> are parallel row and column
<code>apply</code> functions for a matrix <code>x</code>; they may be slightly
more efficient than <code>parApply</code> but do less post-processing of the
result.
</p>
<h3>Value</h3>
<p>For <code>clusterCall</code>, <code>clusterEvalQ</code> and <code>clusterSplit</code>, a
list with one element per node.
</p>
<p>For <code>clusterApply</code> and <code>clusterApplyLB</code>, a list the same
length as <code>seq</code>.
</p>
<p><code>clusterMap</code> follows <code>mapply</code>.
</p>
<p><code>clusterExport</code> returns nothing.
</p>
<p><code>parLapply</code> returns a list the length of <code>X</code>.
</p>
<p><code>parSapply</code> and <code>parApply</code> follow <code>sapply</code> and
<code>apply</code> respectively.
</p>
<p><code>parRapply</code> and <code>parCapply</code> always return a vector. If
<code>FUN</code> always returns a scalar result this will be of length the
number of rows or columns: otherwise it will be the concatenation of
the returned values.
</p>
<p>An error is signalled on the master if any of the workers produces an
error.
</p>
<h3>Note</h3>
<p>These functions are almost identical to those in package <a href="http://CRAN.R-project.org/package=snow"><span class="pkg">snow</span></a>.
</p>
<p>Two exceptions: <code>parLapply</code> has argument <code>X</code>
not <code>x</code> for consistency with <code>lapply</code>, and
<code>parSapply</code> has been updated to match <code>sapply</code>.
</p>
<h3>Author(s)</h3>
<p>Luke Tierney and R Core.
</p>
<p>Derived from the <a href="http://CRAN.R-project.org/package=snow"><span class="pkg">snow</span></a> package.
</p>
<h3>Examples</h3>
<pre>
## Use option cl.core to choose an appropriate cluster size.
cl <- makeCluster(getOption("cl.cores", 2))
clusterApply(cl, 1:2, get("+"), 3)
xx <- 1
clusterExport(cl, "xx")
clusterCall(cl, function(y) xx + y, 2)
## Use clusterMap like an mapply example
clusterMap(cl, function(x,y) seq_len(x) + y,
c(a = 1, b = 2, c = 3), c(A = 10, B = 0, C = -10))
parSapply(cl, 1:20, get("+"), 3)
## A bootstrapping example, which can be done in many ways:
clusterEvalQ(cl, {
## set up each worker. Could also use clusterExport()
library(boot)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
NULL
})
res <- clusterEvalQ(cl, boot(cd4, corr, R = 100,
sim = "parametric", ran.gen = cd4.rg, mle = cd4.mle))
library(boot)
cd4.boot <- do.call(c, res)
boot.ci(cd4.boot, type = c("norm", "basic", "perc"),
conf = 0.9, h = atanh, hinv = tanh)
stopCluster(cl)
## or
library(boot)
run1 <- function(...) {
library(boot)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
boot(cd4, corr, R = 500, sim = "parametric",
ran.gen = cd4.rg, mle = cd4.mle)
}
cl <- makeCluster(mc <- getOption("cl.cores", 2))
## to make this reproducible
clusterSetRNGStream(cl, 123)
cd4.boot <- do.call(c, parLapply(cl, seq_len(mc), run1))
boot.ci(cd4.boot, type = c("norm", "basic", "perc"),
conf = 0.9, h = atanh, hinv = tanh)
stopCluster(cl)
</pre>
</body></html>