/
ddply.Rd
55 lines (50 loc) · 2.19 KB
/
ddply.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
\name{ddply}
\alias{ddply}
\title{Split data frame, apply function, and return results in a data frame.}
\usage{ddply(.data, .variables, .fun, ..., .progress="none",
.drop=TRUE, .parallel=FALSE)}
\description{
Split data frame, apply function, and return results in a
data frame. For each subset of a data frame, apply
function then combine results into a data frame
}
\details{
All plyr functions use the same split-apply-combine
strategy: they split the input into simpler pieces, apply
\code{.fun} to each piece, and then combine the pieces
into a single data structure. This function splits data
frames by variables and combines the result into a data
frame. If there are no results, then this function will
return a data frame with zero rows and columns
(\code{data.frame()}).
The most unambiguous behaviour is achieved when
\code{.fun} returns a data frame - in that case pieces
will be combined with \code{\link{rbind.fill}}. If
\code{.fun} returns an atomic vector of fixed length, it
will be \code{rbind}ed together and converted to a data
frame. Any other values will result in an error.
}
\keyword{manip}
\value{a data frame}
\references{Hadley Wickham (2011). The Split-Apply-Combine Strategy for
Data Analysis. Journal of Statistical Software, 40(1), 1-29.
\url{http://www.jstatsoft.org/v40/i01/}.}
\arguments{
\item{.data}{data frame to be processed}
\item{.variables}{variables to split data frame by, as quoted variables, a formula or character vector}
\item{.fun}{function to apply to each piece}
\item{.drop}{should combinations of variables that do not appear in the
data be preserved (FALSE) or dropped (TRUE, default)}
\item{...}{other arguments passed on to \code{.fun}}
\item{.progress}{name of the progress bar to use, see \code{\link{create_progress_bar}}}
\item{.parallel}{if \code{TRUE}, apply function in parallel, using parallel
backend provided by foreach}
}
\examples{ddply(baseball, .(year), "nrow")
ddply(baseball, .(lg), c("nrow", "ncol"))
rbi <- ddply(baseball, .(year), summarise,
mean_rbi = mean(rbi, na.rm = TRUE))
with(rbi, plot(year, mean_rbi, type="l"))
base2 <- ddply(baseball, .(id), transform,
career_year = year - min(year) + 1
)}