Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

biggest #8

Merged
merged 1 commit into from Jan 17, 2014
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion tidy-data.tex
Expand Up @@ -322,7 +322,7 @@ \subsection{Manipulation}

In \proglang{R}, filtering and transforming are performed by the base \proglang{R} functions \code{subset()} and \code{transform()}. These are input and output-tidy. The \code{aggregate()} function performs group-wise aggregation. It is input-tidy. Provided that a single aggregation method is used, it is also output-tidy . The \pkg{plyr} package provides tidy \code{summarise()} and \code{arrange()} functions for aggregation and sorting.

The four verbs can be, and often are, modified by the ``by'' preposition. We often need to perform group-wise aggregation, transformation and subsetting, to pick the biggest ?? in each group, to average over replicates and so on. Combining a verb with a ``by'' operator is a concise way to apply that operation to subsets of a data frame. Many \proglang{SAS} {\sc proc}s possess a {\sc by} statement which allows the operation to be performed by group. They are generally input-tidy. Base \proglang{R} possesses a \code{by()} function, which is input-tidy, but, because it produces a list, is not output-tidy. The \code{ddply()} function from the \pkg{plyr} package is a tidy alternative.
The four verbs can be, and often are, modified by the ``by'' preposition. We often need to perform group-wise aggregation, transformation and subsetting, to find the largest value in each group, to average over replicates and so on. Combining a verb with a ``by'' operator is a concise way to apply that operation to subsets of a data frame. Many \proglang{SAS} {\sc proc}s possess a {\sc by} statement which allows the operation to be performed by group. They are generally input-tidy. Base \proglang{R} possesses a \code{by()} function, which is input-tidy, but, because it produces a list, is not output-tidy. The \code{ddply()} function from the \pkg{plyr} package is a tidy alternative.

% Some aggregations occur so frequently they deserve their own optimised implementations. One such operation is (weighted) counting. Base R provides the {\tt table} function for this, but it is not output-tidy: it returns a multidimensional array. An tidy alternative is the {\tt count} function from {\tt plyr}, which returns a tidy dataset with a column for each of the input variables plus a new variable {\tt freq}, which records the number of records in each category.

Expand Down