From ef89a171eaa7cfe6980d9fd7ff1d7fbe6bd73f48 Mon Sep 17 00:00:00 2001 From: lindbrook Date: Thu, 16 Jan 2014 08:32:11 -0800 Subject: [PATCH] biggest --- tidy-data.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidy-data.tex b/tidy-data.tex index 707ca5f..673a6a0 100644 --- a/tidy-data.tex +++ b/tidy-data.tex @@ -322,7 +322,7 @@ \subsection{Manipulation} In \proglang{R}, filtering and transforming are performed by the base \proglang{R} functions \code{subset()} and \code{transform()}. These are input and output-tidy. The \code{aggregate()} function performs group-wise aggregation. It is input-tidy. Provided that a single aggregation method is used, it is also output-tidy . The \pkg{plyr} package provides tidy \code{summarise()} and \code{arrange()} functions for aggregation and sorting. -The four verbs can be, and often are, modified by the ``by'' preposition. We often need to perform group-wise aggregation, transformation and subsetting, to pick the biggest ?? in each group, to average over replicates and so on. Combining a verb with a ``by'' operator is a concise way to apply that operation to subsets of a data frame. Many \proglang{SAS} {\sc proc}s possess a {\sc by} statement which allows the operation to be performed by group. They are generally input-tidy. Base \proglang{R} possesses a \code{by()} function, which is input-tidy, but, because it produces a list, is not output-tidy. The \code{ddply()} function from the \pkg{plyr} package is a tidy alternative. +The four verbs can be, and often are, modified by the ``by'' preposition. We often need to perform group-wise aggregation, transformation and subsetting, to find the largest value in each group, to average over replicates and so on. Combining a verb with a ``by'' operator is a concise way to apply that operation to subsets of a data frame. Many \proglang{SAS} {\sc proc}s possess a {\sc by} statement which allows the operation to be performed by group. They are generally input-tidy. Base \proglang{R} possesses a \code{by()} function, which is input-tidy, but, because it produces a list, is not output-tidy. The \code{ddply()} function from the \pkg{plyr} package is a tidy alternative. % Some aggregations occur so frequently they deserve their own optimised implementations. One such operation is (weighted) counting. Base R provides the {\tt table} function for this, but it is not output-tidy: it returns a multidimensional array. An tidy alternative is the {\tt count} function from {\tt plyr}, which returns a tidy dataset with a column for each of the input variables plus a new variable {\tt freq}, which records the number of records in each category.