# Benchmark study: Computing group summaries

The following makes use of the packages *data.table*, *dplyr*, *memisc*, *rbenchmark*. You may need to
install them from [CRAN](https://cran.r-project.org) using the code
`install.packages(c("data.table","dplyr","memisc","rbenchmark"))` if you want to run this on your computer. (The packages are already installed on the notebook container, however.)

In [1]:
library(data.table)

In [2]:
library(memisc)

Loading required package: lattice
Loading required package: MASS

Attaching package: ‘memisc’

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

    as.array



In [3]:
library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:memisc’:

    collect, recode, rename, syms

The following object is masked from ‘package:MASS’:

    select

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



In [4]:
library(rbenchmark)

In [5]:
load("BData.RData")
load("SData.RData")

In [6]:
suppressMessages(grouped_summary_benchmark_1 <- benchmark(
    aggregate =
        aggregate(X1~a+b,data=BDataF, FUN=mean),
    `with + tapply` =
        with(BDataF,tapply(X1,list(a,b),mean)),
    data.table =
        BDataT[,mean(X1),by=.(a,b)],
    `group_by + summarize` =
        BDTbl %>% group_by(a,b) %>% summarize(mean(X1)),
    `select + group_by + summarize` =
        BDTbl %>% select(X1,a,b) %>% group_by(a,b) %>%
                  summarize(mean(X1)),
    withGroups =
        with(Groups(BDataF,~a+b),mean(X1)),
  columns = c("test","user.self","relative"),
  replications = 100,
  order = NULL,
  relative = "user.self"
))

In [7]:
suppressMessages(grouped_summary_benchmark_2 <- benchmark(
    aggregate =
        aggregate(X1~a+b,data=SDataF, FUN=mean),
    `with + tapply` =
        with(SDataF,tapply(X1,list(a,b),mean)),
    data.table =
        SDataT[,mean(X1),by=.(a,b)],
    `group_by + summarize` =
        SDTbl %>% group_by(a,b) %>% summarize(mean(X1)),
    `select + group_by + summarize` =
        SDTbl %>% select(X1,a,b) %>% group_by(a,b) %>%
                  summarize(mean(X1)),
    withGroups =
        with(Groups(SDataF,~a+b),mean(X1)),
  columns = c("test","user.self","relative"),
  replications = 100,
  order = NULL,
  relative = "user.self"
))

In [8]:
save(grouped_summary_benchmark_1,
     grouped_summary_benchmark_2,
     file="grouped-summary-benchmark.RData")