# Fun with the apply() family of functions

The apply() family are challenging to think about but extremely powerful. These functions allow us to use the full vector power of R to operate on subgroups of data frames, in a single line of code.

Some references may help you to think about these functions:

https://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/

https://www.datacamp.com/community/tutorials/r-tutorial-apply-family/

Let's assemble the simple student survey dataset for some extremely basic examples:

In [13]:
height = c(68, 75, 60)        # inches
weight = c(120,160,118)        # pounds
age = c(16, 17, 16)        # years
handed = c('L', 'R', 'R')     # dominant hand: R=right, L=left

# Here's our data frame:
data = data.frame(Height=height, 
                  Weight=weight,
                  Age=age, 
                  Hand=handed)

In [14]:
print(data)

  Height Weight Age Hand
1     68    120  16    L
2     75    160  17    R
3     60    118  16    R


### Use tapply() to operate on subgroups in the dataframe 

First argument: column(s) to operate on

Second argument: column to group on

Third argument: function to apply

What does R do here? First, group all samples in the dataset by their *Hand* value, then compute the *mean* of each group's *Height*.

In [20]:
means = tapply(data$Height, data$Hand, mean)
print(means)

   L    R 
68.0 67.5 


### Use by() to operate on subgroups in the dataframe 

First argument: column to operate on

Second argument: column to group on

Third argument: function to apply

In [21]:
by(data[,c("Height","Weight")], data$Age, colMeans)

data$Age: 16
Height Weight 
    64    119 
------------------------------------------------------------ 
data$Age: 17
Height Weight 
    75    160 

### Use sapply()

In [26]:
columns = list(data$Height, data$Weight, data$Age)
sapply(columns, mean)

Hopefully that makes the idea clear. Can you think of some examples on the iris or diamonds datasets?

In [30]:
# For example: mean diamond price by diamond cut
library(ggplot2)
by(diamonds[,"price"],diamonds$cut, colMeans)

diamonds$cut: Fair
[1] 4358.758
------------------------------------------------------------ 
diamonds$cut: Good
[1] 3928.864
------------------------------------------------------------ 
diamonds$cut: Very Good
[1] 3981.76
------------------------------------------------------------ 
diamonds$cut: Premium
[1] 4584.258
------------------------------------------------------------ 
diamonds$cut: Ideal
[1] 3457.542