## Loop Functions:

- **lapply** : Loop over a list and evaluate a function on each element
- **sapply** ; Same as lapply but try to simplify the result
- **apply** : Apply a function over the margins of an array
- **tapply** : Apply a function over subsets of a vector
- **mapply** : Multivariate version of *lapply*

Useful auxiliary function: **split** (used in conjunction with *lapply*)

## lapply:

In [None]:
function (X, FUN, ...){
    FUN <- mat.fun(FUN)
    if(!is.vector(X) || is.object(X))
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}

In [1]:
x <- list(a = 1:5, b = rnorm(10))

In [2]:
lapply(x, mean)

In [3]:
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20,1), d = rnorm(100,5))

In [4]:
lapply(x, mean)

In [5]:
x <- 1:4

In [6]:
lapply(x, runif)

basically doing runif(x)

In [9]:
x <- 1:4
lapply(x, runif, min = 0, max = 10)

In [10]:
x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))

In [11]:
x

0,1
1,3
2,4

0,1
1,4
2,5
3,6


## Using anonymous functions:

The following function extracts the first column

In [12]:
lapply(x, function(elt) elt[,1])

This function doesn't exist. it only exists within the context of the lapply. 

## sapply

- Simplifies the output
- If the result is a list, a vecto r is returned
- If result is list with vectors of same length, matrix is returned
- If it can't figure out, then list is returned

In [13]:
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20,1), d = rnorm(100,5))

In [14]:
lapply(x, mean)

In [15]:
sapply(x, mean)

## Apply

- applied over the margins
- applied on rows/columns of matrix
- general arrays
- not faster than loop

In [17]:
str(apply)

function (X, MARGIN, FUN, ...)  


In [2]:
x <- matrix(rnorm(200), 20, 10)

In [3]:
apply(x, 2, mean)
# mean along the columns

In [4]:
apply(x, 1, mean)
# mean along the rows

Shortcut Functions:

- rowSums = apply(x, 1, sum)
- rowMeans = apply(x, 1, mean)
- colSums = apply(x, 2, sum)
- colMeans = apply(x, 2, mean)

In [6]:
x <- matrix(rnorm(200), 20, 10)

In [8]:
# calculate the 25th and the 75th percentile along the rows:

apply(x, 1, quantile, probs = c(0.25, 0.75))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
25%,-1.522126,-0.04185278,-1.015911,-0.3908364,-0.86665071,-0.2559881,-0.7912073,-0.2597336,-0.7319669,-0.3921775,-0.3022219,-0.4862302,-0.4843043,-0.6325084,-0.7428567,-0.5166707,-0.2488422,-1.09300027,-0.08303857,-0.8632086
75%,0.6390472,0.85053279,1.078312,0.5439737,0.02564567,0.9952905,0.1856007,0.6756005,0.209747,0.8375178,0.3504602,0.233172,0.8907894,0.7292796,0.4393425,0.5174418,0.3718593,-0.02136504,1.40913167,0.9673943


In [9]:
a <- array(rnorm(2*2*10), c(2,2,10))

In [10]:
apply(a, c(1,2), mean)

0,1
-0.05557865,-0.05218854
-0.16845684,0.19109441


In [11]:
rowMeans(a, dim = 2)

0,1
-0.05557865,-0.05218854
-0.16845684,0.19109441


In [13]:
# ^ collapsing the 3rd dimension

## mapply

In [1]:
str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)  


multivariate version of lapply, sapply  
applies the function in parallel 

In [2]:
list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))

In [4]:
mapply(rep, 1:4, 4:1)

Vectorizing a function:

In [6]:
noise <- function(n, mean, sd){
    rnorm(n, mean, sd)
}

In [7]:
noise(5,1,2)

In [8]:
noise(1:5, 1:5, 2)

In [9]:
mapply(noise, 1:5, 1:5, 2)

This is essentially the same as: 

In [10]:
list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))

So, mapply basically allows vectorization of functions which do not allow vectorized inputs.

## Tapply

Apply a function over subsets of a vector

In [12]:
str(tapply)

function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)  


Example: Take Group means:

In [13]:
x <- c(rnorm(10), runif(10), rnorm(10,1)) # 3 groups of random data
f <- gl(3,10) # set group indices
f

In [14]:
tapply(x, f, mean) # mean across the 3 groups

Personally, though, I feel that this is a clumsy method.

In [15]:
tapply(x, f, mean, simplify = FALSE) # returns a list

In [17]:
tapply(x, f, range) # calculate the range

## Split

In [18]:
str(split)

function (x, f, drop = FALSE, ...)  


splits vectors/objects and splits it into groups determined by a factor or list of factors

In [20]:
x <- c(rnorm(10), runif(10), rnorm(10,1)) # 3 groups of random data
f <- gl(3,10) # set group indices
split(x,f) # splitting into 3 parts

Common idea: using lapply and split in conjunction

In [21]:
lapply(split(x,f), mean)

In [22]:
# same as :
tapply(x, f, mean)

In [25]:
library(datasets)
head(airquality)

Ozone,Solar.R,Wind,Temp,Month,Day
41.0,190.0,7.4,67,5,1
36.0,118.0,8.0,72,5,2
12.0,149.0,12.6,74,5,3
18.0,313.0,11.5,62,5,4
,,14.3,56,5,5
28.0,,14.9,66,5,6


In [27]:
s <- split(airquality, airquality$Month) # split according to the months

In [31]:
lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

In [32]:
sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")])) #simplifies the results

Unnamed: 0,5,6,7,8,9
Ozone,,,,,
Solar.R,,190.16667,216.483871,,167.4333
Wind,11.62258,10.26667,8.941935,8.793548,10.18


In [33]:
sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE)) #remove missing values

Unnamed: 0,5,6,7,8,9
Ozone,23.61538,29.44444,59.115385,59.961538,31.44828
Solar.R,181.2963,190.16667,216.483871,171.857143,167.43333
Wind,11.62258,10.26667,8.941935,8.793548,10.18


Splitting on More than one level:

In [34]:
x <- rnorm(10)
f1 <- gl(2,5)
f2 <- gl(5,2)
f1

In [35]:
f2

In [36]:
interaction(f1,f2)

In [38]:
str(split(x, list(f1,f2)))

List of 10
 $ 1.1: num [1:2] -0.0785 -0.7572
 $ 2.1: num(0) 
 $ 1.2: num [1:2] 1.17 -1.31
 $ 2.2: num(0) 
 $ 1.3: num 1.14
 $ 2.3: num -0.232
 $ 1.4: num(0) 
 $ 2.4: num [1:2] -1.454 -0.747
 $ 1.5: num(0) 
 $ 2.5: num [1:2] 0.877 -0.662


In [41]:
str(split(x, list(f1,f2), drop = TRUE)) # drop the empty levels

List of 6
 $ 1.1: num [1:2] -0.0785 -0.7572
 $ 1.2: num [1:2] 1.17 -1.31
 $ 1.3: num 1.14
 $ 2.3: num -0.232
 $ 2.4: num [1:2] -1.454 -0.747
 $ 2.5: num [1:2] 0.877 -0.662
