# lapply, sapply, vapply, apply, tapply, mapply #

Writing for-while loops are not particulary easy when working with the command line. There are some functons which implement looping to make life easier

- lapply: loop over a list and evaluate a function on each element
- sapply: same as lapply but try to simplify the result
- vapply: similar to sapply but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.
- apply: apply a function over the margins of an array
- tapply: apply a function over subsets of a vector
- mapply: multivariate version of lapply.

An auxiliary function split is also useful, particulary in conjunction with lapply

## lapply ##

In [13]:
str(lapply)

function (X, FUN, ...)  


lapply takes three arguments:
- a list (if not a list, it will be coerced with as.list)
- a function
- other arguments via "..." argument (arguments that goes with the function)

lapply always return a list! ('l' stand for list)

In [3]:
x <- list(a = 1:5, b = rnorm(10))
lapply(x,mean)

lapply can make use of anonymous functions

In [1]:
# extract the first column of each matrix
x <- list(a = matrix(1:4, 2,2), b = matrix(1:6, 3,2))
lapply(x, function(elt) elt[,1]) # function(elt) only exists insdie lapply

## sapply ##

sapply will try to simplify the result of lapply if possilbe
- if the result is a list where every elementh is length 1, then a vector is returned
- if the result is a lit where every element is a vector of the same length (>1), a matrix is returned
- otherwise a list is returned

In [11]:
x <- list(a = 1:5, b = rnorm(10))
sapply(x, mean)

In [11]:
sapply(iris, class)

## vapply ##

Aunque sapply trata de mostrar los resultados de la forma más adecuada, esto puede ser causa de problemas si en un script esperamos un determinado tipo. **vapply** permite especificar el tipo devuelto. En el caso de que el tipo devuelto no sea igual al especificado, vapply generará un error, parando la ejecución del script

In [9]:
iris_min_max <- sapply(iris[,1:4], range)
iris_min_max
class(iris_min_max)
vapply(iris[,1:4], range, integer(2)) # incorrecto; los valores devueltos son numericos

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
4.3,2.0,1.0,0.1
7.9,4.4,6.9,2.5


ERROR: Error in vapply(iris[, 1:4], range, integer(2)): valores deben ser del tipo 'integer',
pero el resultado FUN(X[[1]])  es del tipo  'double'


In [10]:
vapply(iris[,1:4], range, numeric(2))

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
4.3,2.0,1.0,0.1
7.9,4.4,6.9,2.5


## apply ##

apply is used to evaluate a function (often an anonymous one) over the margins of an array
- it is most often used to apply a functoin to the rows or columns of a matrix
- it can be used with general arrays, taking the average of an array of matrices
- it is not really faster than writing a loop, but it works in one line!

In [12]:
str(apply)

function (X, MARGIN, FUN, ...)  


apply takes thre arguments:
- an array (X)
- MARGIN is an integer vector indicating which margin should be "retained"
- FUN is a function to be applied
- "..." is for other arguments to be passed to FUN

In [21]:
x <- matrix(rnorm(200), 20, 10) # matrix 20 rows by 10 columns of random 
apply(x,2,mean) # margin = 2 means calculate the mean of the columns
apply(x,1,sum) # margins = 1, sumar las filas

for sums and means of matrix dimensions, there are functions much faster:
- rowSums = apply (x,1,sum)
- rowMeans = apply(x,1,mean)
- colSums = apply(x,2,sum)
- colMeans = apply(x,2, mean)

Los argumentos adicionales de las funciones que usemos con apply, llevan sus argumentos a continuación de la función. En este ejemplo usamos la función **quantile** que requiere de un vector con las cantidades que queremos calcular. **Quantile** devuelve dos valores. **Apply** creará una matriz con estos valores

In [24]:
x <- matrix(rnorm(200), 20, 10)
apply(x,1, quantile, probs = c(0.25,0.75)) # calculate the 25th  75th percentile

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
25%,-1.1281165,-0.6551563,-1.2461853,-0.2946508,-0.6462725,-0.1821137,-0.3555318,-0.1891911,-0.5374931,-0.7787305,-0.75984242,-0.3472447,0.003965777,-0.40480287,-0.09510317,-0.616001926,-1.033804,-0.5833867,-0.8369061,-0.7461815
75%,0.2968702,0.6246136,0.9678086,0.5995795,0.5054866,0.788032,1.0509892,0.8040874,0.2328972,0.7482307,0.08840117,0.3565331,0.941734674,0.08119203,0.87157136,-0.005301727,0.355121,0.5950902,1.7205051,0.4763815


In [33]:
x <- array(rnorm(2 * 2 *10), c(2,2,10)) # matriz 2x2x10
apply(x,c(1,2),mean) # conservamos las dos primeras dimensiones.

0,1
0.2512754,-0.5248534
0.2748219,0.5271803


In [34]:
rowMeans(x, dims = 2) # equivalente a la instrucción anterior

0,1
0.2512754,-0.5248534
0.2748219,0.5271803


## mapply ##

In [35]:
str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)  


**mappy** is a loop function that applies a function in parallel over a set of arguments (lapply, sapply, apply works over one element only)

- FUN is a function to apply
- "..." los argumentos sobre los que se aplicará la función
- MoreArgs is a list of other arguments to FUN
- SIMPLIFY indicates whether the result should be simplified

In [None]:
x <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
y <- mapply(rep, 1:4, 4:1) # igual resultado

In [42]:
# generate random normal noise
# number of observations, value of the mean, and standard deviation
noise <- function(n,mean,sd){
    rnorm(n, mean, sd)
}
noise(5,1,2) # 5 random variables with mean=1 and std=2
noise(1:5, 1:5, 2) # no funciona si quiero 5 vectores de longitud 1:5 y media:1:5
mapply(noise, 1:5, 1:5, 2) # es una manera de vectorizar una función que no permite ser vectorizada
list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2)) # codigo equivalente

## tapply ##

In [43]:
str(tapply)

function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)  


tapply is used to apply a function over subsets of a vector.
- X is a vector
- INDEX is a factor or a list of factors (or else they are coarced to factors)
- FUN is a function to be applied
- "..." contains other arguments to be passed FUN
- simplify, should we simplify the result?

the first argument is a vector, the second argument is a vector of the same length which identifies which element of the numeric vector is in.

In [46]:
x <- c(rnorm(10), runif(10), rnorm(10,1)) #crea un vector con tres grupos de 10 valores
f <- gl(3,10) # creamos un vector con factores de valor 1,2,3 repetidos 10 veces cada uno
tapply(x,f,mean) # calcula la media de cadauno de los grupos

In [1]:
head(iris)
tapply(iris$Sepal.Length, iris$Species,mean) # media de Petal.Length en especie

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa


## split ##

In [47]:
str(split)

function (x, f, drop = FALSE, ...)  


split takes a vector or other objects and splits it into groups determined by a factor or list of factors
- x is a vector (or list) or data frame
- f is a factor (or coerced to one) or a list of factors
- drop indicates whether empty factors levels should be dropped

no es una función tipo loop, sino una función que se puede usar junto a apply o sapply.

Devuelve una  lista! Lo cual permite usarlo con apply o sapply

In [49]:
x <- c(rnorm(10), runif(10), rnorm(10,1))
f <- gl(3,10)
split(x,f) # separa el vector x en 3 grupos definidos por el vector f

In [51]:
library(datasets)
head(airquality)

Ozone,Solar.R,Wind,Temp,Month,Day
41.0,190.0,7.4,67,5,1
36.0,118.0,8.0,72,5,2
12.0,149.0,12.6,74,5,3
18.0,313.0,11.5,62,5,4
,,14.3,56,5,5
28.0,,14.9,66,5,6


In [53]:
s <- split(airquality, airquality$Month) # divide el dataset según los meses
lapply(s, function(x) colMeans(x[,c("Ozone", "Solar.R", "Wind")]))

Hay valores NA

In [55]:
lapply(s, function(x) colMeans(x[,c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))

Splitting on more than one level

In [5]:
x <- rnorm(10)
f1 <- gl(2,5) # factor f1 with two levels repeated 5 times
f2 <- gl(5,2) # factor f2 with 5 levels repeated 2 times
f1
f2
interaction(f1,f2) #combine all the levels

In [6]:
str(split(x,list(f1,f2))) # list calls interaction

List of 10
 $ 1.1: num [1:2] -0.022 0.771
 $ 2.1: num(0) 
 $ 1.2: num [1:2] -0.156 -1.632
 $ 2.2: num(0) 
 $ 1.3: num 0.601
 $ 2.3: num -0.0258
 $ 1.4: num(0) 
 $ 2.4: num [1:2] 0.696 0.729
 $ 1.5: num(0) 
 $ 2.5: num [1:2] 0.48 0.699
