# Loop functions and debugging

## lapply

lapply always returns a list, regardless of the input class

In [1]:
lapply

In [2]:
x <- list(a = 1:5, b = rnorm(10))
x

In [3]:
# Returns the mean of each element in the list
lapply(x, mean)

In [4]:
lapply(1:4, runif)

In [5]:
runif

In [6]:
# Give input to the runif function

In [7]:
lapply(1:4, runif, min = 0, max = 10)

In [8]:
x <- list(a = matrix(1:4, 2), b = matrix(1:6, 3))
x

0,1
1,3
2,4

0,1
1,4
2,5
3,6


**NOTE TO SELF: **
lapply and friends make heavy use of anonymous functions

In [9]:
# Go through each matrix and spit out the first column. 
# Similar to lambda function
lapply(x, function(myMatrix) myMatrix[, 1])

## sapply

Will simplyfy the result of lapply if possible:
- If result is a list with one element, a vector is returned.
- If the result is a list with every element is a vector of same length, a matrix is returned.
- If it can't figure it out, a list is returned.

In [10]:
lapply(1:4, function(x) x**2)

In [11]:
sapply(1:4, function(x) x**2)

### apply

Used to evaulate a function over the margins of an array:
- Over rows or columns in a matrix.
- Can be used with general arrays, e.g. taking the average of an array of matrices.
- Not really faster than writing a loop, but requires less typing.

In [12]:
str(apply)

function (X, MARGIN, FUN, ...)  


In [13]:
x <- matrix(rnorm(200), 20, 10)

In [14]:
apply(x, 2, mean) # Taking mean of the matrix columns

In [15]:
apply(x, 1, sum) # Summing up the rows

## Shortcuts

For sums and means of a matrix, there exist some shortcuts:
- rowSums = apply(x, 1, sum)
- rowMeans = apply(x, 1, mean)
- colSums = apply(x, 2, sum)
- colMeans = apply(x, 2, mean)

They are faster and should be used for large matrices

In [16]:
x <- matrix(rnorm(200), 20, 10)
apply(x, 2, quantile, probs = c(0.25, 0.75)) # Applying the built-in quantile function and giving it input

0,1,2,3,4,5,6,7,8,9,10
25%,-0.5189913,-0.9131652,-0.5449028,-0.7495604,-0.5278997,-0.309002,-1.2905993,-0.1750097,-0.745738,-0.6470574
75%,0.6973501,0.2725515,0.8300761,0.8745787,1.000794,1.1047977,0.939496,0.3653384,0.4036975,0.4443589


In [17]:
# Average matrix in array
a <- array(rnorm(2 * 2 * 3), c(2, 2, 3))
a
apply(a, c(1, 2), mean)

0,1
0.07104738,-1.31710442
-0.9417952,0.4307628


In [18]:
rowMeans(a, dims = 2)

0,1
0.07104738,-1.31710442
-0.9417952,0.4307628


### mapply

In [19]:
str(mapply)

function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)  


In [20]:
first = 1:3
second = 2:4

func <- function(x, y) {
    x / y
}
# similar to c[func(1, 2), func(2, 3), func(3, 4)]
mapply(func, first, second)

### tapply

tapply is used to apply a function over a subset of a vector. Don't know why it's called tapply though.

In [21]:
str(tapply)

function (X, INDEX, FUN = NULL, ..., simplify = TRUE)  


In [22]:
x <- c(rnorm(10), runif(10), rnorm(10, 1))
f <- gl(3, 10) # Generates a factor level vector (1, 2, 3), where each factor is repeated 10 times

In [23]:
f # Will use this to map to the data in x

In [24]:
# Take mean of each group
means <- tapply(x, f, mean)
means

ERROR: Error in dn[[2L]]: subscript out of bounds


In [25]:
for (m in means ) print(m) # Not sure why we get the error output above.

[1] -0.2990282
[1] 0.4455419
[1] 1.242498


In [26]:
mean(x[1:10]) # Same as the means[1]

### split
Takes a vector or other object and splits it into groups determined by a factor or list of factors

In [27]:
str(split)

function (x, f, drop = FALSE, ...)  


In [28]:
split(x, f)

In [29]:
lapply(split(x, f), mean) # Similar to what tapply performed. Returns back a list

### Splitting a data frame

In [30]:
library(datasets)

In [31]:
head(airquality)

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41.0,190.0,7.4,67.0,5.0,1.0
2,36.0,118.0,8.0,72.0,5.0,2.0
3,12.0,149.0,12.6,74.0,5.0,3.0
4,18.0,313.0,11.5,62.0,5.0,4.0
5,,,14.3,56.0,5.0,5.0
6,28.0,,14.9,66.0,5.0,6.0


In [32]:
s <- split(airquality, airquality$Month) # Create a dataframe for each month value (factor)

In [33]:
lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))

In [34]:
sout <- sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))
sout

Unnamed: 0,5,6,7,8,9
Ozone,23.61538,29.44444,59.11538,59.96154,31.44828
Solar.R,181.2963,190.1667,216.4839,171.8571,167.4333
Wind,11.622581,10.266667,8.941935,8.793548,10.18


In [35]:
cor(sout["Ozone", ], sout["Wind", ])

## Debugging

Indication that something is wrong
- message : generic notification
- warning : indication that something is wrong but not fatal
- error : fatal error. execution stops
- condition : every point above is a condition. this is something programmers use to create own conditions

In [36]:
log(-1) # Warning, you get a number back, NaN

In log(-1): NaNs produced

In [37]:
printmsg <- function(x) {
    if (x > 0) {
        print("x is greater than 0")
    } 
    else {
        print("x is less or equal to 0")
    }
    invisible(x)
}

printmsg2 <- function(x) {
    if (is.na(x)) {
        print("x is a missing values")
    }
    else if (x > 0) {
        print("x is greater than 0")
    } 
    else {
        print("x is less or equal to 0")
    }
    invisible(x) # Does not print out the returned value
}

In [38]:
printmsg(x = 3)
printmsg(NA) # We get an error now. Execution stops

[1] "x is greater than 0"


ERROR: Error in if (x > 0) {: missing value where TRUE/FALSE needed


In [39]:
printmsg2(NA)

[1] "x is a missing values"


In [40]:
# This is just an example
x <- log(-1)
printmsg2(x) # The function will now give you an answer, but log(-1) is not a good input.

In log(-1): NaNs produced

[1] "x is a missing values"


### Debugging tools in R

- traceback: prints out the function call stack after an error occurs. does nothing if there is no error.
- debug: flags a function for a "debug" mode which allows you to step through execution of a function one line at the time.
- browser: 
- trace: allows you to insert debuggin code into a function a specific place
- recover: 

### Using the tools

In [41]:
mean(r)
traceback() # Shows you the most recent error

ERROR: Error in mean(r): object 'r' not found


In [42]:
lm(y1 ~ x2)
traceback()

ERROR: Error in eval(expr, envir, enclos): object 'y1' not found


In [43]:
lm(y1 ~ x2)
debug(lm)

ERROR: Error in eval(expr, envir, enclos): object 'y1' not found


In [44]:
double <- function(x) return(x1**2)

addtwo <- function(x, y) {
    z <- x + 2*double(y)
    result <- x. + y
    return(result)
} 

In [45]:
debug(addtwo)
addtwo(1, 2) # Will find the first error in the double function

debugging in: addtwo(1, 2)
debug at <text>#3: {
    z <- x + 2 * double(y)
    result <- x. + y
    return(result)
}
debug at <text>#4: z <- x + 2 * double(y)


ERROR: Error in double(y): object 'x1' not found


In [46]:
options(error = recover)
read.csv("nosuchfile") # Will show where the error happens

In file(file, "rt"): cannot open file 'nosuchfile': No such file or directory

ERROR: Error in file(file, "rt"): cannot open the connection
