In [1]:
# lapply
# list apply, always returns a list.

x = list(a=1:5, b=rnorm(10))
lapply(x, mean)

In [2]:
help(rnorm)

0,1
Normal {stats},R Documentation

0,1
"x, q",vector of quantiles.
p,vector of probabilities.
n,"number of observations. If length(n) > 1, the length is taken to be the number required."
mean,vector of means.
sd,vector of standard deviations.
"log, log.p","logical; if TRUE, probabilities p are given as log(p)."
lower.tail,"logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]."


In [3]:
x

In [4]:
# random uniform has the same probability for all values.
help(runif)

0,1
Uniform {stats},R Documentation

0,1
"x, q",vector of quantiles.
p,vector of probabilities.
n,"number of observations. If length(n) > 1, the length is taken to be the number required."
"min, max",lower and upper limits of the distribution. Must be finite.
"log, log.p","logical; if TRUE, probabilities p are given as log(p)."
lower.tail,"logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]."


In [5]:
# we can use the ... var to pass variable we want to give the function that we apply to the list.
x = 1:4
lapply(x, runif, min=0, max=10)

In [6]:
# lapply often use "anonymous functions"
x = list(a=matrix(1:4, 2,2), b=matrix(1:6, 3, 2))
x

0,1
1,3
2,4

0,1
1,4
2,5
3,6


In [7]:
# use this simple anonymous function, we can get the first column of each matrix in the list.
lapply(x, function(elt) elt[,1])

<h2>sapply</h2>

In [19]:
str(sapply)
str(lapply)

function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)  
function (X, FUN, ...)  


In [10]:
# sapply is simplifying the result of lapply if possible
# if it returns a list with only numbers in each element, than it returns a vector.
x = list(a=1:4, b=rnorm(4), c=runif(4), d=rnorm(10), e=rnorm(100,5))
lapply(x, mean)
sapply(x, mean)

<h2>apply</h2>
It's not really faster than loop, the only reason we use, is because it fits in one line.
Useful in command line

In [14]:
x = matrix(rnorm(200), 20, 10)

# 2 is the dimension(margin) we want to keep,
# 2 means, we're calculating the mean of each column of this matrix.
apply(x, 2, mean)
colMeans(x)

In [15]:
# preserve the row.
apply(x, 1, sum)
rowSums(x)

In [17]:
# there are some optimized functions
# way much faster !!!!
rowSums = apply(x, 2, sum)
rowMeans = apply(x, 2, mean)
colSums = apply(x, 1, sum)
colMeans = apply(x, 1, mean)

In [18]:
help(quantile)

0,1
quantile {stats},R Documentation

0,1
x,"numeric vector whose sample quantiles are wanted, or an object of a class for which a method has been defined (see also ‘details’). NA and NaN values are not allowed in numeric vectors unless na.rm is TRUE."
probs,"numeric vector of probabilities with values in [0,1]. (Values up to 2e-14 outside that range are accepted and moved to the nearby endpoint.)"
na.rm,"logical; if true, any NA and NaN's are removed from x before the quantiles are computed."
names,"logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs."
type,an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
...,further arguments passed to or from other methods.


In [24]:
quantile(c(1,2,3,4,5,6,7), prob=c(0.25, 0.75))

In [25]:
x = matrix(rnorm(200), 20, 10)
apply(x, 1, quantile, prob = c(0.25, 0.75))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
25%,-0.91238263,-0.28163436,-0.65830839,-0.93911855,-0.19701391,-0.42185601,-0.1987485,-0.41206504,-0.78366753,-0.16032532,-0.9018405,-0.75115017,0.08186614,-0.69705779,-0.79185089,-1.23644854,-0.51975539,-0.48334279,-0.51557741,-0.06126219
75%,0.48541531,0.78178713,1.07129981,0.55389678,0.95086062,0.66749134,0.33714949,0.31405143,0.19963913,1.30813568,-0.0620048,0.66612102,1.00288862,0.75911176,0.09213262,0.29772381,0.9715987,0.7329413,-0.20318658,0.87167855


In [26]:
help(array)

0,1
array {base},R Documentation

0,1
data,a vector (including a list or expression vector) giving data to fill the array. Non-atomic classed objects are coerced by as.vector.
dim,"the dim attribute for the array to be created, that is an integer vector of length one or more giving the maximal indices in each dimension."
dimnames,"either NULL or the names for the dimensions. This must a list (or it will be ignored) with one component for each dimension, either NULL or a character vector of the length given by dim for that dimension. The list can be named, and the list names will be used as names for the dimensions. If the list is shorter than the number of dimensions, it is extended by NULLs to the length required."
x,an R object.
...,additional arguments to be passed to or from methods.


In [38]:
# Play with Array
# A two-dimensional array is the same thing as a matrix.

# One-dimensional arrays often look like vectors, but may be handled differently by some 
# functions: str does distinguish them in recent versions of R.

a = array(1:200, c(4,5,10))
a[,,2]
a[1,,]

0,1,2,3,4
21,25,29,33,37
22,26,30,34,38
23,27,31,35,39
24,28,32,36,40


0,1,2,3,4,5,6,7,8,9
1,21,41,61,81,101,121,141,161,181
5,25,45,65,85,105,125,145,165,185
9,29,49,69,89,109,129,149,169,189
13,33,53,73,93,113,133,153,173,193
17,37,57,77,97,117,137,157,177,197


In [48]:
apply(a, c(2,3), mean)

0,1,2,3,4,5,6,7,8,9
2.5,22.5,42.5,62.5,82.5,102.5,122.5,142.5,162.5,182.5
6.5,26.5,46.5,66.5,86.5,106.5,126.5,146.5,166.5,186.5
10.5,30.5,50.5,70.5,90.5,110.5,130.5,150.5,170.5,190.5
14.5,34.5,54.5,74.5,94.5,114.5,134.5,154.5,174.5,194.5
18.5,38.5,58.5,78.5,98.5,118.5,138.5,158.5,178.5,198.5


In [49]:
apply(a, c(1), mean)
rowMeans(a, dims=1)

In [47]:
colMeans(a, dims=1)

0,1,2,3,4,5,6,7,8,9
2.5,22.5,42.5,62.5,82.5,102.5,122.5,142.5,162.5,182.5
6.5,26.5,46.5,66.5,86.5,106.5,126.5,146.5,166.5,186.5
10.5,30.5,50.5,70.5,90.5,110.5,130.5,150.5,170.5,190.5
14.5,34.5,54.5,74.5,94.5,114.5,134.5,154.5,174.5,194.5
18.5,38.5,58.5,78.5,98.5,118.5,138.5,158.5,178.5,198.5


<h2>mapply</h2>
a multivariate apply. 
For example, we might use for loop and loop through two list at the same time, so we can access both list.
and mapply is here for the same usage.

In [50]:
help(rep)

0,1
rep {base},R Documentation

0,1
x,a vector (of any mode including a list) or a factor or (for rep only) a POSIXct or POSIXlt or Date object; or an S4 object containing such an object.
...,"further arguments to be passed to or from other methods. For the internal default method these can include: timesA integer vector giving the (non-negative) number of times to repeat each element if of length length(x), or to repeat the whole vector if of length 1. Negative or NA values are an error. length.outnon-negative integer. The desired length of the output vector. Other inputs will be coerced to an integer vector and the first element taken. Ignored if NA or invalid. eachnon-negative integer. Each element of x is repeated each times. Other inputs will be coerced to an integer vector and the first element taken. Treated as 1 if NA or invalid."
times,see ....
length.out,non-negative integer: the desired length of the output vector.


In [51]:
rep(1:4, 2)

In [52]:
rep(1:4, each=2)

In [55]:
rep(1:4, c(2,2,3,4))

In [56]:
LETTERS

In [59]:
x <- factor(LETTERS[1:4])
names(x) <- letters[1:4]
x
rep(x, 2)
rep(x, each = 2)

In [60]:
mapply(rep, 1:4, 4:1)

In [62]:
noise = function(n, mean, sd){
    rnorm(n, mean, sd)
}

noise(5, 3, 3)

In [64]:
mapply(noise, c(4,4,4,4), 4:7, c(3,3,3,3) )
mapply(noise, 1:5, 1:5, 2)

0,1,2,3
4.501764,7.114038,4.954372,5.885024
6.350176,7.786993,3.63009,11.083846
5.219416,5.844742,5.407422,7.952227
5.767866,3.460245,5.19162,7.606983


<h1>Instant Vectorization</h1>
If there is a function who doesn't support vectors, by using mapply kind of instantly makes it vector compatable.

<h1>tapply</h1>

In [17]:
# don't know why it's called tapply.
x = c(rnorm(10), runif(10), rnorm(10))
# gl means generate factor levels
f = gl(3, 10)
f
tapply(x, f, mean)

In [68]:
str(gl)
help(gl)

function (n, k, length = n * k, labels = seq_len(n), ordered = FALSE)  


0,1
gl {base},R Documentation

0,1
n,an integer giving the number of levels.
k,an integer giving the number of replications.
length,an integer giving the length of the result.
labels,an optional vector of labels for the resulting factor levels.
ordered,a logical indicating whether the result should be ordered or not.


In [69]:
# the default is to simplify it, if we specify not to do so, 
# we'll get 
tapply(x, f, mean, simplify=FALSE)

In [70]:
tapply(x,f, range)

<h2>Split</h2>

In [74]:
# Split
# split according to the factor level given
str(split)

function (x, f, drop = FALSE, ...)  


In [73]:
x = c(rnorm(10), runif(10), rnorm(10))
f = gl(3, 10)

# split always returns a list back
split(x, f)

In [75]:
# often use lapply and split together
lapply(split(x,f), mean)

In [76]:
# use split which much more complex project
# default datasets lib
library(datasets)

In [77]:
head(airquality)

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6


In [89]:
s = split(airquality, airquality$Month)
s["7"]
lapply(s, function(x){colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm=TRUE)})

Unnamed: 0,Ozone,Solar.R,Wind,Temp,Month,Day
62,135.0,269,4.1,84,7,1
63,49.0,248,9.2,85,7,2
64,32.0,236,9.2,81,7,3
65,,101,10.9,84,7,4
66,64.0,175,4.6,83,7,5
67,40.0,314,10.9,83,7,6
68,77.0,276,5.1,88,7,7
69,97.0,267,6.3,92,7,8
70,97.0,272,5.7,92,7,9
71,85.0,175,7.4,89,7,10


In [88]:
class(s)
class(s["7"])
class(s[["7"]])

In [94]:
# if we use sapply, it's going to be simplified into matrix.
sapply(s, function(x){colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm=TRUE)})

Unnamed: 0,5,6,7,8,9
Ozone,23.61538,29.44444,59.11538,59.96154,31.44828
Solar.R,181.2963,190.1667,216.4839,171.8571,167.4333
Wind,11.622581,10.266667,8.941935,8.793548,10.18


In [95]:
# Splittin on more than one factor
x = rnorm(10)
# two grouping var here
f1 = gl(2,5)
f2 = gl(5,2)

# looks like interaction gives you mixed up factors.
# the result is also a factors
interaction(f1, f2)


In [98]:
split(x, list(f1, f2))

In [99]:
# remove empty levels by using DROP argument.
split(x, list(f1, f2), drop=TRUE)

<h2> Debuggin Tools</h2>
1. Message:
2. Warning: not fatal
3. Error: Fatal Problem, stops the function
4. Condition: All three above are conditions.

In [1]:
# Warning
log(-1)

In log(-1): 產生了 NaNs

In [2]:
# invisible, prevent auto printing
printmessage = function(x){
    if(x>0)
        print("x is greater than zero")
    else
        print("x is less than or equal to zero")

    invisible(x)
}

In [4]:
printmessage(1)
printmessage(NA)

[1] "x is greater than zero"


ERROR: Error in if (x > 0) print("x is greater than zero") else print("x is less than or equal to zero"): 需要 TRUE/FALSE 值的地方有缺值


In [6]:
printmessage_2 = function(x){
    if(is.na(x))
        print("x is a missing value")
    else if(x > 0)
        print ("x is greater than zero")
    else 
        print ("x is less than or equal to zero")
            
    invisible(x)
}

x = log(-1)
printmessage_2(x)

In log(-1): 產生了 NaNs

[1] "x is a missing value"


<h2> Debugging Tools in R</h2>
1. Traceback
2. debug
3. browser
4. trace
5. recover

In [9]:
# seems like we can't use the debuggin tool here Q___Q

mean(y)
traceback()

ERROR: Error in mean(y): 找不到物件 'y'


No traceback available 


In [10]:
lm(y- v)

ERROR: Error in stats::model.frame(formula = y - v, drop.unused.levels = TRUE): 找不到物件 'y'


In [11]:
traceback()

No traceback available 


In [12]:
debug(lm)

In [13]:
lm(y-x)

debugging in: lm(y - x)
debug: {
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w)) 
        stop("'weights' must be a numeric vector")
    offset <- as.vector(model.offset(mf))
    if (!is.null(offset)) {
        if (length(offset) != NROW(y)) 
            stop(gettextf("number of offsets is %d, should equal %d (number of observations)", 
                length(offset), NROW(y)), domain = NA)
    }
    if (is.empty.model(mt)) {
        x <- NULL
     

ERROR: Error in stats::model.frame(formula = y - x, drop.unused.levels = TRUE): 找不到物件 'y'


In [14]:
Browse[2]

ERROR: Error in eval(expr, envir, enclos): 找不到物件 'Browse'


<h3>Recover</h3>

In [15]:
options(error=recover)

In [16]:
read.csv("nosuchfile.csv")

In file(file, "rt"): 無法開啟檔案 'nosuchfile.csv' ：No such file or directory

ERROR: Error in file(file, "rt"): 無法開啟連結
