# Exercises

**1. Fix each of the following common data frame subsetting errors:**

```
mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]
```

```
mtcars[mcars$cyl == 4, ]
mtcars[1:4, ]
mtcars[mtcars$cyl <= 5, ]
mtcars[mtcars$cyl %in% c(4, 6), ]
```

**2. Why does `x <- 1:5; x[NA]` yield five missing values? (Hint: why is it different from `x[NA_real_]`?)**

Since `NA` is of type `logical`, it tests each element against `NA`, thus yielding a vector of length 5 where each element is `NA`. `NA_real_` doesn't do this because `NA_real_` is not of type `logical`.

**3. What does `upper.tri()` return? How does subsetting a matrix with it work? Do we need any additional subsetting rules to describe its behaviour?**

    x <- outer(1:5, 1:5, FUN = "*")
    x[upper.tri(x)]

`upper.tri()` returns a logical matrix that is `TRUE` for values in the upper triangle of the input matrix, and `FALSE` otherwise.

You can subset a matrix with it, like `x[upper.tri(x)]`, however, since `[]` reduces things to the lowest dimensionality, this produces a vector of the upper triangle elements, and positions in the original matrix would need to be hand computed from that. It's still useful if you want to set elements to the upper triangle of the input though.

**4. Why does `mtcars[1:20]` return an error? How does it differ from the similar `mtcars[1:20, ]`?**

Subsetting a data frame with one vector subsets like a list, whereas subsetting with two vectors subsets like a matrix. So, `mtcars[1:20]` is interpreted as "select the first 20 columns of `mtcars`", and there aren't 20 columns. `mtcars[1:20, ]`, however is interpreted as "select rows `1:20`, and all columns".

Note that `as.matrix(mtcars)[1:20]` works, though.

**5. Implement your own function that extracts the diagonal entries from a matrix (it should behave like `diag(x)` where `x` is a matrix).**

In [1]:
diag2 <- function(x) {
    sapply(seq(ncol(x)), function(i) x[i, i])
}

**6. What does `df[is.na(df)] <- 0` do? How does it work?**

`is.na(df)` returns a logical matrix with the same dimensions as `df`, where the elements are `TRUE` if the same element in `df` is `NA`. So, `df[is.na(df)]` returns all the elements in `df` that are `NA`, as a vector. By then assigning `df[is.na(df)] <- 0`, we are setting all the `NA` elements with `df` to `0`.

**7. Given a linear model, e.g., `mod <- lm(mpg ~ wt, data = mtcars)`, extract the residual degrees of freedom. Extract the R squared from the model summary (`summary(mod)`)**

In [11]:
mod <- lm(mpg ~ wt, data=mtcars)
print(paste0('Residual dof: ', mod$df.residual,
             ' R^2: ', summary(mod)$r.squared))

[1] "Residual dof: 30 R^2: 0.752832793658264"


**8. How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?**

In [13]:
d <- data.frame(a=1:3, b=1:3, c=1:3, d=1:3)

# Permuting columns
d[, sample(ncol(d))]

# Permuting both
d[sample(nrow(d)), sample(ncol(d))]

c,b,d,a
1,1,1,1
2,2,2,2
3,3,3,3


Unnamed: 0,d,a,b,c
3,3,3,3,3
2,2,2,2,2
1,1,1,1,1


**9. How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?**

In [15]:
# Selecting random sample of m rows from data frame
m <- 4
mtcars[sample(nrow(mtcars), m), ]

# If they were required to be contiguous
idx <- sample(nrow(mtcars), 1)
mtcars[idx:(idx+m), ]

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Camaro Z28,13.3,8,350.0,245,3.73,3.84,15.41,0,0,3,4
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3


Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2


**10. How could you put the columns in a data frame in alphabetical order?**

In [20]:
d <- data.frame(c=1:3, b=1:3, a=1:3)
d[, sort(colnames(d))]

a,b,c
1,1,1
2,2,2
3,3,3
