# Iteration
---

In [1]:
# Attaching libraries
library(tidyverse)

# install.packages('nycflights13')  
library(nycflights13)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


## Exercise 1

In [2]:
output <- vector("double", ncol(mtcars))
names(output) <- names(mtcars)
for (i in names(mtcars)) {
  output[i] <- mean(mtcars[[i]])
}
output

We need to use a list, not a character vector, since the class can have multiple values.

In [3]:
output <- vector("list", ncol(flights))
names(output) <- names(flights)
for (i in seq_along(flights)) {
    output[[i]] <- class(flights[[i]])
}
output

In [4]:
data("iris")
iris_uniq <- vector("double", ncol(iris))
names(iris_uniq) <- names(iris)
for (i in names(iris)) {
  iris_uniq[i] <- length(unique(iris[[i]]))
}
iris_uniq

---
## Exercise 2

In [5]:
str_c(letters, collapse = "")

In [6]:
x <- sample(100)
sd(x)

Or if there was a need to use the equation (e.g. for pedagogical reasons), then the functions `mean()` and `sum()` already work with vectors:

In [7]:
sqrt(sum((x - mean(x)) ^ 2) / (length(x) - 1))

---
## Exercise 3

In [8]:
show_mean <- function(df, digits = 2) {
    for (nm in names(df)){
        if (is.numeric(df[[nm]])) {
            cat(paste(nm, round(mean(df[[nm]]), digits)), "\n")
        }
    }
}

In [9]:
show_mean(iris)

Sepal.Length 5.84 
Sepal.Width 3.06 
Petal.Length 3.76 
Petal.Width 1.2 


---
## Exercise 4

In [10]:
col_summary2 <- function(df, fun = mean) {
  # Summarizes dataframe df using provided function fun. 
  # If any column is not logical it'll be excluded from calculations.
  # A warning will be issued if such column is found. 
  #
  # Args:
  #   df: the dataframe for summary
  #   fun: function  
  #
  # Returns:
  #   The a summary vector.
    
  # test whether each column is numeric
  numeric_cols <- vector("logical", ncol(df))
  for (i in seq_along(df)) {
    numeric_cols[[i]] <- is.numeric(df[[i]])
  }

  # number of numeric columns
  n <- sum(numeric_cols)
  if (n != ncol(df)) {
    warning("Not all the columns are numeric.")
  }

  # indexes of numeric columns
  idxs <- seq_along(df)[numeric_cols]

  out <- vector("double", n)
  names(out) <- colnames(df)[numeric_cols]
  for (i in seq(idxs)) {
    out[i] <- mean(df[[idxs[[i]]]])
  }
  out
}

Let's test it out:

In [11]:
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  x = letters[1:10],
  c = rnorm(10)
)

In [12]:
col_summary2(df, mean)

“Not all the columns are numeric.”

---
## Exercise 5

In [13]:
map_dbl(mtcars, mean)

In [14]:
map(nycflights13::flights, class)

In [15]:
x <- map(iris, unique)
map_int(x, length)

---
## Exercise 6

In [16]:
map_lgl(iris, is.factor)

---
## Exercise 7
The function map applies the function to each element of the vector.

In [17]:
map(1:5, runif)

---
## Exercise 8
This takes samples from normal distributions of means -2, -1, 0, 1, and 2, and returns a list with each element a numeric vectors of length 5.

In [18]:
map(-2:2, rnorm, n = 5)

However, if we use map_dbl it throws an error. map_dbl expects the function to return a numeric vector of length one.

In [19]:
# map_dbl(-2:2, rnorm, n = 5)
# Error: Result 1 is not a length 1 atomic vector