# Lecture 10.2: Vectors, lists, iteration & FP
In this lecture we'll learn about:
- [Atomic vectors](#Atomic-vectors), or what we have been calling vectors up to this point.
- [Lists](#Lists), a.k.a. recursive vectors.
- [Iteration](#Iteration): `for`/`while` loops.
- [Functional programming](#Functional-programming) (FP): functions that operate on other functions.



In [5]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Atomic vectors
Vectors are sequences of data elements in R. So far we have exclusively studied *atomic* vectors, which are sequences of elements that all have the same type. The two most important properties of a vector are its *type* and its *length*:

A single data element is called a *scalar.* An important thing to realize is that, to R, there is no distinction between scalars and vectors -- a scalar is simply an atomic vector of length one.

### Types of atomic vectors
The most important types of atomic vector are logical, numeric, and character.

Logical vectors hold the values `TRUE`, `FALSE` and `NA`.

Numeric vectors hold integers or doubles. By default, if you enter a number in R it is stored as a double:

If you want to explicitly store integers, you can use the as.integer function:

### Names

It is possible to assign names to each entry of a vector:

## Lists
Lists are another type of sequence data type found in R. Unlike atomic vectors, lists can hold objects of multiple types:

As the printout suggests, you can think of a list as a "vector of vectors". For this reason, they are sometimes referred to as "recursive vectors".

The `str` command will print out the **str**ucture of a vector:

Just like atomic vectors, you can name each individual entry of a list:    

### Sub-setting lists
Subsetting lists is a little more complex than subsetting atomic vectors. We will use the following example list:

#### `[]`
The `[]` operator extracts a sub-list. That is, the return type will always be a list:

As with atomic vectors, the single brackets accept integer, logical and character vectors:

#### `[[]]`
The double-brackets will extract a single component from the list:

You can also pass an integer vector to `[[]]`. This will index into successive levels of the list:

### Data frames are lists
Many data types in R are actually lists plus some additional attributes. For example, tibbles and data frames are both lists:

The `names()` of a tibble/data frame correspond to columns. This means we can use the list indexing methods shown above to access columns:

## Iteration
Iteration means, roughly, "running the same piece of code repeatedly". There are many ways to perform iteration in R. The one you have probably heard of is the *for loop*:
```{r}
for (<index> in <vector>) {
    [do something for each value of <index>]
}
```

For example, suppose we wanted to compute the median for each column of the following tibble:

In [6]:
set.seed(1)
df = tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
df

a,b,c,d
<dbl>,<dbl>,<dbl>,<dbl>
-0.6264538,1.51178117,0.91897737,1.35867955
0.1836433,0.38984324,0.7821363,-0.10278773
-0.8356286,-0.62124058,0.07456498,0.38767161
1.5952808,-2.21469989,-1.9893517,-0.05380504
0.3295078,1.12493092,0.61982575,-1.37705956
-0.8204684,-0.04493361,-0.05612874,-0.41499456
0.4874291,-0.01619026,-0.15579551,-0.39428995
0.7383247,0.94383621,-1.47075238,-0.0593134
0.5757814,0.8212212,-0.47815006,1.10002537
-0.3053884,0.59390132,0.41794156,0.76317575


One option is to repeatedly write out the call `median` for each column:

In [9]:
median(df$a)
median(df$b)
median(df$c)
median(df$d)

But this involves too much repetition, and we argued last lecture that repetition is generally a bad idea when coding. Instead, we can use a for loop to "loop over" each column of `df` and grab the median:

Again, this works because data frames are *lists*, and each element of the list is one column:

The for loop should have three components:
1. The *output*, in this case a vector with one entry per column of `df`.
2. The *sequence* of values along which we will iterate. Here we are using `seq_along(df)`, which generates a sequence of numbers from one up to `ncol(df)`. (This relies on the fact that a `data.frame` is really a list with one entry per column of data.)
3. The *body*, which is the piece of code that gets executed in each iteration of the loop. In the example above, the body first runs `output[[1]] = median(df[[1]])`, then `output[[2]] = median(df[[2]])`, etc., on up to `i=4`.

### Unknown output length
In each of the examples above we "pre-allocated" the `output` vector before running the `for` loop. Sometimes you may not know in advance how much output will be generated. For example, the following code draws three random numbers between 0 and 100, and for each number appends that many randomly normal entries to `output`:

In [13]:
output = NULL
for (column in df) {
  output = c(output, median(column))
}

output

## Down with `for` loops
We don't use for loops that often in R because of vectorization. For example, 
```{r}
# sum the numbers 1 to 100
output <- 0
v <- 1:100
for (i in v) {
    output <- output + i
}
```
is exactly equivalent to
```{r}
output = sum(1:100)
```
The latter is both faster and more concise.

### Exercise
Eliminate the for loops in each of the following commands by using functions that work with vectors.

#### Function 1
```{r}
# Function 1
out <- ""
for (x in letters) {
  out <- stringr::str_c(out, x)
}
```

#### Function 2
```{r}
x <- sample(100)
sd <- 0
for (i in seq_along(x)) {
  sd <- sd + (x[i] - mean(x)) ^ 2
}
sd <- sqrt(sd / (length(x) - 1))
```

#### Function 3
```{r}
x <- runif(100)
out <- vector("numeric", length(x))
out[1] <- x[1]
for (i in 2:length(x)) {
  out[i] <- out[i - 1] + x[i]
}
```

### While loops
In some cases you don't even know how long is the sequence over which you are iterating. Here it is not possible to use a `for` loop; instead you must use a `while` loop:
```{r}
while (<condition>) {
    <body>
}
```
The `while` loop will continue running until `<condition>` returns `FALSE`.

Here's an example of how we would use a `while` loop. The following command counts the number of heads and tails encountered in tosses of a fair coin until the third head is encountered:


`while` loops are used mainly in random simulations. They don't come up a lot in data analysis. Still, it's useful to know about them.