# Lecture 10.3: Functional Programming and Some Exercises


In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Functional programming
R is a *functional programming language*, which means, loosely, that functions are treated just like any other data. In particular, they can be passed to other functions. As we will see, this means that most `for` loop type iterations can be replaced by cleaner, functional constructs.

### Example
In the following series of examples, we'll see how the need to write extensible code naturally leads to ideas from functional programming (FP). Above we've seen several examples of functions that apply the `mean` or `median` function to each column of a tibble:

In [2]:
set.seed(1)
df = tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
print(df)
median(df$a)
median(df$b)
median(df$c)
median(df$d)

[38;5;246m# A tibble: 10 x 4[39m
        a       b       c       d
    [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m
[38;5;250m 1[39m -[31m0[39m[31m.[39m[31m626[39m  1.51    0.919   1.36  
[38;5;250m 2[39m  0.184  0.390   0.782  -[31m0[39m[31m.[39m[31m103[39m 
[38;5;250m 3[39m -[31m0[39m[31m.[39m[31m836[39m -[31m0[39m[31m.[39m[31m621[39m   0.074[4m6[24m  0.388 
[38;5;250m 4[39m  1.60  -[31m2[39m[31m.[39m[31m21[39m   -[31m1[39m[31m.[39m[31m99[39m   -[31m0[39m[31m.[39m[31m0[39m[31m53[4m8[24m[39m
[38;5;250m 5[39m  0.330  1.12    0.620  -[31m1[39m[31m.[39m[31m38[39m  
[38;5;250m 6[39m -[31m0[39m[31m.[39m[31m820[39m -[31m0[39m[31m.[39m[31m0[39m[31m44[4m9[24m[39m -[31m0[39m[31m.[39m[31m0[39m[31m56[4m1[24m[39m -[31m0[39m[31m.[39m[31m415[39m 
[38;5;250m 7[39m  0.487 -[31m0[39m[31m.[39m[31m0[39m[31m16[4m

As we have already used this code (or a close variant) on several occasions, it makes sense to extract it out to a function:

In [4]:
col_median = function(df) {
  output = vector("double", length(df))
  for (i in seq_along(df)) {
    output[i] = median(df[[i]])
  }
  output
}

col_mean = function(df) {
  output = vector("double", length(df))
  for (i in seq_along(df)) {
    output[i] = mean(df[[i]])
  }
  output
}

col_sd = function(df) {
  output = vector("double", length(df))
  for (i in seq_along(df)) {
    output[i] = sd(df[[i]])
  }
  output
}



col_median(df)
col_mean(df)
col_sd(df)

The function `col_mean` could just as easily be used to compute the `median` or `rescale01` of each column. Indeed, we would only need to change a single function call in the body of the for loop:
```{r}
output[i] = mean(df[[i]])
```
So it makes sense to generalize `col_mean` to a new function that takes as parameters a data frame `df` as well as a function `f` to apply to each column:

In [8]:
col_summary <- function(df,fun){
    output = vector("double", length(df))
    for(i in seq_along(df)){
        output[i]  = fun(df[[i]])
    }
    output   
}

In [10]:
col_summary(df,median)
col_summary(df,mean)
col_summary(df,sd)

Notice how much more elegant and readable `df %>% col_summary(median)` is compared to
```{r}
output <- vector("double", length(df))
for (i in seq_along(df)) {
  output[[i]] <- median(df[[i]])
}
output
```
If you understand why the former is preferable, you understand the Zen of Functional Programming!

## `map` functions
The pattern of looping over a sequence, doing something to each element and saving the results turns out to be extremely common in data analysis. It even has a name: "map".

There is a set of functions in `tidyverse` designed to help you map over data as easily as possible:
- `map()` makes a list.
- `map_lgl()` makes a logical vector.
- `map_int()` makes an integer vector.
- `map_dbl()` makes a double vector.
- `map_chr()` makes a character vector.

In most cases we will be able to replace `for` loops with calls to these functions, leading to simpler and more readable code.

### Example
How would we write `col_summary` using one of the `map` functions?

In [12]:
map(df,median)
col_summary(df,median)

Compared to `col_summary`, the `map_` functions have a few advantages. One, we can forward additional arguments to the called function:

In [13]:
map(df,mean,na.rm=TRUE)

Two, variable names are preserved:

In [14]:
x <- list(a = 1, b = 2, c= c(2,3))

map_dbl(x,mean)

Three, the `map_` functions allow for some handy shortcuts in addition to taking actual function values. If you pass a *formula* instead of a function, R will convert every instance of `.` to the current list element:

In [18]:
map(x,function(x){1+x})
map(x,function(.) {1+.})
map(x,~1+.)

If you supply a string to a map function, R will extract the attribute with that name from each list element:

In [20]:
list(a = list(a=1,b=2), b = list(a=5,b=3), d = list(a=8, b=4)) %>% map("a")

Similarly, an integer will extract the value at that index for each list element:

In [22]:
list(a = list(a=1,b=2), b = list(a=5,b=3), d = list(a=8, b=4)) %>% map(2)

### Exercise
Create a function `str_rev(s)` which, given a string `s`, returns the reversed string. For example, 
```{r}
> str_rev("Hello my name is KM!")
[1] "!MK si eman ym olleH"

```

Hint: `str_split()` `str_c()` 

In [42]:
s <- "Hello my name is km!"
tmp <- str_split(s,pattern="")
unlist(tmp)
#rev(tmp)
#unlist(tmp)
#rev(unlist(tmp))
#str_c(rev(unlist(tmp)),collapse="")

In [33]:
str_rev <- function(s){
    
    s %>% str_split(pattern="") %>% unlist %>% rev %>% str_c(collapse="")
    
}
str_rev(s)

In [32]:
str_rev = function(string) {
    tmp <- str_split(string, pattern = "")
    str_c(rev(unlist(tmp)), collapse = "")
    }
str_rev(s)

### Exercise
Create a function `n_entries(v, x)` which, given a vector `v` and a scalar `x`, returns the number of times that `x` occurs in `v`.

In [30]:
number <- c(1,2,3,4,5,1,2,3,4,5,5,4,3)

In [46]:
number==1

In [70]:
n_entries <- function(v,x){
    
    # type one line of code here
    tmp <- sum(v == x)
    return(list(v=v,x=x,tmp=tmp))
    
}

n_entries(number,1)

In [45]:
n_entries <- function(v, x)
    {
    count = 0
    for (i in v) 
        {
        if (i==x) {count = count + 1}
    }
    return(count)
}

n_entries(number,3)

### Exercise
Create a function `standardize(df)` which, given a tibble `df`, returns a new data frame where each *numerical* column is standardized (has mean zero and variance one, accomplished by subtracting the column mean and dividing by its standard deviation). For example:
```{r}
> df <- tibble(x=1:3, y=4:6, z=c(0, 0, pi), a=c("a", "b", "c")) %>% print
# A tibble: 3 x 4
      x     y     z a    
  <int> <int> <dbl> <chr>
1     1     4  0    a    
2     2     5  0    b    
3     3     6  3.14 c  

> standardize(df)
# A tibble: 3 x 4
      x     y      z a    
  <dbl> <dbl>  <dbl> <chr>
1    -1    -1 -0.577 a    
2     0     0 -0.577 b    
3     1     1  1.15  c    
```

In [68]:
standardize2 <- function(df){
    
    for(i in seq_along(df)){
    
    if(is.numeric(df[[i]]) == TRUE){
        df[[i]] <- (df[[i]] - mean(df[[i]]))/sd(df[[i]])        
    }
        
    }   
    return(df)
    
}

In [69]:
standardize2(df)

x,y,z,a
<dbl>,<dbl>,<dbl>,<chr>
-1,-1,-0.5773503,a
0,0,-0.5773503,b
1,1,1.1547005,c


In [58]:
is.numeric("a")

In [51]:
standardize <- function(df){
    
    df %>% mutate_if(is.numeric, ~ (. - mean(.))/sd(.))
    
}

In [53]:
df <- tibble(x=1:3, y=4:6, z=c(0, 0, pi), a=c("a", "b", "c")) %>% print
standardize(df)

[38;5;246m# A tibble: 3 x 4[39m
      x     y     z a    
  [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<int>[39m[23m [3m[38;5;246m<dbl>[39m[23m [3m[38;5;246m<chr>[39m[23m
[38;5;250m1[39m     1     4  0    a    
[38;5;250m2[39m     2     5  0    b    
[38;5;250m3[39m     3     6  3.14 c    


x,y,z,a
<dbl>,<dbl>,<dbl>,<chr>
-1,-1,-0.5773503,a
0,0,-0.5773503,b
1,1,1.1547005,c
