# Lecture 10.1: Writing Functions in R
<div style="border: 1px double black; padding: 10px; margin: 10px">

**After today's lecture you will understand:**
* how to write functions in R
</div>

This correpsonds to Chapter 19.1--19.6 of your book



    




In [19]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Functions

Often when programming we find ourselves repeating the same block of code with minor modifications. 

We start with a preliminary exercise where we standardize our vector to have mean zero and standard deviation one.  To achieve this, we can brute force our way through by centering our data first, and then divide by its standard deviation.  

In [3]:
x <- c(1,5,-11,20)
print(x)

[1]   1   5 -11  20


Now let say you have to perform this task again for another vector.  You can simply repeat the above calculations.  

Or, we could write a function in R to help us achieve what we want! 

Now let us try to create another function with a different purpose based on what we have learnt so far. Suppose we want to normalize each column of this tibble to be in $[0,1]$:

In [22]:
set.seed(1)
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
) %>% print

[38;5;246m# A tibble: 10 x 4[39m
        a       b       c       d
    [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m
[38;5;250m 1[39m -[31m0[39m[31m.[39m[31m626[39m  1.51    0.919   1.36  
[38;5;250m 2[39m  0.184  0.390   0.782  -[31m0[39m[31m.[39m[31m103[39m 
[38;5;250m 3[39m -[31m0[39m[31m.[39m[31m836[39m -[31m0[39m[31m.[39m[31m621[39m   0.074[4m6[24m  0.388 
[38;5;250m 4[39m  1.60  -[31m2[39m[31m.[39m[31m21[39m   -[31m1[39m[31m.[39m[31m99[39m   -[31m0[39m[31m.[39m[31m0[39m[31m53[4m8[24m[39m
[38;5;250m 5[39m  0.330  1.12    0.620  -[31m1[39m[31m.[39m[31m38[39m  
[38;5;250m 6[39m -[31m0[39m[31m.[39m[31m820[39m -[31m0[39m[31m.[39m[31m0[39m[31m44[4m9[24m[39m -[31m0[39m[31m.[39m[31m0[39m[31m56[4m1[24m[39m -[31m0[39m[31m.[39m[31m415[39m 
[38;5;250m 7[39m  0.487 -[31m0[39m[31m.[39m[31m0[39m[31m16[4m

To normalize we will need to subtract the minimum from each column and divide by its range:

This required a bunch of repetitive typing.  In situations like this we should again write a function to help us achieve what we want to do!

## Anatomy of a function
To write a function we should first think about the inputs and output. A function takes input(s), does something(s) to them, and then returns an output.

What are the input(s) and output of our normalize function?
```{r}
df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
```

Notice how we have taken our code and converted every instance of `df$a` to `x`, which is the name that we have assigned to our function argument.

Now that we have defined our function, we can replace our code with a nicer looking version:

In [29]:
set.seed(1)
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
) %>% print

[38;5;246m# A tibble: 10 x 4[39m
        a       b       c       d
    [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m   [3m[38;5;246m<dbl>[39m[23m
[38;5;250m 1[39m -[31m0[39m[31m.[39m[31m626[39m  1.51    0.919   1.36  
[38;5;250m 2[39m  0.184  0.390   0.782  -[31m0[39m[31m.[39m[31m103[39m 
[38;5;250m 3[39m -[31m0[39m[31m.[39m[31m836[39m -[31m0[39m[31m.[39m[31m621[39m   0.074[4m6[24m  0.388 
[38;5;250m 4[39m  1.60  -[31m2[39m[31m.[39m[31m21[39m   -[31m1[39m[31m.[39m[31m99[39m   -[31m0[39m[31m.[39m[31m0[39m[31m53[4m8[24m[39m
[38;5;250m 5[39m  0.330  1.12    0.620  -[31m1[39m[31m.[39m[31m38[39m  
[38;5;250m 6[39m -[31m0[39m[31m.[39m[31m820[39m -[31m0[39m[31m.[39m[31m0[39m[31m44[4m9[24m[39m -[31m0[39m[31m.[39m[31m0[39m[31m56[4m1[24m[39m -[31m0[39m[31m.[39m[31m415[39m 
[38;5;250m 7[39m  0.487 -[31m0[39m[31m.[39m[31m0[39m[31m16[4m

This is considerably simpler, but still has some repetition. Soon we will learn about iteration and ways to cut down further on repetition.

## Conditional execution
Often when writing functions we need to do different things depending on what data is passed in. This is known as *conditional execution*, and is accomplished using the `if/else` construct:
```{r}
if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
```

### Exercise
Write a function that inputs a number x and ouptut x/2 when x can be divided by 2, and output 3*x+1 otherwise.   

### Exercise
Write a function `fizzbuzz(x)` that prints "fizz" if x is less than three, and "buzz" otherwise.
```{r}
> fizzbuzz(3)
[1] "fizz"
> fizzbuzz(4)
[2] "buzz"
```

`if/else` and `ifelse()` are very different. `ifelse()` is a *function* that takes three vector arguments and returns a new vector. `if/else` tells R to conditionally execute code.

### Conditions
The `condition` part of the `if` statement must evaluate to either a single `TRUE` or `FALSE`. If it does not, you will get a warning:

(Why?) Similarly, a condition of `NA` will generate an error:
```{r}
> if (NA) { 1 }
Error in if (NA) {: missing value where TRUE/FALSE needed
Traceback:
```

#### Logical operators
Often you will need to combine multiple logical conditions in an `if` statement. To do this we have the `&&` and `||` operators, which take the logical `and` and `or`, respectively, of several logical conditions:

There is a subtle but important difference betwen the single and double versions of these operators. The single `&` performs entrywise `AND` over logical vectors:

#### Testing for equality
Be careful when testing for equality in conditionals. The `==` operator will return a *vector* of logicals. If you want to make sure that any/all entries of a vector are `TRUE`, use the `any()` or `all()` functions:

### RETURN Statement