# **Lab 10: Functions, vectors and lists**

In [1]:
library(tidyverse)

-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.5     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.4     [32mv[39m [34mdplyr  [39m 1.0.7
[32mv[39m [34mtidyr  [39m 1.1.3     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 2.0.1     [32mv[39m [34mforcats[39m 0.5.1

-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Anatomy of a function
To write a function we should first think about the inputs and output. A function takes input(s), does something(s) to them, and then returns an output.

What are the input(s) and output of our normalize function?

In [None]:
df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

In [5]:
rescale01 <- function(x) {
#  ^ function name   ^ function argument (input vector)
    (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
#   ^ function output
}
x = c(1:10, Inf)
rescale01(x)


We have turned up a bug in our function! But since the code now all lives in one place, we can fix the function once rather than having to chase down the bug every place that we copied and pasted the code.

In [6]:
rescale01 = function(x) {
  rng = range(x, na.rm = TRUE, finite = TRUE) # https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/range
  (x - rng[1]) / (rng[2] - rng[1])
}
rescale01(x)

In [9]:
x = c(1:10, Inf)
(rng = range(x, na.rm = TRUE, finite = TRUE)) # https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/range
(rng[2] - rng[1])
(x - rng[1]) / (rng[2] - rng[1])

## Conditions

The condition part of the if statement must evaluate to either a single TRUE or FALSE. If it does not, you will get a warning:

In [10]:
if (c(T, F)) { 1 }

"the condition has length > 1 and only the first element will be used"


Similarly, a condition of NA will generate an error:

In [11]:
if (NA) { 1 }

ERROR: Error in if (NA) {: missing value where TRUE/FALSE needed


### Logical operators

Often you will need to combine multiple logical conditions in an if statement. To do this we have the `&&` and `||` operators, which take the logical and and or, respectively, of several logical conditions:

In [12]:
TRUE && FALSE && TRUE

In [7]:
FALSE || TRUE || FALSE

There is a subtle but important difference betwen the single and double versions of these operators. The single `&` performs entrywise AND over logical vectors:

In [8]:
c(T, T, F) & c(F, T, F)

In contrast, the longer form (i.e., `&&`) evaluates left to right examining only the first element of each vector.

In [21]:
c(T, T, T) && c(F, T, F)

"short-circuiting": 

* R can stop evaluating as soon as it hits one false value, since this will cause the & to return false;
* R can also stop evaluating as soon as it hits one true value, sice this will cause the | to return true

In [11]:
f = function() { print("f called"); F }
g = function() { print("g called"); T }
f() && g()

g() && f()

[1] "f called"


[1] "g called"
[1] "f called"


In [22]:
f = function() { print("f called"); F }
g = function() { print("g called"); T }
g() || f()

f() || g()

[1] "g called"


[1] "f called"
[1] "g called"


The or operator works similarly:

In [12]:
g() || f()

f() || g()

[1] "g called"


[1] "f called"
[1] "g called"


### Testing for equality

Be careful when testing for equality in conditionals. The == operator will return a vector of logicals. If you want to make sure that any/all entries of a vector are TRUE, use the any() or all() functions:

In [23]:
v1 = c(1, 2, 3)
v2 = c(1, 1, 2)
if (v1 == v2) { print("Wrong!") }
if (all(v1 == v2)) { print("All!") }
if (any(v1 == v2)) { print("Any!") }

"the condition has length > 1 and only the first element will be used"


[1] "Wrong!"
[1] "Any!"


Also be wary of testing floating point numbers for equality:

In [14]:
2 == sqrt(2) ^ 2

In [33]:
sqrt(2) ^ 2 - 2

In [15]:
sqrt(2) ^ 2

If you need to do this, use the `near()` function instead:

In [36]:
near(2, sqrt(2) ^ 2) # https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/near

### Multiple conditions

Sometimes you will want to check multiple conditions using an if statement. For example, let's define the function:
$$
sign(x)=\begin{cases}
-1, x<0\\
0, x=0\\
1, x>1
\end{cases}$$

The general form is

```
if (this) {
   do that
} else if (that) {
   do something else
} else {
   
}
```

## Function arguments

Functions can take multiple arguments. Generally they fall into one of two categories:

*   Data to be processed by the function, and
*   Options, which affect how the data gets processed.


### Rules for function arguments

Generally:

*   The data parameters should come first; and
*   The options should come second, and have sensible defaults.

Default parameter values are specified by the option=default notation:

In [37]:
mean_ci <- function(x, conf = 0.95) {
  se <- sd(x) / sqrt(length(x))
  alpha <- 1 - conf
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}


When you call a function, you can omit the values of the default arguments. If overriding the default, you should specify the parameter you are overriding and then input the overridden value with an = in between:



In [41]:
mean_ci(c(1, 2, 3, 4))

In [43]:
mean_ci(c(1, 2, 3, 4), conf=.99) #yes
mean_ci(c(1, 2, 3, 4), .99)  # no

## Validation

When writing functions it's a good idea to validate the input -- that is, make sure it matches your assumptions about what is being passed to the function. Consider the following function which returns the weighted average of a vector:

In [44]:
w_mean = function(x, w) {
    (x * w) / sum(w)
}

This function relies implicitly on the fact that the weight vector `w` is the same length as the input vector `x`. If it's not, you'll get a warning and unexpected behavior.

In [45]:
w_mean(c(1,2,3), w=c(1, 2))

"longer object length is not a multiple of shorter object length"


In [49]:
(c(1,2,3) * c(1, 2))/ 3

"longer object length is not a multiple of shorter object length"



It's best to make the assumption of equal length explicit by checking it:

In [51]:
w_mean = function(x, w) {
    stopifnot(length(w) == length(x)) # https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/stopifnot||
    (x * w) / sum(w)
}

## ...

Some functions are designed to take a variable number of inputs. We saw this for example with the str_c function:

In [55]:
stringr::str_c("a", "b")
stringr::str_c("a", "b", "c", "d")

To construct a function that takes a variable number of arguments we use the `...` notation:

```
f = function(...) {
    <do something with variable arguments>
}

```
One thing you can do with the ... is pass it to another function:

In [56]:
commas <- function(...) stringr::str_c(..., collapse = ", ")
commas(letters[1:10])