# Lab 12 (4/6): Functions

### Web pages
Course page: https://ambujtewari.github.io/teaching/STATS306-Winter2020/

Lab page: https://rogerfan.github.io/stats306_w20/

### Office Hours
    Mondays: 2-4pm, USB 2165
    
### Contact
    Questions on problems: Use the slack discussions
    If you need to email me, include in the subject line: [STATS 306]
    Email: rogerfan@umich.edu

In [None]:
library(tidyverse)
library(nycflights13)

In this lecture we will learn about functions. We already have plenty of experience calling functions in order to mutate, plot, and explore data. Now we will learn how and when to write our own functions.

## Functions

Recall that we can write functions when we find ourselves repeating the same operations multiple times.

In [None]:
df = tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

print(df)

In [None]:
df2 = df %>% 
    mutate(a = (a-min(a))/(max(a)-min(a)),
           b = (b-min(b))/(max(b)-min(b)),
           c = (c-min(c))/(max(c)-min(c)),
           d = (d-min(d))/(max(d)-min(d)))
 
print(df2)

In [None]:
rescale01 = function(x) {
    (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

df2 = df %>%
    mutate(a=rescale01(a), b=rescale01(b), c=rescale01(c), d=rescale01(d))

print(df2)

### Function definition syntax

## Conditional execution
Often when writing functions we need to do different things depending on what data is passed in. This is known as *conditional execution*, and is accomplished using the `if/else` construct:
```{r}
if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
```

### Example
The *Heaviside step function* is defined as
$$H(x)=\begin{cases}0,&x\le 0\\
1,&\text{otherwise}
\end{cases}.$$
How can we code this as an R function?

In [None]:
H = function(x) {
    if (x <= 0) {
        return(0)
    } else {
        return(1)
    }
}

H(-1)
H(0)
H(1)

H(c(-1, 0, 1))

In [None]:
H2 = function(x) {
    ifelse(x <= 0, 0, 1)
}

H2(c(-1, 0, 1))
H2(0)

### Multiple conditions
Sometimes you will want to check multiple conditions using an `if` statement. For example, let's define the function $$\operatorname{sgn}(x) = \begin{cases}-1,&x<0\\0,&x=0\\1,&x>0.\end{cases}$$

In [None]:
sgn = function(x) {
    if (x < 0) {
        return(-1)
    } else if (x == 0) {
        return(0)
    } else {
        return(1)
    }
}

The general form is
```{r}
if (condition1) {
  # do something
} else if (condition2) {
  # do something else
} else {
  # do another thing
}
```

### Brackets
Both `function` and `if` are usually called using the curly bracket delimiters `{` and `}`. For one-line statements, the brackets are optional:

In [None]:
if (TRUE) { 
    print("A1") 
} else { 
    print("B") 
}

if (TRUE) print("A2") else print("B")

In [None]:
square = function(x) {
    return(x*x)
}

square = function(x) x*x

You should almost always use the curly braces. One exception is for very brief, unnamed functions. We'll see some examples of this next week when we study map/reduce computations.

## Function arguments
Functions can take multiple arguments. Generally they fall into one of two categories:
* *Data* to be processed by the function, and
* *Options*, which affect how the data gets processed.

```{r}
mean(x, na.rm=TRUE)
log(x, base=y)
str_c(..., sep=" ")
```
What is/are the data? What are the options?

### Rules for function arguments
Generally:
1. The *data* parameters should come first; and
2. The *options* should come second, and have sensible defaults.

Default parameter values are specified by the `option=default` notation:

In [None]:
mean_ci = function(x, conf=0.95) {
    se = sd(x) / sqrt(length(x))
    alpha = 1 - conf
    mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}

When you call a function, you can omit the values of the default arguments. If overriding the default, you should specify the parameter you are overriding and then input the overridden value with an ` = ` in between:
```{r}
mean_ci(c(1, 2, 3, 4))              # uses default value
mean_ci(c(1, 2, 3, 4), conf = .99)  # ok
mean_ci(c(1, 2, 3, 4), conf=.99)    # ok
mean_ci(c(1, 2, 3, 4), .99)         # dangerous
```

### Validation
When writing functions it's a good idea to *validate* the input -- that is, make sure it matches your assumptions about what is being passed to the function. Consider the following function which returns the weighted average of a vector:

In [None]:
w_mean = function(x, w) {
    sum(x * w) / sum(w)
}

This function relies implicitly on the fact that the weight vector `w` is the same length as the input vector `x`. If it's not, you'll get a warning and unexpected behavior.

In [None]:
w_mean(c(1,2,3), w=c(1, 2))

It's best to make the assumption of equal length explicit by checking it:

In [None]:
w_mean = function(x, w) {
    stopifnot(length(w) == length(x))
    (x * w) / sum(w)
}

Now:

In [None]:
w_mean(c(1,2,3), c(1, 2))

In [None]:
w_mean = function(x, w) {
    if (length(w) != length(x)) {
        stop('length of data and weight vector do not match')
    }
    (x * w) / sum(w)
}

w_mean(c(1,2,3), c(1, 2))

Adding comments is another good way to make sure that you don't encounter unexpected situations in your functions:

In [None]:
w_mean = function(x, w) {
    # Return the average of `x` weighted by weight vector `w`
    stopifnot(length(w) == length(x))
    (x * w) / sum(w)
}

###  Dot-dot-dot (`…`)
Some functions are designed to take a variable number of inputs. We saw this for example with the `str_c` function:

In [None]:
# stringr::str_c("a", "b", "c")

str_c('a', 'b', 'c', sep=' ')

To construct a function that takes a variable number of arguments we use the `...` notation:
```{r}
f = function(...) {
    <do something with variable arguments>
}
```

One thing you can do with the `...` is pass it to another function that takes a variable number of inputs.

In [None]:
commas <- function(...) {
    stringr::str_c(..., collapse = ", ")
}
commas(letters[1:10])

You can also access individual arguments in `...` using the `list(...)` notation. We'll learn more about lists in the next lecture.

In [None]:
test = function(...) {
    print(list(...))
}

test(3, 5, a=4, b=6)

## Return values
Thus far we have relied on the default behavior of R, which is to return the last value in the function:

In [None]:
f = function() {
    1
    2
    3  # this will be returned
}
f()

In [None]:
fact = function(n) {
    
    if (n == 1){
        return(1)
    }
    
    return(n * fact(n-1)) 
}

fact(5)

In more complicated functions you'll need to manually return values using the `return()` function. As an example:
```{r}
complicated_function <- function(x, y, z) {
  if (length(x) == 0 || length(y) == 0) {
      return(0)  # this immediately returns and halts the function.
  }
  # Complicated code here
  return(object)
}
```

### Pipeable functions
We've seen a lot of uses of the pipe operator `%>%`. As you become more advanced, you may find it useful to create your own functions which can be used in data pipelines. 

#### Transformations
For pipeable functions that transform a data frame, simply return the altered version of the data frame. For example:

In [None]:
first_row <- function(df, n=1) {
    df %>% slice(n)
}

mydf = tibble(x=c(1,2,3), y=c("a","b","c"))
print(mydf)

mydf %>% first_row()
mydf %>% first_row(3)

#### Side effects
Some functions have *side effects* but don't modify the original data frame. For example, consider the following function which counts how many missing values are present in a data frame:

In [None]:
show_missings = function(df) {
  n <- sum(is.na(df))
  cat("Missing values: ", n, "\n", sep = "")
  df  # note return value
}

This function works but has the undesirable effect of printing the whole data frame when it returns:

In [None]:
show_missings(mpg)

To correct this tidyverse has the `invisible()` function:

In [None]:
show_missings = function(df) {
  n <- sum(is.na(df))
  cat("Missing values: ", n, "\n", sep = "")
  invisible(df)  # return will not print out
}

Now we can run the command interactively, and also use it in pipelines:

In [None]:
show_missings(flights)

In [None]:
flights %>% filter(month < 5) %>% show_missings

## Environments
The environment is, roughly, the set of variables and data defined in your R session. The default environment is called the "global environment":

In [None]:
environment()

A function depends on the environment in which it was defined. In particular, if you reference a variable inside of a function, which is not *defined* in that function, R will look for it in the enclosing environment.

In [None]:
f = function(x) {
    x + y
}

x = c(1, 2, 3)
y = 3
f(x)

In [None]:
y = 4
f(x)

In [None]:
y = 4
x = 2
f(c(1, 2, 3))

In [None]:
f = function(x) {
    x = 2*x
    print(x)
}

x = 5
f(x)
x  # x is unchanged in the global environment even though we changed it inside the function

You can assign values to variables in the enclosing environment using the `<<-` notation.

In [None]:
f = function(x) {
    x = 2*x
    x <<- x
    
    print(x)
}

x = 5
f(x)
x  # x is now changed in the global environment since we used <<-

In [None]:
f = function(x) {
    y <<- x
    print(y)
}

y = 10
f(5)
y

### Challenge problem
Define a function `drop_even()` which drops all the even-numbered rows from a data frame:
```{r}
> tibble(x=c(1,2,3), y=c("a","b","c")) %>% drop_even
# A tibble: 2 x 2
      x y    
  <int> <chr>
1     1 a    
2     3 c    
```
Hint: Remember the modulo operator `%%` is very useful for checking for divisibility. Also you can use the function `row_number()` inside of a mutate to easily access the row numbers.

In [None]:
drop_even = ...

tibble(x=c(1,2,3), y=c("a","b","c")) %>% drop_even

### Challenge problem
Write a function `howmany()` which prints the number of times that it has been called:
```
> howmany()
[1] 1
> howmany()
[1] 2
> howmany()
[1] 3
```
Hint: You can use `exists('counter')` to check if the variable `counter` exists. Remember that if R can't find a variable in the current environment, it will also check the enclosing environment.


In [None]:
howmany = ...

howmany()
howmany()
howmany()


Harder: write a function `same(x)` which prints "yes!" if `x` is the same value that was passed into `same()` on the previous call, and "no!" otherwise. (`same()` always prints "no!" to start):
```{r}
> same(1)
[1] "no!"
> same(1)
[1] "yes!"
> same(1)
[1] "yes!"
> same("hello")
[1] "no!"
```

In [None]:
same = ...

same(1)
same(1)
same(1)
same('hello')