<a class="anchor" id="jump_to_top"></a>
# Function and Conditionals
---

### Table of Contents
* [Functions](#Function)
* [Conditionals](#Conditionals)
* [cut()](#cut)
* [stop()](#stop)
* [return()](#return)
* [ifelse()](#ifelse)

In [1]:
# Loading packages
library(tidyverse)
library(lubridate)  # lubridate is not part of core tidyverse, so has to be loaded separately.

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date



<a class="anchor" id="Function"></a>
## Functions
One of the best ways to improve your code readability is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. Writing a function has three big advantages over using copy-and-paste:

1. You can give a function an evocative name that makes your code easier to understand.

2. As requirements change, you only need to update code in one place, instead of many.

3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).

General template of a function:

> `MyFunction <- function(arg1, arg2, ... ){
  statements
  return(object)
}`

Example: The following function adds a and b and return it:

In [2]:
AddFunction <- function(a, b) {
    return(a + b)
}

In [3]:
AddFunction(3, 2)

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>

### When should you write a function?
You should consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?

In [4]:
df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
df

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
  (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
  (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
  (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
df

a,b,c,d
-1.08830302,-0.4791421,0.8886581,-0.2712239
1.339821,-0.450475,-2.1733072,2.1558084
0.3262122,0.7925343,0.9989489,0.388962
0.98202921,-0.729786,0.3484742,2.3535507
-0.01806718,1.0606187,-2.9481675,0.41652
1.25528378,1.1488172,0.705749,0.5812109
0.89520676,-0.866519,0.1700609,0.4436826
0.27624727,-1.524631,0.5055564,-1.2241507
-2.24393034,-1.3414404,0.1111239,1.1011523
-1.34728132,0.3325674,1.1904059,-1.2329229


a,b,c,d
0.322463,0.9100569,0.9270889,0.268146128
1.0,0.9350105,0.1872288,0.944864428
0.7171654,2.017001,0.9537384,0.452222731
0.9001628,0.6918812,0.7965647,1.0
0.6210987,2.2503578,0.0,0.459906587
0.976411,2.3271311,0.8828928,0.505826603
0.8759361,0.5728606,0.7534549,0.46748022
0.7032233,0.0,0.8345204,0.002445908
0.0,0.1594602,0.739214,0.650799501
0.2501984,1.6166179,1.0,0.0


You might be able to puzzle out that this rescales each column to have a range from 0 to 1. But did you spot the mistake? I made an error when copying-and-pasting the code for `df$b`: I forgot to change an `a` to a `b`. Extracting repeated code out into a function is a good idea because it prevents you from making this type of mistake.

We are rescaling each column individually, we could write a function that just does that and we call it whenever we are rescaling a vector:

In [5]:
Rescale <- function(x) {
  # Rescales each column to a range from 0 to 1
  #
  # Args:
  #   x: the vector that is being rescaled.
  #
  # Returns:
  #   The new rescaled vector.

  min <- min(x, na.rm = TRUE)
  max <- max(x, na.rm = TRUE)
  (x - min) / (max - min)
}

In [6]:
x  <- c(0, 50, 100)
Rescale(x)

We can simplify the original example now that we have a function:

In [7]:
df$a <- Rescale(df$a)
df$b <- Rescale(df$b)
df$c <- Rescale(df$c)
df$d <- Rescale(df$d)
df

a,b,c,d
0.322463,0.39106386,0.9270889,0.268146128
1.0,0.40178675,0.1872288,0.944864428
0.7171654,0.86673284,0.9537384,0.452222731
0.9001628,0.29731081,0.7965647,1.0
0.6210987,0.96700947,0.0,0.459906587
0.976411,1.0,0.8828928,0.505826603
0.8759361,0.24616601,0.7534549,0.46748022
0.7032233,0.0,0.8345204,0.002445908
0.0,0.06852221,0.739214,0.650799501
0.2501984,0.69468277,1.0,0.0


Compared to the original, this code is easier to understand and we've eliminated one class of copy-and-paste errors. There is still quite a bit of duplication since we're doing the same thing to multiple columns. We'll learn how to eliminate that duplication in iteration.

### Practice
What do these functions do?

In [8]:
f1 <- function(string, prefix) {
  substr(string, 1, nchar(prefix)) == prefix
}

In [9]:
# Your answer goes here

In [10]:
f2 <- function(x) {
  if (length(x) <= 1) return(NULL)
  x[-length(x)]
}

In [11]:
# Your answer goes here

In [12]:
f3 <- function(x, y) {
  rep(y, length.out = length(x))
}

In [13]:
# Your answer goes here

### Practice
Write a function that takes a `dataframe`, `x`, `y`, and a selected color and returns a scatterplot with the given color.

In [14]:
# Your answer goes here

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>

## Function arguments
Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values. We specify a default argument by giving it a default value in the function definition using `=`.

For example, lets modify our `AddFunction()` so that it adds 1 to `a` if `b` is not provided: 

In [15]:
AddFunction <- function(a, b = 1) {
    return(a + b)
}

In [16]:
AddFunction(8)

In [17]:
AddFunction(5, 10)  # It still does what we expect to do when both arguments are available

The default value should almost always be the most common value. Except for safety reasons.

### Choosing names
The names of the arguments are also important. R doesn't care, but the readers of your code (including future-you!) will. Generally you should prefer longer, more descriptive names, but there are a handful of very common, very short names. It's worth memorizing these:

`x`, `y`, `z`: vectors.

`w`: a vector of weights.

`df`: a data frame.

`i`, `j`: numeric indices (typically rows and columns).

`n`: length, or number of rows.

`p`: number of columns.

Otherwise, consider matching names of arguments in existing R functions. For example, use `na.rm` to determine if missing values should be removed.

---
<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="Conditionals"></a>

## Conditionals
An `if` statement allows you to conditionally execute code. It looks like this:

> `if (condition) {
  code executed when condition is TRUE
} else {
  code executed when condition is FALSE
}`

In [18]:
condition = TRUE
if (condition) {
  print("Condition is TRUE")
} else {
  print("Condition is FALSE")
}

[1] "Condition is TRUE"


The condition must evaluate to either `TRUE` or `FALSE`.

You can use `||` (or) and `&&` (and) to combine multiple logical expressions.

You can chain multiple if statements together:
> `if (this) {
  do that
} else if (that) {
  do something else
} else {
  do something else 
}`

---
### Exercise 1
Write a greeting if statement that says "good morning", "good afternoon", or "good evening", depending on the time of day. (Hint: use lubridate's `now()` function to get the current time, and by `hour()` extract the hour of day).

In [19]:
# Your answer goes here

---
### Exercise 2
Implement an if statement: It receives an integer `number`. If our `number` is divisible by 3, it prints "fizz". If it's divisible by 5 it print "buzz". If it's divisible by 3 and 5, it prints "fizzbuzz". Otherwise, it returns the number.

(Hint: `x%%y` gives the remainder of `x` divided by `y`)

In [20]:
# Your answer goes here

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="cut"></a>

## cut()
`cut` divides the range of `x` into intervals and labels the values in `x` according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.

For instance here we label a sample of 100 random numbers from a normal distribution:

In [21]:
z <- rnorm(100)
print(cut(z, breaks = -6:6))

  [1] (-1,0]  (0,1]   (-1,0]  (1,2]   (0,1]   (-1,0]  (4,5]   (-2,-1] (1,2]  
 [10] (0,1]   (0,1]   (0,1]   (1,2]   (0,1]   (-1,0]  (-2,-1] (0,1]   (-3,-2]
 [19] (-1,0]  (0,1]   (-1,0]  (1,2]   (-1,0]  (-2,-1] (0,1]   (-2,-1] (1,2]  
 [28] (1,2]   (-2,-1] (1,2]   (0,1]   (-2,-1] (-1,0]  (-1,0]  (-2,-1] (1,2]  
 [37] (1,2]   (-1,0]  (0,1]   (-1,0]  (-1,0]  (-2,-1] (-3,-2] (0,1]   (0,1]  
 [46] (-2,-1] (-1,0]  (0,1]   (2,3]   (-1,0]  (0,1]   (-1,0]  (-1,0]  (1,2]  
 [55] (0,1]   (-1,0]  (0,1]   (-3,-2] (0,1]   (0,1]   (-1,0]  (0,1]   (2,3]  
 [64] (0,1]   (-2,-1] (-2,-1] (-1,0]  (-3,-2] (-1,0]  (-2,-1] (0,1]   (2,3]  
 [73] (1,2]   (0,1]   (-1,0]  (1,2]   (-2,-1] (-1,0]  (0,1]   (1,2]   (-1,0] 
 [82] (-1,0]  (-2,-1] (0,1]   (-1,0]  (-2,-1] (-1,0]  (0,1]   (1,2]   (0,1]  
 [91] (-1,0]  (0,1]   (1,2]   (-2,-1] (0,1]   (-1,0]  (-2,-1] (-1,0]  (0,1]  
[100] (-1,0] 
12 Levels: (-6,-5] (-5,-4] (-4,-3] (-3,-2] (-2,-1] (-1,0] (0,1] (1,2] ... (5,6]


Let's summarize these bins for a 10,000 sample by `table()`:

In [22]:
Z <- rnorm(10000)
table(cut(Z, breaks = -6:6))


(-6,-5] (-5,-4] (-4,-3] (-3,-2] (-2,-1]  (-1,0]   (0,1]   (1,2]   (2,3]   (3,4] 
      0       0      13     220    1353    3372    3417    1419     186      20 
  (4,5]   (5,6] 
      0       0 

We could answer exercise 1 by `cut()`:

In [23]:
greeting <- cut(hour(now()), c(-1, 11, 17, 24), right = TRUE,
                labels = c("Good Morning!", "Good Afternoon!", "Good Evening!"))
print(greeting)

[1] Good Afternoon!
Levels: Good Morning! Good Afternoon! Good Evening!


Question: what does `right = TRUE` do in the code above?

In [24]:
# Answer goes here

---
<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="stop"></a>

### stop()
**Checking function input arguments**

It's good practice to check important preconditions, and throw an error (with `stop()`), if they are not true:

For example we have this function that gives us `TRUE` if input is an even number and `FALSE` if it's an odd integer:

In [25]:
IsEven <- function(a) {
    if (a %% 2 == 0) {
        return(TRUE)
    } else {
        return(FALSE)
    }
}

In [26]:
IsEven(4)
IsEven(5)

Now what happens if we give a non-integer input?

In [27]:
IsEven(4.4)

4.4 is not an off number! In fact it's not an integer, so we shouldn't have done the test. Let's add a `stop()` and check first to see if the input is an integer:

In [28]:
IsEven <- function(a) {
    
    if (a %% 1 != 0) {
    stop("a must be an integer!")
    }
    
    if (a %% 2 == 0) {
        return(TRUE)
    } else {
        return(FALSE)
    }
}

In [29]:
#IsEven(4.4)  # Should return an error now

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="return"></a>

### return()
** Explicit return statements**

The value returned by the function is usually the last statement it evaluates, but you can choose to return early by using `return()`. I think it's best to save the use of `return()` to signal that you can return early with a simpler solution. A common reason to do this is because the inputs are empty:

In [30]:
ComplicatedFunction <- function(x, y, z) {
  if (length(x) == 0 || length(y) == 0) {
    return(0)
  }
    
  # Complicated code here
}

Another reason is because you have an if statement with one complex block and one simple block

In [31]:
f <- function() {
  if (x) {
    # Do 
    # something
    # that
    # takes
    # many
    # lines
    # to
    # express
  } else {
    # return something short
  }
}

But if the first block is very long, by the time you get to the else, you've forgotten the condition. One way to rewrite it is to use an early return for the simple case:

In [32]:
f <- function() {
  if (!x) {
    return(something_short)
  }

  # Do 
  # something
  # that
  # takes
  # many
  # lines
  # to
  # express
}

<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>
<a class="anchor" id="ifelse"></a>

## ifelse()

`ifelse(test_expression, yes, no)` returns a value with the same shape as `test_expression` which is filled with elements selected from either `yes` or `no` depending on whether the element of `test_expression` is `TRUE` or `FALSE`.

Example:

In [33]:
x <- c(6:-4)
ifelse(x >= 0, x, NA)

In [34]:
number = 4
ifelse(number %% 2 == 0, "even", "odd")

In [35]:
(a <- matrix(1:9, 3, 3))

0,1,2
1,4,7
2,5,8
3,6,9


In [36]:
ifelse(a %% 2 == 0, a, 0)

0,1,2
0,4,0
2,0,8
0,6,0


<div style="text-align: right"> [[Jump to top]](#jump_to_top) </div>

## Function documentation
Functions should contain a comments section immediately below the function definition line. These comments should consist of a one-sentence description of the function; a list of the function's arguments, denoted by `Args:`, with a description of each (including the data type); and a description of the return value, denoted by `Returns:`. The comments should be descriptive enough that a caller can use the function without reading any of the function's code. Example:

In [37]:
CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
  # Computes the sample covariance between two vectors.
  #
  # Args:
  #   x: One of two vectors whose sample covariance is to be calculated.
  #   y: The other vector. x and y must have the same length, greater than one,
  #      with no missing values.
  #   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
  #
  # Returns:
  #   The sample covariance between x and y.
  
  n <- length(x)
    
  # Error handling
  if (n <= 1 || n != length(y)) {
    stop("Arguments x and y have different lengths: ",
         length(x), " and ", length(y), ".")
  }
  if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
    stop(" Arguments x and y must not have missing values.")
  }
  covariance <- var(x, y)
  if (verbose)
    cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
  return(covariance)
}