# Function and Conditionals
---
## Function
One of the best ways to improve your code readability is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. Writing a function has three big advantages over using copy-and-paste:

1. You can give a function an evocative name that makes your code easier to understand.

2. As requirements change, you only need to update code in one place, instead of many.

3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).

General template of a function:

> `MyFunction <- function(arg1, arg2, ... ){
  statements
  return(object)
}`

Example: The following function adds a and b and return it:


In [1]:
AddTwoNums <- function(a, b) {
    return(a + b)
}

In [2]:
AddTwoNums(3, 2)

### When should you write a function?
You should consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?

In [3]:
df <- tibble::tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
df

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
  (max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
  (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
  (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
df

a,b,c,d
-0.82084275,-1.180065,-0.60090186,-0.57635299
0.44961916,1.8449456,-0.05813793,1.03059546
-1.68080415,-0.5327509,0.7227186,0.42242532
0.04726862,1.5193581,0.19005997,-0.45715189
-1.45272368,1.0775926,-0.78215212,2.29525121
-0.51696272,1.4535923,-0.11381506,0.66460878
-0.25118648,1.2243649,2.42639541,0.2749276
-0.41863268,0.1824934,4.04581635,-0.26502514
0.71747376,-0.6815385,-0.86611267,0.3493407
-0.47175534,0.5592928,0.61206784,-0.07518895


a,b,c,d
0.35857454,0.0,0.05399321,0.0
0.88831378,1.6396205,0.16449235,0.55959956
0.0,0.3508581,0.32346381,0.34781197
0.72054734,1.4631451,0.21502197,0.04151028
0.09510177,1.2236987,0.01709319,1.0
0.48528214,1.4274986,0.15315726,0.43214931
0.59610175,1.3032525,0.67030856,0.29644775
0.5262824,0.7385358,1.0,0.108416
1.0,0.270212,0.0,0.32236117
0.50413207,0.9427691,0.30093686,0.17452407


You might be able to puzzle out that this rescales each column to have a range from 0 to 1. But did you spot the mistake? I made an error when copying-and-pasting the code for `df$b`: I forgot to change an `a` to a `b`. Extracting repeated code out into a function is a good idea because it prevents you from making this type of mistake.

We are rescaling each column individually, we could write a function that just does that and we call it whenever we are rescaling a vector:

In [4]:
Rescale <- function(x) {
  min <- min(x, na.rm = TRUE)
  max <- max(x, na.rm = TRUE)
  (x - min) / (max - min)
}

In [5]:
Rescale(c(0, 50, 100))

We can simplify the original example now that we have a function:

In [6]:
df$a <- Rescale(df$a)
df$b <- Rescale(df$b)
df$c <- Rescale(df$c)
df$d <- Rescale(df$d)
df

a,b,c,d
0.35857454,0.0,0.05399321,0.0
0.88831378,1.0,0.16449235,0.55959956
0.0,0.2139874,0.32346381,0.34781197
0.72054734,0.8923681,0.21502197,0.04151028
0.09510177,0.7463305,0.01709319,1.0
0.48528214,0.8706275,0.15315726,0.43214931
0.59610175,0.7948501,0.67030856,0.29644775
0.5262824,0.450431,1.0,0.108416
1.0,0.1648016,0.0,0.32236117
0.50413207,0.5749923,0.30093686,0.17452407


### Practice
Compared to the original, this code is easier to understand and we've eliminated one class of copy-and-paste errors. There is still quite a bit of duplication since we're doing the same thing to multiple columns. 

In [7]:
f1 <- function(string, prefix) {
  substr(string, 1, nchar(prefix)) == prefix
}

In [8]:
# Your answer goes here

In [9]:
f2 <- function(x) {
  if (length(x) <= 1) return(NULL)
  x[-length(x)]
}

In [10]:
# Your answer goes here


In [11]:
f3 <- function(x, y) {
  rep(y, length.out = length(x))
}

In [12]:
# Your answer goes here

### Practice
Write a function that takes a `dataframe`, `x`, `y`, and color and returns a scatterplot with the given color.

In [13]:
# Your answer goes here

## Function arguments
Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values. We specify a default argument by giving it a default value in the function definition using `=`.

For example, lets modify our `AddTwoNums()` so that it adds 1 to `a` if `b` is not provided: 

In [14]:
AddTwoNums <- function(a, b = 1) {
    return(a + b)
}

In [15]:
AddTwoNums(8)

In [16]:
AddTwoNums(5, 10)  # It still does what we expect to do when both arguments are available

The default value should almost always be the most common value. Except for safety reasons.

### Choosing names
The names of the arguments are also important. R doesn't care, but the readers of your code (including future-you!) will. Generally you should prefer longer, more descriptive names, but there are a handful of very common, very short names. It's worth memorizing these:

`x`, `y`, `z`: vectors.

`w`: a vector of weights.

`df`: a data frame.

`i`, `j`: numeric indices (typically rows and columns).

`n`: length, or number of rows.

`p`: number of columns.

Otherwise, consider matching names of arguments in existing R functions. For example, use `na.rm` to determine if missing values should be removed.

---
## Conditionals
An `if` statement allows you to conditionally execute code. It looks like this:

> `if (condition) {
  code executed when condition is TRUE
} else {
  code executed when condition is FALSE
}`

In [17]:
condition = TRUE
if (condition) {
  print("Condition is TRUE")
} else {
  print("Condition is FALSE")
}

[1] "Condition is TRUE"


The condition must evaluate to either `TRUE` or `FALSE`.

You can use `||` (or) and `&&` (and) to combine multiple logical expressions.

You can chain multiple if statements together:
> `if (this) {
  do that
} else if (that) {
  do something else
} else {
  do something else 
}`

---
### Exercise 1
Write a greeting if statement that says "good morning", "good afternoon", or "good evening", depending on the time of day. (Hint: use lubridate's `now()` function to get the current time, and by `hour()` extract the hour of day).

In [18]:
# Your answer goes here

---
### Exercise 2
Implement an if statement. It receives an integer `number`. If our `number` is divisible by three, it prints "fizz". If it's divisible by five it print "buzz". If it's divisible by three and five, it prints "fizzbuzz". Otherwise, it returns the number.

(Hint: `x%%y` gives the remainder of `x` divided by `y`)

In [19]:
# Your answer goes here

## `cut()`
`cut` divides the range of `x` into intervals and codes the values in `x` according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.

For instance here we label a sample of 100 random numbers from a normal distribution:

In [20]:
z <- rnorm(100)
print(cut(z, breaks = -6:6))

  [1] (-1,0]  (0,1]   (-2,-1] (-2,-1] (-2,-1] (-1,0]  (0,1]   (-1,0]  (0,1]  
 [10] (-1,0]  (0,1]   (0,1]   (1,2]   (0,1]   (1,2]   (-1,0]  (0,1]   (-1,0] 
 [19] (-1,0]  (-2,-1] (0,1]   (-1,0]  (-1,0]  (0,1]   (-1,0]  (-1,0]  (-1,0] 
 [28] (-1,0]  (-1,0]  (0,1]   (1,2]   (-1,0]  (0,1]   (1,2]   (-1,0]  (0,1]  
 [37] (-1,0]  (0,1]   (0,1]   (0,1]   (-1,0]  (1,2]   (-1,0]  (-1,0]  (-1,0] 
 [46] (0,1]   (0,1]   (-2,-1] (-2,-1] (-2,-1] (0,1]   (-1,0]  (-2,-1] (-1,0] 
 [55] (0,1]   (0,1]   (0,1]   (0,1]   (-1,0]  (0,1]   (1,2]   (0,1]   (-1,0] 
 [64] (1,2]   (-1,0]  (1,2]   (-2,-1] (-1,0]  (-2,-1] (1,2]   (1,2]   (1,2]  
 [73] (0,1]   (0,1]   (0,1]   (0,1]   (-1,0]  (-2,-1] (-2,-1] (0,1]   (0,1]  
 [82] (1,2]   (1,2]   (-2,-1] (-1,0]  (0,1]   (-2,-1] (0,1]   (-2,-1] (-1,0] 
 [91] (0,1]   (0,1]   (1,2]   (0,1]   (-2,-1] (-2,-1] (-2,-1] (2,3]   (0,1]  
[100] (-1,0] 
12 Levels: (-6,-5] (-5,-4] (-4,-3] (-3,-2] (-2,-1] (-1,0] (0,1] (1,2] ... (5,6]


Let's summarize these bins for a 10,000 sample by `table()`:

In [21]:
Z <- rnorm(10000)
table(cut(Z, breaks = -6:6))


(-6,-5] (-5,-4] (-4,-3] (-3,-2] (-2,-1]  (-1,0]   (0,1]   (1,2]   (2,3]   (3,4] 
      0       0      15     198    1367    3424    3373    1388     221      14 
  (4,5]   (5,6] 
      0       0 

In [22]:
library(lubridate)


Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date



We could answer exercise 1 by `cut()`:

In [23]:
greeting <- cut(hour(now()), c(-1, 5, 12, 17, 24), right = TRUE,
                labels = c("Good Evening!", "Good Morning!", "Good Afternoon!", "Good Evening!"))
print(greeting)

[1] Good Evening!
Levels: Good Evening! Good Morning! Good Afternoon!


Question: what does `right = TRUE` do in the code above?

In [24]:
hour(now())

---
### Checking function input arguments
**Stop**
It's good practice to check important preconditions, and throw an error (with `stop()`), if they are not true:

For example we have this function that gives us `TRUE` if input is an even number and `FALSE` if it's an odd integer:

In [25]:
IsEven <- function(a) {
    if (a %% 2 == 0) {
        return(TRUE)
    } else {
        return(FALSE)
    }
}

In [26]:
IsEven(4)
IsEven(5)

Now what happens if we give a non-integer input?

In [27]:
IsEven(4.4)

4.4 is not an off number! In fact it's not an integer, so we shouldn't have done the test. Let's add a `stop()` and check first to see if the input is an integer:

In [28]:
IsEven <- function(a) {
    
    if (is.integer(a) == FALSE) {
    stop("a must be an integer!")
    }
    
    if (a %% 2 == 0) {
        return(TRUE)
    } else {
        return(FALSE)
    }
}

In [37]:
#IsEven(4.4)  # Should return an error now

ERROR: Error in IsEven(4.4): a must be an integer!


### Explicit return statements
The value returned by the function is usually the last statement it evaluates, but you can choose to return early by using `return()`. I think it's best to save the use of `return()` to signal that you can return early with a simpler solution. A common reason to do this is because the inputs are empty:

In [30]:
ComplicatedFunction <- function(x, y, z) {
  if (length(x) == 0 || length(y) == 0) {
    return(0)
  }
    
  # Complicated code here
}

Another reason is because you have a if statement with one complex block and one simple block

In [31]:
f <- function() {
  if (x) {
    # Do 
    # something
    # that
    # takes
    # many
    # lines
    # to
    # express
  } else {
    # return something short
  }
}

But if the first block is very long, by the time you get to the else, you've forgotten the condition. One way to rewrite it is to use an early return for the simple case:

In [32]:
f <- function() {
  if (!x) {
    return(something_short)
  }

  # Do 
  # something
  # that
  # takes
  # many
  # lines
  # to
  # express
}

## `ifelse()`

`ifelse(test_expression, yes, no)` returns a value with the same shape as `test_expression` which is filled with elements selected from either `yes` or `no` depending on whether the element of `test_expression` is `TRUE` or `FALSE`.

Example:

In [33]:
x <- c(6:-4)
ifelse(x >= 0, x, NA)

In [34]:
number = 4
ifelse(number %% 2 == 0, "even", "odd")

In [35]:
(a <- matrix(1:9, 3, 3))

0,1,2
1,4,7
2,5,8
3,6,9


In [36]:
ifelse(a %% 2 == 0, a, 0)

0,1,2
0,4,0
2,0,8
0,6,0
