# Conditionals & Control Flow

### Equality
The most basic form of comparison is equality. <br> Let's briefly recap its syntax. The following statements all evaluate to TRUE.
```
3 == (2 + 1)
"intermediate" != "r"
TRUE != FALSE
"Rchitect" != "rchitect"
```
Notice from the last expression that R is case sensitive: "R" is not equal to "r". 

In [1]:
# Comparison of logicals
TRUE==FALSE

# Comparison of numerics
-6*14!=17-101

# Comparison of character strings
"useR"=="user"

# Compare a logical with a numeric
TRUE==1

### Greater and less than
Apart from equality operators, R has the less than and greater than operators: < and >. You can also add an equal sign to express less than or equal to or greater than or equal to, respectively. <br>Have a look at the following R expressions, that all evaluate to FALSE:
```
(1 + 2) > 4
"dog" < "Cats"
TRUE <= FALSE```
Remember that for string comparison, R determines the greater than relationship based on alphabetical order. Also, keep in mind that TRUE is treated as 1 for arithmetic, and FALSE is treated as 0. Therefore, FALSE < TRUE is TRUE.

In [2]:
# Comparison of numerics
-6*5+2>=-10+1

# Comparison of character strings
"raining"<="raining dogs"

# Comparison of logicals
TRUE>FALSE

In [3]:
# Comparing Vectors
linkedin <- c(16, 9, 13, 5, 2, 17, 14)

linkedin>15

linkedin<=5

In [4]:
# Comparing Matrices
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)
views <- matrix(c(linkedin, facebook), nrow = 2, byrow = TRUE)

# When does views equal 13?
views == 13

# When is views less than or equal to 14?
views <= 14 

0,1,2,3,4,5,6
False,False,True,False,False,False,False
False,False,False,False,False,True,False


0,1,2,3,4,5,6
False,True,True,True,True,False,True
False,True,True,False,True,True,True


### & and |
All of them will evaluate to TRUE:
```
TRUE & TRUE
FALSE | TRUE
5 <= 5 & 2 < 3
3 < 4 | 7 < 6```
Watch out: 3 < x < 7 to check if x is between 3 and 7 will not work; you'll need 3 < x & x < 7 for that.

### !
The following all evaluate to FALSE:
```
!TRUE
!(5 > 3)
!!FALSE```

In [5]:
x <- 5
y <- 7
!(!(x < 4) & !!!(y > 12))

### The if statement
```
if (condition) {
  expr
}```

In [6]:
medium <- "LinkedIn"
num_views <- 14

if (medium == "LinkedIn") {
  print("Showing LinkedIn information")
}

if (num_views > 15) {
  print("You're popular!")
}

[1] "Showing LinkedIn information"


### else & if else
```
if (condition1) {
  expr1
} else if (condition2) {
  expr2
} else if (condition3) {
  expr3
} else {
  expr4
}```

In [7]:
medium <- "LinkedIn"
num_views <- 14

# Control structure for medium
if (medium == "LinkedIn") {
  print("Showing LinkedIn information")
} else if (medium == "Facebook") {
  print("Showing Facebook information")
} else {
  print("Unknown medium")
}

# Control structure for num_views
if (num_views > 15) {
  print("You're popular!")
} else if (num_views <= 15 & num_views > 10) {
print("Your number of views is average")
} else {
  print("Try to be more visible!")
}

[1] "Showing LinkedIn information"
[1] "Your number of views is average"


# Loops

### While Loop
```
while (condition) {
  expr
}```

In [8]:
speed <- 88
while (speed > 30) {
  print(paste("Your speed is", speed))
  # Break the while loop when speed exceeds 90
  if (speed>90) {
  break    
  }
  speed = speed + 3
}

[1] "Your speed is 88"
[1] "Your speed is 91"


### For Loop

For using the for loop, there are two ways.<br>
Consider the following loops that are equivalent in R:

In [9]:
primes <- c(2, 3, 5, 7, 11, 13)

# loop version 1
for (p in primes) {
  print(p)
}

# loop version 2
for (i in 1:length(primes)) {
  print(primes[i])
}

[1] 2
[1] 3
[1] 5
[1] 7
[1] 11
[1] 13
[1] 2
[1] 3
[1] 5
[1] 7
[1] 11
[1] 13


### next statement is used in place if continue in R.

### strsplit() is used in place of split()

In [10]:
rquote <- "rs internals are irrefutably intriguing"
strsplit(rquote, split = "")

## Functions

#### Function documentation
Before even thinking of using an R function, you should clarify which arguments it expects. All the relevant details such as a description, usage, and arguments can be found in the documentation. <br>To consult the documentation on the sample() function, for example, you can use one of following R commands:
```
help(sample)
?sample```

A quick hack to see the arguments of the sample() function is the args() function.
```
args(sample)```

In [11]:
?mean

args(mean)

### Write your own function
```
my_fun <- function(arg1, arg2) {
  body
}```

Notice that this recipe uses the assignment operator (<-) just as if you were assigning a vector to a variable for example.<br> 
This is not a coincidence. Creating a function in R basically is the assignment of a function object to a variable! 

In [12]:
# A function pow_two
pow_two <- function(x){
  return (x*x)
}

# Use the function
pow_two(12)
pow_two(22)

# Another function sum_abs()
sum_abs <- function(a,b){
 return (abs(a)+abs(b)) 
  
}

# Use the function
sum_abs(-2,3)
sum_abs(-5,-8)

### R passes arguments by value
What does this mean? Simply put, it means that an R function cannot change the variable that you input to that function. <br>Let's look at a simple example:
```
triple <- function(x) {
  x <- 3*x
  x
}
a <- 5
triple(a)
a```

Inside the triple() function, the argument x gets overwritten with its value times three. <br>Afterwards this new x is returned. If you call this function with a variable a set equal to 5, you obtain 15. But did the value of a change? <br>If R were to pass a to triple() by reference, the override of the x inside the function would ripple through to the variable a, outside the function. However, R passes by value, so the R objects you pass to a function can never change unless you do an explicit assignment. <br>a remains equal to 5, even after calling triple(a).

### Load an R Package
There are basically two extremely important functions when it comes down to R packages:

- <b>install.packages()</b>, which installs a given package.
- <b>library()</b> which loads packages, i.e. attaches them to the search list on your R workspace.

To install packages, you need administrator privileges.

#### Different ways to load a package

Have a look at some more code chunks that (attempt to) load one or more packages:

- Chunk 1
```
library(data.table)
require(rjson)
```

- Chunk 2
```
library("data.table")
require(rjson)
```

- Chunk 3
```
library(data.table)
require(rjson, character.only = TRUE)
```

- Chunk 4
```
library(c("data.table", "rjson"))
```

## lapply

#### Use lapply with a built-in R function
```
lapply(X, FUN, ...)
```
To put it generally, lapply takes a vector or list X, and applies the function FUN to each of its members. <br>If FUN requires additional arguments, you pass them after you've specified X and FUN (...). <br>The output of lapply() is a list, the same length as X, where each element is the result of applying FUN on the corresponding element of X.

In [13]:
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")

# Split names from birth year
split_math <- strsplit(pioneers, split = ":")

# Convert to lowercase strings: split_low
split_low <- lapply(split_math,tolower)

# Take a look at the structure of split_low
str(split_low)

List of 4
 $ : chr [1:2] "gauss" "1777"
 $ : chr [1:2] "bayes" "1702"
 $ : chr [1:2] "pascal" "1623"
 $ : chr [1:2] "pearson" "1857"


## lapply and anonymous functions
#### Named function
```triple <- function(x) { 3 * x }```

#### Anonymous function with same implementation
```function(x) { 3 * x }```

#### Use anonymous function inside lapply()
```lapply(list(1,2,3), function(x) { 3 * x })```

## sapply

sapply is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying simplify2array(). <br>
<b>sapply(x, f, simplify = FALSE, USE.NAMES = FALSE)</b> is the same as <b>lapply(x, f).

In [14]:
sapply(list(runif (10), runif (10)), 
       function(x) c(min = min(x), mean = mean(x), max = max(x)))
       
# runif generates random deviates between 0 & 1

0,1,2
min,0.02222888,0.1040401
mean,0.60625702,0.4879974
max,0.97673704,0.843687


## vapply
vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.<br>
```vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE)```

## Mathematical utilities

- <b>abs()</b>: Calculate the absolute value.
- <b>sum()</b>: Calculate the sum of all the values in a data structure.
- <b>mean()</b>: Calculate the arithmetic mean.
- <b>round()</b>: Round the values to 0 decimal places by default.

In [15]:
abs(-5)
sum(4,9,9,9,0)
mean(8,8,6)
round(7.24)

In [16]:
errors <- c(1.9, -2.6, 4.0, -9.5, -3.4, 7.3)

# Sum of absolute rounded values of errors
sum(round(abs(errors)))

## Data Utilities
R features a bunch of functions to juggle around with data structures::

- <b>seq()</b>: Generate sequences, by specifying the from, to, and by arguments.
- <b>rep()</b>: Replicate elements of vectors and lists.
- <b>sort()</b>: Sort a vector in ascending order. Works on numerics, but also on character strings and logicals.
- <b>rev()</b>: Reverse the elements in a data structures for which reversal is defined.
- <b>str()</b>: Display the structure of any R object.
- <b>append()</b>: Merge vectors or lists.
- <b>is.*()</b>: Check for the class of an R object.
- <b>as.*()</b>: Convert an R object from one class to another.
- <b>unlist()</b>: Flatten (possibly embedded) lists to produce a vector.

## Regular Expressions

### grepl & grep
In their most basic form, regular expressions can be used to see whether a pattern exists inside a character string or a vector of character strings. <br>
For this purpose, you can use:

- <b>grepl()</b>, which returns TRUE when a pattern is found in the corresponding character string.
- <b>grep()</b>, which returns a vector of indices of the character strings that contains the pattern.

Both functions need a pattern and an x argument, where pattern is the regular expression you want to match for, and the x argument is the character vector from which matches should be sought.

In [17]:
# The emails vector
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for "edu"
print(grepl(pattern="edu" , x=emails))

# Use grep() to match for "edu"
hits <- grep(pattern="edu" , x=emails)
hits

# Subset emails using hits
emails[hits]

[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE


### More on Regular Expressions
You can use the caret, <b>^</b>, and the dollar sign, <b>$ </b> to match the content located in the start and end of a string, respectively. <br>
This could take us one step closer to a correct pattern for matching only the ".edu" email addresses from our list of emails. But there's more that can be added to make the pattern more robust:

- <b>@</b>, because a valid email must contain an at-sign.
- <b>.*</b>, which matches any character (.) zero or more times (*). Both the dot and the asterisk are metacharacters. You can use them to match any character between the at-sign and the ".edu" portion of an email address.
- <b>\\.edu$</b>, to match the ".edu" part of the email at the end of the string. The \\ part escapes the dot: it tells R that you want to use the . as an actual character.

In [18]:
# The emails vector
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use grepl() to match for .edu addresses more robustly
grepl(pattern="@.*\\.edu$" , x=emails)

# Use grep() to match for .edu addresses more robustly, save result to hits
hits <- grep(pattern="@.*\\.edu$" , x=emails)

# Subset emails using hits
emails[hits]

### sub() & gsub()
In sub() and gsub(), you can specify a replacement argument. <br>
If inside the character vector x, the regular expression pattern is found, the matching element(s) will be replaced with replacement.
- sub() only replaces the first match
- gsub() replaces all matches.

In [19]:
# The emails vector
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "global@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")

# Use sub() to convert the email domains to jp.edu
sub(pattern = "@.*\\.edu$" , replacement = "@jp.edu" , x=emails)

### More on Regular Expression
- <b>.*</b>: A usual suspect! It can be read as "any character that is matched zero or more times".
- <b>\\s</b>: Match a space. The "s" is normally a character, escaping it (\\) makes it a metacharacter.
- <b>[0-9]+</b>: Match the numbers 0 to 9, at least once (+).
- <b>([0-9]+)</b>: The parentheses are used to make parts of the matching string available to define the replacement. 

The \\1 in the replacement argument of sub() gets set to the string that is captured by the regular expression [0-9]+.

In [20]:
awards <- c("Won 1 Oscar.",
  "Won 1 Oscar. Another 9 wins & 24 nominations.",
  "1 win and 2 nominations.",
  "2 wins & 3 nominations.",
  "Nominated for 2 Golden Globes. 1 more win & 2 nominations.",
  "4 wins & 1 nomination.")

sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)

## Date and Time

In R, dates are represented by <b>Date</b> objects, while times are represented by <b>POSIXct</b> objects.<br> Under the hood, however, these dates and times are simple numerical values. <br>Date objects store the number of days since the 1st of January in 1970. <br>POSIXct objects on the other hand, store the number of seconds since the 1st of January in 1970.

The 1st of January in 1970 is the common origin for representing times and dates in a wide range of programming languages. There is no particular reason for this; it is a simple convention. <br>Of course, it's also possible to create dates and times before 1970; the corresponding numerical values are simply <b>negative</b> in this case.

In [21]:
# Get the current date
today <- Sys.Date()
today 

# See what today looks like under the hood
unclass(today)

# Get the current time
now <- Sys.time()
now 

# See what now looks like under the hood
unclass(now
)

[1] "2019-05-22 13:48:50 UTC"

### Create and format dates
To create a Date object from a simple character string in R, you can use the <b>as.Date()</b> function.<br>
The character string has to obey a format that can be defined using a set of symbols (the examples correspond to 13 January, 1982):

- <b>%Y</b>: 4-digit year (1982)
- <b>%y</b>: 2-digit year (82)
- <b>%m</b>: 2-digit month (01)
- <b>%d</b>: 2-digit day of the month (13)
- <b>%A</b>: weekday (Wednesday)
- <b>%a</b>: abbreviated weekday (Wed)
- <b>%B</b>: month (January)
- <b>%b</b>: abbreviated month (Jan)

In [22]:
# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"

# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, format = "%b %d, '%y")
date2 <- as.Date(str2, format = "%Y-%m-%d")
date3 <- as.Date(str3, format = "%d/%B/%Y")

# Convert dates to formatted strings
format(date1, "%A")
format(date2, "%d")
format(date3, "%b %Y")

### Create and format times
Similar to working with dates, you can use <b>as.POSIXct()</b> to convert from a character string to a POSIXct object, and format() to convert from a POSIXct object to a character string.

- <b>%H</b>: hours as a decimal number (00-23)
- <b>%I</b>: hours as a decimal number (01-12)
- <b>%M</b>: minutes as a decimal number
- <b>%S</b>: seconds as a decimal number
- <b>%T</b>: shorthand notation for the typical format %H:%M:%S
- <b>%p</b>: AM/PM indicator
For a full list of conversion symbols, consult the <b>strptime</b> documentation using ```?strptime```

In [23]:
# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"

# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2 <- as.POSIXct(str2)

# Convert times to formatted strings
format(time1, "%M")
format(time2, "%I:%M %p")