In [1]:
options(jupyter.rich_display = FALSE)

# Week 9: Fundamentals of R Programming II

## POP77001 Computer Programming for Social Scientists

### Tom Paskhalis

##### 8 November 2021

##### Module website: [bit.ly/POP77001](https://bit.ly/POP77001)

## Overview

- Conditional statements
- Loops and Iteration
- Function definition and function call
- Scoping in R
- Packages

<div style="text-align: center;">
    <img width="400" height="300" src="../imgs/winnie_the_pooh_assign.png">
</div>

## Algorithm flowchart

<div style="text-align: center;">
    <img width="400" height="300" src="../imgs/median_flowchart.png">
</div>

## Algorithm flowchart (R)

<div style="text-align: center;">
    <img width="400" height="300" src="../imgs/median_r_flowchart.png">
</div>

## Calculate median

In [2]:
a <- c(1,0,2,1) # Input vector (1-dimensional array)
a <- sort(a) # Sort vector
a

[1] 0 1 1 2

In [3]:
n <- length(a) # Calculate length of vector 'a'
n

[1] 4

In [4]:
m <- (n + 1) %/% 2 # Calculate mid-point, %/% is operator for integer division 
m

[1] 2

In [5]:
n %% 2 == 1 # Check whether the number of elements is odd, %% (modulo) gives remainder of division

[1] FALSE

In [6]:
mean(a[m:m+1])

[1] 1

## Control flow in R

- *Control flow* is the order in which statements are executed or evaluated
- Main ways of control flow in Python:
    - *Branching* (conditional) statements (e.g. `if`)
    - *Iteration* (*loops*) (e.g. `for`)
    - *Function calls* (e.g. `length()`)
    
Extra: [R documentation on control flow](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Control.html)

## Branching programs

<div style="text-align: center;">
    <img width="400" height="400" src="../imgs/branching_r_flowchart.png">
</div> 

## Conditional statements

<div style="text-align: center;">
    <img width="400" height="400" src="../imgs/branching_flowchart.png">
</div> 

## Conditional statements: `if`

- `if` - defines condition under which some code is executed 

```
if (<boolean_expression>) {
  <some_code>
}
```

In [7]:
a <- c(1, 0, 2, 1, 100)
a <- sort(a)
n <- length(a)
m <- (n + 1) %/% 2
if (n %% 2 == 1) {
  a[m]
}

[1] 1

## Conditional statements: `if - else`

- `if - else` - defines both condition under which some code is executed and alternative code to execute

```
if (<boolean_expression>) {
  <some_code>
} else {
  <some_other_code>
}
```

In [8]:
a <- c(1, 0, 2, 1)
a <- sort(a)
n <- length(a)
m <- (n + 1) %/% 2
if (n %% 2 == 1) {
  a[m]
} else {
  mean(a[m:m+1])
}

[1] 1

## Conditional statements: `if - else if - else`

- `if - else if - ... - else` - defines both condition under which some code is executed and several alternatives

```
if (<boolean_expression>) {
  <some_code>
} else if (<boolean_expression>) {
  <some_other_code>
} else if (<boolean_expression>) {
...
...
} else {
  <some_more_code>
}
```

## Example of longer conditional statement

In [9]:
x <- 42
if (x > 0) {
  print("Positive")
} else if (x < 0) {
  print("Negative")
} else {
  print("Zero")
}

[1] "Positive"


## Optimising conditional statements

- Parts of conditional statement are evaluated sequentially, so it makes sense to put the most likely condition as the first one

In [10]:
# Ask for user input and cast as double
num <- as.double(readline("Please, enter a number:"))
if (num %% 2 == 0) {
  print("Even")
} else if (num %% 2 == 1) {
  print("Odd")
} else {
  print("This is a real number")
}

Please, enter a number:43
[1] "Odd"


## Nesting conditional statements

- Conditional statements can be nested within each other
- But consider code legibility 📜, modularity ⚙️ and speed 🏎️ 

In [11]:
num <- as.integer(readline("Please, enter a number:")) # Ask for user input and cast as integer
if (num > 0) {
  if (num %% 2 == 0) {
    print("Positive even")
  } else {
    print("Positive odd")  
  }    
} else if (num < 0) {
  if (num %% 2 == 0) {
    print("Negative even") # Notice that odd/even check appears twice    
  } else {
    print("Negative odd") # Consider abstracting this as a function 
  }
} else {
  print("Zero")  
}

Please, enter a number:-43
[1] "Negative odd"


## `ifelse()` function

- R also provides a vectorized version of `if - else` construct
- It takes a vector as an input and returns another vector as an output

```
ifelse(<boolean_expression>, <if_true>, <if_false>)
```

In [12]:
num <- 1:10
num

 [1]  1  2  3  4  5  6  7  8  9 10

In [13]:
ifelse(num %% 2 == 0, "even", "odd")

 [1] "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even" "odd"  "even"

## Iteration (looping)

<div style="text-align: center;">
    <img width="300" height="300" src="../imgs/iteration_flowchart.png">
</div> 

## Iteration: `while`

- `while` - defines a condition under which some code (loop body) is executed repeatedly

```
while (<boolean_expression>) {
  <some_code>
}
```

In [14]:
# Calculate a factorial  with decrementing function
# E.g. 5! = 1 * 2 * 3 * 4 * 5 = 120
x <- 5
factorial <- 1
while (x > 0) {
  factorial <- factorial * x
  x <- x - 1
}
factorial

[1] 120

## Iteration: `for`

- `for` - defines elements and sequence over which some code is executed iteratively

```
for (<element> in <sequence>) {
  <some_code>
}
```

In [15]:
test <- c("t", "e", "s", "t")
for (i in test) {
  # cat() function concatenates and prints objects' representations
  cat(paste0(i, "!"), "")
}

t! e! s! t! 

## Iteration with conditional statements

In [16]:
# Find maximum value in a vector with exhaustive enumeration
v <- c(3, 27, 9, 42, 10, 2, 5)
max_val <- NA
for (i in v) {
  if (is.na(max_val) | i > max_val) {
    max_val <- i
  }
}
max_val

[1] 42

## Generating sequences for iteration

- `seq()` function that we encountered in subsetting can be used in looping
- As well as its cousins: `seq_len()` and `seq_along()`

```
seq(<from>, <to>, <by>)
seq_len(<length>)
seq_along(<object>)
```

## Generating sequences for iteration examples

In [17]:
# If by argument is omitted, it defaults to 1
s <- seq(1, 20)
s

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

In [18]:
# seq_len() is equivalent to seq(1, length(<object>))
seq_len(length(s))

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

In [19]:
seq_along(s)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

In [20]:
# The sequence that you are supplying to seq_along() doesn't have to be numeric
seq_along(letters[1:20])

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

## Generating sequences for iteration examples continued

In [21]:
# vector() function is useful for initiliazing empty vectors of known type and length
s2 <- vector(mode = "double", length = length(s))
for (i in seq_len(length(s))) {
    s2[i] <- s[i] * 2
}

In [22]:
s2

 [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

In [23]:
s3 <- vector(mode = "double", length = length(s))
for (i in seq_along(s)) {
    s3[i] <- s[i] * 3
}

In [24]:
s3

 [1]  3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60

## Iteration: `break` and `next`

- `break` - terminates the loop in which it is contained
- `next` - exits the iteration of a loop in which it is contained

In [25]:
for (i in seq(1,6)) {
  if (i %% 2 == 0) {
    break
  }
  print(i)
}

[1] 1


In [26]:
for (i in seq(1,6)) {
  if (i %% 2 == 0) {
    next
  }
  print(i)
}

[1] 1
[1] 3
[1] 5


## Functions in R

- Function call is the centerpiece of computation in R
- It involves function object and objects that are supplied as arguments
- In R we use function `function()` to create a function object
- Functions are also referred to as *closures* in some R documentation

```
<function_name> <- function(arg_1, arg_2, ..., arg_n) {
  <function_body>
}
```

In [27]:
foo <- function(arg) {
  # <function_body>
}

## Function components

- Body (`body()`) - code inside the function
- List of arguments (`formals()`) - controls how function is called
- Environment/scope/namespace (`environment()`) - location of function's defintion and variables

## Function components example

In [28]:
is_positive <- function(num) {
  if (num > 0) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

In [29]:
body(is_positive)

{
    if (num > 0) {
        return(TRUE)
    }
    else {
        return(FALSE)
    }
}

In [30]:
formals(is_positive)

$num



In [31]:
environment(is_positive)

<environment: R_GlobalEnv>

## Function call

- Function is executed until:
    - Either `return()` function is encountered
    - There are no more expressions to evaluate
- Function call always returns a value:
    - Argument of `return()` function call
    - Value of last expression if no `return()` (implicit return)
- Function can return only one object
    - But you can combine multiple R objects in a list

## Function call example

In [32]:
is_positive <- function(num) {
  if (num > 0) {
    res <- TRUE
  } else {
    res <- FALSE
  }
  return(res)
}

In [33]:
res_1 <- is_positive(5)
res_2 <- is_positive(-7)

In [34]:
print(res_1)
print(res_2)

[1] TRUE
[1] FALSE


## Implicit return example

In [35]:
is_positive <- function(num) {
  if (num > 0) {
    res <- TRUE
  } else {
    res <- FALSE
  }
  res
}

In [36]:
res_1 <- is_positive(5)
res_2 <- is_positive(-7)

In [37]:
print(res_1)
print(res_2)

[1] TRUE
[1] FALSE


## Implicit return example continued

In [38]:
# While this function provides the same functionality as the two versions above
# This is an example of a bad programming style, return value is very unintuitive
is_positive <- function(num) {
  if (num > 0) {
    res <- TRUE
  } else {
    res <- FALSE
  }
}

In [39]:
res_1 <- is_positive(5)
res_2 <- is_positive(-7)

In [40]:
print(res_1)
print(res_2)

[1] TRUE
[1] FALSE


## Function arguments

- *Arguments* provide a way of giving input to a function
- Arguments in function definition are *formal arguments*
- Arguments in function invocations are *actual arguments*
- When a function is invoked (called) arguments are matched and bound to local variable names
- R matches arguments in 3 ways:
    1. by *exact name*
    2. by *partial name*
    3. by *position*
- It is a good idea to only use unnamed (positional) for the main (first one or two) arguments

## Function arguments example

In [41]:
format_date <- function(day, month, year, reverse = TRUE) {
  if (isTRUE(reverse)) {
    formatted <- paste(
      as.character(year), as.character(month), as.character(day), sep = "-"
    )
  } else {
    formatted <- paste(
      as.character(day), as.character(month), as.character(year), sep = "-"
    )
  }
  return(formatted)
}

In [42]:
format_date(4, 10, 2021)

[1] "2021-10-4"

In [43]:
format_date(y = 2021, m = 10, d = 4) # Technically correct, but rather unintuitive

[1] "2021-10-4"

In [44]:
format_date(y = 2021, m = 10, d = 4, FALSE) # Technically correct, but rather unintuitive

[1] "4-10-2021"

In [45]:
format_date(day = 4, month = 10, year = 2021, FALSE)

[1] "4-10-2021"

## Nested functions

In [46]:
which_integer <- function(num) {
  even_or_odd <- function(num) {
    if (num %% 2 == 0) {
      return("even")
    } else {
      return("odd")
    }
  }
  eo <- even_or_odd(num)
  if (num > 0) {
    return(paste0("positive ", eo))
  } else if (num < 0) {
    return(paste0("negative ", eo))
  } else {
    return("zero")
  }
}

In [47]:
which_integer(-43)

[1] "negative odd"

In [48]:
even_or_odd(-43)

ERROR: Error in even_or_odd(-43): could not find function "even_or_odd"


## R environment basics

- Variables (aka names) exist in an *environment* (aka namespace/scope in Python)
- The same R object can have different names
- Binding of objects to names (assignment) happens within a specific environment
- Most environments get created by function calls
- Approximate hierarchy of environments:
    - *Execution* environment of a function
    - *Global* environment of a script
    - *Package* environment of any loaded packages
    - *Base* environment of base R objects

## R environment example

In [49]:
x <- 42
# is equivalent to:
# Binding R object '42', double vector of length 1, to name 'x' in the global environment
assign("x", 42, envir = .GlobalEnv)
x

[1] 42

In [50]:
x <- 5
foo <- function() {
  x <- 12
  return(x)
}
y <- foo()
print(y)
print(x)

[1] 12
[1] 5


##  Every operation is a function call

<div style="text-align: center;">
    <img width="700" height="700" src="../imgs/rstats_function.png">
</div>


## Examples of operators as function calls

In [51]:
`+`(3, 2) # Equivalent to: 3 + 2

[1] 5

In [52]:
`<-`(x, c(10, 12, 14)) # x <- c(10, 12, 14)
x

[1] 10 12 14

In [53]:
`[`(x, 3) # x[3]

[1] 14

In [54]:
`>`(x, 10) # x > 10

[1] FALSE  TRUE  TRUE

## Anonymous functions

- While R has no special syntax for creating anonymous (aka lambda in Python) function
- Note that the result of `function()` does not have to be assigned to a variable
- Thus function `function()` can be easily incorporate into other function calls

In [55]:
add_five <- function() {
  return(function(x) x + 5)
}
af <- add_five()

In [56]:
af # 'af' is just a function, which is yet to be invoked (called)

function(x) x + 5
<environment: 0x5570df70f380>

In [57]:
af(10) # Here we call a function and supply 10 as an argument

[1] 15

In [58]:
# Due to vectorized functions in R this example is an obvious overkill (seq(10) ^ 2 would do just fine)
# but it shows a general approach when we might need to apply a non-vectorized functions
mapply(function(x) x ^ 2, seq(10))

 [1]   1   4   9  16  25  36  49  64  81 100

## Packages

- Program can access functionality of a package using `library()` function
- Every package has its own namespace (which can accessed with `::`)

```
library(<package_name>)
<package_name>::<object_name>
```

## Package loading example

In [59]:
# Package 'Matrix' is part of the standard R library and doesn't have to be installed separately
library("Matrix")

In [60]:
# While it is possible to just use function sparseVector() after loading the library,
# it is good practice to state explicitly which package the object is coming from.
sv <- Matrix::sparseVector(x = c(1, 2, 3), i = c(3, 6, 9), length = 10)

In [61]:
sv

sparse vector (nnz/length = 3/10) of class "dsparseVector"
 [1] . . 1 . . 2 . . 3 .

## Next

- Tutorial: Implementing conditional statements and functions
- Assignment 4: Due at 11:00 on Monday, 15th November (submission on Blackboard)
- Next week: Data wrangling in R