# Introduction to R Programming: Loops, Conditionals, & Functions

Matthew D. Turner, PhD  
Georgia State University

Some rights reserved: [cc by-nc-sa](https://creativecommons.org/licenses/by-nc-sa/4.0/) See bottom of document for details.
***
# Demonstrations: Loops, Decisions, and Functions
The processes demonstrated here will allow R programs to:

+ Repeat things many times, changing exactly what is done each time
+ Make decisions based on the values of variables
+ Encapsulate blocks of code

We will look at each of these in turn using particularly simple examples.

## 0. ASIDE: `cat`
We will be looking at stuff printed to the screen and to do this we will need to use some commands for that. The main one is `cat`.

In [None]:
A <- 10          # Number
B <- "Hello"     # "String" (alphanumeric) variable
C <- 'there'     # Strings can be made with different quotes

In [None]:
cat(A, B, C)

In [None]:
cat(A, B, C)
cat(A, A, A)

Notice that when you do `cat` more than once, it just keeps adding to the same line.

In [None]:
cat(A, B, C, "\n")   # The string "\n" is a NEWLINE
cat(A, A, A)

There are several **special characters** that are used to format output nicely:

+ \\n means **newline**
+ \\t means **tab**, used for aligning columns

These special characters need to be quoted as strings to be used with `cat`.

In [None]:
cat(A, "\n")
cat(B, "\n")
cat(C, "\n")

By default `cat` separates values by one space; we can use other things. There is a parameter we can set called `sep`.

In [None]:
cat(A, B, C, sep = "\n")  # sep defaults to a space, but can be changed

In [None]:
cat(A, B, C, sep = ",")  # A good trick for making CSV's

In [None]:
cat(A, B, C, "\n", sep = "\t")  # "\t" is a tab
cat(A, A, A, "\n", sep = "\t")

## 1. Looping (Iteration)

### 1.1 Introduction to `for` Loops
Looping is just repeating instructions over and over again.

> Broadly speaking, there are two types of these special constructs or loops in modern programming languages. **Some loops execute for a prescribed number of times, as controlled by a counter or an index, incremented at each iteration cycle.** These are part of the for loop family.
>
>On the other hand, **some loops are based on the onset and verification of a logical condition. The condition is tested at the start or the end of the loop construct.** These variants belong to the while or repeat family of loops, respectively. ([From here](https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r))


[R supports both of these types of loops](https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r), but we will spend most effort today on the first type. See section 2.6 below for the second kind of loop.

In R, the first kind of loop is implemented as a **for loop**. 

```
for(variable in LIST) {
    do something here!
}
```

The name "for" is bad, we should have called it a "foreach." (Some languages do.) 

Note: By tradition the index variable in a loop is most often called i, j, or k; but this is not required, you can use any variable name you wish.

In [None]:
for(i in 1:5){
    cat("Do something here!\n")
}

While loops can do the same thing over and over again, the key to their usefulness is that they have a variable (`i` above) that changes each pass through the process and as this changes, it can be used to change other things, too.

In [None]:
for(i in 1:5){
    cat("Loop iteration:\t", i, "\n")
}

The key thing to understand about a loop is that the variable that is changing (`i`) is what you use to change what happens on each pass.

In [None]:
# Make a table of the first 10 numbers and their squares

for(i in 1:10) {
  cat(i, i*i, "\n", sep='\t')
}

In [None]:
numbers <- c(1, 3, 5, 7, 100)

for(i in numbers){
    cat(i, i*i, "\n", sep='\t')
}

We do not need to limit `i` to holding numbers, it can hold other sorts of things, like strings.

In [None]:
names <- c('condition1','condition2','condition3','condition4')

for(i in names){
    cat(i, "\n")
}

#### 1.1.1 ASIDE: `cat` (Again!)

In [None]:
# What if we wanted to use commas?

for(i in names){
    cat(i, sep = ",")
}

Note that `cat`'s `sep` value is only used to separate multiple things given to `cat` _at the same time_. It is not automatically placed after (or before) items in independent calls to `cat`. So if we want a list with commas, we need to do something like this:

In [None]:
for(i in names){
    cat(i, ",")
}

Annoyingly, however, `cat` **does** place the `sep` character between the value of `i` and the string `","`. So we need to set `sep` to the empty string `""` to essentially turn it off:

In [None]:
for(i in names){
    cat(i, ", ", sep = "")  # Note the 2 changes here!
}

Note that this trick adds an extra comma at the end of the list. While this is not pretty, it usually does not mess anything up. So if you use this method to send output to a CSV file, things will still work, even with the extra commas.

### 1.2 Two Statistical Examples Using Loops
Let's look at some statistical examples. As usual we will use the **data frame** `mtcars`.

We want to look and see how some other variables affect Miles per Gallon, `mpg`, _individually_.

#### Graphs
Sometimes we want to see how variables are related with graphs.

In [None]:
data(mtcars)

In [None]:
# Let's look at the first 4 variables in mtcars

summary(mtcars[1:4])
head(mtcars[1:4])
dim(mtcars[1:4])

An easy to get an overview plot is with `plot` applied to the entire data frame, but this can be hard to read!

In [None]:
plot(mtcars)

The only real response (y) variable in this data set is `mpg` so we really just want the first row of this plot. Since that is so small, we will need to make each of these plots individually. A `for` loop will help us here.

To work things out we will make the plot of `mpg ~ hp` or miles-per-gallon on horsepower.

First, we introduce a method to get variable names from `mtcars`. 

In [None]:
names(mtcars)

In [None]:
# names is a vector, and we can pick out column names

n <- names(mtcars)
n[1]
n[4]

In [None]:
# Note that we do not need to reassign names(mtcars) to a variable

names(mtcars)[4]

In [None]:
# Most basic plot (in HUMAN form!) of MPG on HP

options(repr.plot.width = 3, repr.plot.height = 3)  # Set the figure size 

plot(mtcars$hp, mtcars$mpg, pch = 16)

In [None]:
# By column number

head(mtcars[1:4])  # Reminder

plot(mtcars[,4], mtcars[,1], pch = 16)

We lost the names of our variables, but we know how to get them back.

In [None]:
plot(mtcars[,4], mtcars[,1], xlab = names(mtcars)[4], 
     ylab = names(mtcars)[1], pch = 16)

In [None]:
options(repr.plot.width = 6, repr.plot.height = 3)

par(mfrow=c(1,3))  # Tell R to make a 1 by 3 array of plots

# The only change below this is to convert "4" into "i"

for(i in 2:4){
    plot(mtcars[,i], mtcars[,1], xlab = names(mtcars)[i], 
         ylab = names(mtcars)[1], pch = 16)
}

In [None]:
# mtcars has how many variables in it?

dim(mtcars)      # Returns rows and columns
length(mtcars)   # For a data frame, returns the columns only (different for other objects)

In [None]:
# Show all possible plots for MPG in mtcars

options(repr.plot.width = 6, repr.plot.height = 12)  # Make this bigger!

par(mfrow=c(5,2))  # Tell R to make a 5 by 2 array of plots

# Only one change below:

for(i in 2:11){
    plot(mtcars[,i], mtcars[,1], xlab = names(mtcars)[i], 
         ylab = names(mtcars)[1], pch = 16)
}

Not pretty, but it is a quick way to repeat an analysis for each column in `mtcars`.

#### Slopes of Regression Lines
What about the slopes of the regression lines?

In [None]:
# Fit linear regression of MPG on HP (HUMAN form)

mod1 <- lm(mtcars$mpg ~ mtcars$hp)

coef(mod1)    # This function returns the intercept and slope

In [None]:
coef(mod1)[2]    # Slope for horsepower; coef is just like names above

Stop here for a moment and notice something very important: in the expression `coef(mod1)[2]` the `2` is picking out the second element of the list returned by `coef(mod1)`. This `2` refers to the slope which is **always** the second item in this list. In the code below, this `2` is not to be changed as `i` changes.

In [None]:
cat(names(mtcars)[4], coef(mod1)[2], "\n")

We can put all of this together using the numerical way to access columns.

In [None]:
mod1 <- lm(mtcars[,1] ~ mtcars[,4])          # Fit the model
cat(names(mtcars)[4], coef(mod1)[2], "\n")   # Print the slope

In [None]:
for(i in 2:11){
    mod1 <- lm(mtcars[,1] ~ mtcars[,i])
    cat(names(mtcars)[i], coef(mod1)[2], "\n", sep = "\t")
}

### 1.3 More Advanced: Loops Over Functions
The following example is a bit more advanced, but when you have more experience it shows you something **very** powerful. Study the documentation for R's `do.call` function (and tutorials online) for much more.

In [None]:
# ADVANCED: The do.call function can call a function by name.
#           However, the arguments passed to the function by
#           do.call **must** be an R list! This is not a problem
#           as most things can be made into a list by the "list"
#           function. Read ?do.call for details.

what2do <- c("mean", "median", "var", "sd")

x <- c(1,23,45,67,88,100,101)

for(i in what2do){
    cat(i, do.call(i, list(x)), "\n", sep="\t")  # Note the list function
}

One thing to notice here is that loops can be made to work over data sets, over variables in data frames, over files, over lists, and finally over lists of functions. This is a single abstraction that allows repition over almost any constructs in R.

## 2. Conditionals and Decisions

### 2.1 Introduction
We need to allow R to make decisions based on the values of variables. We have been doing this in a simple way already. The main construct here is:

```
if(x == A){
    when comparison is TRUE do something
} else {
    when it is FALSE do a different thing
}
```

To make this work we need to know how to use R's comparison operators.

### 2.2 Comparisons
Remember the comparisons (`<`, `>`, `<=`, `>=`, `==`, and `!=`) from last time:

In [None]:
1 == 1     # Comparison uses == NOT = 

In [None]:
1 == 2     
1 != 2     # != means "is NOT equal to"

In [None]:
1 < 100

In [None]:
1 > 100

### 2.3 `if`-`else` Structures
These comparisons are used in conditionals (if-else) statements.

In [None]:
if(1 == 1){
    cat("one equals one")
} else {
    cat("not so much")
}

In [None]:
x <- 33

if(x == 1){
    cat("x is equal to 1")
} else {
    cat("x in NOT equal to 1")
}

In [None]:
# If we don't want to do anything when the test fails, we can skip the ELSE

x <- 33

if(x == 1){
    cat("x is equal to 1")
}

### 2.4 `if` - `else` `if` - `else` Structures
We can also chain things together to make several comparisons. This is called a if-else-if. We literally just place the next `if` statement after the first else.

In math, there is a thing called the **Law of Trichotomy** which says that all (real) numbers are either **negative**, **positive**, or equal to **zero**. 

We can write this as an if-else if-else structure.

In [None]:
x <- 0

if (x < 0) {
    cat("Negative number")
} else if (x > 0) {
    cat("Positive number")
} else {
    cat("Zero")
}

### 2.5 Vector Form of `if`-`else` (Advanced)
Note that `if`-`else` statements and their kin only apply to single values. (So `x` above needs to be just one number.) However there is a special function that can be used to work with vector quantities, but it has some limits. The form is:

```r
ifelse(test_expression, x, y)
```

If the `test_expression` is `TRUE` then the value `x` is returned; if `test_expression` is `FALSE` then the value `y` is returned. This is applied to each element of a list in turn, and the result is list of `x`'s and `y`'s. Note that in place of either `x` or `y` you can place another function to be evaluated.

In [None]:
x <- c(-100, 2, 10, -99)

ifelse(x > 0, 1, -1)

In [None]:
x <- c(1, 0, -1)

ifelse(x > 0, 1, -1)  # Note what happens to zero!

We can also make a vector form of the law of trichotomy. For this version, we want to get `-1` when the input number is negative, `1` when the input is positive, and `0` when the input is zero.

In [None]:
# This is a slightly more ADVANCED example, so if it is not 
# clear at first, don't worry about it

x <- c(1, 0, -1)

ifelse(x > 0, 1, ifelse(x < 0, -1, 0))  # Note what happens to zero!

### 2.6  `while` Loops
Now that we have conditionals worked out, we can look at another kind of loop. `for` loops are good for tasks where you know in advance how many times you want to do them, but `while` loops use a conditional test to decide when to quit.

```
while(TEST_CONDITION){
  Do something here
  Update something or go on forever!
}
```

In [None]:
i <- 0

while(i <= 100){
    cat(i, "\n")
    i <- i + 10
}

In [None]:
# Equivalent for loop for the basic case

for(i in seq(0, 100, by = 10)){
    cat(i, "\n")
}

Note that in the `for` loop we have built the value of `i` into the definition. What if we change `i` to a different value like `22`?

Note the differences between these two loops: In the `for` loop we need to know the starting value of `i` to write the loop. In the `while` loop version, we could actually accept **any** value of `i` as out start and the loop would simply pick up at that value and increment by 10 on each pass. `while` loops are better for processes where the steps are not all known when you start.

In principle, however, you can usually make any of the basic looping structures do any sort of loop you desire. But the different types of loops make certain forms easier.


## 3. Functions
### 3.1 Introduction to Functions
Functions wrap up little bits of code to do things. This can be as simple as defining a z-score or as complicated as computing and returning regression coefficients (or much more).

The basic form of a function definition looks like this:

```r
name_of_function <- function(INPUT){
    do various things here that use INPUT
    return(OUTPUT)
}
```
The main feature of a function is that it takes an INPUT (or INPUTs), using these it does a computation, then it gives back ("returns") an OUTPUT, usually using the `return` function. What happens in between INPUT and OUTPUT happens inside of a sealed container. 

Once a function is defined, it is used by typing its name followed by its INPUT in parentheses:

```r
name_of_function(INPUT)
```

### 3.2 Examples

#### 3.2.1 The Square Function
By tradition one of the first functions people usually learn to program is the "square" function:

In [None]:
square <- function(x){
    result <- x*x
    return(result)
}

In [None]:
square(5)
square(10)

In [None]:
short.square <- function(x){
    return(x*x)
}

In [None]:
short.square(2)

Remember the table of squares from above?

In [None]:
for(i in 1:10) {
  cat(i, i*i, "\n", sep='\t')
}

In [None]:
for(i in 1:10) {
  cat(i, square(i), "\n", sep='\t') # Each time through square gets a new value
}

Clearly for this example it does not make things simpler. We usually use functions to wrap up more complex tasks.

Notice a few things:

+ Functions use **round** brackets or parentheses: `(` and `)`, **not** square ones for the INPUT.
+ Functions use the **curly brackets** `{` and `}` to surround their instructions.
+ When **defining** a function, you use a variable (like `x` above). This variable is replaced with whatever number you give as input to the function.
+ When you give a variable as input to a function, its **value** is what is used as input. Its name does not matter. (In programming and logic this is a **free parameter**, see [here](https://en.wikipedia.org/wiki/Free_variables_and_bound_variables).)

Additionally, 
+ Any names used inside a function do not affect things outside of the function, even when they are the same!

Here is an example of that last point. Note that `f` does not do anything really useful.

In [None]:
f <- function(x){
    A <- 10         # Set the variable A equal to 10
    return(A*x)     # Return A times x
}

In [None]:
A <- 222     # Use the same name as used inside of f

f(100)

A            # Note A's value OUTSIDE of f is unchanged

The example above shows the [concept of "scope"](https://en.wikipedia.org/wiki/Scope_%28computer_science%29) as the term is used in computer science.

#### 3.2.2 The `mtcars` Slope Function

Recall the example of obtaining the slopes from `mtcars` above:

In [None]:
i <- 3    # Pick one of the columns from mtcars

mod1 <- lm(mtcars[,1] ~ mtcars[,i])          # Fit the model
coef(mod1)[2]  

In [None]:
mtcars_slope <- function(i){
    mod1 <- lm(mtcars[,1] ~ mtcars[,i])
    return(coef(mod1)[2])                # Just the slope, no name
}

In [None]:
mtcars_slope(3)  # Sometimes R carries other junk along with numbers

In [None]:
# Here is the loop from above, with the new function
# Compare this with the version above. It should be clearer
# to understand.

for(i in 2:11){
    cat(names(mtcars)[i], mtcars_slope(i), "\n", sep = "\t")
}

#### 3.2.3 Trichotomy Function
Let's say we want to wrap up the trichotomy law from above into a function called `trichotomy` that works on individual numbers. This function will return the number `1` for input that is positive, `-1` for negatives, and `0` for inputs of zero. It can be implemented using `if`-`else if`-`else`.

In [None]:
trichotomy <- function(x){
    if (x < 0) {
        return(-1)
    } else if (x > 0) {
        return(1)
    } else {
        return(0)
    }
}

In [None]:
# Test

trichotomy(100)
trichotomy(-3.33)
trichotomy(0.000)

In [None]:
# This is actually used in actual work, R implements this as the 
# sign function:

sign(100)
sign(-3.33)
sign(0.000)

#### 3.2.4 Skipping Elements in a List
In the following example we will use the R's modulus operator. This is written as `%%` and it gives the remainder after integer (whole number) division.

In [None]:
10 %% 3   # 10 = 3 * 3 + 1

In [None]:
12 %% 4   # 12 = 3 * 4 + 0

The definition of an **even number** is that there is a remainder of 0 when the number is divided by 2.

In [None]:
  2 %% 2   # Evens
100 %% 2
 38 %% 2
233 %% 2   # Odds
  3 %% 2
 77 %% 2

In [None]:
for(i in 1:100){
    cat(i, "\t")
}

In [None]:
# Now we can combine this with an if to only print even numbers:

for(i in 1:100){
    if(i %% 2 == 0){
        cat(i, "\t")
    }
}

In [None]:
# Slightly more advanced trick: the modulus operator can test for divisibility
# by other values, too. Let's say we want to print only 5 values per line.
# Then when i %% 5 == 0, we need to make cat move to the next line.

for(i in 1:100){
    cat(i, "\t")
    if(i %% 5 == 0){
        cat("\n")
    }
}

***
Version 1.0  
2018.07.11

To contact the author, email [mturner46@gsu.edu](mailto:mturner46@gsu.edu). Please contact me with recommendations for improvement or if you find any errors. This work may be adapted for any non-commercial purpose within the bounds of the license.

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.