# Introduction to R Programming: Loops, Conditionals, & Functions

Matthew D. Turner, PhD  
Georgia State University

Some rights reserved: [cc by-nc-sa](https://creativecommons.org/licenses/by-nc-sa/4.0/) See bottom of document for details.
***
# Basic Exercises
This notebook is a collection of basic exercises for making loops, using conditionals, and writing functions. You should start here if you are completely new to these topics. The exercises below are of varying complexity, so if you find some to be too hard, you may want to proceed to the next section.

In the notebook below, empty cells with comments contain instructions for you to build simple programs in R code. Cells are executed by either using the key combination shift-enter (or shift-return on some computers) or by using the mouse to press the "Run" button at the top of the notebook.

Start by running the cells below. **Remember**: you need to run all of the cells with code, even ones where you do not make any changes. If something does not work, it could be due to skipping a cell. Also, you may press shift-enter to move through the notebook one cell at a time; this includes the text cells!

In [2]:
# Setting up the R environment

options(repr.plot.width = 4, repr.plot.height = 4)  # Set the figure size 

### 1. Simple Loop Exercises
Here we will use loops to display some data and do basic data filtering. Please note that using loops to filter data is not always optimal, but it does lend itself to easy examples for practice.

In [3]:
# Here is some data drawn from a mathematically defined (known) distribution
#  >> It has a true mean of 25 and a true standard deviation of 9 <<

d <- c(39.89503,26.84281,24.25861,28.87009,21.98997,21.62833,3.836511,33.59383,
       29.82358,10.65092,33.86132,37.29709,1.178438,9.14365,24.33669,18.59312,
       3.755524,35.13025,12.3738,22.3297,17.46969,26.15035,24.44927,33.22489,
       30.54006,17.60494,11.47602,50.5574,22.49891,17.17721,34.62822,21.65989,
       25.29415,22.08202,31.12628,28.16447,26.65634,26.81129,38.41278,21.12782,
       10.29205,25.77778,1.574856,14.65169,30.01658,32.61243,22.40103,41.40533,
       6.766324,34.11612,23.20139,25.23146,22.6141,16.48503,23.39496,27.68777,
       20.16894,19.71992,22.51938,18.84068,47.7751,33.34769,40.17129,25.83041,
       25.5106,29.01418,4.318843,21.87696,31.89288,17.99988,26.79175,9.075963,
       17.81769,29.30444,40.21988,39.85456,16.40074,24.58396,26.97367,20.57295,
       29.30642,20.47403,33.41683,9.886789,32.07099,27.86358,23.91271,8.362674,
       20.9204,27.89309,-0.5909693,30.50101,25.05787,25.68484,31.23876,11.40253,
       18.55688,5.459615,16.87926,27.25582)

Determine the number of data points in `d`. We will use this in some `for` loops below.

In [40]:
# How many elements in d?
#
# Hint: If you've never used it, look up the help on the "length" function 
#       first (?length). You should get used to using it.



Note that the `length` function in R gives different sorts of values for different sorts of objects. For a data frame, it gives the number of columns. For a matrix, vector, or list it gives the total number of elements in the object. For a _vector_ like `d` above it will give the total number of items.

In [41]:
# Write a for loop that prints out the elements of d
#
# Hint: If you use tabs to arrange things, your columns might not line up
#       perfectly, but this is ok. If you want to use commas, see the demo 
#       notebook, section 1.1.1. But that is a little harder.
# Hint: If i = 3, then d[i] is the 3rd element of d, for instance. 



In [42]:
# Write a loop that prints out d, with 3 elements per line separated by tabs
#
# Hint: You will need an if.
# Hint: Remember the modulus: %%
# Hint: see the introductory demo, section 3.2.4



In [None]:
# Modify the FIRST loop above to print out only elements that are greater
# than the mean of d
#
# Hint: Do NOT try to print them 3 to a line unless you like a real, 
#       annoying challenge!
# Hint: You should get 53 numbers printed out



In R the `quantile` functions reports the values in data (like our `d`) that correspond to particular percentages of the distribution. The cell below computes the 25th and 75th (1st and 3rd) quartiles of `d`, and puts them into variables `Q1` and `Q3`. You will use these in the next exercises.

In [4]:
out <- quantile(d, c(0.25, 0.75))
Q1 <- out[1]
Q3 <- out[2]
cat(Q1, Q3, sep=", ")   # Show you the values

17.95433, 29.87183

We can use the comparisons to see if elements of `d` are between these limits. In the example below, there is some "fluff" from R printed along with the results (we mentioned this in the demo examples, too) which can be ignored. 

In [5]:
cat(d[1])   # Just looking at this value, it is between 17.95 and 29.87?

d[1] > Q1   # TRUE
d[1] < Q3   # FALSE

# These can be combined with the "and" (&) operator:

d[1] > Q1 & d[1] < Q3  # Is d[1] **INSIDE** the (Q1, Q3) interval?
                       # That is, are both statements simultaneously TRUE?

39.89503

In [None]:
# Start with the first for loop above, and modify it to only print out
# elements of d that are INSIDE the interval (Q1, Q3).
#
# Hint: If you think this is hard, you **might** be overthinking it.
# Hint: Did you remember to change the d[1] from the example in the cell above?



In [None]:
# Copy the loop just above, but add an else clause to cat the string
# "........" (a sequence of 8 dots) for each element of d that is NOT in 
# the interval.
#
# Hint: In case you wonder, "........" made things line up nicely for me
#       when I did this using tabs ("\t") to separate values. You may need
#       to adjust the number of dots for your screen.



In case you had problems with the previous two exercises, here is my solution to a **related** problem: The following code prints out the values that are **outside** the (Q1, Q3) interval, rather than inside, and prints dots for values inside the interval. Comapre it to your result above, and where you have dots I should have numbers, and _vice-versa_.

In [6]:
# Print the points in d OUTSIDE of the (Q1, Q3) interval:

for(i in 1:length(d)){
    if(d[i] > Q1 & d[i] < Q3){
        cat("........", "\t")
    } else{
        cat(d[i], "\t")
    }
}

39.89503 	........ 	........ 	........ 	........ 	........ 	3.836511 	33.59383 	........ 	10.65092 	33.86132 	37.29709 	1.178438 	9.14365 	........ 	........ 	3.755524 	35.13025 	12.3738 	........ 	17.46969 	........ 	........ 	33.22489 	30.54006 	17.60494 	11.47602 	50.5574 	........ 	17.17721 	34.62822 	........ 	........ 	........ 	31.12628 	........ 	........ 	........ 	38.41278 	........ 	10.29205 	........ 	1.574856 	14.65169 	30.01658 	32.61243 	........ 	41.40533 	6.766324 	34.11612 	........ 	........ 	........ 	16.48503 	........ 	........ 	........ 	........ 	........ 	........ 	47.7751 	33.34769 	40.17129 	........ 	........ 	........ 	4.318843 	........ 	31.89288 	........ 	........ 	9.075963 	17.81769 	........ 	40.21988 	39.85456 	16.40074 	........ 	........ 	........ 	........ 	........ 	33.41683 	9.886789 	32.07099 	........ 	........ 	8.362674 	........ 	........ 	-0.5909693 	30.50101 	........ 	........ 	31.23876 	11.40253 	........ 	5.459615 	16.87926 	........ 	

The code that I wrote does not have to look exactly like your code. For instance, I wrote: `i in 1:length(d)` where you could have written `i in 1:100` or `i in 1:n` for some variable `n` that you defined above. Neither way is better or worse, just different.

### 2. Function Exercises
Here we will make some functions. 

#### 2.1 Z-Scores
In the first example we will make a simple z-score function. 

The z-score is defined as: $$z = \frac{x - \bar{x}}{s_x}\\[2ex]$$ where $x$ is any of the data points, $\bar{x}$ is the mean of the whole data set, and $s_x$ is the standard deviation of the set. In the following you will calculate this then wrap it up as a function.

In [7]:
# For the x provided, (1) write a formula to compute z from x, and 
#                     (2) print (or cat) z

x <- c(5, 8, 7, 9)



In [8]:
z = (x - mean(x))/sd(x)
cat(z)

-1.317465 0.439155 -0.146385 1.024695

To make a function you need to wrap your calculations inside of the following:

```r 
function_name <- function(INPUT){
    calculations                                 
    return(OUTPUT)
}
```
where the `calculations` set the value of the `OUTPUT`. In the above cell you actually did all of the calculations for `z`. So to make it a function, you just have to add the formula you wrote above to the middle of the function definition.

In [None]:
# (1) Copy the function definition from the text cell above (or retype it)
# (2) Change the function_name to z
# (3) Change the INPUT to x
# (4) Place your z formula where the calculations go
# (5) Change OUTPUT to z 



In [9]:
z <- function(x){
    z = (x - mean(x))/sd(x)
    return(z)
}

The next few cells will test your `z` function.

In [10]:
# The results here should be the same as above

z(x)  

In [23]:
# The following is an example of what math people call a "fixed-point" these particular
# numbers are also their own z-scores. So if your function works, the output should be 
# the same as the input

y <- c(-1, 0, 1)

z(y)     

In [22]:
mean(y)
var(y)
sd(y)

In [11]:
# Here we make 100 random normal numbers. They are drawn from a distribution with mean
# of 100 and standard deviation of 15.

random_data <- rnorm(100, mean = 100, sd = 15)
cat(random_data, sep=",  ")

96.30601,  85.83765,  115.8075,  75.51509,  107.1422,  116.839,  109.3991,  116.7874,  98.76782,  95.17485,  82.82325,  114.5331,  136.9556,  105.0516,  108.2704,  125.9209,  84.36107,  109.9005,  127.0408,  111.7153,  103.1983,  107.7864,  88.96781,  99.40869,  106.8326,  84.70936,  106.9914,  85.54433,  107.877,  114.8528,  107.8855,  96.36306,  91.26947,  86.19991,  116.3781,  94.88841,  118.7197,  114.47,  93.00482,  109.1394,  106.3361,  103.9606,  104.6113,  95.24153,  63.89622,  102.9177,  108.5338,  83.85312,  102.2513,  79.91065,  113.9889,  104.1734,  108.3264,  86.83575,  91.28749,  104.1914,  87.80647,  76.99942,  129.2678,  79.11895,  94.40533,  86.07761,  95.20602,  110.1874,  96.99638,  103.4492,  77.60198,  101.0685,  114.6171,  126.0801,  89.86719,  101.7774,  92.36072,  94.18034,  85.77382,  99.05039,  101.7004,  96.88752,  116.9454,  128.2755,  83.10148,  93.11326,  76.77855,  92.03128,  85.52991,  94.19641,  88.67239,  109.6174,  97.64087,  123.9211,  102.6756,  100

For any normally distributed data, $N(\mu, \sigma)$ if you apply the $z$ transformation to it, the data should become $N(0,1)$, that is, normally distributed with a mean or center of 0 and a standard deviation of 1. We can test the random data above:

In [12]:
random_z <- z(random_data)

mean(random_z)   # This should be very close to 0; very small ~ 1 x 10^-12 (1e-12) or less
sd(random_z)     # This should be very close to 1

When writing functions

+ It is often easiest to simply do the calculation (like for the `z`) then when you get it to work transfer it into the function form.
+ If you do a calculation on a vector (like `x` above) then the function will work on other vectors naturally. 
+ Functions written in R by the developers tend to have a lot of extra parts: to check that inputs are correct, to look for problems, and generally to protect users from errors. Your functions will usually not have these features. If you start developing stuff for other people to use, you may want to learn how to do this sort of thing.

Functions are all about **abstraction**, that is, taking a process and making it more general than any specific example. Now that you have abstracted 

In [None]:
# Here is some very simple data
x <- c(1,2,3,4,5,6)
y <- c(2.1, 2.9, 4.5, 5.9, 6.7, 7.1)

In [None]:
# Plot the data above



In [None]:
# Fit a regression model for y on x (y ~ x) and call the model "m1" 
# (that is a one following the m, not an "el")
#
# Hint: This is the simplest case of using lm, see ?lm for help
# Hint: Some of this was covered in the demo notebook



In [None]:
m1 <- lm(y~x)

In [None]:
# We can get the coefficients from m1 with the following code:

intercept <- coef(m1)[1]
slope     <- coef(m1)[2]

In [None]:
plot(x,y)

In [None]:
# Using Q1 and Q3 test if the first element of d is in between these values

# Hint: 



In [None]:
d[1] < Q3
d[1] > Q1

In [None]:
mean(d)
median(d)
quantile(d, c(0.25, 0.75))

In [None]:
for(i in 1:100){
    cat(i, )
}

In [None]:
cat(paste(letters, 100* 1:26), fill = TRUE, labels = paste0("{", 1:10, "}:"))

In [None]:
library(MASS)
c()

In [None]:
d <- rnorm(100, mean = 25, sd = 9)

In [None]:
# mean = 25 / sd = 9

d <- c(39.89503,26.84281,24.25861,28.87009,21.98997,21.62833,3.836511,33.59383,
       29.82358,10.65092,33.86132,37.29709,1.178438,9.14365,24.33669,18.59312,
       3.755524,35.13025,12.3738,22.3297,17.46969,26.15035,24.44927,33.22489,
       30.54006,17.60494,11.47602,50.5574,22.49891,17.17721,34.62822,21.65989,
       25.29415,22.08202,31.12628,28.16447,26.65634,26.81129,38.41278,21.12782,
       10.29205,25.77778,1.574856,14.65169,30.01658,32.61243,22.40103,41.40533,
       6.766324,34.11612,23.20139,25.23146,22.6141,16.48503,23.39496,27.68777,
       20.16894,19.71992,22.51938,18.84068,47.7751,33.34769,40.17129,25.83041,
       25.5106,29.01418,4.318843,21.87696,31.89288,17.99988,26.79175,9.075963,
       17.81769,29.30444,40.219,39.85456,16.40074,24.58396,26.97367,20.57295,
       29.30642,20.47403,33.41683,9.886789,32.07099,27.86358,23.91271,8.362674,
       20.9204,27.89309,-0.5909693,30.50101,25.05787,25.68484,31.23876,11.40253,
       18.55688,5.459615,16.87926,27.25582)

In [None]:

hist(d, col = "lavender")

In [None]:
dfit <- fitdistr(d, densfun = "normal")
dfit

In [None]:
mean(d)
sd(d)

In [None]:
str(dfit)

In [None]:
cat(rep(d), sep =',')

## Integer Math
Sometimes doing math with integers (whole numbers) and remainders is better than doing it with decimal points. For instance, we can use this to tell if numbers are even or odd. The definition of an even number is one that has zero remainder when divided by 2.

`%%` gives the remainder after division
`%/%` does the division, and gives only the whole number part

So for $a < b$ we can write: $b = n \cdot a + r$ where `b %/% a` gives $n$, and `b %% a` gives $r$.

It turns out that remainders tell us what we need to know to see if a number is even. The definition of an even number is one that has zero remainder when divided by 2.

In [None]:
# How do you tell if a number is even?

  5 %% 2
-35 %% 2
 19 %% 2
-20 %% 2
  2 %% 2
  6 %% 2
  0 %% 2

In [None]:
for(i in 1:20){
    cat(i, i*i, "\n", sep="\t")
}

In [None]:
# Using an IF-ELSE (or just an IF) statement, make the above loop only
# cat the squares of EVEN numbers



In [None]:
for(i in 1:20){
    if(i %% 2 == 0){
        cat(i, i*i, "\n", sep="\t")
    }
}

In [None]:
??even

***
Version 1.0  
2018.07.11

To contact the author, email [mturner46@gsu.edu](mailto:mturner46@gsu.edu). Please contact me with recommendations for improvement or if you find any errors. This work may be adapted for any non-commercial purpose within the bounds of the license.

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.