# Writing Functions and Loops

Open RStudio.

Open a new R script in R and **save it as** `wpa_7_LastFirst.R` (where Last and First is your last and first name). 

Careful about: capitalizing, last and first name order, and using `_` instead of `-`.

At the top of your script, write the following (**with appropriate changes**):

In [1]:
# Assignment: WPA 7
# Name: Laura Fontanesi
# Date: 20 April 2020

## 1. Creating R Functions

R scripts are a way to organize and save data, complicated expressions, or sequences of operations for re-use.

Whenever we re-use a code snippet, instead of copy-pasting and thus increasing the chances of typos and other mistakes, we should rather think about how to generalize our code, so that it can be re-used later in the script or other scripts on slightly different data.

Functions are perfect for this purpose. We already used many functions, that other people have defined for us and saved in packages that we could load in R.

Now we see how to create our own functions:

In [2]:
percentage = function(x, total) {
    ratio = x/total
    perc = ratio*100
    
    return(perc)
}

In [3]:
percentage(5, 20)

In [4]:
a = percentage(45, 80)

a

In [5]:
a + percentage(35, 80)

In [6]:
print_percentage = function(x, total) {
    ratio = x/total
    perc = ratio*100
    
    print(paste(perc, '%'))
}

In [7]:
print_percentage(6, 20)

[1] "30 %"


In [8]:
a = print_percentage(45, 80)

a

[1] "56.25 %"


In [9]:
print_percentage_decimals = function(x, total, n_decimals) {
    ratio = x/total
    perc = ratio*100
    
    rounded_perc = round(perc, n_decimals)
    
    print(paste(rounded_perc, '%'))
}

In [10]:
print_percentage_decimals(45, 80, 1)
print_percentage_decimals(45, 80, 0)
print_percentage_decimals(79, 81, 4)

[1] "56.2 %"
[1] "56 %"
[1] "97.5309 %"


In [11]:
print_percentage_decimals = function(x, total, n_decimals=3) { # define a default argument
    ratio = x/total
    perc = ratio*100
    
    rounded_perc = round(perc, n_decimals)
    
    print(paste(rounded_perc, '%'))
}

In [12]:
print_percentage_decimals(79, 81)

[1] "97.531 %"


In [13]:
print_percentage_decimals(79, 81, 1)

[1] "97.5 %"


## 2. Conditional satements

Conditional statements can help us improve our functions, so that different operations can be done depending on the input to the function.

In [14]:
x = -4

if (x > 0) {
    print("x is positive")
} else {
    x + 9
}

In [15]:
print_percentage_smart = function(x, total, n_decimals=0) { # define a default argument
    if (x <= total & x >=0) {
        ratio = x/total
        perc = ratio*100

        rounded_perc = round(perc, n_decimals)
        print(paste(rounded_perc, '%'))
        
    } else {
        print("Invalid x value: x should be positive and smaller than its total.")
    }
}

In [16]:
print_percentage_smart(3, 40, 1)

[1] "7.5 %"


In [17]:
print_percentage_smart(-3, 40, 1)

[1] "Invalid x value: x should be positive and smaller than its total."


In [18]:
print_percentage_smart(42, 40, 1)

[1] "Invalid x value: x should be positive and smaller than its total."


In [19]:
print_percentage_smart = function(x, total, n_decimals=0) { # define a default argument
    if (x <= total & x >=0) {
        ratio = x/total
        perc = ratio*100

        rounded_perc = round(perc, n_decimals)
        print(paste(rounded_perc, '%'))
        
    } else if (x > total) {
        print("Invalid x value: x should be smaller than its total.")
    } else {
        print("Invalid x value: x should be positive.")
    }
}

In [20]:
print_percentage_smart(3, 40, 1)

[1] "7.5 %"


In [21]:
print_percentage_smart(-3, 40, 1)

[1] "Invalid x value: x should be positive."


In [22]:
print_percentage_smart(42, 40, 1)

[1] "Invalid x value: x should be smaller than its total."


## 3. Loops

Another construct that can help you repeat the same code on different inputs is a loop. We now look at 2 types of loops: `for` and `while` loops:

In [23]:
vector = seq(1, 10)
second_vector = c()

second_vector

NULL

In [24]:
for (i in vector) {
    second_vector = c(second_vector, i+2)
}

In [25]:
second_vector

In [26]:
count = 1
third_vector = c()

while (count <= 14) {
    
    third_vector = c(third_vector, count**2)
    count = count + 1
}

In [27]:
third_vector

In [28]:
student_data = data.frame(student=seq(1, 40),
                          total_points=round(runif(n=40, min=0, max=18), 1))
head(student_data)

Unnamed: 0_level_0,student,total_points
Unnamed: 0_level_1,<int>,<dbl>
1,1,11.9
2,2,3.8
3,3,2.8
4,4,2.8
5,5,0.8
6,6,3.6


In [29]:
subset_students = c(5, 16, 3, 9, 37, 30, 25, 28)

In [30]:
for (student in subset_students) {
    print(student)
}

[1] 5
[1] 16
[1] 3
[1] 9
[1] 37
[1] 30
[1] 25
[1] 28


In [31]:
for (student in subset_students) {
    print(paste("Student", student))
}

[1] "Student 5"
[1] "Student 16"
[1] "Student 3"
[1] "Student 9"
[1] "Student 37"
[1] "Student 30"
[1] "Student 25"
[1] "Student 28"


In [32]:
for (i in subset_students) {
    
    points_student = student_data[student_data$student == i, "total_points"]
    grade = percentage(points_student, 18)
    rounded_grade = round(grade)
    
    print(paste("Student:", i, "-", "Grade:", rounded_grade, '%'))
}

[1] "Student: 5 - Grade: 4 %"
[1] "Student: 16 - Grade: 65 %"
[1] "Student: 3 - Grade: 16 %"
[1] "Student: 9 - Grade: 24 %"
[1] "Student: 37 - Grade: 61 %"
[1] "Student: 30 - Grade: 87 %"
[1] "Student: 25 - Grade: 53 %"
[1] "Student: 28 - Grade: 17 %"


In [33]:
something = 3

sequence = seq(3, 13)

for (x in sequence) {
    print(x + something)
}

[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16


In [34]:
x = 4
while(x < 10) {
    print(x)
    x = x + 1
}

[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9


The equivalent way to write this loop with a `while` loop instead:

In [35]:
n_subset = length(subset_students)
n = 1

while (n <= n_subset) {
    student = subset_students[n]
    
    points_student = student_data[student_data$student == student, "total_points"]
    grade = percentage(points_student, 18)
    rounded_grade = round(grade)
    
    print(paste("Student:", student, "-", "Grade:", rounded_grade, '%'))
    
    n = n + 1 # crucial!
}

[1] "Student: 5 - Grade: 4 %"
[1] "Student: 16 - Grade: 65 %"
[1] "Student: 3 - Grade: 16 %"
[1] "Student: 9 - Grade: 24 %"
[1] "Student: 37 - Grade: 61 %"
[1] "Student: 30 - Grade: 87 %"
[1] "Student: 25 - Grade: 53 %"
[1] "Student: 28 - Grade: 17 %"


## 4. Now it's your turn

In this WPA, you will analyze data from another fake study. In this fake study the researchers were interested in whether playing video games had cognitive benefits compared to other leisure activities. In the study, 90 University students were asked to do one of 3 leisure activities for 1 hour a day for the next month. 30 participants were asked to play visio games, 30 to read and 30 to juggle. At the end of the month each participant did 3 cognitive tests, a problem solving test (**logic**) and a reflex/response test (**reflex**) and a written comprehension test (**comprehension**).

#### Datafile description

The data file has 90 rows and 7 columns. Here are the columns

- `id`: The participant ID

- `age`: The age of the participant

- `gender`: The gender of the particiant

- `activity`: Which leisure activity the participant was assigned for the last month ("reading", "juggling", "gaming")

- `logic`: Score out of 120 on a problem solving task. Higher is better.

- `reflex`: Score out of  25 on a reflex test. Higher indicates faster reflexes.

- `comprehension`: Score out of 100 on a reading comprehension test. Higher is better.

**Task A**

1. Load the `data_wpa7.txt` dataset in R (find it on Github) and save it as a new object called `leisure`. Inspect the dataset first.

2. Write a function called `feed_me()` that takes a string `food` as an argument, and returns (in case `food = 'pizza'`) the sentence "I love to eat pizza". Try your function by running `feed_me("apples")` (it should then return "I love to eat apples").

3. Without using the `mean()` function, calculate the mean of the vector `vec_1 = seq(1, 100, 5)`.

4. Write a function called `my_mean()` that takes a vector `x` as an argument, and returns the mean of the vector `x`. Use your code for task A3 as your starting point. Test it on the vector from task A3.

5. Try your `my_mean()` function to calculate the mean 'logic' rating of participants in the `leisure` dataset and compare the result to the built-in `mean()` function (using `==`) to make sure you get the same result.

6. Create a loop that prints the squares of integers from 1 to 10.

7. Modify the previous code so that it saves the squared integers as a vector called `squares`. You'll need to pre-create a vector, and use indexing to update it.

**Task B**

1. Create a function called `standardize`, that, given an input vector, returns its standardized version. Remember that to normalize a score, also called z-transforming it, you first subtract the mean score from the individual scores and then divide by the standard deviation.

2. Create a copy of the `leisure` dataset. Call this copy `z_leisure`. Normalise the `logic`, `reflex`, `age` and `comprehension` columns using the `standardize` function using a `for` loop. In each iteration of the loop, you should standardize one of these 4 columns. You can create a vector first, called `columns_to_standardize` where you store these columns and use them later in the loop. You should not add them as additional columns, but overwrite the original columns.

**Task C**

1. Create a scatterplot of `age` and `reflex` of participants in the `leisure` datset. Cutomise it and add a regression line.

2. Create a function called `my_plot()` that takes arguments `x` and `y` and returns a customised scatterplot with your customizations and the regression line.

3. Now test your `my_plot()` function on the `age` and `reflec` of participants in the `leisure` dataset.

**Task D**

1. Create a loop that returns the sum of the vector `1:10`. (i.e. Don't use the existing `sum` function).
2. Use this loop to create a function, called `my_sum` that returns the sum of any vector x. Test it on the `logic` ratings.
3. Modify the function you created in task D2, to instead calculate the mean of a vector. Call this new function `my_mean2` and compare it to both the `my_mean` function you created, and the in-built `mean` function. (Bonus: Can you also think of a way to do this without using the the length function)

**Task E (extra)**

1. What is the probability of getting a significant p-value if the null hypothesis is true? Test this by conducting the following simulation:

  - Create a vector called `p_values` with 100 NA values. 
  - Draw a sample of size 10 from a normal distribution with mean = 0 and standard deviation = 1.
  - Do a one-sample t-test testing if the mean of the distribution is different from 0. Save the p-value from this test in the 1st position of `p_values`.
  - Repeat these steps with a loop to fill `p_values` with 100 p-values.
  - Create a histogram of `p_values` and calculate the proportion of p-values that are significant at the .05 level.

2. Create a function called `p_simulation` with 4 arguments: `sim`: the number of simulations, `samplesize`: the sample size, `mu_true`: the true mean, and `sd_true`: the true standard deviation. Your function should repeat the simulation from the previous question with the given arguments. That is, it should calculate `sim` p-values testing whether `samplesize` samples from a normal distribution with mean = `mu_true` and standard deviation = `sd_true` is significantly different from 0. The function should return a vector of p-values. 

*Note*: to get the p-value of a t-test:

## Submit your assignment

Save and email your script to me at [laura.fontanesi@unibas.ch](mailto:laura.fontanesi@unibas.ch) by the end of **Friday**.