## R Functions ##

Functions are the basic building blocks of programming. It allows us to avoid having to code the same thing over and over again, and make our code re usable and easier to manage when creating larger programs.

Here is a basic function that adds two numbers together so that we can see how they are structred in R:

In [1]:
add_num <- function(x, y) {
        x + y
}

In [2]:
add_num(5,6)

In [3]:
add_num(251,822)

We define the name of the function, just like any variable, but the inside changes. We first tell R we are assigning a `function` and immediately we add in brackets the variables that the function takes, in this case `x` and `y` as placeholders for any number. The curly brakets is where we add what we want the function to actually do, in this case, add `x` and `y` together. We could add the `return` command before the instructions, but R will automatically return the last piece of the code from our function. Then we test it, and as we can any two numbers and our function will add them and return them.

Now let's code a function that will take a vector, and return the values that are above a certain value (10 in this case).

In [4]:
above_10 <- function(x) {
        use <- x > 10
        x[use]
}

In [10]:
vec <- 1:50
print(above_10(vec))

 [1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
[26] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50


This only works for numbers above 10, what if we wanted to define that number:

In [11]:
above <- function(x,n) {
        use <- x > n
        x[use]
}

In [12]:
print(above(vec, 27))

 [1] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50


In both cases, we are creating a `use` variable, which is just a True False vector of that tells us which numbers on our inputed vector satisfy the condition bigger than `n`. Then we ask it to return the subset of numbers which comply with that True False vector.

As for more complex functions, let's say we have a matrix and we want to calculate the mean of each column.

In [13]:
columnmean <- function(m) {
        nc <- ncol(m)
        means <- numeric(nc)
        for(i in 1:nc) {
                means[i] <- mean(m[, i])
        }
        means
}    

In [21]:
m1 <- matrix(rnorm(36)*100, nrow=6) # creating a random matrix of 36 numbers from -100 to 100 in 6 rows (hence 6 columns)

In [22]:
print(columnmean(m1))

[1] -75.730908 -21.827960  71.997776   1.108621  66.811140 -28.796938


Here we created a function that takes in a matrix `m`. We created the means vector to store the means, then iterated over the numer of columns, and stored the mean of each column in our means vector and at the end we ask it to return the `means` variable.

In general, functions belong to the class `function`, and we build them using the following structure:

```
f <- function(<arguments>) {
        ## Do Something interesting
}
```

Functions are `first class objects` which means they can be treated much like we do other objects in R. 

- Functions can be passed as arguments to other functions.
- Functions can be nested so that you can define a function within another function. The return  value of a function is the last expression in the function to be evaluated.

#### Function Arguments ####

Functions have named arguments which can potentially have default values.

- The formal arguments are the ones included in the function definition.
- The `formals` function returns a list of all the formal arguments of a function.

In [25]:
formals(add_num)

$x


$y



- Not every function call in R makes use of all the formal arguments.
- Function arguments can be missing or might have default values.

#### Argument Matching ####

R functions arguments can be matched positionally or by name. so the following calls to the function `sd` (standard deviation function), are all valid.

In [27]:
mydata <- rnorm(100) # 100 random numbers from -1 to 1

In [28]:
sd(mydata)

In [29]:
sd(x = mydata)

In [30]:
sd(x = mydata, na.rm = FALSE)

In [31]:
sd(na.rm = FALSE, x = mydata)

In [32]:
sd(na.rm = FALSE, mydata)

Even though it is legal, like in the last example, it is not recommended to play around with the order of the arguments.

We can also mix positional matching and matching by name. When an argument is matched by name it is taken 'taken out' of the argument list and the remaining unnamed arguments are matched in the order that they are listed in the function definition.

In [34]:
args(lm) # Linear model function

This means that the two following funciton calls are equivalent:

```
lm(data = mydata, y - x, model = FALSE, 1:100)
lm(y - x, mydata, 1:100, model = FALSE)
```

The second way to call the argument is the common and recommended one.

Most of the time, named arguments are useful when working directly on the command line, especially when you have a long argument list and we want to use the defaults except for a few arguments at the end of the list.

Named arguments also help if you can remember the name of the argument but not its position on the list.

Function arguments can also be partially matched, as long as there are no similar named arguments. This is very useful for interactive work. 

The order of the operations when given arguments is:

- Check for exact match for named argument
- Check for partial match
- Check for positional match

