## functions 

We have come across many functions already. It is good to be aware of the origin of functions. R has a system to look for functions in different environments, this is to ensure identical function names from different packages do not interfere with one another.  
The base R functions, like `mean`, `sample`, `print`  etc come with every installation of R.  
Functions created by package developers become available after loading a package.  
User defined packages are created by the user during a R session. 
To define your own function use the following syntax:  

In [None]:
my_function <- function(input_arguments) {
# code that does stuf with input_arguments and returns something
  return(output)
}
my_function(input_arguments = 5)

A working function: 

In [None]:
sum_two_numbers <- function(a, b) {
  output <- a + b
  return(output)
}

# After the function is defined you can use the function:  
sum_two_numbers(a = 5, b = 6)

It is good practice to be explicit, however this also works:  

In [None]:
sum_two_numbers <- function(a, b) {
  a + b
  }

sum_two_numbers(5, 6)

We will have to delve into environments first before we can continue with functions  

### packages & environments & namespaces 

In R it is relatively straightforward to create you own package. If accepted by the maintainers of CRAN, the repository for R packages, it will become available for anyone by a simple call to the `install.packages()` function! But you can also put your package or scripts on Git Hub, and make it available that way.  
With a simple call to the explicit function name (no parenthesis), for example `mean` you can see the `environment: namespace:base` - which means the function is available after installing R.  
Now try to make a call to `gather`. 


In [7]:
gather

ERROR: Error in eval(expr, envir, enclos): object 'gather' not found


Indeed, "object `gather` not found".  
Install the package once (this is already done for us), and then load the package in each new R session:

In [None]:
#install.packages("tidyr")

After installing a package, the package will be located in your local `library` folder. Which you can find by typing `.libPaths()`. However we are working on a server now in JupyterLab so this might not work now.

In [4]:
.libPaths()

In [None]:
dir(.libPaths())

After installation and loading the package the functions from the package wil be available. For example the `gather` function is from the package `tidyr`:   

In [9]:
 library(tidyr)

Now we type `gather`:  

In [10]:
gather

You get the source code and at the end you see the function is from the pracma namespace. One final remark on environments: When you try to evaluate a function (or any object for that matter) R will search through the environments in a certain order and use the first it finds. To find out where and in what order R will search you can use:  

In [11]:
searchpaths()

However it is important to realize if a call is made within a function, this environment is searched first! Thus, also functions have there own environment, which only exists inside the function where the call is made. Here is an example of how this works, make sure to understand this as it is important: 

In [None]:
a <- 1

my_fun <- function(input) {
  a <- 5
  output <- a * input
return(output)
  }

my_fun(input = 1)


my_fun2 <- function(input) {
  
  output <- a * input
return(output)
  }

my_fun2(input = 1)

So `my_fun` finds `a` in it's own environment, but `my_fun2` does not so it searches outside the function environment. 
R is an interactive programming language, the danger with interactive programming is that you might have used certain variable names before and forgot about it. Which can easily happen during extensive data analysis projects. This is why it is good practice not to clutter your R-memory with variables. Keep it clean by using functions. Observe how defining a variable in a function is not stored in global R memory:

In [None]:
my_fun3 <- function( b0, b1, b2, x) {
  y <-  b1*x + b2*x
  output = y + b1
return(y)
  }

my_fun3(b0 = 2, b1 = 0.1, b2 = 0.5, x = 1:10)

b0
b1
b2
x
y
output

Notice how none of the variables exist! If you did not use a function, R memory would be cluttered already.  
There are more reasons functions are useful, they are easy to reuse, and needed for functional programming, which we go into in the next section.  

By the way, you can see your R memory and the objects that live inside in the top right panel in `Global Environment`. You can also type `ls()`  

* exercise: create a function with input arguments your name, age and weight. In the function body calculate your BMI `weight in kg`/ `(height in cm)^2`. Store your name, age, weight and BMI in a named list and return this list. 

### functional programming: apply family  
Functional programming: "Functions that do stuff with or use other functions" is close enough for us.  
Two important functions to this end are `lapply` and `sapply`. The difference is only in how they output the result. `lapply` outputs a list, and `sapply` tries to format the output - which often wont' be satisfactory - so try `sapply` first, then `lapply`.  
To quickly check the classes of the columns of a data frame:

In [None]:
sapply(my_dataframe, class)

As you can see, the arguments are: first the object to perform a function over, then the function that is called per sub index (in this case the columns) in the object. For `lists` it will be each slot:

In [None]:
sapply(my_list, class)

You can also use sapply instead of loops:

In [None]:
sapply(1:10, function(i) { i * 2})

This might be a tricky one to see at first, but what happens is that sapply subsets the columns , list entries or entries in a vector, and feeds this 1 by 1 to the function. In the last example we simply defined our own function, and gave the fed-subsets a name: `i` - which we can then continue to use in our function.  

* exercise: Define a function that adds a value to each subset of the vector `1:10`. Use `sapply` to do this. The value that you add should be an argument to the function   

* exercise: calculate the summary of each `iris` column using sapply or lapply using the `summary()` function  

Hopefully you noticed that if you want to specify some arguments to the function within an apply type of function, you need to explicitly name the subsets fed into the function by the apply function. You can do this by writing the whole function: `function(x) { function code or existing function name which used the x argument which contains the fed subset }`