# Lecture 10.1: Writing Functions in R
<div style="border: 1px double black; padding: 10px; margin: 10px">

**After today's lecture you will understand:**
* how to write functions in R
</div>

This correpsonds to Chapter 19.1--19.6 of your book



    




In [2]:
library(tidyverse)

## Functions

R identifies functions by the `func()` construction. Functions are simply collections of commands that do something. Functions take arguments which can be used to specify which objects to operate on and what values of parameters are used. You can use `help(func)` to see what a function is used for and what arguments it expects, i.e. `help(sprintf)`.

### Arguments

Functions will often have multiple arguments. Some arguments have default values, others do not. All arguments without default values must be passed to a function. Arguments can be passed by name or position. For instance,




In [5]:
# generate 5 numbers from a Normal(0, 1) distribution.
w = rnorm(5, mean = 0, sd = 1)
x = rnorm( n = 5, mean = 0, sd = 1)
y = rnorm(5, 0, 1)
z = rnorm(5)
round( cbind(x, y, z), 1)

x,y,z
-0.5,-0.5,-0.2
-1.5,0.5,-0.7
-0.5,-0.1,1.0
-0.5,1.4,0.2
0.2,0.0,0.2


In [6]:
# rnorm generate random normal distributed numbers
# pass in arguments
# whenever unsure, ?rnorm, we can see that those are the three arguments
# we are requird to pass the value n, and no need mean and sd.
# want to specify the names if we change the default values

Arguments passed by name need not be in order:

In [8]:
w = rnorm( mean = 0, sd = 1, n = 5)
u = rnorm( mean = 0, sd = 1, 5) # This also works but is bad style. 
round( rbind(u = u, w = w), 1 )

# unnamed arguments get passed to the first argument after the names arguments are assigned

0,1,2,3,4,5
u,0.9,0.1,0.1,-0.2,-2.9
w,-0.8,0.4,-1.2,-0.3,-1.4


###  Style notes

Values for function arguments with default values should be passed by name, not position.
Commonly used and required function arguments can be passed by position.
It’s never bad style to pass by name rather than value.

### Writing Functions in R

You can create your own functions in R. Use functions for tasks that you repeat often in order to make your scripts more easily readable and modifiable. A good rule of thumb is never to copy an paste more than twice; use a function instead.
It can also be a good practice to use functions to break complex processes into parts, especially if these parts are used with control flow statements such as loops or conditionals.


We start with a preliminary exercise where we standardize our vector to have mean zero and standard deviation one.  To achieve this, we can brute force our way through by centering our data first, and then divide by its standard deviation.  

In [3]:
x <- c(1,5,-11,20)
print(x)

[1]   1   5 -11  20


In [5]:
x_centered <- x-mean(x)
print(x_centered)
var(x_centered)

[1]  -2.75   1.25 -14.75  16.25


In [6]:
x_std <- x_centered / sd(x_centered) # divide by std dev to have variance 1
print(x_std)
var(x_std)

[1] -0.21501223  0.09773283 -1.15324742  1.27052682


Now let say you have to perform this task again for another vector.  You can simply repeat the above calculations.  

In [11]:
y <- c(-12, 3, 14, 56)
y_centered <- y - mean(y)
y_std <- y_centered / sd(y_centered)
print(y_std)
var(y_std) 

[1] -0.93379798 -0.41978074 -0.04283477  1.39641349


Or, we could write a function in R to help us achieve what we want! 

In [9]:
# function to compute z-scores
z_score1 = function(x) {
  #inputs: x - a numeric vector
  #outputs: the z-scores for x
  xbar = mean(x)
  s = sd(x)
  z = (x - mean(x)) / s
  
  return(z)  
}

stopifnot( z_score1(1:3) == -1:1) # return error if this is not true

The return statement is not strictly necessary, but can make complex functions more readable. It is good practice to avoid creating intermediate objects to store values only used once.



In [10]:
# function to compute z-scores
z_score2 = function(x){
  #inputs: x - a numeric vector
  #outputs: the z-scores for x
  {x - mean(x)} / sd(x)
}

In [11]:
x = rnorm(10, 3, 1) ## generate some normally distributed values
round( cbind(x, 'Z1' = z_score1(x), 'Z2' = z_score2(x) ), 1)

x,Z1,Z2
3.0,-0.3,-0.3
3.2,0.0,0.0
2.9,-0.5,-0.5
3.4,0.3,0.3
3.2,-0.1,-0.1
2.2,-1.5,-1.5
2.8,-0.6,-0.6
4.9,2.4,2.4
3.3,0.1,0.1
3.3,0.1,0.1


### Default Parameters

We can set default values for parameters using the construction `parameter = xx` in the function definition.




In [12]:
# function to compute z-scores
z_score3 = function(x, na.rm = T){
  {x - mean(x, na.rm = na.rm)} / sd(x, na.rm = na.rm)
}

In [13]:
x = c(NA, x, NA)
round( cbind(x, 'Z1' = z_score1(x), 'Z2' = z_score2(x), 'Z3' = z_score3(x) ), 1)

x,Z1,Z2,Z3
,,,
3.0,,,-0.3
3.2,,,0.0
2.9,,,-0.5
3.4,,,0.3
3.2,,,-0.1
2.2,,,-1.5
2.8,,,-0.6
4.9,,,2.4
3.3,,,0.1


## Scope

Scoping refers to how R looks up the value associated with an object referred to by name. There are two types of scoping – lexical and dynamic – but we will concern ourselves only with lexical scoping here. There are four keys to understanding scoping:

- environments
- name masking
- variables vs functions
- dynamic look up and lazy evaluation.


An environment can be thought of as a context in which names are associated with objects. Each time a function is called, it generates a new environment for the computation.

Consider the following examples:

In [15]:
?ls()
ls()

In [16]:
f1 = function() {
  f1_message = "I'm defined inside of f!"  # `message` is a function in base
  ls()
}
f1()

In [17]:
exists('f1') # f1 %in% ls() 

In [18]:
exists('f1_message')

In [19]:
environment() # here we are in the global environment

<environment: R_GlobalEnv>

In [20]:
f2 = function(){
  environment() # here we are in the local environment -- each time we get a different local environment
    # created for the purpose of this function
}
f2()

<environment: 0x7fab30195bd0>

In [21]:
rm(f1, f2)

Name masking refers to where and in what order `R` looks for object names.
When we call `f1` above, `R` first looks in the current environment which happens to be the global environment. The call to `ls()` however, happens within the environment created by the function call and hence returns only the objects defined in the local environment.

When an environment is created, it gets nested within the current environment referred to as the “parent environment”. When an object is referenced we first look in the current environment and move recursively up through parent environments until we find a value bound to that name.



Name masking refers to the notion that objects of the same name can exist in different environments. Consider these examples:



In [22]:
#  Example 3 -- lexical scoping
y = x = 'I came from outside of f!'
f3 = function(){
  x =  'I came from inside of f!'
  list( x = x, y = y )
}
f3()

In [23]:
# x is giong to be within the closing environment
# y is not, so R will search for y in the parent environment and keep moving up
# x is associated with f3, not going to change the x in the global environment, unless we explicitly write the code 
# to do that
x 

In [24]:
#  Example 4 -- masking
mean = function(x){ 
    sum(x)
}
mean(1:10)

In [25]:
base::mean(1:10)



In [26]:
rm(mean)

R also uses dynamic look up, meaning values are searched for when a function is called, not when it is created. In the example above, y was defined in the global environment rather than within the function body. This means the value returned by f3 depends on the value of y in the global environment. You should generally avoid this, but there are occasions where it can be useful.



In [27]:
# Example 5 - dynamic lookup
y = "I have been reinvented!"
f3()

In [None]:
# when we create f3 y was defined differnetly. R use what its call dynamic lookup
# find y in the nearest environment when f3 is evaluated.  

Finally, lazy evaluation means R only evaluates function arguments if and when they are actually used.



In [28]:
# Example 6 - lazy evaluation

f4 = function(x){
  #x
  45
}

f4( x = stop("Let's pass an error.") )

### Summary thus far
<div style="border: 1px double black; padding: 10px; margin: 10px">
    
**Functions**

* When in doubt, pass arguments to a function by name
    
* If you copy and paste a chunk of codes more than three times, use a function
    
* Use comment to document each of your function
    * purpose
    * input / arguments - including default arguments
    * outputs
    
* Scope: function bodies are executed in their own environment
    * dynamic lookup 
    * masking
    * lazy evaluation
    
    </div>    