# R Functions

**Custom Functions and Using Functions in R**

## Overview

In general, we make use of functions to do specific tasks for us in our code. When beginning coding with functions, it might be tempting to get them to do many things at once. You are free to do this with your own personal code, but this isn't usually the best way to approach functions you intend to share with other users. Functions should be designed with a purpose. You can always create more functions to do more tasks.


## Pure functions

In functional programming, we decide what to compute. Programs are built from pure functions that take input and return output without side effects. Side effects can include:

- Changing global variables from inside the function
- Printing out stuff from inside the function
- Writing to files
- Plotting from inside the function

These behaviors are fine for quick exploration, but when we share code, they make it harder to use and test. Pure functions return an object; the user can then decide when to print, write to a file, or plot.

Avoid these behaviors in the functions that you write and intend to share.
If you have any print statements for debugging or code chunks where you test the functions by calling them outside the function definition, remove them before you share your code (or turn in your assignment).


## Existing Functions in base R

There are many functions that will be available to us when we start programming in R.

If you type the command in RStudio's console:


In [None]:
help(package = 'base')

You will see a list of all functions and data (we'll talk more about soon) in the base package of R.

### Some Common Base R Functions

#### Aggregate functions
*(these take a whole vector and return a single summary value)*

- `mean()` mean of a set of numbers  
- `median()` median of a set of numbers  
- `sd()` standard deviation of a set of numbers  
- `sum()` sum of a set of numbers  
- `length()` length of a vector  
- `max()` / `min()` maximum and minimum values of a vector  

#### Elementwise functions
*(these apply a transformation to each element of a vector)*

- `round()` round to a specified number of decimal places  
- `sqrt()` square root  
- `log()` natural logarithm  
- `exp()` exponential  
- `abs()` absolute value  

From the *Help* pane (in Rstudio), you can see the official documentation. You can also pull this up through the Jupyter hub code cells similarily by using the `help()` function or a `?` like `?mean`. For most of the documentation, you will skim it to find out how to use it as there will be much more information than you need.


In [None]:
vec <- c(-1,0,1)
sum(vec)
abs(vec)

### Reading documentation

Let's take a look at the documentation for `mean()`.

At the very top, you’ll see a short **Description** of what the function is for. Below that is the **Usage** section shows how to call the function, including `defaults`.

You’ll see something like:

- `mean(x, ...)`
- `mean.default(x, trim = 0, na.rm = FALSE, ...)`

**How to read this:**

- `x` is the main input.
- `trim = 0` means the default value of `trim` is `0`.
- `na.rm = FALSE` means the default is **not** to remove missing values.
- `...` means “additional optional arguments”.

The **Arguments** section is one of the most important parts.

For `mean()`, you’ll see which **arguments** or inputs it will accept when the function is called:

- **`x`**: the data to average
- **`trim`**: how much to trim from each end before averaging (useful for robust averages)
- **`na.rm`**: whether to remove missing values (`NA`) before computing
- **`...`**: extra options passed to other methods

The **Value** (Return value) section indicated the what the function will **return** or output.

For `mean()`, you should expect a **single number** (typically numeric), but the exact type can depend on `x` (e.g., integer vs double) and the presence of missing values.

---

Many of the functions we use for summaries, like `mean()`, `median()`, `sum()`, and `length()`, are good examples of **pure functions** in the way we use them in R. A pure function takes inputs (its arguments), computes a result, and returns that result. It does not “reach out” and change other variables in your workspace.

That’s why, when I use a function like `mean()`, I don’t need to keep track of the local variables inside the function. Any intermediate values the function creates are temporary and disappear when the function call finishes. What does matter for my code are the function’s arguments (including default values like `na.rm = FALSE`) and the return value.


In [None]:
?mean

## Default values in functions

A **default value** is a value a local parameter takes if the caller doesn't provide an argument in the corresponding place in the function call.

### Syntax


In [None]:
my_fun <- function(x, option = default_value) {
  # code
}

Notice that any parameter without default values will be on the left of those with default values in the definition.

### Example: a power function


In [None]:
power <- function(x, p = 2) {
  x ^ p
}

power(5)      # x == 5, p == 2 (uses the default)
power(5, 3)   # x == 5, p == 3

### Example: a greeting function

In [None]:
greet <- function(name = "friend", punctuation = "!") {
  paste0("Hi ", name, punctuation)
}

greet()
greet("johnnpickles")
greet("johnnpickles", " :)")

## Create your own functions

Why do you want to create your own function?

- To get computers to do tedious work for you.
- To organize your work and have a way for it to be reproduced.
- To create tools you can share with others.

You already have everything you need to write your own functions. Most of the time, R already has functions that do most of what you want—--especially when you use packages like the `tidyverse`. Your job is often to combine those existing tools into a small function that does exactly what you need for your project.

When you write your own functions, you can use functions from packages inside them. If you later decide to share your work, you might bundle your functions into your own package (a collection of related functions).

R also supports more advanced programming styles, like creating new kinds of objects (called classes). We won’t focus on that in DATS 1001. In this course, we’ll focus on building your own toolbox by writing clear functions that you can reuse and build on.


### Examples of functions

#### 1. Absolute value

In [None]:
my_abs <- function(x){
  if (x < 0){
    return(-x)
  } else {
    return(x)
  }
}

In [None]:
my_abs(-7)
my_abs(7)

Or a bit more compactly:

In [None]:
my_abs <- function(x){
  if (x < 0){
    return(-x)
  }
  x
}

In [None]:
my_abs(-7)
my_abs(7)

#### 2. Let's make our own function `my_sum()`.

#### 3. Let's program `mock()`:

Let's make a function to mock a user. We will have them input some text it have it output the text all mockingly. First we need something that's the opposite of `paste0`:

In [None]:
text <- "Hello, world"
chars_vec <- strsplit(text, split = "")[[1]]
chars_vec

tidyverse package [`stringr`](https://stringr.tidyverse.org/)

## Programming with the pipe (`|>`)

The pipe lets you write code left-to-right:

> “Take the thing on the left, and pass it into the function on the right.”

### Basic pattern


In [None]:
x |> f() |> g()

This is the same as:

In [None]:
g(f(x))

### Example

In [None]:
x <- c(1, 2, 3, 4, 100)

x |> mean()
x |> mean(trim = 0.2)

### Where does the input go?

In base R, `x |> f(a = 1)` becomes:


In [None]:
f(x, a = 1)

### Debugging tip

In [None]:
# Note: this cell includes examples; adjust as needed.
input <- 100
output <- sqrt(log(input))

output <- input |> log() |> sqrt()

library(tidyverse)
output <- input %>% log() %>% sqrt()

If a pipe chain gets long, break it into steps:

In [None]:
input <- 100
x_log <- input |> log()
output <- x_log |> sqrt()

This is often easier to debug than a long pipe chain.