In [2]:
library(rlang)

### Code is data

To compute on code, you first need some way to capture it. The first function that captures code is `rlang::expr()`. You can think of it returning exactly what you pass in:

In [6]:
expr(mean(x, na.rm = TRUE))

mean(x, na.rm = TRUE)

In [7]:
expr(10 + 100 + 1000)

10 + 100 + 1000

`expr()` lets you capture code that you’ve typed. You need a different tool to capture code passed to a function because `expr()` doesn’t work:

In [8]:
capture_it <- function(x) {
  expr(x)
}
capture_it(a + b + c)

x

Here you need to use a function specifically designed to capture user input in a function argument: `enexpr()`.

In [11]:
capture_it <- function(x) {
  enexpr(x)
}
capture_it(a + b + c)

a + b + c

In [17]:
f <- expr(f(x = 1, y = 2))

# Add a new argument
f$z <- 3
f
#> f(x = 1, y = 2, z = 3)


f(x = 1, y = 2, z = 3)

In [18]:
# Or remove an argument:
f[[2]] <- NULL
f
#> f(y = 2, z = 3)

f(y = 2, z = 3)

Note that the first element of the call is the function to be called, which means the first argument is in the second position.

### Code is a tree

A very convenient tool for understanding the tree-like structure is `lobstr::ast()`. Given some code, will display the underlying tree structure. Function calls form the branches of the tree, and are shown by rectangles. The leaves of the tree are symbols (like a) and constants (like "b").

In [19]:
lobstr::ast(f(a, "b"))

█─f 
├─a 
└─"b" 

In [20]:
lobstr::ast(f1(f2(a, b), f3(1, f4(2))))

█─f1 
├─█─f2 
│ ├─a 
│ └─b 
└─█─f3 
  ├─1 
  └─█─f4 
    └─2 

Because all function forms in can be written in prefix form (Section 5.8.2), every R expression can be displayed in this way:

In [21]:
lobstr::ast(1 + 2 * 3)

█─`+` 
├─1 
└─█─`*` 
  ├─2 
  └─3 

### Code can generate code

As well as seeing the tree from code typed by a human, you can also use code to create new trees. There are two main tools: `call2()` and unquoting.

`rlang::call2()` constructs a function call from its components: the function to call, and the arguments to call it with.

In [22]:
call2("f", 1, 2, 3)

f(1, 2, 3)

In [24]:
call2("+", 1, call2("*", 2, 3))

1 + 2 * 3

This is often convenient to program with, but is a bit clunkly for interactive use. An alternative technique is to build complex code trees by combining simpler code trees with a template. `expr()` and `enexpr()` have built-in support for this idea via `!!` (pronounced bang-bang), the unquote operator.

In [27]:
xx <- expr(x + x)
yy <- expr(y + y)

expr(!!xx / !!yy)

(x + x)/(y + y)

Notice that the output preserves the operator precedence so we get `(x + x) / (y + y)` not `x + x / y + y (i.e. x + (x / y) + y)`. This is important to note, particularly if you’ve been thinking “wouldn’t this be easier to do by pasting strings?”.

Unquoting gets even more useful when you wrap it up into a function, first using `enexpr()` to capture the user’s expression, then `expr()` and `!!` to create an new expression using a template. The example below shows you might generate an expression that computes the coefficient of variation:

In [42]:
cv <- function(var) {
  var <- enexpr(var)
  expr(mean(!!var) / sd(!!var))
}

cv(x)
#> mean(x)/sd(x)

cv(x + y)
#> mean(x + y)/sd(x + y)

x + x

x + y + (x + y)

In [34]:
cv(`)`)

mean(`)`)/sd(`)`)

### Evaluation excutes an expression in an environment

The primary tool for evaluating expressions is `base::eval()`, which takes an expression and an environment:

In [48]:
eval(expr(x + y), env(x = 1, y = 10))
eval(expr(x + y), env(x = 2, y = 100))

In [50]:
cv <- function(var) {
  var <- enexpr(var)
  expr(!!var + !!var)
}

cv(x)
eval(cv(x), env(x =1))
cv(x + y)
eval(cv(x + y), env(x = 1, y = 2))

x + x

x + y + (x + y)

If you omit the environment, it will use the current environment. Here that’s the global environment:

In [52]:
x <- 10
y <- 100
eval(expr(x + y))

One of the big advantages of evaluating code manually is that you can tweak the execution environment. There are two main reaons to do this:

- To temporarily override functions to implement a domain specific language.
- To add a data mask so you can to refer to variables in a data frame as if they are variables in an environment.

### You can override functions to make a DSL

It’s fairly straightforward to understand customising the environment with different variable values. It’s less obvious that you can also rebind functions to do different things. The example below evalutes code in a special environment where the basic algebraic operators (`+`, `-`, `*`, `/`) have been overridden to work with string instead of numbers:

In [60]:
string_math <- function(x) {
  e <- env(
    caller_env(),
    `+` = function(x, y) paste0(x, y),
    `*` = function(x, y) strrep(x, y),
    `-` = function(x, y) sub(paste0(y, "$"), "", x),
    `/` = function(x, y) substr(x, 1, nchar(x) / y)
  )

  eval(enexpr(x), e)
}
name <- "Hadley"
string_math("Hi" - "i" + "ello " + name)
#> [1] "Hello Hadley"
string_math("x-" * 3 + "y")
#> [1] "x-x-x-y"

### Data masks blur the line between data frames and environments

Rebinding functions is an extremely powerful technique, but it tends to require a lot of investment. A more immediately practical application is modifying evaluation to look for variables in a data frame instead of an environment. This idea powers the base `subset()` and `transform()` functions, as well as many tidyverse functions like `ggplot2::aes()` and `dplyr::mutate()`. It’s possible to use `eval()` for this, but there are a few potential pitfalls, so we’ll use `rlang::eval_tidy()` instead.

As well as expression and environment, `eval_tidy()` also takes a data mask, which is typically a data frame:

In [73]:
df <- data.frame(x = 1:5, y = sample(5))
df
eval_tidy(expr(x + y), df)
#> [1] 2 6 5 9 8

x,y
1,2
2,3
3,4
4,1
5,5


Evaluating with a data mask is a useful technique for interactive analysis because it allows you to write `x + y` rather than `df$x + df$y`.

We can wrap this pattern up into a function by using `enexpr()`. This gives us a function very similar to `base::with()`:

In [74]:
with2 <- function(df, expr) {
  eval_tidy(enexpr(expr), df)
}

with2(df, x + y)

### Quosures capture an expression with its environment

We can see the problem if we attempt to use `with2()` mingling a variable from the data frame, and a variable called `a` in the current environment:

In [75]:
with2 <- function(df, expr) {
  a <- 1000
  eval_tidy(enexpr(expr), df)
}

In [76]:
df <- data.frame(x = 1:3)
df
a <- 10
with2(df, x + a)

x
1
2
3


That’s because we really want to evaluate the captured expression in the environment where it was written (where `a` is 10), not the environment inside of `with2()` (where `a` is 1000).

### `enquo` sibling of `enexpr` which bundles expression AND environment

Fortunately we call solve this problem by using a new data structure: the quosure which bundles an expression with an environment. `eval_tidy()` knows how to work with quosures so all we need to do is switch out `enexpr()` for `enquo()`:

In [77]:
with2 <- function(df, expr) {
  a <- 1000
  eval_tidy(enquo(expr), df)
}

with2(df, x + a)
#> [1] 11 12 13

Whenever you use a data mask, you must always use `enquo()` instead of `enexpr()`. 