Take the following code, which takes a variable `x`, multiplies it by 10, and saves the result to a new variable called `y`. It doesn’t work because we haven’t defined a variable called `x`:

In [2]:
y <- x * 10

ERROR: Error in eval(expr, envir, enclos): object 'x' not found


It would be nice if we could capture the intent of the code, without executing it. In other words, how can we separate our description of the action from the action itself? One way is to use `rlang::expr()`:

In [4]:
z <- rlang::expr(y <- x * 10)
z

y <- x * 10

`expr()` returns an expression, an object that captures the structure of the code without evaluating it (i.e. running it). If you have an expression, you can evaluate it with `base::eval()`:

In [6]:
x <- 4
eval(z)
y

In [7]:
library(rlang)
library(lobstr)

In [9]:
lobstr::ast(f(x, "y", 1))

█─f 
├─x 
├─"y" 
└─1 

In [11]:
lobstr::ast(f(g(1, 2), h(3, 4, i())))

█─f 
├─█─g 
│ ├─1 
│ └─2 
└─█─h 
  ├─3 
  ├─4 
  └─█─i 

There’s one important situtation where whitespace does affect the AST:

In [13]:
lobstr::ast(y <- x)
lobstr::ast(y < -x)

█─`<-` 
├─y 
└─x 

█─`<` 
├─y 
└─█─`-` 
  └─x 

### Infix

In [28]:
lobstr::ast(y <- x * 10)

█─`<-` 
├─y 
└─█─`*` 
  ├─x 
  └─10 

In [30]:
lobstr::ast(`<-`(y, `*`(x, 10)))

█─`<-` 
├─y 
└─█─`*` 
  ├─x 
  └─10 

There really is no difference between the ASTs, and if you generate an expression with prefix calls, R will still print them in infix form:

In [31]:
expr(`<-`(y, `*`(x, 10)))


y <- x * 10

### Expressions

You can test for a constant with `rlang::is_syntactic_literal()`.

Constants are “self-quoting” in the sense that the expression used to represent a constant is the constant itself:

In [35]:
identical(expr(TRUE), TRUE)
#> [1] TRUE
identical(expr(1), 1)
#> [1] TRUE
identical(expr(2L), 2L)
#> [1] TRUE
identical(expr("x"), "x")

You can create a symbol in two ways: by capturing code that references an object with `expr()`, or turning a string into a symbol with `sym()`:

In [36]:
expr(x)
#> x
sym("x")
#> x

x

x

You can turn a symbol back into a string with `as.character()` or `as_string()`. `as_string()` has the advantage of clearly signalling that you’ll get a character vector of length 1.

In [38]:
as_string(expr(x))

You can recognise a symbol because it’s printed without quotes, and str() tells you that it’s a symbol:


In [40]:
str(expr(x))

 symbol x


### Calls

A call object represents a captured function call. Call objects are vectors: the first component is name of the function to call (usually repesented as a symbol), and the remaining elements are the arguments for that call. Call objects create branches in the AST, because calls can be nested inside other calls.

You can identify a call object when printed because it looks just like a function call. Unfortunately `typeof()` and `str()` print “language” for call objects, but `is_call()` returns `TRUE`:

In [41]:
lobstr::ast(read.table("important.csv", row.names = FALSE))

█─read.table 
├─"important.csv" 
└─row.names = FALSE 

In [48]:
x <- expr(read.table("important.csv", row.names = FALSE))

typeof(x)
#> [1] "language"
is.call(x)
#> [1] TRUE

### Subsetting

Calls generally behave like lists, i.e. you can use standard subsetting tools. The first element of the call object the function to call, which is a usually a symbol

In [49]:
x[[1]]
#> read.table
is_symbol(x[[1]])
#> [1] TRUE

read.table

The primary exception to this rule occurs when you use `::` to call a function in a specific package. In that case the first element will be another call:

In [45]:
lobstr::ast(base::read.csv("important.csv"))

█─█─`::` 
│ ├─base 
│ └─read.csv 
└─"important.csv" 

You can extract individual arguments with `[[` or `$` (if named):

In [56]:

x[[2]]
#> [1] "important.csv"
x$row
#> [1] FALSE

In [54]:
length(x) - 1

Extracting specific arguments from calls is challenging because of R’s flexible rules for argument matching: it could potentially be in any location, with the full name, with an abbreviated name, or with no name. To work around this problem, you can use `rlang::call_standardise()` which standardises all arguments to use the full name (Note that if the function uses `...` it’s not possible to standardise all arguments.):

In [58]:
rlang::call_standardise(x)
#> read.table(file = "important.csv", row.names = FALSE)

read.table(file = "important.csv", row.names = FALSE)

### Constructing

You can construct a call object from its components by using `rlang::call2()`. The first argument is the name of function to call (either as a string, a symbol, or another call). The remaining arguments will be passed along to the call:

In [59]:
call2("mean", x = expr(x), na.rm = TRUE)
#> mean(x = x, na.rm = TRUE)
call2(expr(base::mean), x = expr(x), na.rm = TRUE)
#> base::mean(x = x, na.rm = TRUE)

mean(x = x, na.rm = TRUE)

base::mean(x = x, na.rm = TRUE)

Note that infix calls created in this way still print as usual.

In [61]:
call2("<-", expr(x), 10)

x <- 10

Using `call2()` to create complex expressions is a bit clunky.

### Parsing and grammar

Programming languages use conventions called operator precedence to resolve this ambiguity. We can use `ast()` to see what R does:

In [2]:
lobstr::ast(1 + 2 * 3)

█─`+` 
├─1 
└─█─`*` 
  ├─2 
  └─3 

There’s one particularly surprising case in R: `!` has a much lower precedence (i.e. it binds less tightly) than you might expect. This allows you to write useful operations like:

In [3]:
lobstr::ast(!x %in% y)

█─`!` 
└─█─`%in%` 
  ├─x 
  └─y 

In [5]:
?Syntax

In [6]:
lobstr::ast((1 + 2) * 3)

█─`*` 
├─█─`(` 
│ └─█─`+` 
│   ├─1 
│   └─2 
└─3 

### Associativity

In R, most operators are left-associative, i.e. the operations on the left are evaluated first:

In [7]:
lobstr::ast(1 + 2 + 3)

█─`+` 
├─█─`+` 
│ ├─1 
│ └─2 
└─3 

There are two exceptions: exponentiation and assignment.

In [8]:
lobstr::ast(2^2^3)

█─`^` 
├─2 
└─█─`^` 
  ├─2 
  └─3 

In [9]:
lobstr::ast(x <- y <- z)

█─`<-` 
├─x 
└─█─`<-` 
  ├─y 
  └─z 

### Parsing and deparsing

Most of the time you type code into the console, and R takes care of turning the characters you’ve typed into an AST. But occasionally you have code stored in a string, and you want to parse it yourself. You can do so using `rlang::parse_expr()`:

In [10]:
x1 <- "y <- x + 10"
lobstr::ast(!!x1)
#> "y <- x + 10"

x2 <- rlang::parse_expr(x1)
x2
#> y <- x + 10
lobstr::ast(!!x2)

"y <- x + 10" 

y <- x + 10

█─`<-` 
├─y 
└─█─`+` 
  ├─x 
  └─10 

`parse_expr()` always returns a single expression. If you have multiple expression separated by `;` or `\n`, you’ll need to use `rlang::parse_exprs()`. It returns a list of expressions:

In [11]:
x3 <- "a <- 1; a + 1"
rlang::parse_exprs(x3)

[[1]]
a <- 1

[[2]]
a + 1


**Base R**

The base equivalent to `parse_exprs()` is `parse()`. It is a little harder to use because it’s specialised for parsing R code stored in files. You need supply your string to the text argument. It returns an expression vector, which I recommend turning into a list:

In [13]:
as.list(parse(text = x1))

[[1]]
y <- x + 10


The inverse of parsing is deparsing: given an expression, you want the string that would generate it. This happens automatically when you print an expression, and you can get the string yourself with `rlang::expr_text()`:

In [18]:
z <- rlang::expr(y <- x + 10)
rlang::expr_text(z)

Parsing and deparsing are not perfectly symmetric because parsing generates an abstract syntax tree. This means we lose backticks around ordinary names, comments, and whitespace:

In [21]:
cat(expr_text(expr({
  # This is a comment
  x <-             `x` + 1
})))

{
    x <- x + 1
}

### Walking the AST with recursive functions

To make this pattern easier to see, we’ll need two helper functions. First we define `expr_type()` which will return “constant” for constant, “symbol” for symbols, “call”, for calls, “pairlist” for pairlists, and the “type” of anything else:

In [22]:
expr_type <- function(x) {
  if (rlang::is_syntactic_literal(x)) {
    "constant"
  } else if (is.symbol(x)) {
    "symbol"
  } else if (is.call(x)) {
    "call"
  } else if (is.pairlist(x)) {
    "pairlist"
  } else {
    typeof(x)
  }
}

expr_type(expr("a"))
#> [1] "constant"
expr_type(expr(f(1, 2)))
#> [1] "call"

We’ll couple this with a wrapper around the switch function:

In [24]:
switch_expr <- function(x, ...) {
  switch(expr_type(x), 
    ..., 
    stop("Don't know how to handle type ", typeof(x), call. = FALSE)  
  )
}

With these two functions in hand, the basic template for any function that walks the AST is as follows:

In [25]:
recurse_call <- function(x) {
  switch_expr(x, 
    # Base cases
    symbol = ,
    constant = ,
    
    # Recursive cases
    call = ,
    pairlist = 
  )
}

### Finding F and T

We’ll start simple with a function that determines whether a function uses the logical abbreviations `T` and `F`: it will return `TRUE` if it finds a logical abbreviation, and `FALSE` otherwise.

`TRUE` is parsed as a logical vector of length one, while `T` is parsed as a name. This tells us how to write our base cases for the recursive function: a constant is never a logical abbreviation, and a symbol is an abbreviation if it’s “F” or “T”:

In [28]:
library(lobstr)
ast(TRUE)
ast(T)
#> T

TRUE 

T 

In [29]:
logical_abbr_rec <- function(x) {
  switch_expr(x, 
    constant = FALSE,
    symbol = as_string(x) %in% c("F", "T")
  )
}

In [31]:
logical_abbr_rec(expr(TRUE))
#> [1] FALSE
logical_abbr_rec(expr(T))
#> [1] TRUE

Here we’ll typically make a wrapper that quotes its input (we’ll learn more about that in the next chapter), so we don’t need to use `expr()` every time.

In [32]:
logical_abbr <- function(x) {
  logical_abbr_rec(enexpr(x))
}

logical_abbr(T)
#> [1] TRUE
logical_abbr(FALSE)
#> [1] FALSE

Next we need to implement the recursive cases. Here it’s simple because we want to do the same thing for calls and for pairlists: recursively apply the function to each subcomponent, and return `TRUE` if any subcomponent contains a logical abbreviation. This is made easy by `purrr::some()`, which iterates over a list and returns `TRUE` if the predicate function is true for any element

In [34]:
logical_abbr_rec <- function(x) {
  switch_expr(x,
    # Base cases
    constant = FALSE,
    symbol = as_string(x) %in% c("F", "T"),
    
    # Recursive cases
    call = ,
    pairlist = purrr::some(x, logical_abbr_rec)
  )
}

logical_abbr(mean(x, na.rm = T))
#> [1] TRUE
logical_abbr(function(x, na.rm = T) FALSE)
#> [1] TRUE

### Finding all variables created by assignment

Assignment is a call object where the first element is the symbol `<-`, the second is the name of variable, and the third is the value to be assigned.

Next, we need to decide what data structure we’re going to use for the results. Here I think it will be easiest if we return a character vector. If we return symbols, we’ll need to use a `list()` and that makes things a little more complicated.

With that in hand we can start by implementing the base cases and providing a helpful wrapper around the recursive function. The base cases here are really simple!

In [38]:
find_assign_rec <- function(x) {
  switch_expr(x, 
    constant = ,
    symbol = character()
  )  
}
find_assign <- function(x) find_assign_rec(enexpr(x))

find_assign("x")
#> character(0)
find_assign(x)
#> character(0)

Next we implement the recursive cases. This is made easier by a function that should exist in purrr, but currently doesn’t. `flat_map_chr()` expects `.f` to return a character vector of arbitrary length, and flattens all results into a single character vector.

In [99]:
flat_map_chr <- function(.x, .f, ...) {
  # purrr:map() is returning a list of lists
  # purrr::flatten_chr() is then flattening that list to one list
  purrr::flatten_chr(purrr::map(.x, .f, ...))
}

flat_map_chr(letters[1:3], ~ rep(., sample(3, 1)))

The recursive case for pairlists is simple: we iterate over every element of the pairlist (i.e. each function argument) and combine the results. The case for calls is a little bit more complex - if this is a call to `<-` then we should return the second element of the call:

In [101]:
find_assign_rec <- function(x) {
  switch_expr(x,
    # Base cases
    constant = ,
    symbol = character(),
    
    # Recursive cases
    pairlist = flat_map_chr(as.list(x), find_assign_rec),
    call = {
      if (is_call(x, "<-")) {
        as_string(x[[2]])
      } else {
        flat_map_chr(as.list(x), find_assign_rec)
      }
    }
  )
}

In [103]:
find_assign(a <- 1)
#> [1] "a"
find_assign({
  a <- 1
  {
    b <- 2
  }
})
#> [1] "a" "b"

Now we need to make our function more robust by coming up with examples intended to break it. What happens when we assign to the same variable multiple times?

In [104]:
find_assign({
  a <- 1
  a <- 2
})
#> [1] "a" "a"

It’s easiest to fix this at the level of the wrapper function:

In [107]:
find_assign <- function(x) unique(find_assign_rec(enexpr(x)))

find_assign({
  a <- 1
  a <- 2
})
#> [1] "a"

What happens if we have nested calls to `<-`? Currently we only return the first. That’s because when `<-` occurs we immediately terminate recursion.

In [110]:
find_assign({
  a <- b <- c <- 1
})
ast(a <- b <- c <- 1)
#> [1] "a"


█─`<-` 
├─a 
└─█─`<-` 
  ├─b 
  └─█─`<-` 
    ├─c 
    └─1 

Instead we need to take a more rigorous approach. I think it’s best to keep the recursive function focused on the tree structure, so I’m going to extract out `find_assign_call()` into a separate function.

In [111]:
find_assign_call <- function(x) {
  if (is_call(x, "<-") && is_symbol(x[[2]])) {
    lhs <- as_string(x[[2]])
    children <- as.list(x)[-1]
  } else {
    lhs <- character()
    children <- as.list(x)
  }
  
  c(lhs, flat_map_chr(children, find_assign_rec))
}

find_assign_rec <- function(x) {
  switch_expr(x,
    # Base cases
    constant = ,
    symbol = character(),
    
    # Recursive cases
    pairlist = flat_map_chr(x, find_assign_rec),
    call = find_assign_call(x)
  )
}

find_assign(a <- b <- c <- 1)
#> [1] "a" "b" "c"
find_assign(system.time(x <- print(y <- 5)))
#> [1] "x" "y"

### Pairlists

Pairlists are a remnant of R’s past and have been replaced by lists almost everywhere. The only place you are likely to see pairlists in R61 is when working with calls to the “function” function, as the formal arguments to a function are stored in a pairlist:

In [112]:
f <- expr(function(x, y = 10) x + y)

args <- f[[2]]
args
#> $x
#> 
#> 
#> $y
#> [1] 10
typeof(args)
#> [1] "pairlist"

$x


$y
[1] 10


### Missing arguments

There’s one special symbol that needs a little extra discussion: the empty symbol, which is used to represent missing arguments (not missing values!). You only need to care about the missing symbol if you’re programmatically creating functions with missing arguments. You can make an empty symbol with `missing_arg()` (or `expr()`):

In [113]:
missing_arg()
typeof(missing_arg())
#> [1] "symbol"

An empty symbol doesn’t print anything, so check if you have one with `rlang::is_missing()`:

In [114]:
is_missing(missing_arg())
#> [1] TRUE

And you’ll find them in the wild in function formals:

In [116]:
f <- expr(function(x, y = 10) x + y)
args <- f[[2]]
is_missing(args[[1]])
#> [1] TRUE

The empty symbol has a peculiar property: if you bind it to a variable, then access that variable, you will get an error:

In [117]:
m <- missing_arg()
m
#> Error in eval(expr, envir, enclos):
#>   argument "m" is missing, with no default

ERROR: Error in eval(expr, envir, enclos): argument "m" is missing, with no default


This is the same error you get when referring to a missing argument inside a function, and indeed this is the magic that powers missing arguments.

 ### Expression vectors

Finally, we need to briefly discuss the expression vector. Expression vectors are produced by only two base functions: `expression()` and `parse()`:

In [118]:
exp1 <- parse(text = c("
x <- 4
x
"))
exp2 <- expression(x <- 4, x)

typeof(exp1)
#> [1] "expression"
typeof(exp2)
#> [1] "expression"

exp1
#> expression(x <- 4, x)
exp2
#> expression(x <- 4, x)

expression(x <- 4, x)

expression(x <- 4, x)

Like calls and pairlists, expression vectors behave like a list:

In [120]:
length(exp1)
#> [1] 2
exp1[[1]]
#> x <- 4

x <- 4

In [136]:
# Greg's example using quosures
check_using_quo_get_expr <- function(data, test) {
  q_test <- enquo(test)
  eval(quo_get_expr(q_test), data)
  # or use tidy_eval and pass in quosure directly
#   eval_tidy(q_test, data)
}
check_using_quo_get_expr(list(left = 1, right = 2), left != right)