vignettes/Performance.Rmd

---
title: "R6 and Reference class performance tests"
output:
  html_document:
    theme: null
    css: mystyle.css
    toc: yes
    fig_retina: false
---

<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Reference and R6 class performance tests}
-->

```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 3.9, fig.height = 3.5)
library(microbenchmark)
# Only print 3 significant digits
print_microbenchmark <- function (x, unit, order = NULL, ...) {
  s <- summary(x, unit = unit)
  cat("Unit: ", attr(s, "unit"), "\n", sep = "")

  timing_cols <- c("min", "lq", "median", "uq", "max")
  s[timing_cols] <- lapply(s[timing_cols], signif, digits = 3)
  s[timing_cols] <- lapply(s[timing_cols], format, big.mark = ",")

  print(s, ..., row.names = FALSE)
}
assignInNamespace("print.microbenchmark", print_microbenchmark,
  "microbenchmark")
```

This document compares the memory costs and speed of R's reference classes against R6 classes and simple environments. For must uses, R6 and reference classes have comparable features, but as we'll see, R6 classes are faster and lighter weight.

This document tests reference classes against R6 classes (in many variations), as well as against very simple reference objects: environments created by closures.

*****

First we'll load some packages which will be used below:

```{r}
library(microbenchmark)
options(microbenchmark.unit = "us")
library(pryr)  # For object_size function
library(R6)
```


```{r echo=FALSE}
library(ggplot2)
library(scales)

# Set up ggplot2 theme
my_theme <- theme_bw(base_size = 10) +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank()
    )
```

*****

Class definitions
=================

We'll start by defining a number of classes or class-like entities, using reference classes, R6 classes, and simple environments that are created directly by functions. These classes will be used for the speed and memory tests that follow. This is a lot of boring code, so you may want to skip ahead to the results.

All of these classes have the same basic characteristics:

* A field named `x` that contains a number.
* An way of initializing the value of `x`.
* A method named `getx` for retrieving the value of `x`.
* A method named `inc` for incrementing the value of `x`.

The fields and methods are accessed with the `$` operator, so if we have an object named `obj`, we could use `obj$x` or `obj$getx()`.

## R reference class

```{r}
RC <- setRefClass("RC", 
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <- x,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)
```

In reference classes, the binding that points back to the object is named `.self`. Within a method, assignment can be done by using `.self`, as in `.self$x <- 10`, or by using `<<-`, as in `x <<- 10`.

To create an object, simply call `$new()` on the class:

```{r}
RC$new()
```

## R6 class

Creating an R6 class is similar to the reference class, except that there's no need to separate the fields and methods, and you can't specify the types of the fields.

```{r}
R6 <- R6Class("R6",
  public = list(
    x = NULL,
    initialize = function(x = 1) self$x <- x,
    getx = function() self$x,
    inc = function(n = 1) self$x <- x + n
  )
)
```

Whereas reference classes use `.self`, R6 classes use `self` (without the leading period). As with reference classes, objects are instantiated by calling `$new()`:

```{r}
R6$new()
```

An R6 object essentially just an environment. The methods for an R6 object have bindings (that is, they have names) in that environment, and also have that environment as their enclosing environment (they "run in" that environment).

## R6 class, without class attribute

By default, a class attribute is added to R6 objects. This attribute adds a slight performance penalty because R will attempt to use S3 dispatch when using `$` on the object.

It's possible generate objects without the class attribute, by using `class=FALSE`:

```{r}
R6NoClass <- R6Class("R6NoClass",
  class = FALSE,
  public = list(
    x = NULL,
    initialize = function(x = 1) self$x <- x,
    getx = function() self$x,
    inc = function(n = 1) self$x <- self$x + n
  )
)
```

Note that without the class attribute, S3 method dispatch on the objects is not possible.

## R6 class, non-portable

By default, R6 objects are *portable*. This means that inheritance can be in classes that are in different packages. However, it also requires the use of `self$` and `private$` to access members, and this incurs a small performance penalty.

If `portable=FALSE` is used, members can be accessed without using `self$`, and assignment can be done with `<<-`:

```{r}
R6NonPortable <- R6Class("R6NonPortable",
  portable = FALSE,
  public = list(
    x = NULL,
    initialize = function(value = 1) x <<- value,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)
```


## R6 class, without class attribute, and non-portable

For comparison, we'll use a an R6 class without a class attribute, and which is non-portable.

```{r}
R6NoClassNonPortable <- R6Class("R6NoClassNonPortable",
  portable = FALSE,
  class = FALSE,
  public = list(
    x = NULL,
    initialize = function(value = 1) x <<- value,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)
```


## R6 class, with public and private members

This is a variant of the previous type of reference class, but this version has public and private members.

```{r}
R6Private <- R6Class("R6Private",
  private = list(x = NULL),
  public = list(
    initialize = function(x = 1) private$x <- x,
    getx = function() private$x,
    inc = function(n = 1) private$x <- private$x + n
  )
)
```

Instead of a single `self` object which refers to all items in an object, these objects have `self` (which refers to the public items) and `private`.

```{r}
R6Private$new()
```


## R6 class, with public and private members, no class attribute, and non-portable

For comparison, we'll add a version without a class attribute, and which is non-portable.

```{r}
R6PrivateNcNp <- R6Class("R6PrivateNcNp",
  portable = FALSE,
  class = FALSE,
  private = list(x = NULL),
  public = list(
    initialize = function(x = 1) private$x <- x,
    getx = function() x,
    inc = function(n = 1) x <<- x + n
  )
)
```


## Environment created by a closure, with class attribute

In R, environments are passed by reference. A simple way to create an object that's passed by reference is to use the environment created by the invocation of a function. The function below captures that environment, attaches a class to it, and returns it:

```{r}
ClosureEnvClass <- function(x = 1) {
  inc <- function(n = 1) x <<- x + n
  getx <- function() x
  self <- environment()
  class(self) <- "ClosureEnvClass"
  self
}
```

Even though `x` isn't declared in the function body, it gets captured because it's an argument to the function.

```{r}
ls(ClosureEnvClass())
```

Objects created this way are very similar to those created by `R6` generator we created above.


## Environment created by a closure, without class attribute

We can make an even simpler type of reference object to the previous one, by not having a a class attribute, and not having `self` object:

```{r}
ClosureEnvNoClass <- function(x = 1) {
  inc <- function(n = 1) x <<- x + n
  getx <- function() x
  environment()
}
```

This is simply an environment with some objects in it.

```{r}
ls(ClosureEnvNoClass())
```

*****

Tests
=====

For all the timings using `microbenchmark()`, the results are reported in microseconds, and the most useful value is probably the median column.

## Memory footprint

```{r echo = FALSE}
# Utility functions for calculating sizes
obj_size <- function(expr, .env = parent.frame()) {
  size_n <- function(n = 1) {
    objs <- lapply(1:n, function(x) eval(expr, .env))
    as.numeric(do.call(object_size, objs))
  }

  data.frame(one = size_n(1), incremental = size_n(2) - size_n(1))
}

obj_sizes <- function(..., .env = parent.frame()) {
  exprs <- as.list(match.call(expand.dots = FALSE)$...)
  names(exprs) <- lapply(1:length(exprs),
    FUN = function(n) {
      name <- names(exprs)[n]
      if (is.null(name) || name == "") paste(deparse(exprs[[n]]), collapse = " ")
      else name
    })

  sizes <- mapply(obj_size, exprs, MoreArgs = list(.env = .env), SIMPLIFY = FALSE)
  do.call(rbind, sizes)
}
```


How much memory does a single instance of each object take, and how much memory does each additional object take? We'll use the functions `obj_size` and `obj_sizes` (shown at the bottom of this document) to calculate the sizes.

Sizes of each type of object, in bytes:

```{r}
sizes <- obj_sizes(
  RC$new(),
  R6$new(),
  R6NoClass$new(),
  R6NonPortable$new(),
  R6NoClassNonPortable$new(),
  R6Private$new(),
  R6PrivateNcNp$new(),
  ClosureEnvClass(),
  ClosureEnvNoClass()
)
sizes
```

The results are plotted below. Note that the plots have very different x scales.

```{r echo = FALSE, results = 'hold'}
objnames <- c("RC", "R6", "R6NoClass", "R6NonPortable",
              "R6NoClassNonPortable", "R6Private", "R6PrivateNcNp",
              "ClosureEnvClass", "ClosureEnvNoClass")

sizes$name <- factor(objnames, levels = rev(objnames))

ggplot(sizes, aes(y = name, x = one)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  geom_point(size = 2) +
  scale_x_continuous(limits = c(0, max(sizes$one[-1]) * 1.5),
                     expand = c(0, 0), oob = rescale_none) +
  scale_y_discrete(
    breaks = sizes$name,
    labels = c("RC (off chart)", "R6", "R6NoClass", "R6NonPortable",
               "R6NoClassNonPortable", "R6Private", "R6PrivateNcNp",
               "ClosureEnvClass", "ClosureEnvNoClass")) +
  my_theme +
  ggtitle("First object")

ggplot(sizes, aes(y = name, x = incremental)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  scale_x_continuous(limits = c(0, max(sizes$incremental) * 1.05),
                     expand = c(0, 0)) +
  geom_point(size = 2) +
  my_theme +
  ggtitle("Additional objects")
```


It looks like using a reference class takes up a huge amount of memory, but much of that is shared between reference classes. Adding an object from a different reference class doesn't require much more memory --- around 38KB:

```{r}
RC2 <- setRefClass("RC2",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 2) .self$x <<- x,
    inc = function(n = 2) x <<- x * n
  )
)

# Calcualte the size of a new RC2 object, over and above an RC object
as.numeric(object_size(RC$new(), RC2$new()) - object_size(RC$new()))
```

## Object instantiation speed

How much time does it take to create one of these objects? This shows the median time, in microseconds:

```{r}
# Function to extract the medians from microbenchmark results
mb_summary <- function(x) {
  res <- summary(x, unit="us")
  data.frame(name = res$expr, median = res$median)
}

speed <- microbenchmark(
  RC$new(),
  R6$new(),
  R6NoClass$new(),
  R6NonPortable$new(),
  R6NoClassNonPortable$new(),
  R6Private$new(),
  R6PrivateNcNp$new(),
  ClosureEnvClass(),
  ClosureEnvNoClass()
)
speed <- mb_summary(speed)
speed
```

The plot below shows the median instantiation time.

```{r echo = FALSE, results = 'hold', fig.width = 8}
speed$name <- factor(speed$name, rev(levels(speed$name)))

p <- ggplot(speed, aes(y = name, x = median)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  geom_point(size = 2) +
  scale_x_continuous(limits = c(0, max(speed$median) * 1.05), expand = c(0, 0)) +
  my_theme +
  ggtitle("Median time to instantiate object (\u0b5s)")

p
```

R reference classes are much slower to instantiate than the other types of classes. Instantiating R6 objects is roughly 10 times faster. Creating an environment with a simple closure is another 10-20 times faster still.


## Field access speed

How much time does it take to access a field in an object? First we'll make some objects:

```{r}
rc           <- RC$new()
r6           <- R6$new()
r6nc         <- R6NoClass$new()
r6np         <- R6NonPortable$new()
r6nc_np      <- R6NoClassNonPortable$new()
r6priv       <- R6Private$new()
r6priv_nc_np <- R6PrivateNcNp$new()
closure      <- ClosureEnvClass()
closure_nc   <- ClosureEnvNoClass()
```

And then get a value from these objects:

```{r}
speed <- microbenchmark(
  rc$x,
  r6$x,
  r6nc$x,
  r6np$x,
  r6nc_np$x,
  r6priv$x,
  r6priv_nc_np$x,
  closure$x,
  closure_nc$x
)
speed <- mb_summary(speed)
speed
```


```{r echo = FALSE, results = 'hold', fig.width = 8}
speed$name <- factor(speed$name, rev(levels(speed$name)))

p <- ggplot(speed, aes(y = name, x = median)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  geom_point(size = 2) +
  scale_x_continuous(limits = c(0, max(speed$median) * 1.05), expand = c(0, 0)) +
  my_theme +
  ggtitle("Median time to access field (\u0b5s)")

p
```

Accessing the field of a reference class is much slower than the other methods. 

There's also an obvious pattern where accessing the field of an environment (created by R6 or a closure) is slower when there is a class attribute. This is because, for the objects that have a class attribute, R attempts to look up an S3 method for `$`, and this lookup has a performance penalty, as we'll see below.


## Field setting speed

How much time does it take to set the value of a field in an object?

```{r}
speed <- microbenchmark(
  rc$x <- 4,
  r6$x <- 4,
  r6nc$x <- 4,
  r6np$x <- 4,
  r6nc_np$x <- 4,
  # r6priv$x <- 4,         # Can't set private field directly,
  # r6priv_nc_np$x <- 4,   # so we'll skip these two
  closure$x <- 4,
  closure_nc$x <- 4
)
speed <- mb_summary(speed)
speed
```

```{r echo = FALSE, results = 'hold', fig.width = 8}
speed$name <- factor(speed$name, rev(levels(speed$name)))

p <- ggplot(speed, aes(y = name, x = median)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  geom_point(size = 2) +
  scale_x_continuous(limits = c(0, max(speed$median) * 1.05), expand = c(0, 0)) +
  my_theme +
  ggtitle("Median time to set field (\u0b5s)")

p
```

Reference classes are significantly slower than the others, again. In this case, there's additional overhead due to type-checking the value.

Once more, the no-class objects are significantly faster than the others, again probably due to attempted S3 dispatch on the `` `$<-` `` function.

## Speed of method call that accesses a field

How much overhead is there when calling a method from one of these objects? All of these `getx()` methods simply return the value of `x` in the object. When necessary, this method uses `self$x` (for R6 classes, when `portable=TRUE`), and in others, it just uses `x` (when `portable=FALSE`, and in reference classes).

```{r}
speed <- microbenchmark(
  rc$getx(),
  r6$getx(),
  r6nc$getx(),
  r6np$getx(),
  r6nc_np$getx(),
  r6priv$getx(),
  r6priv_nc_np$getx(),
  closure$getx(),
  closure_nc$getx()
)
speed <- mb_summary(speed)
speed
```

```{r echo = FALSE, results = 'hold', fig.width = 8}
speed$name <- factor(speed$name, rev(levels(speed$name)))

p <- ggplot(speed, aes(y = name, x = median)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  geom_point(size = 2) +
  my_theme +
  ggtitle("Median time to call method that accesses field (\u0b5s)")

p
```

The reference class is the slowest.

`r6` is also somewhat slower than the others. There are two reasons for this: first, it uses `self$x` which adds some time, and second, it has a class attribute, which slows down the access of both `r6$getx` and `self$x`.

One might expect `r6priv` to be the same speed as `r6`, but it is faster. Although accessing `r6priv$getx` is slow, because `r6priv` has a class attribute, accessing `private$x` is faster, because it does not have a class attribute.

The objects which can access `x` directly (without `self` or `private`) and which lack a class attribute are the fastest.

## Assignment using `self$x <-` vs. `x <<-`

With reference class objects, you can modify fields using the `<<-` operator, or by using the `.self` object. For example, compare the `setx()` methods of these two classes:

```{r}
RCself <- setRefClass("RCself",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function() .self$x <- 1,
    setx = function(n = 2) .self$x <- n
  )
)

RCnoself <- setRefClass("RCnoself",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function() x <<- 1,
    setx = function(n = 2) x <<- n
  )
)
```

Non-portable R6 classes are similar, except they use `self` instead of `.self`:

```{r}
R6self <- R6Class("R6self",
  portable = FALSE,
  public = list(
    x = 1,
    setx = function(n = 2) self$x <- n
  )
)

R6noself <- R6Class("R6noself",
  portable = FALSE,
  public = list(
    x = 1,
    setx = function(n = 2) x <<- n
  )
)
```


```{r}
rc_self   <- RCself$new()
rc_noself <- RCnoself$new()
r6_self   <- R6self$new()
r6_noself <- R6noself$new()

speed <- microbenchmark(
  rc_self$setx(),
  rc_noself$setx(),
  r6_self$setx(),
  r6_noself$setx()
)
speed <- mb_summary(speed)
speed
```

```{r echo = FALSE, results = 'hold', fig.width = 8}
speed$name <- factor(speed$name, rev(levels(speed$name)))

p <- ggplot(speed, aes(y = name, x = median)) +
  geom_segment(aes(yend = name), xend = 0, colour = "gray80") +
  geom_point(size = 2) +
  my_theme +
  ggtitle("Assignment to a field using self vs <<- (\u0b5s)")

p
```

For both reference and non-portable R6 classes, assignment using `.self$x <-` is somewhat slower than using `x <<-`.

Bear in mind that, by default, R6 classes are portable, and can't use assignment with `x <<-`.

## Overhead from using `$` on objects with a class attribute

There is some overhead when using `$` on an object that has a class attribute. In the test below, we'll create three different kinds of objects:

1. An environment with no class attribute.
1. An environment with a class `"e2"`, but without a `$.e2` S3 method.
1. An environment with a class `"e3"`, which has a `$.e3` S3 method that simply returns `NULL`.

Each one of these environments will contain an object `x`.

```{r}
e1 <- new.env(hash = FALSE, parent = emptyenv())
e2 <- new.env(hash = FALSE, parent = emptyenv())
e3 <- new.env(hash = FALSE, parent = emptyenv())

e1$x <- 1
e2$x <- 1
e3$x <- 1

class(e2) <- "e2"
class(e3) <- "e3"

# Define an S3 method for class e3
`$.e3` <- function(x, name) {
  NULL
}
```

Now we can run timing tests for calling `$` on each type of object. Note that for the `e3` object, the `$` function does nothing --- it simply returns `NULL`.

```{r}
speed <- microbenchmark(
  e1$x,
  e2$x,
  e3$x
)
speed <- mb_summary(speed)
speed
```

Using `$` on `e2` and `e3` is much slower than on `e1`. This is because `e2` and `e3` have a class attribute. Even though there's no `$` method defined for `e2`, doing `e2$x` still about 6 times slower than `e1$x`, simply because R looks for an appropriate S3 method.

`e3$x` is slightly faster than `e2$x`; this is probably because the `$.e3` function doesn't actually do anything other than return NULL.

If an object has a class attribute, R will attempt to look for a method every time `$` is called. This can slow things down considerably, if `$` is used often.

## Lists vs. environments, and `$` vs. `[[`

Lists could also be used for creating classes. How much time does it take to access items using `$` for lists vs. environments? We'll also compare using `obj$x` to `obj[['x']]`.

```{r}
lst <- list(x = 10)
env <- new.env()
env$x <- 10

mb_summary(microbenchmark(
  lst = lst$x,
  env = env$x,
  lst[['x']],
  env[['x']]
))
```

Performance is comparable across environments and lists.

The `[[` operator is slightly faster than `$`, probably because it doesn't need to convert the unevaluated symbol to a string.

*****

Wrap-up
=======

R6 objects take less memory and are significantly faster than R's reference class objects, and they also have some options that provide for even more speed.

In these tests, the biggest speedup for R6 classes comes from not using a class attribute; this speeds up the use of `$`. Non-portable R6 classes can also access fields without `$` at all, which provides another modest speed boost. In most cases, these speed increases are negligible -- they are on the order of microseconds and will be noticeable only when tens or even hundreds of thousands of class member accesses are performed.


*****

Appendix
========

## Functions for calculating object sizes

```{r eval=FALSE}
# Utility functions for calculating sizes
obj_size <- function(expr, .env = parent.frame()) {
  size_n <- function(n = 1) {
    objs <- lapply(1:n, function(x) eval(expr, .env))
    as.numeric(do.call(object_size, objs))
  }

  data.frame(one = size_n(1), incremental = size_n(2) - size_n(1))
}

obj_sizes <- function(..., .env = parent.frame()) {
  exprs <- as.list(match.call(expand.dots = FALSE)$...)
  names(exprs) <- lapply(1:length(exprs),
    FUN = function(n) {
      name <- names(exprs)[n]
      if (is.null(name) || name == "") paste(deparse(exprs[[n]]), collapse = " ")
      else name
    })

  sizes <- mapply(obj_size, exprs, MoreArgs = list(.env = .env), SIMPLIFY = FALSE)
  do.call(rbind, sizes)
}
```


## System information

```{r}
sessionInfo()
```