Permalink
Fetching contributors…
Cannot retrieve contributors at this time
237 lines (171 sloc) 8.1 KB
---
title: "Portable and non-portable R6 classes"
---
```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```
One limitation to R's reference classes is that class inheritance across package namespaces is limited. R6 avoids this problem when the `portable` option is enabled.
## The problem
Here is an example of the cross-package inheritance problem with reference classes: Suppose you have ClassA in pkgA, and ClassB in pkgB, which inherits from ClassA. ClassA has a method `foo` which calls a non-exported function `fun` in pkgA.
If ClassB inherits `foo`, it will try to call `fun` -- but since ClassB objects are created in pkgB namespace (which is an environment) instead of the pkgA namespace, it won't be able to find `fun`.
Something similar happens with R6 when the `portable=FALSE` option is used. For example:
```{r}
library(R6)
# Simulate packages by creating environments
pkgA <- new.env()
pkgB <- new.env()
# Create a function in pkgA but not pkgB
pkgA$fun <- function() 10
ClassA <- R6Class("ClassA",
portable = FALSE,
public = list(
foo = function() fun()
),
parent_env = pkgA
)
# ClassB inherits from ClassA
ClassB <- R6Class("ClassB",
portable = FALSE,
inherit = ClassA,
parent_env = pkgB
)
```
When we create an instance of ClassA, it works as expected:
```{r}
a <- ClassA$new()
a$foo()
```
But with ClassB, it can't find the `foo` function:
```{r eval=FALSE}
b <- ClassB$new()
b$foo()
#> Error in b$foo() : could not find function "fun"
```
## Portable R6
R6 supports inheritance across different packages, with the default `portable=TRUE` option. In this example, we'll again simulate different packages by creating separate parent environments for the classes.
```{r}
pkgA <- new.env()
pkgB <- new.env()
pkgA$fun <- function() {
"This function `fun` in pkgA"
}
ClassA <- R6Class("ClassA",
portable = TRUE, # The default
public = list(
foo = function() fun()
),
parent_env = pkgA
)
ClassB <- R6Class("ClassB",
portable = TRUE,
inherit = ClassA,
parent_env = pkgB
)
a <- ClassA$new()
a$foo()
b <- ClassB$new()
b$foo()
```
When a method is inherited from a superclass, that method also gets that class's environment. In other words, method "runs in" the superclass's environment. This makes it possible for inheritance to work across packages.
When a method is defined in the subclass, that method gets the subclass's environment. For example, here ClassC is a subclass of ClassA, and defines its own `foo` method which overrides the `foo` method from ClassA. It happens that the method looks the same as ClassA's -- it just calls `fun`. But this time it finds `pkgC$fun` instead of `pkgA$fun`. This is in contrast to ClassB, which inherited the `foo` method and environment from ClassA.
```{r}
pkgC <- new.env()
pkgC$fun <- function() {
"This function `fun` in pkgC"
}
ClassC <- R6Class("ClassC",
portable = TRUE,
inherit = ClassA,
public = list(
foo = function() fun()
),
parent_env = pkgC
)
cc <- ClassC$new()
# This method is defined in ClassC, so finds pkgC$fun
cc$foo()
```
## Using `self`
One important difference between non-portable and portable classes is that with non-portable classes, it's possible to access members with just the name of the member, and with portable classes, member access always requires using `self$` or `private$`. This is a consequence of the inheritance implementation.
Here's an example of a non-portable class with two methods: `sety`, which sets the private field `y` using the `<<-` operator, and `getxy`, which returns a vector with the values of fields `x` and `y`:
```{r}
NP <- R6Class("NP",
portable = FALSE,
public = list(
x = 1,
getxy = function() c(x, y),
sety = function(value) y <<- value
),
private = list(
y = NA
)
)
np <- NP$new()
np$sety(20)
np$getxy()
```
If we attempt the same with a portable class, it results in an error:
```{r eval=FALSE}
P <- R6Class("P",
portable = TRUE,
public = list(
x = 1,
getxy = function() c(x, y),
sety = function(value) y <<- value
),
private = list(
y = NA
)
)
p <- P$new()
# No error, but instead of setting private$y, this sets y in the global
# environment! This is because of the sematics of <<-.
p$sety(20)
y
#> [1] 20
p$getxy()
#> Error in p$getxy() : object 'y' not found
```
To make this work with a portable class, we need to use `self$x` and `private$y`:
```{r}
P2 <- R6Class("P2",
portable = TRUE,
public = list(
x = 1,
getxy = function() c(self$x, private$y),
sety = function(value) private$y <- value
),
private = list(
y = NA
)
)
p2 <- P2$new()
p2$sety(20)
p2$getxy()
```
There is a small performance penalty for using `self$x` as opposed to `x`. In most cases, this is negligible, but it can be noticeable in some situations where there are tens of thousands or more accesses per second. For more information, see the Performance vignette.
## Potential pitfalls with cross-package inheritance
Inheritance happens when an object is instantiated with `MyClass$new()`. At that time, members from the superclass get copied to the new object. This means that when you instantiate R6 object, it will essentially save some pieces of the superclass in the object.
Because of the way that packages are built in R, R6's inheritance behavior could potentially lead to surprising, hard-to-diagnose problems when packages change versions.
Suppose you have two packages, pkgA, containing `ClassA`, and pkgB, containing `ClassB`, and there is code in pkgB that instantiates `ClassB` in an object, `objB`, at build time. This is in contrast to instantiating `ClassB` at run-time, by calling a function. All of the code in the package is run when a *binary* package is built, and the resulting objects are saved in the package. (Generally, if the object can be accessed with `pkgB:::objB`, this means it was created at build time.)
When `objB` is created at package build time, pieces from the superclass, `pkgA::ClassA`, are saved inside of it. This is fine in and of itself. But imagine that pkgB was built and installed against pkgA 1.0, and then you upgrade to pkgA 2.0 without subsequently building and installing pkgB. Then `pkgB::objB` will contain some code from `pkgA::ClassA` 1.0, but the version of `pkgA::ClassA` that's installed will be 2.0. This can cause problems if `objB` inherited code which uses parts of `pkgA` that have changed -- but the problems may not be entirely obvious.
This scenario is entirely possible when installing packages from CRAN. It is very common for a package to be upgraded without upgrading all of its downstream dependencies. As far as I know, R does not have any mechanism to force downstream dependencies to be rebuilt when a package is upgraded on a user's computer.
If this problem happens, the remedy is to rebuild pkgB against pkgA 2.0. I don't know if CRAN rebuilds all downstream dependencies when a package is updated. If it doesn't, then it's possible for CRAN to have incompatible binary builds of pkgA and pkgB, and users would then have to install pkgB from source, with `install.packages("pkgB", type = "source")`.
To avoid this problem entirely, objects of `ClassB` must not be instantiated at build time. You can either instantiate them only in functions, or at package load time, by adding an `.onLoad` function to your package. For example:
```{r eval=FALSE}
ClassB <- R6Class("ClassB",
inherit = pkgA::ClassA,
public = list(x = 1)
)
# We'll fill this at load time
objB <- NULL
.onLoad <- function(libname, pkgname) {
# The namespace is locked after loading; we can still modify objB at this time.
objB <<- ClassB$new()
}
```
You might be wondering why `ClassB` (the class, not the instance of the class `objB`) doesn't save a copy of `pkgA::ClassA` inside of it when the package is built. This is because, for the `inherit` argument, `R6Class` saves the unevaluated expression, (`pkgA::ClassA`), and evaluates it when `$new()` is called.
## Wrap-up
In summary:
* Portable classes allow inheritance across different packages.
* Portable classes always require the use of `self` or `private` to access members. This can incur a small performance penalty, since using `self$x` is slower than just `x`.