Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
# S3 {#s3}
```{r, include = FALSE}
source("common.R")
```
## Introduction
\index{S3}
S3 is R's first and simplest OO system. S3 is informal and ad hoc, but there is a certain elegance in its minimalism: you can't take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise. S3 is the only OO system used in the base and stats packages, and it's the most commonly used system in CRAN packages.
S3 is very flexible, which means it allows you to do things that are quite ill-advised. If you're coming from a strict environment like Java this will seem pretty frightening, but it gives R programmers a tremendous amount of freedom. It may be very difficult to prevent people from doing something you don't want them to do, but your users will never be held back because there is something you haven't implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will therefore teach you the conventions you should (almost) always follow.
The goal of this chapter is to show you how the S3 system works, not how to use it effectively to create new classes and generics. I'd recommend coupling the theoretical knowledge from this chapter with the practical knowledge encoded in the [vctrs package](https://vctrs.r-lib.org).
### Outline {-}
* Section \@ref(s3-basics) gives a rapid overview of all the main components
of S3: classes, generics, and methods. You'll also learn about
`sloop::s3_dispatch()`, which we'll use throughout the chapter to explore
how S3 works.
* Section \@ref(s3-classes) goes into the details of creating a new S3 class,
including the three functions that should accompany most classes:
a constructor, a helper, and a validator.
* Section \@ref(s3-methods) describes how S3 generics and methods work,
including the basics of method dispatch.
* Section \@ref(object-styles) discusses the four main styles of S3 objects:
vector, record, data frame, and scalar.
* Section \@ref(s3-inheritance) demonstrates how inheritance works in S3,
and shows you what you need to make a class "subclassable".
* Section \@ref(s3-dispatch) concludes the chapter with a discussion of the
finer details of method dispatch including base types, internal generics,
group generics, and double dispatch.
### Prerequisites {-}
S3 classes are implemented using attributes, so make sure you're familiar with the details described in Section \@ref(attributes). We'll use existing base S3 vectors for examples and exploration, so make sure that you're familiar with the factor, Date, difftime, POSIXct, and POSIXlt classes described in Section \@ref(s3-atomic-vectors).
We'll use the [sloop](https://sloop.r-lib.org) package for its interactive helpers.
```{r setup, messages = FALSE}
library(sloop)
```
## Basics {#s3-basics}
\index{attributes!class}
\index{classes!S3}
\indexc{class()}
An S3 object is a base type with at least a `class` attribute (other attributes may be used to store other data). For example, take the factor. Its base type is the integer vector, it has a `class` attribute of "factor", and a `levels` attribute that stores the possible levels:
```{r}
f <- factor(c("a", "b", "c"))
typeof(f)
attributes(f)
```
You can get the underlying base type by `unclass()`ing it, which strips the class attribute, causing it to lose its special behaviour:
```{r}
unclass(f)
```
\index{generics}
\index{functions!generic}
An S3 object behaves differently from its underlying base type whenever it's passed to a __generic__ (short for generic function). The easiest way to tell if a function is a generic is to use `sloop::ftype()` and look for "generic" in the output:
```{r}
ftype(print)
ftype(str)
ftype(unclass)
```
A generic function defines an interface, which uses a different implementation depending on the class of an argument (almost always the first argument). Many base R functions are generic, including the important `print()`:
```{r}
print(f)
# stripping class reverts to integer behaviour
print(unclass(f))
```
Beware that `str()` is generic, and some S3 classes use that generic to hide the internal details. For example, the `POSIXlt` class used to represent date-time data is actually built on top of a list, a fact which is hidden by its `str()` method:
```{r}
time <- strptime(c("2017-01-01", "2020-05-04 03:21"), "%Y-%m-%d")
str(time)
str(unclass(time))
```
The generic is a middleman: its job is to define the interface (i.e. the arguments) then find the right implementation for the job. The implementation for a specific class is called a __method__, and the generic finds that method by performing __method dispatch__.
You can use `sloop::s3_dispatch()` to see the process of method dispatch:
```{r}
s3_dispatch(print(f))
```
\index{S3!methods}
We'll come back to the details of dispatch in Section \@ref(method-dispatch), for now note that S3 methods are functions with a special naming scheme, `generic.class()`. For example, the `factor` method for the `print()` generic is called `print.factor()`. You should never call the method directly, but instead rely on the generic to find it for you.
Generally, you can identify a method by the presence of `.` in the function name, but there are a number of important functions in base R that were written before S3, and hence use `.` to join words. If you're unsure, check with `sloop::ftype()`:
```{r}
ftype(t.test)
ftype(t.data.frame)
```
\index{S3!finding source}
Unlike most functions, you can't see the source code for most S3 methods[^base-s3] just by typing their names. That's because S3 methods are not usually exported: they live only inside the package, and are not available from the global environment. Instead, you can use `sloop::s3_get_method()`, which will work regardless of where the method lives:
```{r, error = TRUE}
weighted.mean.Date
s3_get_method(weighted.mean.Date)
```
[^base-s3]: The exceptions are methods found in the base package, like `t.data.frame`, and methods that you've created.
### Exercises
1. Describe the difference between `t.test()` and `t.data.frame()`.
When is each function called?
1. Make a list of commonly used base R functions that contain `.` in their
name but are not S3 methods.
1. What does the `as.data.frame.data.frame()` method do? Why is
it confusing? How could you avoid this confusion in your own
code?
1. Describe the difference in behaviour in these two calls.
```{r}
set.seed(1014)
some_days <- as.Date("2017-01-31") + sample(10, 5)
mean(some_days)
mean(unclass(some_days))
```
1. What class of object does the following code return? What base type is it
built on? What attributes does it use?
```{r}
x <- ecdf(rpois(100, 10))
x
```
1. What class of object does the following code return? What base type is it
built on? What attributes does it use?
```{r}
x <- table(rpois(100, 5))
x
```
## Classes {#s3-classes}
\index{S3!classes}
\index{attributes!class}
\indexc{class()}
If you have done object-oriented programming in other languages, you may be surprised to learn that S3 has no formal definition of a class: to make an object an instance of a class, you simply set the __class attribute__. You can do that during creation with `structure()`, or after the fact with `class<-()`:
```{r}
# Create and assign class in one step
x <- structure(list(), class = "my_class")
# Create, then set class
x <- list()
class(x) <- "my_class"
```
You can determine the class of an S3 object with `class(x)`, and see if an object is an instance of a class using `inherits(x, "classname")`.
```{r}
class(x)
inherits(x, "my_class")
inherits(x, "your_class")
```
The class name can be any string, but I recommend using only letters and `_`. Avoid `.` because (as mentioned earlier) it can be confused with the `.` separator between a generic name and a class name. When using a class in a package, I recommend including the package name in the class name. That ensures you won't accidentally clash with a class defined by another package.
S3 has no checks for correctness which means you can change the class of existing objects:
```{r, error = TRUE}
# Create a linear model
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
print(mod)
# Turn it into a date (?!)
class(mod) <- "Date"
# Unsurprisingly this doesn't work very well
print(mod)
```
If you've used other OO languages, this might make you feel queasy, but in practice this flexibility causes few problems. R doesn't stop you from shooting yourself in the foot, but as long as you don't aim the gun at your toes and pull the trigger, you won't have a problem.
To avoid foot-bullet intersections when creating your own class, I recommend that you usually provide three functions:
* A low-level __constructor__, `new_myclass()`, that efficiently creates new
objects with the correct structure.
* A __validator__, `validate_myclass()`, that performs more computationally
expensive checks to ensure that the object has correct values.
* A user-friendly __helper__, `myclass()`, that provides a convenient way for
others to create objects of your class.
You don't need a validator for very simple classes, and you can skip the helper if the class is for internal use only, but you should always provide a constructor.
### Constructors {#s3-constructor}
\index{S3!constructors}
\index{constructors!S3}
S3 doesn't provide a formal definition of a class, so it has no built-in way to ensure that all objects of a given class have the same structure (i.e. the same base type and the same attributes with the same types). Instead, you must enforce a consistent structure by using a __constructor__.
The constructor should follow three principles:
* Be called `new_myclass()`.
* Have one argument for the base object, and one for each attribute.
* Check the type of the base object and the types of each attribute.
I'll illustrate these ideas by creating constructors for base classes[^base-constructors] that you're already familiar with. To start, lets make a constructor for the simplest S3 class: `Date`. A `Date` is just a double with a single attribute: its `class` is "Date". This makes for a very simple constructor:
[^base-constructors]: Recent versions of R have `.Date()`, `.difftime()`, `.POSIXct()`, and `.POSIXlt()` constructors but they are internal, not well documented, and do not follow the principles that I recommend.
\indexc{Date}
```{r}
new_Date <- function(x = double()) {
stopifnot(is.double(x))
structure(x, class = "Date")
}
new_Date(c(-1, 0, 1))
```
The purpose of constructors is to help you, the developer. That means you can keep them simple, and you don't need to optimise error messages for public consumption. If you expect users to also create objects, you should create a friendly helper function, called `class_name()`, which I'll describe shortly.
A slightly more complicated constructor is that for `difftime`, which is used to represent time differences. It is again built on a double, but has a `units` attribute that must take one of a small set of values:
\indexc{difftime}
```{r}
new_difftime <- function(x = double(), units = "secs") {
stopifnot(is.double(x))
units <- match.arg(units, c("secs", "mins", "hours", "days", "weeks"))
structure(x,
class = "difftime",
units = units
)
}
new_difftime(c(1, 10, 3600), "secs")
new_difftime(52, "weeks")
```
The constructor is a developer function: it will be called in many places, by an experienced user. That means it's OK to trade a little safety in return for performance, and you should avoid potentially time-consuming checks in the constructor.
### Validators
\index{S3!validators}
\index{validators!S3}
More complicated classes require more complicated checks for validity. Take factors, for example. A constructor only checks that types are correct, making it possible to create malformed factors:
\indexc{factor}
```{r, error = TRUE}
new_factor <- function(x = integer(), levels = character()) {
stopifnot(is.integer(x))
stopifnot(is.character(levels))
structure(
x,
levels = levels,
class = "factor"
)
}
new_factor(1:5, "a")
new_factor(0:1, "a")
```
Rather than encumbering the constructor with complicated checks, it's better to put them in a separate function. Doing so allows you to cheaply create new objects when you know that the values are correct, and easily re-use the checks in other places.
```{r, error = TRUE}
validate_factor <- function(x) {
values <- unclass(x)
levels <- attr(x, "levels")
if (!all(!is.na(values) & values > 0)) {
stop(
"All `x` values must be non-missing and greater than zero",
call. = FALSE
)
}
if (length(levels) < max(values)) {
stop(
"There must be at least as many `levels` as possible values in `x`",
call. = FALSE
)
}
x
}
validate_factor(new_factor(1:5, "a"))
validate_factor(new_factor(0:1, "a"))
```
This validator function is called primarily for its side-effects (throwing an error if the object is invalid) so you'd expect it to invisibly return its primary input (as described in Section \@ref(invisible)). However, it's useful for validation methods to return visibly, as we'll see next.
### Helpers
\index{S3!helpers}
\index{helpers!S3}
If you want users to construct objects from your class, you should also provide a helper method that makes their life as easy as possible. A helper should always:
* Have the same name as the class, e.g. `myclass()`.
* Finish by calling the constructor, and the validator, if it exists.
* Create carefully crafted error messages tailored towards an end-user.
* Have a thoughtfully crafted user interface with carefully chosen default
values and useful conversions.
The last bullet is the trickiest, and it's hard to give general advice. However, there are three common patterns:
* Sometimes all the helper needs to do is coerce its inputs to the desired
type. For example, `new_difftime()` is very strict, and violates the usual
convention that you can use an integer vector wherever you can use a
double vector:
```{r, error = TRUE}
new_difftime(1:10)
```
It's not the job of the constructor to be flexible, so here we create
a helper that just coerces the input to a double.
```{r}
difftime <- function(x = double(), units = "secs") {
x <- as.double(x)
new_difftime(x, units = units)
}
difftime(1:10)
```
\indexc{difftime}
* Often, the most natural representation of a complex object is a string.
For example, it's very convenient to specify factors with a character
vector. The code below shows a simple version of `factor()`: it takes a
character vector, and guesses that the levels should be the unique values.
This is not always correct (since some levels might not be seen in the
data), but it's a useful default.
```{r, error = TRUE}
factor <- function(x = character(), levels = unique(x)) {
ind <- match(x, levels)
validate_factor(new_factor(ind, levels))
}
factor(c("a", "a", "b"))
```
\indexc{factor}
* Some complex objects are most naturally specified by multiple simple
components. For example, I think it's natural to construct a date-time
by supplying the individual components (year, month, day etc). That leads
me to this `POSIXct()` helper that resembles the existing `ISODatetime()`
function[^efficient]:
```{r}
POSIXct <- function(year = integer(),
month = integer(),
day = integer(),
hour = 0L,
minute = 0L,
sec = 0,
tzone = "") {
ISOdatetime(year, month, day, hour, minute, sec, tz = tzone)
}
POSIXct(2020, 1, 1, tzone = "America/New_York")
```
\indexc{POSIXct}
[^efficient]: This helper is not efficient: behind the scenes `ISODatetime()` works by pasting the components into a string and then using `strptime()`. A more efficient equivalent is available in `lubridate::make_datetime()`.
For more complicated classes, you should feel free to go beyond these patterns to make life as easy as possible for your users.
### Exercises
1. Write a constructor for `data.frame` objects. What base type is a data
frame built on? What attributes does it use? What are the restrictions
placed on the individual elements? What about the names?
1. Enhance my `factor()` helper to have better behaviour when one or
more `values` is not found in `levels`. What does `base::factor()` do
in this situation?
1. Carefully read the source code of `factor()`. What does it do that
my constructor does not?
1. Factors have an optional "contrasts" attribute. Read the help for `C()`,
and briefly describe the purpose of the attribute. What type should it
have? Rewrite the `new_factor()` constructor to include this attribute.
1. Read the documentation for `utils::as.roman()`. How would you write a
constructor for this class? Does it need a validator? What might a helper
do?
## Generics and methods {#s3-methods}
\indexc{UseMethod()}
\index{S3!generics}
\index{generics!S3}
The job of an S3 generic is to perform method dispatch, i.e. find the specific implementation for a class. Method dispatch is performed by `UseMethod()`, which every generic calls[^internal-generic]. `UseMethod()` takes two arguments: the name of the generic function (required), and the argument to use for method dispatch (optional). If you omit the second argument, it will dispatch based on the first argument, which is almost always what is desired.
[^internal-generic]: The exception is internal generics, which are implemented in C, and are the topic of Section \@ref(internal-generics).
Most generics are very simple, and consist of only a call to `UseMethod()`. Take `mean()` for example:
```{r}
mean
```
Creating your own generic is similarly simple:
```{r}
my_new_generic <- function(x) {
UseMethod("my_new_generic")
}
```
(If you wonder why we have to repeat `my_new_generic` twice, think back to Section \@ref(first-class-functions).)
You don't pass any of the arguments of the generic to `UseMethod()`; it uses deep magic to pass to the method automatically. The precise process is complicated and frequently surprising, so you should avoid doing any computation in a generic. To learn the full details, carefully read the Technical Details section in `?UseMethod`.
### Method dispatch
\index{S3!method dispatch}
\index{method dispatch!S3}
How does `UseMethod()` work? It basically creates a vector of method names, `paste0("generic", ".", c(class(x), "default"))`, and then looks for each potential method in turn. We can see this in action with `sloop::s3_dispatch()`. You give it a call to an S3 generic, and it lists all the possible methods. For example, what method is called when you print a `Date` object?
```{r}
x <- Sys.Date()
s3_dispatch(print(x))
```
The output here is simple:
* `=>` indicates the method that is called, here `print.Date()`
* `*` indicates a method that is defined, but not called, here `print.default()`.
The "default" class is a special __pseudo-class__. This is not a real class, but is included to make it possible to define a standard fallback that is found whenever a class-specific method is not available.
The essence of method dispatch is quite simple, but as the chapter proceeds you'll see it get progressively more complicated to encompass inheritance, base types, internal generics, and group generics. The code below shows a couple of more complicated cases which we'll come back to in Sections \@ref(inheritance) and \@ref(s3-dispatch).
```{r}
x <- matrix(1:10, nrow = 2)
s3_dispatch(mean(x))
s3_dispatch(sum(Sys.time()))
```
### Finding methods
\index{S3!methods!locating}
`sloop::s3_dispatch()` lets you find the specific method used for a single call. What if you want to find all methods defined for a generic or associated with a class? That's the job of `sloop::s3_methods_generic()` and `sloop::s3_methods_class()`:
```{r}
s3_methods_generic("mean")
s3_methods_class("ordered")
```
### Creating methods {#s3-arguments}
\index{S3!methods!creating}
\index{methods!S3}
There are two wrinkles to be aware of when you create a new method:
* First, you should only ever write a method if you own the generic or the
class. R will allow you to define a method even if you don't, but it is
exceedingly bad manners. Instead, work with the author of either the
generic or the class to add the method in their code.
* A method must have the same arguments as its generic. This is enforced in
packages by `R CMD check`, but it's good practice even if you're not
creating a package.
There is one exception to this rule: if the generic has `...`, the method
can contain a superset of the arguments. This allows methods to take
arbitrary additional arguments. The downside of using `...`, however, is
that any misspelled arguments will be silently swallowed[^ellipsis],
as mentioned in Section \@ref(fun-dot-dot-dot).
[^ellipsis]: See <https://github.com/hadley/ellipsis> for an experimental way of warning when methods fail to use all the arguments in `...`, providing a potential resolution of this issue.
### Exercises
1. Read the source code for `t()` and `t.test()` and confirm that
`t.test()` is an S3 generic and not an S3 method. What happens if
you create an object with class `test` and call `t()` with it? Why?
```{r, results = FALSE}
x <- structure(1:10, class = "test")
t(x)
```
1. What generics does the `table` class have methods for?
1. What generics does the `ecdf` class have methods for?
1. Which base generic has the greatest number of defined methods?
1. Carefully read the documentation for `UseMethod()` and explain why the
following code returns the results that it does. What two usual rules
of function evaluation does `UseMethod()` violate?
```{r}
g <- function(x) {
x <- 10
y <- 10
UseMethod("g")
}
g.default <- function(x) c(x = x, y = y)
x <- 1
y <- 1
g(x)
```
1. What are the arguments to `[`? Why is this a hard question to answer?
## Object styles
\index{S3!object styles}
So far I've focussed on vector style classes like `Date` and `factor`. These have the key property that `length(x)` represents the number of observations in the vector. There are three variants that do not have this property:
* Record style objects use a list of equal-length vectors to represent
individual components of the object. The best example of this is `POSIXlt`,
which underneath the hood is a list of 11 date-time components like year,
month, and day. Record style classes override `length()` and subsetting
methods to conceal this implementation detail.
```{r}
x <- as.POSIXlt(ISOdatetime(2020, 1, 1, 0, 0, 1:3))
x
length(x)
length(unclass(x))
x[[1]] # the first date time
unclass(x)[[1]] # the first component, the number of seconds
```
\indexc{POSIXlt}
* Data frames are similar to record style objects in that both use lists of
equal length vectors. However, data frames are conceptually two dimensional,
and the individual components are readily exposed to the user. The number of
observations is the number of rows, not the length:
```{r}
x <- data.frame(x = 1:100, y = 1:100)
length(x)
nrow(x)
```
\indexc{Date}
* Scalar objects typically use a list to represent a single thing.
For example, an `lm` object is a list of length 12 but it represents one
model.
```{r}
mod <- lm(mpg ~ wt, data = mtcars)
length(mod)
```
Scalar objects can also be built on top of functions, calls, and
environments[^s3-pairlist]. This is less generally useful, but you can see
applications in `stats::ecdf()`, R6 (Chapter \@ref(r6)), and
`rlang::quo()` (Chapter \@ref(quasiquotation)).
\indexc{lm()}
[^s3-pairlist]: You can also build an object on top of a pairlist, but I have yet to find a good reason to do so.
Unfortunately, describing the appropriate use of each of these object styles is beyond the scope of this book. However, you can learn more from the documentation of the vctrs package (<https://vctrs.r-lib.org>); the package also provides constructors and helpers that make implementation of the different styles easier.
### Exercises
1. Categorise the objects returned by `lm()`, `factor()`, `table()`,
`as.Date()`, `as.POSIXct()` `ecdf()`, `ordered()`, `I()` into the
styles described above.
1. What would a constructor function for `lm` objects, `new_lm()`, look like?
Use `?lm` and experimentation to figure out the required fields and their
types.
## Inheritance {#s3-inheritance}
\index{S3!inheritance}
\index{S3!methods!inheriting}
\index{inheritance!S3}
S3 classes can share behaviour through a mechanism called __inheritance__. Inheritance is powered by three ideas:
* The class can be a character _vector_. For example, the `ordered` and
`POSIXct` classes have two components in their class:
```{r}
class(ordered("x"))
class(Sys.time())
```
\indexc{POSIXct}
* If a method is not found for the class in the first element of the
vector, R looks for a method for the second class (and so on):
```{r}
s3_dispatch(print(ordered("x")))
s3_dispatch(print(Sys.time()))
```
* A method can delegate work by calling `NextMethod()`. We'll come back to
that very shortly; for now, note that `s3_dispatch()` reports delegation
with `->`.
```{r}
s3_dispatch(ordered("x")[1])
s3_dispatch(Sys.time()[1])
```
Before we continue we need a bit of vocabulary to describe the relationship between the classes that appear together in a class vector. We'll say that `ordered` is a __subclass__ of `factor` because it always appears before it in the class vector, and, conversely, we'll say `factor` is a __superclass__ of `ordered`.
S3 imposes no restrictions on the relationship between sub- and superclasses but your life will be easier if you impose some. I recommend that you adhere to two simple principles when creating a subclass:
* The base type of the subclass should be that same as the superclass.
* The attributes of the subclass should be a superset of the attributes
of the superclass.
`POSIXt` does not adhere to these principles because `POSIXct` has type double, and `POSIXlt` has type list. This means that `POSIXt` is not a superclass, and illustrates that it's quite possible to use the S3 inheritance system to implement other styles of code sharing (here `POSIXt` plays a role more like an interface), but you'll need to figure out safe conventions yourself.
\indexc{POSIXt}
### `NextMethod()`
\indexc{NextMethod()}
`NextMethod()` is the hardest part of inheritance to understand, so we'll start with a concrete example for the most common use case: `[`. We'll start by creating a simple toy class: a `secret` class that hides its output when printed:
```{r}
new_secret <- function(x = double()) {
stopifnot(is.double(x))
structure(x, class = "secret")
}
print.secret <- function(x, ...) {
print(strrep("x", nchar(x)))
invisible(x)
}
x <- new_secret(c(15, 1, 456))
x
```
This works, but the default `[` method doesn't preserve the class:
```{r}
s3_dispatch(x[1])
x[1]
```
To fix this, we need to provide a `[.secret` method. How could we implement this method? The naive approach won't work because we'll get stuck in an infinite loop:
```{r}
`[.secret` <- function(x, i) {
new_secret(x[i])
}
```
Instead, we need some way to call the underlying `[` code, i.e. the implementation that would get called if we didn't have a `[.secret` method. One approach would be to `unclass()` the object:
```{r}
`[.secret` <- function(x, i) {
x <- unclass(x)
new_secret(x[i])
}
x[1]
```
This works, but is inefficient because it creates a copy of `x`. A better approach is to use `NextMethod()`, which concisely solves the problem of delegating to the method that would have been called if `[.secret` didn't exist:
```{r}
`[.secret` <- function(x, i) {
new_secret(NextMethod())
}
x[1]
```
We can see what's going on with `sloop::s3_dispatch()`:
```{r}
s3_dispatch(x[1])
```
The `=>` indicates that `[.secret` is called, but that `NextMethod()` delegates work to the underlying internal `[` method, as shown by the `->`.
As with `UseMethod()`, the precise semantics of `NextMethod()` are complex. In particular, it tracks the list of potential next methods with a special variable, which means that modifying the object that's being dispatched upon will have no impact on which method gets called next.
### Allowing subclassing {#s3-subclassing}
\index{S3!subclassing}
When you create a class, you need to decide if you want to allow subclasses, because it requires some changes to the constructor and careful thought in your methods.
To allow subclasses, the parent constructor needs to have `...` and `class` arguments:
```{r}
new_secret <- function(x, ..., class = character()) {
stopifnot(is.double(x))
structure(
x,
...,
class = c(class, "secret")
)
}
```
Then the subclass constructor can just call to the parent class constructor with additional arguments as needed. For example, imagine we want to create a supersecret class which also hides the number of characters:
```{r}
new_supersecret <- function(x) {
new_secret(x, class = "supersecret")
}
print.supersecret <- function(x, ...) {
print(rep("xxxxx", length(x)))
invisible(x)
}
x2 <- new_supersecret(c(15, 1, 456))
x2
```
To allow inheritance, you also need to think carefully about your methods, as you can no longer use the constructor. If you do, the method will always return the same class, regardless of the input. This forces whoever makes a subclass to do a lot of extra work.
Concretely, this means we need to revise the `[.secret` method. Currently it always returns a `secret()`, even when given a supersecret:
```{r}
`[.secret` <- function(x, ...) {
new_secret(NextMethod())
}
x2[1:3]
```
\indexc{vec\_restore()}
We want to make sure that `[.secret` returns the same class as `x` even if it's a subclass. As far as I can tell, there is no way to solve this problem using base R alone. Instead, you'll need to use the vctrs package, which provides a solution in the form of the `vctrs::vec_restore()` generic. This generic takes two inputs: an object which has lost subclass information, and a template object to use for restoration.
Typically `vec_restore()` methods are quite simple: you just call the constructor with appropriate arguments:
```{r}
vec_restore.secret <- function(x, to, ...) new_secret(x)
vec_restore.supersecret <- function(x, to, ...) new_supersecret(x)
```
(If your class has attributes, you'll need to pass them from `to` into the constructor.)
Now we can use `vec_restore()` in the `[.secret` method:
```{r}
`[.secret` <- function(x, ...) {
vctrs::vec_restore(NextMethod(), x)
}
x2[1:3]
```
(I only fully understood this issue quite recently, so at time of writing it is not used in the tidyverse. Hopefully by the time you're reading this, it will have rolled out, making it much easier to (e.g.) subclass tibbles.)
If you build your class using the tools provided by the vctrs package, `[` will gain this behaviour automatically. You will only need to provide your own `[` method if you use attributes that depend on the data or want non-standard subsetting behaviour. See `?vctrs::new_vctr` for details.
### Exercises
1. How does `[.Date` support subclasses? How does it fail to support
subclasses?
1. R has two classes for representing date time data, `POSIXct` and
`POSIXlt`, which both inherit from `POSIXt`. Which generics have
different behaviours for the two classes? Which generics share the same
behaviour?
1. What do you expect this code to return? What does it actually return?
Why?
```{r, eval = FALSE}
generic2 <- function(x) UseMethod("generic2")
generic2.a1 <- function(x) "a1"
generic2.a2 <- function(x) "a2"
generic2.b <- function(x) {
class(x) <- "a1"
NextMethod()
}
generic2(structure(list(), class = c("b", "a2")))
```
## Dispatch details {#s3-dispatch}
\index{S3!method dispatch}
This chapter concludes with a few additional details about method dispatch. It is safe to skip these details if you're new to S3.
### S3 and base types {#implicit-class}
\index{implicit class}
\index{S3!implicit class}
What happens when you call an S3 generic with a base object, i.e. an object with no class? You might think it would dispatch on what `class()` returns:
```{r}
class(matrix(1:5))
```
But unfortunately dispatch actually occurs on the __implicit class__, which has three components:
* The string "array" or "matrix" if the object has dimensions
* The result of `typeof()` with a few minor tweaks
* The string "numeric" if object is "integer" or "double"
There is no base function that will compute the implicit class, but you can use `sloop::s3_class()`
```{r}
s3_class(matrix(1:5))
```
This is used by `s3_dispatch()`:
```{r}
s3_dispatch(print(matrix(1:5)))
```
This means that the `class()` of an object does not uniquely determine its dispatch:
```{r}
x1 <- 1:5
class(x1)
s3_dispatch(mean(x1))
x2 <- structure(x1, class = "integer")
class(x2)
s3_dispatch(mean(x2))
```
### Internal generics {#internal-generics}
\index{generics!internal}
Some base functions, like `[`, `sum()`, and `cbind()`, are called __internal generics__ because they don't call `UseMethod()` but instead call the C functions `DispatchGroup()` or `DispatchOrEval()`. `s3_dispatch()` shows internal generics by including the name of the generic followed by `(internal)`:
```{r}
s3_dispatch(Sys.time()[1])
```
For performance reasons, internal generics do not dispatch to methods unless the class attribute has been set, which means that internal generics do not use the implicit class. Again, if you're ever confused about method dispatch, you can rely on `s3_dispatch()`.
### Group generics
\index{S3!group generics}
\index{generics!group}
Group generics are the most complicated part of S3 method dispatch because they involve both `NextMethod()` and internal generics. Like internal generics, they only exist in base R, and you cannot define your own group generic.
There are four group generics:
* __Math__: `abs()`, `sign()`, `sqrt()`, `floor()`, `cos()`, `sin()`, `log()`,
and more (see `?Math` for the complete list).
* __Ops__: `+`, `-`, `*`, `/`, `^`, `%%`, `%/%`, `&`, `|`, `!`, `==`, `!=`, `<`,
`<=`, `>=`, and `>`.
* __Summary__: `all()`, `any()`, `sum()`, `prod()`, `min()`, `max()`, and
`range()`.
* __Complex__: `Arg()`, `Conj()`, `Im()`, `Mod()`, `Re()`.
Defining a single group generic for your class overrides the default behaviour for all of the members of the group. Methods for group generics are looked for only if the methods for the specific generic do not exist:
```{r}
s3_dispatch(sum(Sys.time()))
```
Most group generics involve a call to `NextMethod()`. For example, take `difftime()` objects. If you look at the method dispatch for `abs()`, you'll see there's a `Math` group generic defined.
```{r}
y <- as.difftime(10, units = "mins")
s3_dispatch(abs(y))
```
`Math.difftime` basically looks like this:
```{r}
Math.difftime <- function(x, ...) {
new_difftime(NextMethod(), units = attr(x, "units"))
}
```
It dispatches to the next method, here the internal default, to perform the actual computation, then restore the class and attributes. (To better support subclasses of `difftime` this would need to call `vec_restore()`, as described in Section \@ref(s3-subclassing).)
Inside a group generic function a special variable `.Generic` provides the actual generic function called. This can be useful when producing error messages, and can sometimes be useful if you need to manually re-call the generic with different arguments.
### Double dispatch
\index{double dispatch}
\index{method dispatch!S3!double dispatch}
Generics in the Ops group, which includes the two-argument arithmetic and Boolean operators like `-` and `&`, implement a special type of method dispatch. They dispatch on the type of _both_ of the arguments, which is called __double dispatch__. This is necessary to preserve the commutative property of many operators, i.e. `a + b` should equal `b + a`. Take the following simple example:
```{r}
date <- as.Date("2017-01-01")
integer <- 1L
date + integer
integer + date
```
If `+` dispatched only on the first argument, it would return different values for the two cases. To overcome this problem, generics in the Ops group use a slightly different strategy from usual. Rather than doing a single method dispatch, they do two, one for each input. There are three possible outcomes of this lookup:
* The methods are the same, so it doesn't matter which method is used.
* The methods are different, and R falls back to the internal method with
a warning.
* One method is internal, in which case R calls the other method.
This approach is error prone so if you want to implement robust double dispatch for algebraic operators, I recommend using the vctrs package. See `?vctrs::vec_arith` for details.
### Exercises
1. Explain the differences in dispatch below:
```{r}
length.integer <- function(x) 10
x1 <- 1:5
class(x1)
s3_dispatch(length(x1))
x2 <- structure(x1, class = "integer")
class(x2)
s3_dispatch(length(x2))
```
1. What classes have a method for the `Math` group generic in base R? Read
the source code. How do the methods work?
1. `Math.difftime()` is more complicated than I described. Why?