# Object-Oriented Programming with S3 and R6 in R

Object-oriented programming (OOP) lets you specify relationships between functions and the objects that they can act on, helping you manage complexity in your code. This is an intermediate level course, providing an introduction to OOP, using the S3 and R6 systems. S3 is a great day-to-day R programming tool that simplifies some of the functions that you write. R6 is especially useful for industry-specific analyses, working with web APIs, and building GUIs. The course concludes with an interview with Winston Chang, creator of the R6 package.


## Introduction to Object-Oriented Programming

Learn what object-oriented programming (OOP) consists of, when to use it, and what OOP systems are available in R. You'll also learn how R identifies different types of variable, using classes, types, and modes.



### Should I OOP?
Object-oriented programming (OOP) is very powerful, but not appropriate for every data analysis workflow. Which of the following scenarios are a good fit for using object-oriented programming?

1. Cleaning up a dirty dataset.
2. Writing an interface to the Internet Movie Database API.
3. Creating objects to work with cartographic data for spatial analysis.
4. Using ggplot2 to visualize your dataset.

Answer: 2 and 3

### You've Already Been Working With Objects
In the Introduction to R course you already met several common R objects such as numeric, logical and character vectors, as well as data.frames. One of the principles of OOP is that functions can behave differently for different kinds of object.

The summary() function is a good example of this. Since different types of variable need to be summarized in different ways, the output that is displayed to you varies depending upon what you pass into it.

In [3]:
LETTERS = c("A", "B", "C", "D" ,"E", "F", "G", "H", "I", "J" ,"K" ,"L" ,"M", "N" ,"O", "P" ,"Q",
            "R", "S" ,"T", "U", "V" ,"W" ,"X" ,"Y" ,"Z")

a_numeric_vector <- rlnorm(50)
a_factor <- factor(
  sample(c(LETTERS[1:5], NA), 50, replace = TRUE)
)
a_data_frame <- data.frame(
  n = a_numeric_vector,
  f = a_factor
)
a_linear_model <- lm(dist ~ speed, cars)

# Call summary() on the numeric vector
summary(a_numeric_vector)

# Do the same for the other three objects

summary(a_factor)
summary(a_data_frame)
summary(a_linear_model)

# Functions can behave differently depending upon the type of input.

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1134  0.6788  0.8490  1.5057  2.0764  8.1382 

       n             f     
 Min.   :0.1134   A   : 4  
 1st Qu.:0.6788   B   : 6  
 Median :0.8490   C   :13  
 Mean   :1.5057   D   :11  
 3rd Qu.:2.0764   E   : 4  
 Max.   :8.1382   NA's:12  


Call:
lm(formula = dist ~ speed, data = cars)

Residuals:
    Min      1Q  Median      3Q     Max 
-29.069  -9.525  -2.272   9.215  43.201 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -17.5791     6.7584  -2.601   0.0123 *  
speed         3.9324     0.4155   9.464 1.49e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
Multiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 
F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12


### Which Systems Should I Use?
R has many OOP frameworks, some of which are better than others. Test your ability to choose an appropriate framework by deciding which of the following statements is true.

1. Knowing how to use S3 is a fundamental R skill.
2. R6 and ReferenceClasses are powerful OOP frameworks.
3. If it’s good enough for ggplot2, it’s good enough for me. I should make regular use of proto.
4. I should use any framework with a number in its name.
5. S4 is useful for working with Bioconductor.

Answer: 1, 2 and 5

### What's my type?
You've just seen four functions that help you determine what type of variable you're working with. class() and typeof() are important and will come in handy often. mode() and storage.mode() mostly exist for compatibility with the S programming language.

In this exercise, you will look at what these functions return for different variable types. There are some rarer types that you may not have come across yet.

1. array: Generalization of a matrix with an arbitrary number of dimensions.
2. formula: Used by modeling and plotting functions to define relationships between variables.

Also note that there are three kinds of functions in R.

1. Most of the functions that you come across are called closures.
2. A few important functions, like length() are known as builtin functions, which use a special evaluation mechanism to make them go faster.
3. Language constructs, like if and while are also functions! They are known as special functions.

In [4]:
type_info <- function(x) {
  c(
    class = class(x), 
    typeof = typeof(x), 
    mode = mode(x), 
    storage.mode = storage.mode(x)
  )
}

# Create list of example variables
some_vars <- list(
  an_integer_vector = rpois(24, lambda = 5),
  a_numeric_vector = rbeta(24, shape1 = 1, shape2 = 1),
  an_integer_array = array(rbinom(24, size = 8, prob = 0.5), dim = c(2, 3, 4)),
  a_numeric_array = array(rweibull(24, shape = 1, scale = 1), dim = c(2, 3, 4)),
  a_data_frame = data.frame(int = rgeom(24, prob = 0.5), num = runif(24)),
  a_factor = factor(month.abb),
  a_formula = y ~ x,
  a_closure_function = mean,
  a_builtin_function = length,
  a_special_function = `if`
)

# Loop over some_vars calling type_info() on each element to explore them
lapply(some_vars, type_info)

### Make it Classy (1)
As well as simply retrieving the class, the class() function can be used to override it. The syntax is

class(x) <- "some_class"

This is particularly useful for lists, since lists can be used to combine other variables into more complex variables. (Remember the Lego analogy: individual variables are like Lego pieces, and you can use lists to build whatever you like.)

In this exercise, you'll look at an object to store the state of a chess game, and override its class.

To make sense of the exercise, you need to know a little bit about chess.

1. There are two players in a chess game, named "white" and "black".
2. Each player has six types of piece: a king, a queen, bishops, knights, rooks, and pawns.
3. The position of each piece can be recorded using the row ("a" to "h") and the column (1 to 8).

In [5]:
chess <- list(white_king = "g1", white_queen = "h4", white_rooks = c("f1" ,"f6"), 
              black_king = "g8", black_queen = "d7", black_rooks = c("a6", "f8"))

# Explore the structure of chess
str(chess)

# Override the class of chess
class(chess) <- "chess_game"

# Is chess still a list?
is.list(chess)

# How many pieces are left on the board?
length(unlist(chess))



List of 6
 $ white_king : chr "g1"
 $ white_queen: chr "h4"
 $ white_rooks: chr [1:2] "f1" "f6"
 $ black_king : chr "g8"
 $ black_queen: chr "d7"
 $ black_rooks: chr [1:2] "a6" "f8"


### Make it Classy (2)
In the last exercise, you overrode the class of an object representing a chess game.

To test your understanding, see if you can tell how this affects the four functions for interrogating variable type: class(), typeof(), mode(), and storage.mode().

Which of the following statements is true?


1. typeof() is a fundamental property that is overridden to match the class() of chess.

2. mode() is the S-language equivalent of class(), and is overridden to match the class() of chess.

3. storage.mode() is the S-language equivalent of class(), and is overridden to match the class() of chess.

4. The typeof(), mode(), and storage.mode() of chess are all still "list".

Answer 4

In [6]:
typeof(chess)
mode(chess)
storage.mode(chess)