diff --git a/.Rbuildignore b/.Rbuildignore index 7fd38cc6..f9f3167d 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -30,3 +30,5 @@ script.R ^CRAN-SUBMISSION$ .vscode ^\.cache$ +^docs$ +^pkgdown$ diff --git a/.gitignore b/.gitignore index 6c04a916..85f03708 100644 --- a/.gitignore +++ b/.gitignore @@ -19,3 +19,4 @@ TAGS /Meta/ .vscode .cache +docs diff --git a/vignettes/FAQ.Rmd b/vignettes/FAQ.Rmd index 29d99340..2618f45b 100644 --- a/vignettes/FAQ.Rmd +++ b/vignettes/FAQ.Rmd @@ -5,6 +5,9 @@ vignette: > %\VignetteIndexEntry{FAQ} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} +editor: + markdown: + wrap: sentence --- ```{r, include = FALSE} @@ -20,15 +23,14 @@ If you have a question that you think would fit well here please [open an issue] #### 1. What are the underlying types of cpp11 objects? - | vector | element | - | --- | --- | - | cpp11::integers | int | - | cpp11::doubles | double | - | cpp11::logicals | cpp11::r_bool | - | cpp11::strings | cpp11::r_string | - | cpp11::raws | uint8_t | - | cpp11::list | SEXP | - +| vector | element | +|-----------------|-----------------| +| cpp11::integers | int | +| cpp11::doubles | double | +| cpp11::logicals | cpp11::r_bool | +| cpp11::strings | cpp11::r_string | +| cpp11::raws | uint8_t | +| cpp11::list | SEXP | #### 2. How do I add elements to a named list? @@ -133,7 +135,6 @@ my_true() my_both() ``` - #### 8. How do I create a new empty environment? To do this you need to call the `base::new.env()` function from C++. @@ -219,7 +220,6 @@ std::string my_string() { } ``` - #### 12. What are the types for C++ iterators? The iterators are `::iterator` classes contained inside the vector classes. @@ -228,9 +228,10 @@ For example the iterator for `cpp11::doubles` would be `cpp11::doubles::iterator #### 13. My code has `using namespace std`, why do I still have to include `std::` in the signatures of `[[cpp11::register]]` functions? The `using namespace std` directive will not be included in the generated code of the function signatures, so they still need to be fully qualified. -However you will _not_ need to qualify the type names within those functions. +However you will *not* need to qualify the type names within those functions. The following won't compile + ```{cpp11, eval = FALSE} #include #include @@ -243,8 +244,8 @@ string foobar() { } ``` - But this will compile and work as intended + ```{cpp11} #include #include @@ -262,7 +263,7 @@ std::string foobar() { In place modification breaks the normal semantics of R code. In general it should be avoided, which is why `cpp11::writable` classes always copy their data when constructed. -However if you are _positive_ in-place modification is necessary for your use case you can use the move constructor to do this. +However if you are *positive* in-place modification is necessary for your use case you can use the move constructor to do this. ```{cpp11} #include @@ -288,7 +289,8 @@ x `cpp11::unwind_protect()` is cpp11's way of safely calling R's C API. In short, it allows you to run a function that might throw an R error, catch the `longjmp()` of that error, promote it to an exception that is thrown and caught by a try/catch that cpp11 sets up for you at `.Call()` time (which allows destructors to run), and finally tells R to continue unwinding the stack now that the C++ objects have had a chance to destruct as needed. -Since `cpp11::unwind_protect()` takes an arbitrary function, you may be wondering if you should use it for your own custom needs. In general, we advise against this because this is an extremely advanced feature that is prone to subtle and hard to debug issues. +Since `cpp11::unwind_protect()` takes an arbitrary function, you may be wondering if you should use it for your own custom needs. +In general, we advise against this because this is an extremely advanced feature that is prone to subtle and hard to debug issues. ##### Destructors @@ -310,7 +312,7 @@ A::~A() { void test_destructor_ok() { A a{}; cpp11::unwind_protect([&] { - Rf_error("oh no!"); + Rf_error("oh no!"); }); } @@ -318,7 +320,7 @@ void test_destructor_ok() { void test_destructor_bad() { cpp11::unwind_protect([&] { A a{}; - Rf_error("oh no!"); + Rf_error("oh no!"); }); } ``` @@ -334,11 +336,13 @@ test_destructor_bad() #> Error: oh no! ``` -In general, the only code that can be called within `unwind_protect()` is "pure" C code or C++ code that only uses POD (plain-old-data) types and no exceptions. If you mix complex C++ objects with R's C API within `unwind_protect()`, then any R errors will result in a jump that prevents your destructors from running. +In general, the only code that can be called within `unwind_protect()` is "pure" C code or C++ code that only uses POD (plain-old-data) types and no exceptions. +If you mix complex C++ objects with R's C API within `unwind_protect()`, then any R errors will result in a jump that prevents your destructors from running. ##### Nested `unwind_protect()` -Another issue that can arise has to do with _nested_ calls to `unwind_protect()`. It is very hard (if not impossible) to end up with invalidly nested `unwind_protect()` calls when using the typical cpp11 API, but you can manually create a scenario like the following: +Another issue that can arise has to do with *nested* calls to `unwind_protect()`. +It is very hard (if not impossible) to end up with invalidly nested `unwind_protect()` calls when using the typical cpp11 API, but you can manually create a scenario like the following: ```{cpp11} #include @@ -347,7 +351,7 @@ Another issue that can arise has to do with _nested_ calls to `unwind_protect()` void test_nested() { cpp11::unwind_protect([&] { cpp11::unwind_protect([&] { - Rf_error("oh no!"); + Rf_error("oh no!"); }); }); } @@ -355,11 +359,11 @@ void test_nested() { If you were to run `test_nested()` from R, it would likely crash or hang your R session due to the following chain of events: -- `test_nested()` sets up a try/catch to catch unwind exceptions -- The outer `unwind_protect()` is called. It uses the C function `R_UnwindProtect()` to call its lambda function. -- The inner `unwind_protect()` is called. It again uses `R_UnwindProtect()`, this time to call `Rf_error()`. -- `Rf_error()` performs a `longjmp()` which is caught by the inner `unwind_protect()` and promoted to an exception. -- That exception is thrown, but because we are in the outer call to `R_UnwindProtect()` (a C function), we end up throwing that exception _across_ C stack frames. This is _undefined behavior_, which is known to have caused R to crash on certain platforms. +- `test_nested()` sets up a try/catch to catch unwind exceptions +- The outer `unwind_protect()` is called. It uses the C function `R_UnwindProtect()` to call its lambda function. +- The inner `unwind_protect()` is called. It again uses `R_UnwindProtect()`, this time to call `Rf_error()`. +- `Rf_error()` performs a `longjmp()` which is caught by the inner `unwind_protect()` and promoted to an exception. +- That exception is thrown, but because we are in the outer call to `R_UnwindProtect()` (a C function), we end up throwing that exception *across* C stack frames. This is *undefined behavior*, which is known to have caused R to crash on certain platforms. You might think that you'd never do this, but the same scenario can also occur with a combination of 1 call to `unwind_protect()` combined with usage of the cpp11 API: @@ -395,32 +399,35 @@ void test_outer() { } ``` -This might seem unsafe because `cpp11::package()` uses `unwind_protect()` to call the R function for `test_inner()`, which then goes back into C++ to call `cpp11::stop()`, which itself uses `unwind_protect()`, so it seems like we are in a nested scenario, but this scenario does actually work. It makes more sense if we analyze it one step at a time: - -- Call the R function for `test_outer()` -- A try/catch is set up to catch unwind exceptions -- The C++ function for `test_outer()` is called -- `cpp11::package()` uses `unwind_protect()` to call the R function for `test_inner()` -- Call the R function for `test_inner()` -- A try/catch is set up to catch unwind exceptions (_this is the key!_) -- The C++ function for `test_inner()` is called -- `cpp11::stop("oh no!")` is called, which uses `unwind_protect()` to call `Rf_error()`, causing a `longjmp()`, which is caught by that `unwind_protect()` and promoted to an exception. -- That exception is thrown, but this time it is caught by the try/catch set up by `test_inner()` as we entered it from the R side. This prevents that exception from crossing the C++ -> C boundary. -- The try/catch calls `R_ContinueUnwind()`, which `longjmp()`s again, and now the `unwind_protect()` set up by `cpp11::package()` catches that, and promotes it to an exception. -- That exception is thrown and caught by the try/catch set up by `test_outer()`. -- The try/catch calls `R_ContinueUnwind()`, which `longjmp()`s again, and at this point we can safely let the `longjmp()` proceed to force an R error. +This might seem unsafe because `cpp11::package()` uses `unwind_protect()` to call the R function for `test_inner()`, which then goes back into C++ to call `cpp11::stop()`, which itself uses `unwind_protect()`, so it seems like we are in a nested scenario, but this scenario does actually work. +It makes more sense if we analyze it one step at a time: + +- Call the R function for `test_outer()` +- A try/catch is set up to catch unwind exceptions +- The C++ function for `test_outer()` is called +- `cpp11::package()` uses `unwind_protect()` to call the R function for `test_inner()` +- Call the R function for `test_inner()` +- A try/catch is set up to catch unwind exceptions (*this is the key!*) +- The C++ function for `test_inner()` is called +- `cpp11::stop("oh no!")` is called, which uses `unwind_protect()` to call `Rf_error()`, causing a `longjmp()`, which is caught by that `unwind_protect()` and promoted to an exception. +- That exception is thrown, but this time it is caught by the try/catch set up by `test_inner()` as we entered it from the R side. This prevents that exception from crossing the C++ -\> C boundary. +- The try/catch calls `R_ContinueUnwind()`, which `longjmp()`s again, and now the `unwind_protect()` set up by `cpp11::package()` catches that, and promotes it to an exception. +- That exception is thrown and caught by the try/catch set up by `test_outer()`. +- The try/catch calls `R_ContinueUnwind()`, which `longjmp()`s again, and at this point we can safely let the `longjmp()` proceed to force an R error. #### 16. Ok but I really want to call `cpp11::unwind_protect()` manually If you've read the above bullet and still feel like you need to call `unwind_protect()`, then you should keep in mind the following when writing the function to unwind-protect: -- You shouldn't create any C++ objects that have destructors. -- You shouldn't use any parts of the cpp11 API that may call `unwind_protect()`. -- You must be very careful not to call `unwind_protect()` in a nested manner. +- You shouldn't create any C++ objects that have destructors. +- You shouldn't use any parts of the cpp11 API that may call `unwind_protect()`. +- You must be very careful not to call `unwind_protect()` in a nested manner. In other words, if you only use plain-old-data types, are careful to never throw exceptions, and only use R's C API, then you can use `unwind_protect()`. -One place you may want to do this is when working with long character vectors. Unfortunately, due to the way cpp11 must protect the individual CHARSXP objects that make up a character vector, it can currently be quite slow to use the cpp11 API for this. Consider this example of extracting out individual elements with `x[i]` vs using the native R API: +One place you may want to do this is when working with long character vectors. +Unfortunately, due to the way cpp11 must protect the individual CHARSXP objects that make up a character vector, it can currently be quite slow to use the cpp11 API for this. +Consider this example of extracting out individual elements with `x[i]` vs using the native R API: ```{cpp11} #include @@ -432,7 +439,7 @@ cpp11::sexp test_extract_cpp11(cpp11::strings x) { for (R_xlen_t i = 0; i < size; ++i) { (void) x[i]; } - + return R_NilValue; } @@ -446,16 +453,17 @@ cpp11::sexp test_extract_r_api(cpp11::strings x) { (void) STRING_ELT(data, i); } }); - + return R_NilValue; } ``` + ```{r} set.seed(123) x <- sample(letters, 1e6, replace = TRUE) bench::mark( - test_extract_cpp11(x), + test_extract_cpp11(x), test_extract_r_api(x) ) ``` diff --git a/vignettes/converting.Rmd b/vignettes/converting.Rmd index 034627f6..4a30dfd2 100644 --- a/vignettes/converting.Rmd +++ b/vignettes/converting.Rmd @@ -5,6 +5,9 @@ vignette: > %\VignetteIndexEntry{Converting from Rcpp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} +editor: + markdown: + wrap: sentence --- ```{r, include = FALSE} @@ -16,44 +19,46 @@ knitr::opts_chunk$set( In many cases there is no need to convert a package from Rcpp. If the code is already written and you don't have a very compelling need to use cpp11 I would recommend you continue to use Rcpp. -However if you _do_ feel like your project will benefit from using cpp11 this vignette will provide some guidance and doing the conversion. +However if you *do* feel like your project will benefit from using cpp11 this vignette will provide some guidance and doing the conversion. ## Getting started -1. Add cpp11 by calling `usethis::use_cpp11()`. +1. Add cpp11 by calling `usethis::use_cpp11()`. -1. Start converting function by function. +2. Start converting function by function. - Converting the code a bit at a time (and regularly running your tests) is the best way to do the conversion correctly and make progress. Doing a separate commit after converting each file (or possibly each function) can make finding any regressions with [git bisect](https://youtu.be/KKeucpfAuuA) much easier in the future. - - 1. Convert `#include ` to `#include `. - 1. Convert all instances of `// [[Rcpp::export]]` to `[[cpp11::register]]`. - 1. Grep for `Rcpp::` and replace with the equivalent cpp11 function using the cheatsheets below. + Converting the code a bit at a time (and regularly running your tests) is the best way to do the conversion correctly and make progress. + Doing a separate commit after converting each file (or possibly each function) can make finding any regressions with [git bisect](https://youtu.be/KKeucpfAuuA) much easier in the future. -1. Remove Rcpp - 1. Remove Rcpp from the `LinkingTo` and `Imports` fields. - 1. Remove `@importFrom Rcpp sourceCpp`. - 1. Delete `src/RccpExports.cpp` and `R/RcppExports.R`. - 1. Delete `src/Makevars` if it only contains `PKG_CPPFLAGS=-DSTRICT_R_HEADERS`. - 1. Clean out old compiled code with `pkgbuild::clean_dll()`. - 1. Re-document the package to update the `NAMESPACE`. + 1. Convert `#include ` to `#include `. + 2. Convert all instances of `// [[Rcpp::export]]` to `[[cpp11::register]]`. + 3. Grep for `Rcpp::` and replace with the equivalent cpp11 function using the cheatsheets below. + +3. Remove Rcpp + + 1. Remove Rcpp from the `LinkingTo` and `Imports` fields. + 2. Remove `@importFrom Rcpp sourceCpp`. + 3. Delete `src/RccpExports.cpp` and `R/RcppExports.R`. + 4. Delete `src/Makevars` if it only contains `PKG_CPPFLAGS=-DSTRICT_R_HEADERS`. + 5. Clean out old compiled code with `pkgbuild::clean_dll()`. + 6. Re-document the package to update the `NAMESPACE`. ## Cheatsheet ### Vectors -| Rcpp | cpp11 (read-only) | cpp11 (writable) | -| --- | --- | --- | -| NumericVector | doubles | writable::doubles | -| NumericMatrix | doubles_matrix<> | writable::doubles_matrix<> | -| IntegerVector | integers | writable::integers | -| IntegerMatrix | integers_matrix<> | writable::integers_matrix<> | -| CharacterVector | strings | writable::strings | -| RawVector | raws | writable::raws | -| List | list | writable::list | -| RObject | sexp | | - -Note that each cpp11 vector class has a read-only and writeable version. +| Rcpp | cpp11 (read-only) | cpp11 (writable) | +|-----------------|---------------------|-------------------------------| +| NumericVector | doubles | writable::doubles | +| NumericMatrix | doubles_matrix\<\> | writable::doubles_matrix\<\> | +| IntegerVector | integers | writable::integers | +| IntegerMatrix | integers_matrix\<\> | writable::integers_matrix\<\> | +| CharacterVector | strings | writable::strings | +| RawVector | raws | writable::raws | +| List | list | writable::list | +| RObject | sexp | | + +Note that each cpp11 vector class has a read-only and writeable version. The default classes, e.g. `cpp11::doubles` are *read-only* classes that do not permit modification. If you want to modify the data you or create a new vector, use the writeable variant. @@ -65,26 +70,27 @@ See for more Rcpp also allows very flexible implicit conversions, e.g. if you pass a `REALSXP` to a function that takes a `Rcpp::IntegerVector()` it is implicitly converted to a `INTSXP`. These conversions are nice for usability, but require (implicit) duplication of the data, with the associated runtime costs. -cpp11 throws an error in these cases. If you want the implicit coercions you can add a call to `as.integer()` or `as.double()` as appropriate from R when you call the function. +cpp11 throws an error in these cases. +If you want the implicit coercions you can add a call to `as.integer()` or `as.double()` as appropriate from R when you call the function. ### Other objects -| Rcpp | cpp11 | -| --- | --- | -| XPtr | external_pointer | -| Environment | environment | -| Function | function | -| Environment (namespace) | package | +| Rcpp | cpp11 | +|-------------------------|------------------| +| XPtr | external_pointer | +| Environment | environment | +| Function | function | +| Environment (namespace) | package | ### Functions -| Rcpp | cpp11 | -| --- | --- | -|`wrap()` | `as_sexp()` | -|`as()` | `as_cpp()` | -|`stop()` | `stop()` | -|`checkUserInterrupt()` | `check_user_interrupt()` | -|`CharacterVector::create("a", "b", "c")` | `{"a", "b", "c"}` | +| Rcpp | cpp11 | +|------------------------------------------|--------------------------| +| `wrap()` | `as_sexp()` | +| `as()` | `as_cpp()` | +| `stop()` | `stop()` | +| `checkUserInterrupt()` | `check_user_interrupt()` | +| `CharacterVector::create("a", "b", "c")` | `{"a", "b", "c"}` | Note that `cpp11::stop()` and `cpp11::warning()` are thin wrappers around `Rf_stop()` and `Rf_warning()`. These are simple C functions with a `printf()` API, so they do not understand C++ objects like `std::string`. @@ -94,7 +100,7 @@ Therefore you need to call `obj.c_str()` when passing string data to them. Calling R functions from C++ is similar to using Rcpp. -```c++ +``` cpp // Rcpp ----------------------------------------------- Rcpp::Function as_tibble("as_tibble", Rcpp::Environment::namespace_env("tibble")); as_tibble(x, Rcpp::Named(".rows", num_rows), Rcpp::Named(".name_repair", name_repair)); @@ -108,24 +114,24 @@ as_tibble(x, ".rows"_nm = num_rows, ".name_repair"_nm = name_repair); ### Unsupported Rcpp features -- None of [Modules](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-modules.pdf) -- None of [Sugar](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-sugar.pdf) -- Some parts of [Attributes](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-attributes.pdf) - - No dependencies - - No random number generator restoration - - No support for roxygen2 comments - - No interfaces +- None of [Modules](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-modules.pdf) +- None of [Sugar](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-sugar.pdf) +- Some parts of [Attributes](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-attributes.pdf) + - No dependencies + - No random number generator restoration + - No support for roxygen2 comments + - No interfaces ### RNGs -Rcpp includes calls to `GetRNGstate()` and `PutRNGstate()` around the wrapped function. +Rcpp includes calls to `GetRNGstate()` and `PutRNGstate()` around the wrapped function. This ensures that if any C++ code calls the R API functions `unif_rand()`, `norm_rand()`, `exp_rand()`, or `R_unif_index()` the random seed state is set accordingly. -cpp11 does _not_ do this, so you must include the calls to `GetRNGstate()` and `PutRNGstate()` _yourself_ if you use any of those functions in your C++ code. +cpp11 does *not* do this, so you must include the calls to `GetRNGstate()` and `PutRNGstate()` *yourself* if you use any of those functions in your C++ code. See [R-exts 6.3 - Random number generation](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Random-numbers) for details on these functions. One convenient way to do safely is to use a simple class: -```cpp +``` cpp class local_rng { public: local_rng() { @@ -143,12 +149,12 @@ void foo() { } ``` - ## Common issues when converting ### STL includes -Rcpp.h includes a number of STL headers automatically, notably `` and ``, however the cpp11 headers generally do not. If you have errors like +Rcpp.h includes a number of STL headers automatically, notably `` and ``, however the cpp11 headers generally do not. +If you have errors like ``` error: no type named 'string' in namespace 'std' @@ -189,7 +195,7 @@ If you are constructing a length 1 logical vector you may need to explicitly use This issue only occurs with the clang compiler, not gcc. When constructing vectors with more than one element this is not an issue -```cpp +``` cpp // bad cpp11::writable::logicals({FALSE}); diff --git a/vignettes/cpp11.Rmd b/vignettes/cpp11.Rmd index 878ff673..5f10fcc6 100644 --- a/vignettes/cpp11.Rmd +++ b/vignettes/cpp11.Rmd @@ -5,6 +5,9 @@ vignette: > %\VignetteIndexEntry{Get started with cpp11} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} +editor: + markdown: + wrap: sentence --- ```{r, include = FALSE} @@ -26,43 +29,43 @@ In this vignette you'll learn how to improve performance by rewriting key functi This magic comes by way of the [cpp11](https://github.com/r-lib/cpp11) package. cpp11 makes it very simple to connect C++ to R. -While it is _possible_ to write C or Fortran code for use in R, it will be painful by comparison. +While it is *possible* to write C or Fortran code for use in R, it will be painful by comparison. cpp11 provides a clean, approachable API that lets you write high-performance code, insulated from R's more complex C API. Typical bottlenecks that C++ can address include: -* Loops that can't be easily vectorised because subsequent iterations depend on previous ones. +- Loops that can't be easily vectorised because subsequent iterations depend on previous ones. -* Recursive functions, or problems which involve calling functions millions of times. - The overhead of calling a function in C++ is much lower than in R. +- Recursive functions, or problems which involve calling functions millions of times. + The overhead of calling a function in C++ is much lower than in R. -* Problems that require advanced data structures and algorithms that R doesn't provide. - Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. +- Problems that require advanced data structures and algorithms that R doesn't provide. + Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues. The aim of this vignette is to discuss only those aspects of C++ and cpp11 that are absolutely necessary to help you eliminate bottlenecks in your code. We won't spend much time on advanced features like object-oriented programming or templates because the focus is on writing small, self-contained functions, not big programs. A working knowledge of C++ is helpful, but not essential. Many good tutorials and references are freely available, including and . -For more advanced topics, the _Effective C++_ series by Scott Meyers is a popular choice. +For more advanced topics, the *Effective C++* series by Scott Meyers is a popular choice. ### Outline -* Section [intro](#intro) teaches you how to write C++ by converting simple R functions to their C++ equivalents. - You'll learn how C++ differs from R, and what the key scalar, vector, and matrix classes are called. +- Section [intro](#intro) teaches you how to write C++ by converting simple R functions to their C++ equivalents. + You'll learn how C++ differs from R, and what the key scalar, vector, and matrix classes are called. -* Section [cpp_source](#cpp-source) shows you how to use `cpp11::cpp_source()` to load a C++ file from disk in the same way you use `source()` to load a file of R code. +- Section [cpp_source](#cpp-source) shows you how to use `cpp11::cpp_source()` to load a C++ file from disk in the same way you use `source()` to load a file of R code. -* Section [classes](#classes) discusses how to modify attributes from cpp11, and mentions some of the other important classes. +- Section [classes](#classes) discusses how to modify attributes from cpp11, and mentions some of the other important classes. -* Section [na](#na) teaches you how to work with R's missing values in C++. +- Section [na](#na) teaches you how to work with R's missing values in C++. -* Section [stl](#stl) shows you how to use some of the most important data structures and algorithms from the standard template library, or STL, built-in to C++. +- Section [stl](#stl) shows you how to use some of the most important data structures and algorithms from the standard template library, or STL, built-in to C++. -* Section [case-studies](#case-studies) shows two real case studies where cpp11 was used to get considerable performance improvements. +- Section [case-studies](#case-studies) shows two real case studies where cpp11 was used to get considerable performance improvements. -* Section [package](#package) teaches you how to add C++ code to an R package. +- Section [package](#package) teaches you how to add C++ code to an R package. -* Section [more](#more) concludes the vignette with pointers to more resources to help you learn cpp11 and C++. +- Section [more](#more) concludes the vignette with pointers to more resources to help you learn cpp11 and C++. ### Prerequisites @@ -75,9 +78,9 @@ library(cpp11) You'll also need a working C++ compiler. To get it: -* On Windows, install [Rtools](https://cran.r-project.org/bin/windows/Rtools/). -* On Mac, install Xcode from the app store. -* On Linux, `sudo apt-get install r-base-dev` or similar. +- On Windows, install [Rtools](https://cran.r-project.org/bin/windows/Rtools/). +- On Mac, install Xcode from the app store. +- On Linux, `sudo apt-get install r-base-dev` or similar. ## Getting started with C++ {#intro} @@ -99,10 +102,10 @@ There's a lot going on underneath the hood but cpp11 takes care of all the detai The following sections will teach you the basics by translating simple R functions to their C++ equivalents. We'll start simple with a function that has no inputs and a scalar output, and then make it progressively more complicated: -* Scalar input and scalar output -* Vector input and scalar output -* Vector input and vector output -* Matrix input and vector output +- Scalar input and scalar output +- Vector input and scalar output +- Vector input and vector output +- Matrix input and vector output ### No inputs, scalar output @@ -115,7 +118,7 @@ one <- function() 1L The equivalent C++ function is: -```cpp +``` cpp int one() { return 1; } @@ -131,19 +134,18 @@ cpp_function('int one() { This small function illustrates a number of important differences between R and C++: -* The syntax to create a function looks like the syntax to call a function; - you don't use assignment to create functions as you do in R. +- The syntax to create a function looks like the syntax to call a function; you don't use assignment to create functions as you do in R. -* You must declare the type of output the function returns. - This function returns an `int` (a scalar integer). - The classes for the most common types of R vectors are: `doubles`, `integers`, `strings`, and `logicals`. +- You must declare the type of output the function returns. + This function returns an `int` (a scalar integer). + The classes for the most common types of R vectors are: `doubles`, `integers`, `strings`, and `logicals`. -* Scalars and vectors are different. - The scalar equivalents of numeric, integer, character, and logical vectors are: `double`, `int`, `String`, and `bool`. +- Scalars and vectors are different. + The scalar equivalents of numeric, integer, character, and logical vectors are: `double`, `int`, `String`, and `bool`. -* You must use an explicit `return` statement to return a value from a function. +- You must use an explicit `return` statement to return a value from a function. -* Every statement is terminated by a `;`. +- Every statement is terminated by a `;`. ### Scalar input, scalar output @@ -172,11 +174,12 @@ cpp_function('int sign_cpp(int x) { In the C++ version: -* We declare the type of each input in the same way we declare the type of the output. - While this makes the code a little more verbose, it also makes clear the type of input the function needs. +- We declare the type of each input in the same way we declare the type of the output. + While this makes the code a little more verbose, it also makes clear the type of input the function needs. -* The `if` syntax is identical --- while there are some big differences between R and C++, there are also lots of similarities! C++ also has a `while` statement that works the same way as R's. - As in R you can use `break` to exit the loop, but to skip one iteration you need to use `continue` instead of `next`. +- The `if` syntax is identical --- while there are some big differences between R and C++, there are also lots of similarities! + C++ also has a `while` statement that works the same way as R's. + As in R you can use `break` to exit the loop, but to skip one iteration you need to use `continue` instead of `next`. ### Vector input, scalar output @@ -210,21 +213,22 @@ cpp_function('double sum_cpp(doubles x) { The C++ version is similar, but: -* To find the length of the vector, we use the `.size()` method, which returns an integer. - C++ methods are called with `.` (i.e., a full stop). +- To find the length of the vector, we use the `.size()` method, which returns an integer. + C++ methods are called with `.` (i.e., a full stop). -* The `for` statement has a different syntax: `for(init; check; increment)`. - This loop is initialised by creating a new variable called `i` with value 0. - Before each iteration we check that `i < n`, and terminate the loop if it's not. - After each iteration, we increment the value of `i` by one, using the special prefix operator `++` which increases the value of `i` by 1. +- The `for` statement has a different syntax: `for(init; check; increment)`. + This loop is initialised by creating a new variable called `i` with value 0. + Before each iteration we check that `i < n`, and terminate the loop if it's not. + After each iteration, we increment the value of `i` by one, using the special prefix operator `++` which increases the value of `i` by 1. -* In C++, vector indices start at 0, which means that the last element is at position `n - 1`. - I'll say this again because it's so important: __IN C++, VECTOR INDICES START AT 0__! This is a very common source of bugs when converting R functions to C++. +- In C++, vector indices start at 0, which means that the last element is at position `n - 1`. + I'll say this again because it's so important: **IN C++, VECTOR INDICES START AT 0**! + This is a very common source of bugs when converting R functions to C++. -* Use `=` for assignment, not `<-`. +- Use `=` for assignment, not `<-`. -* C++ provides operators that modify in-place: `total += x[i]` is equivalent to `total = total + x[i]`. - Similar in-place operators are `-=`, `*=`, and `/=`. +- C++ provides operators that modify in-place: `total += x[i]` is equivalent to `total = total + x[i]`. + Similar in-place operators are `-=`, `*=`, and `/=`. This is a good example of where C++ is much more efficient than R. As shown by the following microbenchmark, `sum_cpp()` is competitive with the built-in (and highly optimised) `sum()`, while `sum_r()` is several orders of magnitude slower. @@ -266,12 +270,12 @@ cpp_function('doubles pdist_cpp(double x, doubles ys) { This function introduces a few new concepts: -* Because we are creating a new vector we need to use `writable::doubles` rather than the read-only `doubles`. +- Because we are creating a new vector we need to use `writable::doubles` rather than the read-only `doubles`. -* We create a new numeric vector of length `n` with a constructor: `cpp11::writable::doubles out(n)`. - Another useful way of making a vector is to copy an existing one: `cpp11::doubles zs(ys)`. +- We create a new numeric vector of length `n` with a constructor: `cpp11::writable::doubles out(n)`. + Another useful way of making a vector is to copy an existing one: `cpp11::doubles zs(ys)`. -* C++ uses `pow()`, not `^`, for exponentiation. +- C++ uses `pow()`, not `^`, for exponentiation. Note that because the R version is fully vectorised, it's already going to be fast. @@ -284,7 +288,7 @@ bench::mark( ``` On my computer, it takes around 5 ms with a 1 million element `y` vector. -The C++ function is about 2.5 times faster, ~2 ms, but assuming it took you 10 minutes to write the C++ function, you'd need to run it ~200,000 times to make rewriting worthwhile. +The C++ function is about 2.5 times faster, \~2 ms, but assuming it took you 10 minutes to write the C++ function, you'd need to run it \~200,000 times to make rewriting worthwhile. The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (`x - ys`), and allocating memory is an expensive operation. The C++ function avoids this overhead because it uses an intermediate scalar. @@ -302,19 +306,20 @@ This lets you take advantage of text editor support for C++ files (e.g., syntax Your stand-alone C++ file should have extension `.cpp`, and needs to start with: -```cpp +``` cpp #include "cpp11.hpp" using namespace cpp11; ``` And for each function that you want available within R, you need to prefix it with: -```cpp +``` cpp [[cpp11::register]] ``` If you're familiar with roxygen2, you might wonder how this relates to `@export`. -`cpp11::register` registers a C++ function to be called from R. `@export` controls whether a function is exported from a package and made available to the user. +`cpp11::register` registers a C++ function to be called from R. +`@export` controls whether a function is exported from a package and made available to the user. To compile the C++ code, use `cpp_source("path/to/file.cpp")`. This will create the matching R functions and add them to your current session. @@ -344,12 +349,9 @@ For the remainder of this vignette C++ code will be presented stand-alone rather If you want to try compiling and/or modifying the examples you should paste them into a C++ source file that includes the elements described above. This is easy to do in RMarkdown by using `{cpp11}` instead of `{r}` at the beginning of your code blocks. - ### Exercises -1. With the basics of C++ in hand, it's now a great time to practice by reading and writing some simple C++ functions. - For each of the following functions, read the code and figure out what the corresponding base R function is. - You might not understand every part of the code yet, but you should be able to figure out the basics of what the function does. +1. With the basics of C++ in hand, it's now a great time to practice by reading and writing some simple C++ functions. For each of the following functions, read the code and figure out what the corresponding base R function is. You might not understand every part of the code yet, but you should be able to figure out the basics of what the function does. ```{cpp11} #include "cpp11.hpp" @@ -406,21 +408,21 @@ int f4(cpp11::function pred, list x) { } ``` -1. To practice your function writing skills, convert the following functions - into C++. For now, assume the inputs have no missing values. +1. To practice your function writing skills, convert the following functions into C++. + For now, assume the inputs have no missing values. - 1. `all()`. + 1. `all()`. - 2. `cumprod()`, `cummin()`, `cummax()`. + 2. `cumprod()`, `cummin()`, `cummax()`. - 3. `diff()`. Start by assuming lag 1, and then generalise for lag `n`. + 3. `diff()`. + Start by assuming lag 1, and then generalise for lag `n`. - 4. `range()`. + 4. `range()`. - 5. `var()`. Read about the approaches you can take on - [Wikipedia](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance). - Whenever implementing a numerical algorithm, it's always good to check - what is already known about the problem. + 5. `var()`. + Read about the approaches you can take on [Wikipedia](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance). + Whenever implementing a numerical algorithm, it's always good to check what is already known about the problem. ## Other classes {#classes} @@ -491,13 +493,13 @@ call_with_one(paste) Calling R functions with positional arguments is obvious: -```cpp +``` cpp f("y", 1); ``` But you need a special syntax for named arguments: -```cpp +``` cpp using namespace cpp11::literals; f("x"_nm = "y", "value"_nm = 1); @@ -510,6 +512,7 @@ cpp11 also provides `.names()` as an alias for the `names` attribute. The following code snippet illustrates these methods. Note the use of `{}` [initializer list](https://en.cppreference.com/w/cpp/utility/initializer_list) syntax. This allows you to create an R vector from C++ scalar values: + ```{r attribs, engine = "cpp11"} #include "cpp11.hpp" using namespace cpp11; @@ -526,13 +529,15 @@ doubles attribs() { ``` ## Missing values {#na} + If you're working with missing values, you need to know two things: -* How R's missing values behave in C++'s scalars (e.g., `double`). +- How R's missing values behave in C++'s scalars (e.g., `double`). -* How to get and set missing values in vectors (e.g., `doubles`). +- How to get and set missing values in vectors (e.g., `doubles`). ### Scalars + The following code explores what happens when you take one of R's missing values, coerce it into a scalar, and then coerce back to an R vector. Note that this kind of experimentation is a useful way to figure out what any operation does. @@ -549,6 +554,7 @@ list scalar_missings() { return writable::list({as_sexp(int_s), as_sexp(chr_s), as_sexp(lgl_s), as_sexp(num_s)}); } ``` + ```{r} str(scalar_missings()) ``` @@ -557,6 +563,7 @@ With the exception of `bool`, things look pretty good here: all of the missing v However, as we'll see in the following sections, things are not quite as straightforward as they seem. #### Integers + With integers, missing values are stored as the smallest integer. If you don't do anything to them, they'll be preserved. But, since C++ doesn't know that the smallest integer has this special behaviour, if you do anything to it you're likely to get an incorrect value: for example, `cpp_eval('NA_INTEGER + 1')` gives -2147483647. @@ -564,6 +571,7 @@ But, since C++ doesn't know that the smallest integer has this special behaviour So if you want to work with missing values in integers, either use a length 1 `integers` or be very careful with your code. #### Doubles + With doubles, you may be able to get away with ignoring missing values and working with NaNs (not a number). This is because R's NA is a special type of IEEE 754 floating point number NaN. So any logical expression that involves a NaN (or in C++, NAN) always evaluates as FALSE: @@ -574,8 +582,8 @@ cpp_eval("NAN < 1") cpp_eval("NAN > 1") cpp_eval("NAN == NAN") ``` -(Here I'm using `cpp_eval()` which allows you to see the result of running a single C++ expression, making it excellent for this sort of interactive experimentation.) -But be careful when combining them with Boolean values: + +(Here I'm using `cpp_eval()` which allows you to see the result of running a single C++ expression, making it excellent for this sort of interactive experimentation.) But be careful when combining them with Boolean values: ```{r} cpp_eval("NAN && TRUE") @@ -583,6 +591,7 @@ cpp_eval("NAN || FALSE") ``` However, in numeric contexts NaNs will propagate NAs: + ```{r} cpp_eval("NAN + 1") cpp_eval("NAN - 1") @@ -591,14 +600,17 @@ cpp_eval("NAN * 1") ``` ### Strings + `String` is a scalar string class introduced by cpp11, so it knows how to deal with missing values. ### Boolean + C++'s `bool` has two possible values (`true` or `false`), a logical vector in R has three (`TRUE`, `FALSE`, and `NA`). If you coerce a length 1 logical vector, make sure it doesn't contain any missing values; otherwise they will be converted to TRUE. One way to fix this is to use `int` instead, as this can represent `TRUE`, `FALSE`, and `NA`. ### Vectors {#vectors-cpp11} + With vectors, you need to use a missing value specific to the type of vector, `NA_REAL`, `NA_INTEGER`, `NA_LOGICAL`, `NA_STRING`: ```{r, engine = "cpp11"} @@ -623,13 +635,13 @@ str(missing_sampler()) ### Exercises -1. Rewrite any of the functions from the first exercise to deal with missing values. - If `na_rm` is true, ignore the missing values. - If `na_rm` is false, return a missing value if the input contains any missing values. - Some good functions to practice with are `min()`, `max()`, `range()`, `mean()`, and `var()`. +1. Rewrite any of the functions from the first exercise to deal with missing values. + If `na_rm` is true, ignore the missing values. + If `na_rm` is false, return a missing value if the input contains any missing values. + Some good functions to practice with are `min()`, `max()`, `range()`, `mean()`, and `var()`. -1. Rewrite `cumsum()` and `diff()` so they can handle missing values. - Note that these functions have slightly more complicated behaviour. +2. Rewrite `cumsum()` and `diff()` so they can handle missing values. + Note that these functions have slightly more complicated behaviour. ## Standard Template Library {#stl} @@ -647,9 +659,9 @@ Iterators are used extensively in the STL: many functions either accept or retur They are the next step up from basic loops, abstracting away the details of the underlying data structure. Iterators have three main operators: -1. Advance with `++`. -1. Get the value they refer to, or __dereference__, with `*`. -1. Compare with `==`. +1. Advance with `++`. +2. Get the value they refer to, or **dereference**, with `*`. +3. Compare with `==`. For example we could re-write our sum function using iterators: @@ -670,15 +682,13 @@ double sum2(doubles x) { The main changes are in the for loop: -* We start at `x.begin()` and loop until we get to `x.end()`. A small - optimization is to store the value of the end iterator so we don't need to - look it up each time. This only saves about 2 ns per iteration, so it's only - important when the calculations in the loop are very simple. +- We start at `x.begin()` and loop until we get to `x.end()`. + A small optimization is to store the value of the end iterator so we don't need to look it up each time. + This only saves about 2 ns per iteration, so it's only important when the calculations in the loop are very simple. -* Instead of indexing into x, we use the dereference operator to get its - current value: `*it`. +- Instead of indexing into x, we use the dereference operator to get its current value: `*it`. -* Notice we use `auto` rather than giving the type of the iterator. +- Notice we use `auto` rather than giving the type of the iterator. This code can be simplified still further through the use of a C++11 feature: range-based for loops. @@ -761,15 +771,15 @@ local({ The key points are: -* We step through two iterators (input and output) simultaneously. +- We step through two iterators (input and output) simultaneously. -* We can assign into an dereferenced iterator (`out_it`) to change the values in `out`. +- We can assign into an dereferenced iterator (`out_it`) to change the values in `out`. -* `upper_bound()` returns an iterator. - If we wanted the value of the `upper_bound()` we could dereference it; to figure out its location, we use the `distance()` function. +- `upper_bound()` returns an iterator. + If we wanted the value of the `upper_bound()` we could dereference it; to figure out its location, we use the `distance()` function. When in doubt, it is generally better to use algorithms from the STL than hand rolled loops. -In _Effective STL_, Scott Meyers gives three reasons: efficiency, correctness, and maintainability. +In *Effective STL*, Scott Meyers gives three reasons: efficiency, correctness, and maintainability. Algorithms from the STL are written by C++ experts to be extremely efficient, and they have been around for a long time so they are well tested. Using standard algorithms also makes the intent of your code more clear, helping to make it more readable and more maintainable. @@ -830,7 +840,8 @@ list rle_cpp(doubles x) { } ``` -(An alternative implementation would be to replace `i` with the iterator `lengths.rbegin()` which always points to the last element of the vector. You might want to try implementing that.) +(An alternative implementation would be to replace `i` with the iterator `lengths.rbegin()` which always points to the last element of the vector. +You might want to try implementing that.) Other methods of a vector are described at . @@ -867,6 +878,7 @@ logicals duplicated_cpp(integers x) { } ``` +````{=html} +```` ### Exercises To practice using the STL algorithms and data structures, implement the following using R functions in C++, using the hints provided: -1. `median.default()` using `partial_sort`. +1. `median.default()` using `partial_sort`. -1. `%in%` using `unordered_set` and the `find()` or `count()` methods. +2. `%in%` using `unordered_set` and the `find()` or `count()` methods. -1. `unique()` using an `unordered_set` (challenge: do it in one line!). +3. `unique()` using an `unordered_set` (challenge: do it in one line!). -1. `min()` using `std::min()`, or `max()` using `std::max()`. +4. `min()` using `std::min()`, or `max()` using `std::max()`. -1. `which.min()` using `min_element`, or `which.max()` using `max_element`. +5. `which.min()` using `min_element`, or `which.max()` using `max_element`. -1. `setdiff()`, `union()`, and `intersect()` for integers using sorted ranges - and `set_union`, `set_intersection` and `set_difference`. +6. `setdiff()`, `union()`, and `intersect()` for integers using sorted ranges and `set_union`, `set_intersection` and `set_difference`. ## Case studies {#case-studies} @@ -939,13 +951,14 @@ gibbs_r <- function(N, thin) { } ``` -This is relatively straightforward to convert to C++. We: +This is relatively straightforward to convert to C++. +We: -* Add type declarations to all variables. +- Add type declarations to all variables. -* Use `(` instead of `[` to index into the matrix. +- Use `(` instead of `[` to index into the matrix. -* Include "Rmath.h" and call the functions with `Rf_`. +- Include "Rmath.h" and call the functions with `Rf_`. ```{r, engine = "cpp11"} #include "cpp11/matrix.hpp" @@ -1022,7 +1035,8 @@ There are two ways we could attack this problem. If you have a good R vocabulary, you might immediately see how to vectorise the function (using `ifelse()`, `pmin()`, and `pmax()`). Alternatively, we could rewrite `vacc1a()` and `vacc1()` in C++, using our knowledge that loops and function calls have much lower overhead in C++. -Either approach is fairly straightforward. In R: +Either approach is fairly straightforward. +In R: ```{r} vacc2 <- function(age, female, ily) { @@ -1034,7 +1048,7 @@ vacc2 <- function(age, female, ily) { } ``` -(If you've worked R a lot you might recognise some potential bottlenecks in this code: `ifelse`, `pmin`, and `pmax` are known to be slow, and could be replaced with `p * 0.75 + p * 0.5 * female`, `p[p < 0] <- 0`, `p[p > 1] <- 1`. You might want to try timing those variations.) +(If you've worked R a lot you might recognise some potential bottlenecks in this code: `ifelse`, `pmin`, and `pmax` are known to be slow, and could be replaced with `p * 0.75 + p * 0.5 * female`, `p[p < 0] <- 0`, `p[p > 1] <- 1`. You might want to try timing those variations.) Or in C++: @@ -1077,7 +1091,8 @@ stopifnot( ) ``` -The original blog post forgot to do this, and introduced a bug in the C++ version: it used `0.004` instead of `0.04`. Finally, we can benchmark our three approaches: +The original blog post forgot to do this, and introduced a bug in the C++ version: it used `0.004` instead of `0.04`. +Finally, we can benchmark our three approaches: ```{r} bench::mark( @@ -1096,29 +1111,31 @@ I was a little surprised that the C++ was so much faster, but it is because the The same C++ code that is used with `cpp_source()` can also be bundled into a package. There are several benefits of moving code from a stand-alone C++ source file to a package: -1. Your code can be made available to users without C++ development tools. +1. Your code can be made available to users without C++ development tools. + +2. Multiple source files and their dependencies are handled automatically by the R package build system. -1. Multiple source files and their dependencies are handled automatically by the R package build system. +3. Packages provide additional infrastructure for testing, documentation, and consistency. -1. Packages provide additional infrastructure for testing, documentation, and consistency. +To add `cpp11` to an existing package first put your C++ files in the `src/` directory of your package. -To add `cpp11` to an existing package first put your C++ files in the `src/` directory of your package. +Then the easiest way to configure everything is to call `usethis::use_cpp11()`. +Alternatively: -Then the easiest way to configure everything is to call `usethis::use_cpp11()`. Alternatively: +- Add this to your `DESCRIPTION` file: -* Add this to your `DESCRIPTION` file: - - ```yaml + ``` yaml LinkingTo: cpp11 ``` -* And add the following [roxygen](https://roxygen2.r-lib.org/) directive somewhere in your package's R files. (A common location is `R/pkgname-package.R`) +- And add the following [roxygen](https://roxygen2.r-lib.org/) directive somewhere in your package's R files. + (A common location is `R/pkgname-package.R`) - ```R + ``` r #' @useDynLib pkgname, .registration = TRUE ``` -* You'll then need to run [`devtools::document()`](https://devtools.r-lib.org/reference/document.html) to update your `NAMESPACE` file to include the `useDynLib` statement. +- You'll then need to run [`devtools::document()`](https://devtools.r-lib.org/reference/document.html) to update your `NAMESPACE` file to include the `useDynLib` statement. If you don't use `devtools::load_all()`, you'll also need to run `cpp11::cpp_register()` before building the package. This function scans the C++ files for `[[cpp11::register]]` attributes and generates the binding code required to make the functions available in R. @@ -1129,11 +1146,12 @@ Re-run `cpp11::cpp_register()` whenever functions are added, removed, or have th C++ is a large, complex language that takes years to master. If you would like to dive deeper or write more complex functions other resources I've found helpful in learning C++ are: -* [_Effective C++_](https://www.aristeia.com/books.html) and [_Effective STL_](https://www.aristeia.com/books.html) +- [*Effective C++*](https://www.aristeia.com/books.html) and [*Effective STL*](https://www.aristeia.com/books.html) -* [_C++ Annotations_](http://www.icce.rug.nl/documents/cplusplus/cplusplus.html), aimed at knowledgeable users of C (or any other language using a C-like grammar, like Perl or Java) who would like to know more about, or make the transition to, C++. +- [*C++ Annotations*](http://www.icce.rug.nl/documents/cplusplus/cplusplus.html), aimed at knowledgeable users of C (or any other language using a C-like grammar, like Perl or Java) who would like to know more about, or make the transition to, C++. -* [_Algorithm Libraries_](https://www.cs.helsinki.fi/u/tpkarkka/alglib/k06/), which provides a more technical, but still concise, description of important STL concepts. (Follow the links under notes.) +- [*Algorithm Libraries*](https://www.cs.helsinki.fi/u/tpkarkka/alglib/k06/), which provides a more technical, but still concise, description of important STL concepts. + (Follow the links under notes.) Writing performant code may also require you to rethink your basic approach: a solid understanding of basic data structures and algorithms is very helpful here. -That's beyond the scope of this vignette, but I'd suggest the [_Algorithm Design Manual_](https://www.algorist.com/) MIT's [_Introduction to Algorithms_](https://web.archive.org/web/20200604134756/https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/), _Algorithms_ by Robert Sedgewick and Kevin Wayne which has a free [online textbook](http://algs4.cs.princeton.edu/home/) and a matching [Coursera course](https://www.coursera.org/learn/algorithms-part1). +That's beyond the scope of this vignette, but I'd suggest the [*Algorithm Design Manual*](https://www.algorist.com/) MIT's [*Introduction to Algorithms*](https://web.archive.org/web/20200604134756/https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/), *Algorithms* by Robert Sedgewick and Kevin Wayne which has a free [online textbook](http://algs4.cs.princeton.edu/home/) and a matching [Coursera course](https://www.coursera.org/learn/algorithms-part1). diff --git a/vignettes/internals.Rmd b/vignettes/internals.Rmd index 2eca4989..66d86111 100644 --- a/vignettes/internals.Rmd +++ b/vignettes/internals.Rmd @@ -5,6 +5,9 @@ vignette: > %\VignetteIndexEntry{cpp11 internals} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} +editor: + markdown: + wrap: sentence --- ```{r, include = FALSE} @@ -20,28 +23,29 @@ The development repository for cpp11 is . First install any dependencies needed for development. -```r +``` r install.packages("remotes") remotes::install_deps(dependencies = TRUE) ``` You can load the package in an interactive R session -```r +``` r devtools::load_all() ``` Or run the cpp11 tests with -```r +``` r devtools::test() ``` -There are more extensive tests in the `cpp11test` directory. Generally when developing the C++ headers I run R with its working directory in the `cpp11test` directory and use `devtools::test()` to run the cpp11tests. +There are more extensive tests in the `cpp11test` directory. +Generally when developing the C++ headers I run R with its working directory in the `cpp11test` directory and use `devtools::test()` to run the cpp11tests. If you change the cpp11 headers you will need to install the new version of cpp11 and then clean and recompile the cpp11test package: -```r +``` r # Assuming your working directory is `cpp11test/` devtools::clean_dll() devtools::load_all() @@ -49,7 +53,7 @@ devtools::load_all() To calculate code coverage of the cpp11 package run the following from the `cpp11` root directory. -```r +``` r covr::report(cpp11_coverage()) ``` @@ -57,9 +61,11 @@ covr::report(cpp11_coverage()) This project uses [clang-format](https://clang.llvm.org/docs/ClangFormat.html) (version 10) to automatically format the c++ code. -You can run `make format` to re-format all code in the project. If your system does not have `clang-format` version 10, this can be installed using a [homebrew tap](https://github.com/r-lib/homebrew-taps) at the command line with `brew install r-lib/taps/clang-format@10`. +You can run `make format` to re-format all code in the project. +If your system does not have `clang-format` version 10, this can be installed using a [homebrew tap](https://github.com/r-lib/homebrew-taps) at the command line with `brew install r-lib/taps/clang-format@10`. -You may need to link the newly installed version 10. To do so, run `brew unlink clang-format` followed by `brew link clang-format@10`. +You may need to link the newly installed version 10. +To do so, run `brew unlink clang-format` followed by `brew link clang-format@10`. Alternatively many IDEs support automatically running `clang-format` every time files are written. @@ -67,25 +73,26 @@ Alternatively many IDEs support automatically running `clang-format` every time cpp11 is a header only library, so all source code exposed to users lives in [inst/include](https://github.com/r-lib/cpp11/tree/main/inst/include). R code used to register functions and for `cpp11::cpp_source()` is in [R/](https://github.com/r-lib/cpp11/tree/main/R). -Tests for _only_ the code in `R/` is in [tests/testthat/](https://github.com/r-lib/cpp11/tree/main/tests/testthat) +Tests for *only* the code in `R/` is in [tests/testthat/](https://github.com/r-lib/cpp11/tree/main/tests/testthat). The rest of the code is in a separate [cpp11test/](https://github.com/r-lib/cpp11/tree/main/cpp11test) package included in the source tree. Inside [cpp11test/src](https://github.com/r-lib/cpp11/tree/main/cpp11test/src) the files that start with `test-` are C++ tests using the [Catch](https://testthat.r-lib.org/reference/use_catch.html) support in testthat. In addition there are some regular R tests in [cpp11test/tests/testthat/](https://github.com/r-lib/cpp11/tree/main/cpp11test/tests/testthat). ## Naming conventions -- All header files are named with a `.hpp` extension. -- All source files are named with a `.cpp` extension. -- Public header files should be put in `inst/include/cpp11` -- Read only r_vector classes and free functions should be put in the `cpp11` namespace. -- Writable r_vector class should be put in the `cpp11::writable` namespace. -- Private classes and functions should be put in the `cpp11::internal` namespace. +- All header files are named with a `.hpp` extension. +- All source files are named with a `.cpp` extension. +- Public header files should be put in `inst/include/cpp11` +- Read only r_vector classes and free functions should be put in the `cpp11` namespace. +- Writable r_vector class should be put in the `cpp11::writable` namespace. +- Private classes and functions should be put in the `cpp11::internal` namespace. ## Vector classes -All of the basic r_vector classes are class templates, the base template is defined in [cpp11/r_vector.hpp](https://github.com/r-lib/cpp11/blob/main/inst/include/cpp11/r_vector.hpp) +All of the basic r_vector classes are class templates, the base template is defined in [cpp11/r_vector.hpp](https://github.com/r-lib/cpp11/blob/main/inst/include/cpp11/r_vector.hpp). The template parameter is the type of **value** the particular R vector stores, e.g. `double` for `cpp11::doubles`. -This differs from Rcpp, whose first template parameter is the R vector type, e.g. `REALSXP`. +This differs from Rcpp, whose first template parameter is the R vector type, e.g. +`REALSXP`. The file first has the class declarations, then function definitions further down in the file. Specializations for the various types are in separate files, e.g. [cpp11/doubles.hpp](https://github.com/r-lib/cpp11/blob/main/inst/include/cpp11/doubles.hpp), [cpp11/integers.hpp](https://github.com/r-lib/cpp11/blob/main/inst/include/cpp11/integers.hpp) @@ -107,7 +114,7 @@ The most common C++ types are included in the test suite and should work without Some useful links on SFINAE -- https://www.fluentcpp.com/2018/05/15/make-sfinae-pretty-1-what-value-sfinae-brings-to-code/, https://www.fluentcpp.com/2018/05/18/make-sfinae-pretty-2-hidden-beauty-sfinae/ +- https://www.fluentcpp.com/2018/05/15/make-sfinae-pretty-1-what-value-sfinae-brings-to-code/, https://www.fluentcpp.com/2018/05/18/make-sfinae-pretty-2-hidden-beauty-sfinae/ ## Protection @@ -115,7 +122,8 @@ Some useful links on SFINAE cpp11 uses an idea proposed by [Luke Tierney](https://github.com/RcppCore/Rcpp/issues/1081#issuecomment-630330838) to use a double linked list with the head preserved to protect objects cpp11 is protecting. -Each node in the list uses the head (`CAR`) part to point to the previous node, and the `CDR` part to point to the next node. The `TAG` is used to point to the object being protected. +Each node in the list uses the head (`CAR`) part to point to the previous node, and the `CDR` part to point to the next node. +The `TAG` is used to point to the object being protected. The head and tail of the list have `R_NilValue` as their `CAR` and `CDR` pointers respectively. Calling `preserved.insert()` with a regular R object will add a new node to the list and return a protect token corresponding to the node added. @@ -137,17 +145,17 @@ This exception is caught by the try/catch block defined in the `BEGIN_CPP11` mac The exception will cause any C++ destructors to run, freeing any resources held by C++ objects. After the try/catch block exits, the R error unwinding is then continued by `R_ContinueUnwind()` and a normal R error results. -We require R >=3.5 to use cpp11, but when it was created we wanted to support back to R 3.3, but `R_ContinueUnwind()` wasn't available until R 3.5. +We require R \>=3.5 to use cpp11, but when it was created we wanted to support back to R 3.3, but `R_ContinueUnwind()` wasn't available until R 3.5. Below are a few other options we considered to support older R versions: -1. Using `R_TopLevelExec()` works to avoid the C long jump, but because the code is always run in a top level context any errors or messages thrown cannot be caught by `tryCatch()` or similar techniques. -2. Using `R_TryCatch()` is not available prior to R 3.4, and also has a serious bug in R 3.4 (fixed in R 3.5). -3. Calling the R level `tryCatch()` function which contains an expression that runs a C function which then runs the C++ code would be an option, but implementing this is convoluted and it would impact performance, perhaps severely. -4. Have `cpp11::unwind_protect()` be a no-op for these versions. This means any resources held by C++ objects would leak, including `cpp11::r_vector` / `cpp11::sexp` objects. +1. Using `R_TopLevelExec()` works to avoid the C long jump, but because the code is always run in a top level context any errors or messages thrown cannot be caught by `tryCatch()` or similar techniques. +2. Using `R_TryCatch()` is not available prior to R 3.4, and also has a serious bug in R 3.4 (fixed in R 3.5). +3. Calling the R level `tryCatch()` function which contains an expression that runs a C function which then runs the C++ code would be an option, but implementing this is convoluted and it would impact performance, perhaps severely. +4. Have `cpp11::unwind_protect()` be a no-op for these versions. This means any resources held by C++ objects would leak, including `cpp11::r_vector` / `cpp11::sexp` objects. None of these options were perfect, here are some pros and cons for each. -1. Causes behavior changes and test failures, so it was ruled out. -2. Was also ruled out since we wanted to support back to R 3.3. -3. Was ruled out partially because the implementation would be somewhat tricky and more because performance would suffer greatly. -4. Is what we ended up doing before requiring R 3.5. It leaked protected objects when there were R API errors. +1. Causes behavior changes and test failures, so it was ruled out. +2. Was also ruled out since we wanted to support back to R 3.3. +3. Was ruled out partially because the implementation would be somewhat tricky and more because performance would suffer greatly. +4. Is what we ended up doing before requiring R 3.5. It leaked protected objects when there were R API errors. diff --git a/vignettes/motivations.Rmd b/vignettes/motivations.Rmd index a16297a8..67391d61 100644 --- a/vignettes/motivations.Rmd +++ b/vignettes/motivations.Rmd @@ -5,6 +5,9 @@ vignette: > %\VignetteIndexEntry{Motivations for cpp11} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} +editor: + markdown: + wrap: sentence --- ```{r, include = FALSE} @@ -28,7 +31,7 @@ should_run_benchmarks <- function(x) { # Motivations R and S have a long history of interacting with compiled languages. -In fact the original version of S written in the late 1970s was mainly a wrapper around FORTRAN routines. [(History-of-S)](https://www.r-project.org/conferences/useR-2006/Slides/Chambers.pdf) +In fact the original version of S written in the late 1970s was mainly a wrapper around FORTRAN routines [(History-of-S)](https://www.r-project.org/conferences/useR-2006/Slides/Chambers.pdf). Released in 2000, the [cxx](https://cran.r-project.org/package=cxx) package was an early prototype of C++ bindings to R. [Rcpp](https://cran.r-project.org/package=Rcpp) was first published to CRAN in 2008, and [Rcpp11](https://cran.r-project.org/package=Rcpp11) in 2014. Of these `Rcpp` has by far the widest adoption, with over 2000 reverse dependencies as of 2020. @@ -40,26 +43,28 @@ cpp11 is a ground up rewrite of C++ bindings to R with different design trade-of Changes that motivated cpp11 include: -- Enforcing [copy-on-write semantics](#copy-on-write-semantics). -- Improving the [safety](#improve-safety) of using the R API from C++ code. -- Supporting [ALTREP objects](#altrep-support). -- Using [UTF-8 strings](#utf-8-everywhere) everywhere. -- Applying newer [C++11 features](#c11-features). -- Having a more straightforward, [simpler implementation](#simpler-implementation). -- Faster [compilation time](#compilation-speed) with lower memory requirements. -- Being *completely* [header only](#header-only) to avoid ABI issues. -- Capable of [vendoring](#vendoring) if desired. -- More robust [protection](#protection) using a much more efficient linked list data structure. -- [Growing vectors](#growing-vectors) more efficiently. +- Enforcing [copy-on-write semantics](#copy-on-write-semantics). +- Improving the [safety](#improve-safety) of using the R API from C++ code. +- Supporting [ALTREP objects](#altrep-support). +- Using [UTF-8 strings](#utf-8-everywhere) everywhere. +- Applying newer [C++11 features](#c11-features). +- Having a more straightforward, [simpler implementation](#simpler-implementation). +- Faster [compilation time](#compilation-speed) with lower memory requirements. +- Being *completely* [header only](#header-only) to avoid ABI issues. +- Capable of [vendoring](#vendoring) if desired. +- More robust [protection](#protection) using a much more efficient linked list data structure. +- [Growing vectors](#growing-vectors) more efficiently. -## Copy-on-write semantics +## Copy-on-write semantics {#copy-on-write-semantics} R uses [copy-on-write](https://adv-r.hadley.nz/names-values.html#copy-on-modify) (also called copy-on-modify) semantics. Lets say you have two variables `x` and `y` that both point to the same underlying data. + ```{r} x <- c(1, 2, 3) y <- x ``` + If you modify `y`, R will first copy the values of `x` to a new position, then point `y` to the new location and only after the copy modify `y`. This allows `x` to retain the original values. @@ -71,18 +76,19 @@ x ``` C++ does not have copy-on-write built into the language, however it has related concepts, copy-by-value and copy-by-reference. -Copy-by-value works similarly to R, except that R only copies when something is changed, C++ _always_ copies. +Copy-by-value works similarly to R, except that R only copies when something is changed, C++ *always* copies. -```cpp +``` cpp int x = 42; int y = x; y = 0; // x is still == 42 ``` -Copy-by-reference does the opposite, both `x` and `y` always point to the *same* underlying value. In C++ you specify a reference with `&`. +Copy-by-reference does the opposite, both `x` and `y` always point to the *same* underlying value. +In C++ you specify a reference with `&`. -```cpp +``` cpp int x = 42; int &y = x; y = 0; @@ -109,7 +115,7 @@ NumericVector times_two_rcpp(NumericVector x) { } ``` -If you do this with regular R functions, you will see the value of `y` is `x` * 2, but the value of `x` is unchanged. +If you do this with regular R functions, you will see the value of `y` is `x` \* 2, but the value of `x` is unchanged. ```{r} x <- c(1, 2, 3) @@ -154,7 +160,7 @@ z x ``` -## Improve safety +## Improve safety {#improve-safety} Internally R is written in C, not C++. In general C and C++ work well together, a large part of C++'s success is due to its high interoperability with C code. @@ -176,7 +182,8 @@ Crucially long jumps are *incompatible* with C++ [destructors](https://isocpp.or If a long jump occurs the destructors of any active C++ objects are not run, and therefore any resources (such as memory, file handles, etc.) managed by those objects will cause a [resource leak](https://en.wikipedia.org/wiki/Resource_leak). For example, the following unsafe code would leak the memory allocated in the C++ `std::vector` `x` when the R API function `Rf_allocVector()` fails (since you can't create a vector of `-1` size). -```cpp + +``` cpp std::vector x({1., 2., 3.}); SEXP y = PROTECT(Rf_allocVector(REALSXP, -1)); @@ -186,8 +193,7 @@ cpp11 provides two mechanisms to make interfacing with Rs C API and C++ code saf `cpp11::unwind_protect()` takes a functional object (a C++11 lamdba function or `std::function`) and converts any C long jumps encountered to C++ exceptions. Now instead of a C long jump happening when the `Rf_allocVector()` call fails, a C++ exception occurs, which *does* trigger the `std::vector` destructor, so that memory is automatically released. - -```cpp +``` cpp std::vector x({1., 2., 3.}); SEXP y; @@ -198,7 +204,7 @@ unwind_protect([]() { `cpp11::safe()` is a more concise way to wrap a particular R API function with `unwind_protect()`. -```cpp +``` cpp std::vector x({1., 2., 3.}); SEXP y = PROTECT(safe[Rf_allocVector](REALSXP, -1)); @@ -216,17 +222,17 @@ This is done without developer facing code changes. With both C and C++ sides of the coin covered we can safely use R's C API and C++ code together with C++ objects without leaking resources. -## Altrep support +## Altrep support {#altrep-support} [ALTREP](https://svn.r-project.org/R/branches/ALTREP/ALTREP.html) which stands for **ALT**ernative **REP**resntations is a feature introduced in R 3.5. ALTREP allows R internals and package authors to define alternative ways of representing data to R. One example of the use of altrep is the `:` operator. -Prior to R 3.5 `:` generated a full vector for the entire sequence. e.g. `1:1000` would require 1000 individual values. +Prior to R 3.5 `:` generated a full vector for the entire sequence. +e.g. `1:1000` would require 1000 individual values. As of R 3.5 this sequence is instead represented by an ALTREP vector, so *none* of the values actually exist in memory. Instead each time R access a particular value in the sequence that value is computed on-the-fly. -This saves memory and excution time, and allows users to use sequences which -would otherwise be too big to fit in memory. +This saves memory and excution time, and allows users to use sequences which would otherwise be too big to fit in memory. ```{r, R.options = list(max.print = 20)} 1:1e9 @@ -314,7 +320,7 @@ knitr::kable(readRDS("sum.Rds")) [cpp11test/src/sum.cpp](https://github.com/r-lib/cpp11/blob/main/cpp11test/src/sum.cpp) contains the code ran in these benchmarks. -## UTF-8 everywhere +## UTF-8 everywhere {#utf-8-everywhere} R has complicated support for Unicode strings and non-ASCII code pages, whose behavior often differs substantially on different operating systems, particularly Windows. Correctly dealing with this is challenging and often feels like whack a mole. @@ -328,19 +334,19 @@ Concretely cpp11 always uses `Rf_translateCharUTF8()` when obtaining `const char -## C++11 features +## C++11 features {#c11-features} C++11 provides a host of new features to the C++ language. cpp11 uses a number of these including -- [move semantics](https://en.cppreference.com/w/cpp/language/move_constructor) -- [type traits](https://en.cppreference.com/w/cpp/header/type_traits) -- [initializer_list](https://en.cppreference.com/w/cpp/utility/initializer_list) -- [variadic templates / parameter packs](https://en.cppreference.com/w/cpp/language/parameter_pack) -- [user defined literals](https://en.cppreference.com/w/cpp/language/user_literal) -- [user defined attributes](https://en.cppreference.com/w/cpp/language/attributes) +- [move semantics](https://en.cppreference.com/w/cpp/language/move_constructor) +- [type traits](https://en.cppreference.com/w/cpp/header/type_traits) +- [initializer_list](https://en.cppreference.com/w/cpp/utility/initializer_list) +- [variadic templates / parameter packs](https://en.cppreference.com/w/cpp/language/parameter_pack) +- [user defined literals](https://en.cppreference.com/w/cpp/language/user_literal) +- [user defined attributes](https://en.cppreference.com/w/cpp/language/attributes) -## Simpler implementation +## Simpler implementation {#simpler-implementation} Rcpp is very ambitious, with a number of advanced features, including [modules](https://cran.r-project.org/package=Rcpp/vignettes/Rcpp-modules.pdf), [sugar](https://cran.r-project.org/package=Rcpp/vignettes/Rcpp-sugar.pdf) and extensive support for [attributes](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-attributes.pdf). While these are useful features, many R packages do not use one or any of these advanced features. @@ -369,14 +375,14 @@ git ls-files -- inst/include | while read f; do git blame -w --line-porcelain -- ``` This limited scope allows the implementation to be much simpler, the headers in Rcpp 1.0.4 have 74,658 lines of code (excluding blank or commented lines) in 379 files. -Some headers in Rcpp are automatically generated, removing these still gives you 25,249 lines of code in 357 files. +Some headers in Rcpp are automatically generated, removing these still gives you 25,249 lines of code in 357 files. In contrast the headers in cpp11 contain only 1,734 lines of code in 19 files. This reduction in complexity should make cpp11 an easier project to maintain and ensure correctness, particularly around interactions with the R garbage collector. -## Compilation speed +## Compilation speed {#compilation-speed} Rcpp always bundles all of its headers together, which causes slow compilation times and high peak memory usage when compiling. The headers in cpp11 are more easily decoupled, so you only can include only the particular headers you actually use in a source file. @@ -390,26 +396,22 @@ Here are some real examples of the reduction in compile time and peak memory usa gtime -f %M:%e R CMD INSTALL --libs-only --use-vanilla . ``` -| package | Rcpp compile time | cpp11 compile time | Rcpp peak memory | cpp11 peak memory | Rcpp commit | cpp11 commit | -| --- | --- | --- | --- | --- | --- | --- | -| haven | 17.42s | 7.13s | 428MB | 204MB | [a3cf75a4][haven] | [978cb034][haven] | -| readr | 124.13s | 81.08s | 969MB | 684MB | [ec0d8989][readr] | [aa89ff72][readr] | -| roxygen2 | 17.34s | 4.24s | 371MB | 109MB | [6f081b75][roxygen2] | [e8e1e22d][roxygen2] | -| tidyr | 14.25s | 3.34s | 363MB | 83MB | [3899ed51][tidyr] | [60f7c7d4][tidyr] | - -[haven]: https://github.com/tidyverse/haven/compare/a3cf75a4...978cb034 -[readr]: https://github.com/tidyverse/readr/compare/ec0d8989...aa89ff72 -[roxygen2]: https://github.com/r-lib/roxygen2/compare/6f081b75...e8e1e22d -[tidyr]: https://github.com/tidyverse/tidyr/compare/3899ed51...60f7c7d4 +| package | Rcpp compile time | cpp11 compile time | Rcpp peak memory | cpp11 peak memory | Rcpp commit | cpp11 commit | +|-----------|-----------|-----------|-----------|-----------|-----------|-----------| +| haven | 17.42s | 7.13s | 428MB | 204MB | [a3cf75a4](https://github.com/tidyverse/haven/compare/a3cf75a4...978cb034) | [978cb034](https://github.com/tidyverse/haven/compare/a3cf75a4...978cb034) | +| readr | 124.13s | 81.08s | 969MB | 684MB | [ec0d8989](https://github.com/tidyverse/readr/compare/ec0d8989...aa89ff72) | [aa89ff72](https://github.com/tidyverse/readr/compare/ec0d8989...aa89ff72) | +| roxygen2 | 17.34s | 4.24s | 371MB | 109MB | [6f081b75](https://github.com/r-lib/roxygen2/compare/6f081b75...e8e1e22d) | [e8e1e22d](https://github.com/r-lib/roxygen2/compare/6f081b75...e8e1e22d) | +| tidyr | 14.25s | 3.34s | 363MB | 83MB | [3899ed51](https://github.com/tidyverse/tidyr/compare/3899ed51...60f7c7d4) | [60f7c7d4](https://github.com/tidyverse/tidyr/compare/3899ed51...60f7c7d4) | -## Header only +## Header only {#header-only} Rcpp has long been a *mostly* [header only](https://en.wikipedia.org/wiki/Header-only) library, however is not a *completely* header only library. -There have been [cases](https://github.com/tidyverse/dplyr/issues/2308) when a package was first installed with version X of Rcpp, and then a newer version of Rcpp was later installed. Then when the original package X was loaded R would crash, because the [Application Binary Interface](https://en.wikipedia.org/wiki/Application_binary_interface) of Rcpp had changed between the two versions. +There have been [cases](https://github.com/tidyverse/dplyr/issues/2308) when a package was first installed with version X of Rcpp, and then a newer version of Rcpp was later installed. +Then when the original package X was loaded R would crash, because the [Application Binary Interface](https://en.wikipedia.org/wiki/Application_binary_interface) of Rcpp had changed between the two versions. Because cpp11 consists of exclusively headers this issue does not occur. -## Vendoring +## Vendoring {#vendoring} In the go community the concept of [vendoring](https://go.googlesource.com/proposal/+/master/design/25719-go15vendor.md) is widespread. Vendoring means that you copy the code for the dependencies into your project's source tree. @@ -421,14 +423,13 @@ Vendoring has advantages and drawbacks however. The advantage is that changes to the cpp11 project could never break your existing code. The drawbacks are both minor, your package size is now slightly larger, and major, you no longer get bugfixes and new features until you explicitly update cpp11. -I think the majority of packages should use `LinkingTo: cpp11` and _not_ vendor the cpp11 dependency. +I think the majority of packages should use `LinkingTo: cpp11` and *not* vendor the cpp11 dependency. However, vendoring can be appropriate for certain situations. -## Protection +## Protection {#protection} -cpp11 uses a custom double linked list data structure to track objects it is -managing. This structure is much more efficient for large numbers of objects -than using `R_PreserveObject()` / `R_ReleaseObjects()` as is done in Rcpp. +cpp11 uses a custom double linked list data structure to track objects it is managing. +This structure is much more efficient for large numbers of objects than using `R_PreserveObject()` / `R_ReleaseObjects()` as is done in Rcpp. ```{r, message = FALSE, eval = should_run_benchmarks()} library(cpp11test) @@ -467,14 +468,14 @@ Whereas it is linear or worse with the number of objects being tracked for Rcpp. knitr::kable(b_release) ``` -## Growing vectors +## Growing vectors {#growing-vectors} One major difference in Rcpp and cpp11 is how vectors are grown. Rcpp vectors have a `push_back()` method, but unlike `std::vector()` no additional space is reserved when pushing. This makes calling `push_back()` repeatably very expensive, as the entire vector has to be copied each call. In contrast `cpp11` vectors grow efficiently, reserving extra space. -Because of this you can do ~10,000,000 vector appends with cpp11 in approximately the same amount of time that Rcpp does 10,000, as this benchmark demonstrates. +Because of this you can do \~10,000,000 vector appends with cpp11 in approximately the same amount of time that Rcpp does 10,000, as this benchmark demonstrates. ```{r, message = FALSE, eval = should_run_benchmarks()} grid <- expand.grid(len = 10 ^ (0:7), pkg = "cpp11", stringsAsFactors = FALSE)