cppally is a high-performance header-only library providing a rich C++20 API for advanced R data manipulation. Leveraging C++20 Concepts, custom R-based classes, templated functions and Single-Instruction-Multiple-Data (SIMD) vectorisation, cppally enables type-safety, performance, flexible templates and readable code.
For info on using cppally see Getting started with cppally
I first want to thank the authors and contributors of the fantastic cpp11 R package, without which I would not have been inspired to write this package. I’d also like to thank the authors and contributors of Rcpp for developing this ecosystem that has laid much of the groundwork for C++ and R integration.
Install the CRAN release
install.packages("cppally")or the development version
pak::pak("NicChr/cppally")template <RMathType T>
[[cppally::register]]
r_dbl cpp_sum(r_vector<T> x){
r_size_t n = x.length();
r_dbl out(0);
for (r_size_t i = 0; i < n; ++i){
out += x.get(i);
}
return out;
}
Register the C++ function to R
cpp_source(code = '
#include <cppally.hpp>
using namespace cppally;
template <RMathType T>
[[cppally::register]]
r_dbl cpp_sum(r_vector<T> x){
r_size_t n = x.length();
r_dbl out(0);
for (r_size_t i = 0; i < n; ++i){
out += x.get(i);
}
return out;
}
')cpp_sum(1:5)
#> [1] 15NA values are handled like in R. In this case NA is returned if the vector contains one or more NA values.
cpp_sum(c(1, NA, 3))
#> [1] NAcppally makes heavy use of templates for powerful generic programming. While this offers a flexible framework for writing generic functions, it comes at the cost of slower compile times and larger binary sizes.
Users can write and optionally register their own templates (to R). There are two main limitations to be aware of. The first is that templates must be written in header files if they are to be used across multiple compilation units. The other big limitation is that template specialisations cannot be called from R, so when calling C++ template functions from R, we always rely on automatic deduction from the function inputs. There is a workaround discussed in the main vignette Getting started with cppally
cppally offers R-based C++ scalar types that are NA aware. To achieve
this multiple methods such as binary arithmetic operators have been
written to ensure NA is propagated correctly. While every attempt has
been made to make this as fast as possible, it adds some overhead and in
some cases can prevent effective vectorisation (via e.g. SIMD
instructions). If you find that this is slowing things down too much you
can work with the underlying C/C++ types using unwrap_t<> and
unwrap().
Like the excellent cpp11 package, cppally also handles automatic protection for R objects. For more info see Automatic Protection
For performance reasons, ALTREP materialisation is eager by default,
which means that ALTREP vectors are materialised on construction. To
preserve ALTREP compact representations, one can enable the package-wide
‘CPPALLY_PRESERVE_ALTREP’ flag. This can be done through
cppally::use_preserve_altrep_flag() or
cppally::cpp_source(..., preserve_altrep = TRUE). You can also
manually add the ‘-DCPPALLY_PRESERVE_ALTREP’ flag to Makevars.
Using the R C API alongside cppally is strongly discouraged for the following reasons.
If R throws an error via Rf_error() a ‘longjmp’ will occur, meaning
C++ destructors won’t run and memory that should have been released will
not be released.
Furthermore, due to the way cppally caches vector names, using R C API
functions like Rf_setAttrib() will set the vector’s names without
informing cppally, leading to synchronisation issues. cppally needs to
keep the names cache in sync with the R names attribute and the only way
it can do that is by detecting changes to the names via
cppally::r_vector::set_names() or cppally::attr::set_attr() (or
cppally::set_old_names()).
Attribute manipulation is possible and helpers can be found in the
attr namespace via cppally::attr::
To avoid the overhead associated with automatic protection entirely, one
can use view types like e.g. r_str_view, a non-owning class for R
strings. For more info on views see Automatic
Protection
Copy-on-modify can be enabled via cppally::use_copy_on_modify() or by
setting the CPPALLY_COPY_ON_MODIFY Makevars flag directly. When this is
enabled, all in-place modifications check that the object being modified
isn’t referenced or owned by another object. If it is referenced, a copy
is taken first before modifying, otherwise it directly modifies.
This safety check is inherently single-threaded which effectively disables almost all parallelisation. Enable this if prevention of accidental modification is a high concern. On the other hand, leaving it disabled may be preferable when performance is important.
By default, copy-on-modify is disabled and hence all element setting is
done in-place via r_vector::set(). It is up to the user to ensure that
a fresh vector is created before further manipulation or that it’s safe
to modify the existing vector.
Any coercion that results in complete information loss is an error (partial is allowed, e.g. double -> int).
For example, string -> int may not be possible without complete information loss
as<r_int>(r_str("a"))#> Error:
#> ! Implicit NA coercion detected from r_str to r_int, please ensure data can be coerced without complete loss of information
This is in contrast to R which returns an NA with a warning
as.integer("a")
#> Warning: NAs introduced by coercion
#> [1] NAThe benefit of cppally’s approach is that when registering C++ functions to R, inputs can be supplied flexibly without unexpected behaviour.
Let’s say you have a function foo that expects an r_int but you give
it an r_dbl without realising - this will implicitly coerce to r_int
without throwing an error.
[[cppally::register]]
r_int foo(r_int x){
return x;
}foo(1.2345)
#> [1] 1The double 1.2345 was implicitly converted to 1, an example of
partial lossy coercion. cppally allows this.
What cppally doesn’t allow is total lossy coercion, which can result in ambiguity. Take the following example of counting the occurrence of a value in a vector.
[[cppally::register]]
int count_val(r_vector<r_dbl> x, r_dbl value){
return x.count(value);
}x <- c(rep(0, 20), rep(1, 30), rep(NA, 40))
count_val(x, 0)
#> [1] 20
count_val(x, 1)
#> [1] 30
count_val(x, NA)
#> [1] 40
# value can also implicitly coerce a string to a double
count_val(x, "0")
#> [1] 20
count_val(x, "1")
#> [1] 30So far so good. But what happens if we implicitly coerced to NA and then counted occurrences?
count_val(x, as.double("1. 0")) # Wrong
#> Warning in count_val(x, as.double("1. 0")): NAs introduced by coercion
#> [1] 40as.double("1. 0") was coerced to NA, count_val() then counted the
number of NA values and returned 40, even though we didn’t ask for a
count of NA, so the result should have been 0.
This is the main issue with allowing total lossy coercion to NA -
count_val() can’t distinguish between a true NA and an NA that has
been produced from a lossy coercion.
With cppally this ambiguity is impossible.
count_val(x, "1. 0")
#> Error:
#> ! Implicit NA coercion detected from r_str to r_dbl, please ensure data can be coerced without complete loss of informationAll indexing is 0-based including subsetting vectors.
On the C++ side, 64-bit integers are fully supported, including vectors. To return 64-bit integers to R we need the bit64 package to be loaded. cppally delegates the handling of 64-bit integer vectors to bit64 by marking them with the “integer64” class.
library(bit64)[[cppally::register]]
r_int64 as_int64(r_int x){
return as<r_int64>(x);
}as_int64(.Machine$integer.max) + 1L
#> integer64
#> [1] 2147483648Please note that other signed 64-bit integer types like int64_t,
R_xlen_t and cppally’s r_size_t will convert to 64-bit integer
vectors when returned to R.
as<r_size_t>(r_int(0))#> integer64
#> [1] 0
as<R_xlen_t>(r_int(0))#> integer64
#> [1] 0
as<int64_t>(r_int(0))#> integer64
#> [1] 0
The cppally version of R’s R_NilValue is r_null which is of type
r_sexp. In an attempt to avoid the use of additional meta-programming
tactics to deal with r_null, we allow vectors to be able to contain
r_null which makes programming with R attributes easier. This means
r_vector<T> objects can be r_null. To detect this, use the
is_null() member function.
r_vector<r_int>(r_null)#> NULL
r_vector<r_int>(r_null).is_null()#> [1] TRUE
Because cppally is a template-heavy library, binary sizes can sometimes get large. This is primarily an issue on windows which will throw a compiler error if a single .o file gets too big. In this case you may want to consider adding the following flag to Makevars.win
PKG_CXXFLAGS = -Wa,-mbig-obj
To benefit from OMP SIMD vectorisation and parallelisation, it is recommended to add these flags to Makevars
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS)
And these flags to Makevars.win (including the windows specific binary size flags)
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) -Wa,-mbig-obj
PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS)
At the moment C++20 is not fully supported via RStudio, so I would recommend using vscode with the C/C++ for Visual Studio Code extension. Positron may also be an option but since I haven’t used it, I can’t speak to its capabilities.
While I personally use vscode for C++ code and RStudio for R code and package development, you can also use vscode (or Positron) for both these things, but again, I haven’t personally used vscode for writing R code so I can’t say much about it.
To get vscode’s intellisense to work correctly, you will likely need to set some parameters in c_cpp_properties.json.
My json file looks like this:
{
"configurations": [
{
"name": "Win32",
"includePath": [
"${workspaceFolder}/**",
"${workspaceFolder}/src",
"${workspaceFolder}/inst/include",
"C:/Program Files/R/R-4.*/include",
"${env:LOCALAPPDATA}/R/win-library/4.*/cpp11/include",
"${env:LOCALAPPDATA}/R/win-library/4.*/Rcpp/include",
"${env:LOCALAPPDATA}/R/win-library/4.*/cppally/include"
],
"defines": [
"_DEBUG",
"UNICODE",
"_UNICODE",
"STRICT_R_HEADERS"
],
"compilerPath": "C:\\rtools45\\x86_64-w64-mingw32.static.posix\\bin\\g++.exe",
"cppStandard": "gnu++20",
"intelliSenseMode": "gcc-x64"
}
],
"version": 4
}As your R installation path may differ, you can find the exact path with
normalizePath(Sys.getenv("R_HOME"), winslash = "/")Your R libraries can be found with
.libPaths()The compiler bundled with RTools is likely found here
cxx <- system2(file.path(R.home("bin"), "R"),
c("CMD", "config", "CXX20"), stdout = TRUE)
cxx_bin <- trimws(strsplit(cxx, " ")[[1]][1])
Sys.which(cxx_bin)
#> g++
#> "C:\\rtools45\\X86_64~1.POS\\bin\\G__~1.EXE"Once you have both paths, set compilerPath and the R include path in c_cpp_properties.json accordingly.