Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using C++ interface for later::later() can trigger R's GC on the wrong thread #39

Closed
wch opened this issue Jan 24, 2018 · 5 comments · Fixed by #40
Closed

Using C++ interface for later::later() can trigger R's GC on the wrong thread #39

wch opened this issue Jan 24, 2018 · 5 comments · Fixed by #40

Comments

@wch
Copy link
Member

wch commented Jan 24, 2018

This is a bug that @jcheng5 and I have been tracking down. Here's an example that will crash R most of the time on Mac and Linux :

# Requires background-thread branch of httpuv
# devtools::install_github("rstudio/httpuv@background-thread")

library(promises)
promise(~later::later(~resolve(NULL), 5)) %...>% { message("Done") }

app <- list(
  call = function(req) {
    list(
      status = 200L,
      headers = list(
        "Content-Type" = "text/plain"
      ),
      body = "Hello"
    )
  }
)

httpuv::startServer("0.0.0.0", 8000, app)
# Must open a web browser to http://127.0.0.1:8000 and reload before "Done" is printed

This will crash with weird errors like

Error in tryCatch(evalq((function (hash = TRUE, parent = parent.frame(),  : 
  object 'classes' not found

and

Error in !missing(finally) : 179 arguments passed to '!' which requires 1

Using the special builds of R, RDsan and RDassertthread from https://github.com/wch/r-debug, I found that the cause of these weird errors is an R GC event occurring on the wrong thread. The GC is triggered when the background thread (as opposed to the main R thread) calls the C++ function later::later(), and schedules a C function to be run. This can cause the std::priority_queue<Callback> in later::CallbackRegistry to reorganize itself, which can cause the Rcpp::Function to be copied, which can trigger a GC on the wrong thread.

This can be see in the stack trace here:
https://gist.github.com/wch/1705437725df17e16e412946143c0d88#file-gistfile2-txt-L21-L60

With RDassertthread, it crashes every time, with:

R: memory.c:3325: R_PreserveObject: Assertion `pthread_equal(pthread_self(), __R_thread_id__)' failed.
Aborted

Using RDassertthread -d gdb provides a stack trace.

@wch
Copy link
Member Author

wch commented Jan 24, 2018

Now that I've looked at it some more, I think that maybe the root of the problem is that the Rcpp::Function's reference counter is not thread-safe -- unlike shared_ptr, where the reference counter is thread-safe. This can lead an incorrect reference count if two threads are creating or deleting Function objects that point to the same underlying R object, and that can trigger a GC at the wrong time.

The implication of this is that it is unsafe to store Rcpp objects anywhere that another thread may copy or destroy them. This includes putting them in a vector that another thread calls push_back() on, because growing vector would involve copying the Rcpp object and manipulating its reference counter on the wrong thread.


Here's a test script I wrote, trying to reproduce part the original issue: triggering a GC by growing a vector of Functions. However, it never triggers a GC when growing the vector, and I believe that it is because it all happens on the main R thread.

Rcpp::cppFunction("
  void copy_fun(Function f, int n = 1000) {
    std::vector<Function> fv;
    Function ident(\"identity\");
    Rcerr << \"Pushing...\";
    fv.push_back(f);
    for (int i=1; i<n; i++) {
      fv.push_back(ident);
    }
    Rcerr << \"done.  \";
  }
")


e <- new.env()
reg.finalizer(e, function(e) cat("Finalizer!\n"))
rm(e)

copy_fun(identity, 1e7)

@wch
Copy link
Member Author

wch commented Jan 24, 2018

Here's an example that has a background thread that manipulates a std::vector that contains a Rcpp::Function object, and causes crashes. The vector manipulations don't directly touch the Rcpp::Function object -- they simply grow the vector with other objects -- yet the refcount of the Function is altered on the background thread.

This crashes reliably for me, and when I run it with RDassertthread, it exits with:

R: memory.c:3325: R_PreserveObject: Assertion `pthread_equal(pthread_self(), __R_thread_id__)' failed.
Aborted
Rcpp::cppFunction(
  depends = "BH",
  includes = "
    #include <pthread.h>
    #include <boost/function.hpp>

    // Rcpp::Function objects can be used as Tasks
    typedef boost::function<void(void)> Task;

    void dummy_function() {}
    Task dummy_task = &dummy_function;

    // Function to be executed on second thread
    void* run_bg_thread(void *data) {
      std::vector<Task>* vec = (std::vector<Task>*)data;

      // Grow the vector by pushing dummy_task onto it repeatedly.
      // When the vector grows, it will copy the Rcpp::Function that was added
      // as the first element of vec, and its ref count will be changed.
      for (int i=0; i<1e6; i++) {
        vec->push_back(dummy_task);
      }

      return NULL;
    }
  ",
  "
  void go(Function f) {
    // Create a vector containing f. We'll give it to the other thread.
    std::vector<Task>* vec = new std::vector<Task>;
    vec->push_back(f);

    // Start the thread.
    pthread_t thread;
    pthread_create(&thread, NULL, *run_bg_thread, (void*)vec);

    // Start doing other stuff with f, so that refcount is manipulated.
    for (int i=0; i<1e6; i++) {
      Function tmp = f;
    }

    // wait for thread to finish
    pthread_join(thread,NULL);
  }
  "
)

fun <- function() {}
go(fun)

@wch
Copy link
Member Author

wch commented Jan 24, 2018

@jimhester has educated me on how Rcpp objects work. They are not smart pointers to SEXPs, as I had assumed; instead, each Rcpp object simply wraps an SEXP. That means that when they copied or destroyed, it uses R's reference counting mechanism, which definitely isn't thread-safe.

https://github.com/RcppCore/Rcpp/blob/c3b26f6/inst/include/Rcpp/storage/PreserveStorage.h#L36-L41

@jcheng5
Copy link
Member

jcheng5 commented Jan 24, 2018

Does that mean they can get arbitrarily large and expensive to copy? Should later be working harder to minimize the number of copies of Rcpp::Function?

@jcheng5
Copy link
Member

jcheng5 commented Jan 24, 2018

(Mostly meaning passing them by const reference)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants