Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in scheduler causes test failures #40

Closed
wch opened this issue Apr 7, 2017 · 1 comment
Closed

Race condition in scheduler causes test failures #40

wch opened this issue Apr 7, 2017 · 1 comment

Comments

@wch
Copy link
Collaborator

@wch wch commented Apr 7, 2017

After much investigation and the help of some R-Core members, it turns out that the seemingly-
random errors we've had on R-devel were caused by a race condition in the pool package.

Specifically, the NaiveScheduler's executeTasks method could get invoked by a this finalizer when running the same executeTasks method. The error can be triggered reliably by inserting a gc() in the middle of executeTasks().

If a gc() is inserted right after line 74, I reliably get this output, on R 3.3.3 and R-devel:

> source("testthat.R")
1. Error: fetch: only validates after validationInterval (@test-fetch.R#35) ----
could not find function "task"
1: naiveScheduler$protect({
       obj <- poolCheckout(pool)
       t0 <- Sys.time()
       pool_metadata <- attr(obj, "pool_metadata", exact = TRUE)
       lastValidated_t0 <- pool_metadata$lastValidated
       poolReturn(obj)
       obj <- poolCheckout(pool)
       t1 <- Sys.time()
       pool_metadata <- attr(obj, "pool_metadata", exact = TRUE)
       lastValidated_t1 <- pool_metadata$lastValidated
       if ((t1 - t0) * 1000 < pool$validationInterval) {
           expect_identical(lastValidated_t0, lastValidated_t1)
       }
       checkCounts(pool, free = 0, taken = 1)
       poolReturn(obj)
       checkCounts(pool, free = 1, taken = 0)
       obj <- poolCheckout(pool)
       t2 <- Sys.time()
       pool_metadata <- attr(obj, "pool_metadata", exact = TRUE)
       lastValidated_t2 <- pool_metadata$lastValidated
       if ((t2 - t0) * 1000 < pool$validationInterval) {
           expect_identical(lastValidated_t0, lastValidated_t2)
       }
       checkCounts(pool, free = 0, taken = 1)
       poolReturn(obj)
       checkCounts(pool, free = 1, taken = 0)
       Sys.sleep((pool$validationInterval + 100)/1000)
       obj <- poolCheckout(pool)
       t3 <- Sys.time()
       pool_metadata <- attr(obj, "pool_metadata", exact = TRUE)
       lastValidated_t3 <- pool_metadata$lastValidated
       if ((t3 - t0) * 1000 > pool$validationInterval) {
           expect_false(identical(lastValidated_t0, lastValidated_t3))
       }
       checkCounts(pool, free = 0, taken = 1)
       poolReturn(obj)
       checkCounts(pool, free = 1, taken = 0)
       obj <- poolCheckout(pool)
       t4 <- Sys.time()
       pool_metadata <- attr(obj, "pool_metadata", exact = TRUE)
       lastValidated_t4 <- pool_metadata$lastValidated
       if ((t4 - t3) * 1000 < pool$validationInterval) {
           expect_identical(lastValidated_t3, lastValidated_t4)
       }
       checkCounts(pool, free = 0, taken = 1)
       poolReturn(obj)
       checkCounts(pool, free = 1, taken = 0)
   }) at testthat/test-fetch.R:35
2: private$refCount$release() at testthat/test-fetch.R:35
3: private$callback()
4: self$protect({
       task()
   })
5: force(expr)

2. Error: pool scheduler: schedules things in the right order (@test-scheduling.R#13) 
could not find function "task"
1: naiveScheduler$protect({
       scheduleTask(1000, function() {
           results <<- c(results, 3L)
       })
       scheduleTask(100, function() {
           results <<- c(results, 2L)
       })
       scheduleTask(10, function() {
           results <<- c(results, 1L)
       })
   }) at testthat/test-scheduling.R:13
2: private$refCount$release() at testthat/test-scheduling.R:13
3: private$callback()
4: self$protect({
       task()
   })
5: force(expr)

testthat results ================================================================
OK: 204 SKIPPED: 0 FAILED: 2
1. Error: fetch: only validates after validationInterval (@test-fetch.R#35) 
2. Error: pool scheduler: schedules things in the right order (@test-scheduling.R#13) 

The problem is that there's a window of time between when the counter used by protect is decremented and when it enters another protect block. If the finalizer fires in that window (as we did explicitly with the gc() above), then bad things can happen.

This was referenced Jun 21, 2017
@bborgesr
Copy link
Contributor

@bborgesr bborgesr commented Jul 3, 2017

Closed by #44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.