Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private event loops #84

Merged
merged 44 commits into from
Jul 26, 2019
Merged

Private event loops #84

merged 44 commits into from
Jul 26, 2019

Conversation

wch
Copy link
Member

@wch wch commented Feb 18, 2019

Opening this draft PR for discussion purposes.

Thoughts:

It would be useful to be tell later that a particular private event loop will no longer be used, so that it can clear the callback queue, and perhaps error if that loop is run again. This will help prevent queued (but never called) callbacks from preventing GC of objects that they reference, as is the case is rstudio/websocket#42. I think this should be safe for callbacks that are R functions; I'm less sure that it's safe in general for C/C++ functions.

In most cases, you'd want to use it like this, where the lifetime of the private event loop is completely contained within the with_private_loop() call:

with_private_loop({
   later( function() { ... }, delay )

  while (something)
    run_now()
})

In this case, it makes sense to destroy the private loop when it exits.

In other cases, it's useful to save a reference to the private loop so that it can be used later. The pattern (using the current API in this PR) is like this:

my_loop <- NULL
with_private_loop({
  my_loop <<- current_loop()
})

# Do some stuff...

later( function() { ... }, delay , loop = my_loop)

# Do some other stuff...

while (something)
  run_now(loop = my_loop)

You wouldn't want the private loop to be destroyed in this case. I think it makes sense to add an option to with_private_loop() that allows the user to control that. Something like:

with_private_loop({ ... }, destroy_on_exit = FALSE)

But this brings up related issues: First, the API for this is a little weird: why call with_private_loop({ ... }) if I'm just going to save a reference to the loop and use it somewhere else? Why not just have a function like new_loop() which just returns the loop directly?

Second, the current API makes it easy to unknowingly call a function that runs the global loop. For example:

f <- function() {
  run_now()
}

my_loop <- NULL
with_private_loop({ my_loop <<- current_loop() })

# Do some stuff...

later( function() { ... }, delay , loop = my_loop)

f()  # Oops

while (something)
  run_now(loop = my_loop)

It would be a little safer to do something like:

my_loop <- new_loop()

# Do some stuff...

with_loop(my_loop, {
  later( function() { ... }, delay)

  f()  # OK

  while (something)
    run_now()
})

It might make sense to have two interfaces: one which creates a loop and automatically destroys it at the end:

with_temporary_loop({
  ...
})

And one which allows you to pass in a loop object and does not destroy it automatically:

my_loop <- new_loop()

with_loop(my_loop, {
  ...
})

The later and run_now functions would not need to have a loop parameter, and there would be no need for a current_loop() function.

Also, the object returned by new_loop() could be an environment with a finalizer which destroys the loop (and empties the queue).


Other thoughts:

  • Do we have to handle re-entrancy differently when there are multiple loops?
  • It would also be helpful to have functions to introspect what event loops still exist, and if they're still being run. In some cases, a private event loop will keep scheduling itself on the global loop and keep running in the background, and it would be nice to be able tell when that happens.
  • Promises created with one event loop may not properly when another event loop is being run. We'll want to document this. See this gist for an example.
  • A common pattern for private event loops is to create a promise chain, then call run_now until the promise is resolved. One problem is what to do about interrupts received during the run_now. The interrupt might happen (A) when one of the steps of the chain is running, or (B) when one of the steps is not running (like when it's waiting for a data to come in.) In case A, the current behavior is to simply stop, without In order to have a synchronous-like behavior in this situation, we need to have the interrupt break out of the chain (and cause the finally to execute, if present). See https://gist.github.com/wch/7e55d4b9a8f2d640c8d66ca166858b6d for examples.

@wch wch force-pushed the joe/feature/private-event-loops2 branch from bbf3f28 to 2b208f9 Compare March 14, 2019 21:43
wch added 6 commits March 15, 2019 09:54
A loop handle is an environment which, when GC’d, will delete the queue, which in turn allows the items in the queue to be GC’d. Without this, if a loop ID was lost, then its queue would exist forever, preventing resource cleanup.
Some compilers give errors when a subclass of boost::operators is an abstract class.
@wch wch force-pushed the joe/feature/private-event-loops2 branch from 55b553e to 5dac018 Compare March 15, 2019 14:55
@wch
Copy link
Member Author

wch commented Mar 15, 2019

This PR now has these new functions:

  • create_loop()
  • with_loop(loop, expr)
  • with_temp_loop(expr)
  • destroy_loop(loop)
  • current_loop()
  • global_loop()

The loop handles are environments with finalizers -- the finalizers call destroy_loop, so that if no one has any references to the handle, the callbacks in the event loop (and things they refer to) can be GC'd.

There's also one more un-exported function, list_queue(loop), which returns a list of queued
callbacks for the specified loop. This is useful for debugging. For example:

later(function() message("foo"), 2)
later(function() message("foo"), 1)
str(later:::list_queue())
#> List of 2
#>  $ :List of 3
#>   ..$ id      : num 1
#>   ..$ when    : num 0.999
#>   ..$ callback:function ()  
#>   .. ..- attr(*, "srcref")= 'srcref' int [1:8] 4 7 4 31 7 31 4 4
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7f9fffef05d8> 
#>  $ :List of 3
#>   ..$ id      : num 0
#>   ..$ when    : num 2
#>   ..$ callback:function ()  
#>   .. ..- attr(*, "srcref")= 'srcref' int [1:8] 3 7 3 31 7 31 3 3
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7f9fffef05d8>

If there are any C/C++ functions in the event loop, they will be shown as "C/C++ function":

library(Rcpp)

cppFunction(
  includes = '
    #include <later_api.h>
    void c_message(void* data) {
      fprintf(stderr, "bar\\n");
    }
  ',
  code = '
    void schedule_c_fun() {
      later::later(c_message, NULL, 1);
    }
  ',
  depends = "later"
)

library(later)
later(function() message("foo"), 2)
schedule_c_fun()
str(later:::list_queue())
#> List of 2
#>  $ :List of 3
#>   ..$ id      : num 1
#>   ..$ when    : num 0.998
#>   ..$ callback: chr "C/C++ function"
#>  $ :List of 3
#>   ..$ id      : num 0
#>   ..$ when    : num 2
#>   ..$ callback:function ()  
#>   .. ..- attr(*, "srcref")= 'srcref' int [1:8] 19 7 19 31 7 31 19 19
#>   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fe54d4e7f78>

@jcheng5 jcheng5 self-requested a review March 15, 2019 16:23
@jcheng5
Copy link
Member

jcheng5 commented Mar 19, 2019

I think the APIs in inst/include/later.h needs to be modified to take a loop ID as well

DESCRIPTION Outdated
@@ -1,7 +1,7 @@
Package: later
Type: Package
Title: Utilities for Delaying Function Execution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought: might be worth modifying the title/description to say something about event loops?

NAMESPACE Show resolved Hide resolved

id <- .globals$next_id
.globals$next_id <- id + 1L
createCallbackRegistry(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just occurred to me that loop IDs may be stored in objects that get serialized. Should we do something to make loop handles less... serializable? (Would be inherently problematic for them to be deserialized in a different process) Maybe like an extptr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump. Can we prevent these from being serialized? Or make the id a combination of the integer and Sys.getpid()?

R/later.R Outdated
loop$id <- id
lockBinding("id", loop)
# Automatically destroy the loop when the handle is GC'd
reg.finalizer(loop, destroy_loop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬

Copy link
Member Author

@wch wch Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some research, it seems like we should get rid of this and just warn instead.

https://docs.oracle.com/javase/9/docs/api/java/lang/Object.html#finalize--

And we had this difficult issue way back: rstudio/pool#40

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some other discussion, we decided to make the C++ function deleteCallbackRegistries simply be a no-op when a re-entrant call is made.

src/later.cpp Outdated
// Gets a callback registry by ID. If registry doesn't exist, it will be created.
boost::shared_ptr<CallbackRegistry> getCallbackRegistry(int loop) {
ASSERT_MAIN_THREAD()
// TODO: clean up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what this is referring to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool idle();

bool execCallbacks(double timeoutSecs = 0, bool runAll = true, int loop = GLOBAL_LOOP);
bool idle(int loop);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should loop default to GLOBAL_LOOP here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, maybe this header file isn't public...?


return List::create(
_["id"] = callbackNum,
_["when"] = when.diff_secs(Timestamp()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the Timestamp() here, as it makes comparison between different rRepresentation objects unreliable. Could this either be the absolute time, or at least have the subtrahend be passed as an argument to rRepresentation()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's kind of an ugly bit. Getting the absolute time is a bit tricky to do cross-platform and I figured that since the list_queue() function isn't exported and is only for debugging, I'd just do it the easy way.

src/later.cpp Outdated
@@ -155,5 +200,6 @@ double next_op_secs() {

extern "C" void execLaterNative(void (*func)(void*), void* data, double delaySecs) {
ensureInitialized();
doExecLater(func, data, delaySecs);
int loop = GLOBAL_LOOP;
doExecLater(getCallbackRegistry(GLOBAL_LOOP), func, data, delaySecs, loop == GLOBAL_LOOP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't call getCallbackRegistry here, it's ASSERT_MAIN_THREAD. I think we're going to have to guard the callbackRegistries somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it'd be great if we had a unit test that identified this issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • We need to protect accesses to the callbackRegistries to be thread safe.
  • We need to figure out what to do if a background thread tries to schedule a callback on the main thread, but with a loop that no longer exists.
  • Do we need lock hierarchies?

@wch wch force-pushed the joe/feature/private-event-loops2 branch from c416178 to f974f10 Compare June 20, 2019 16:25
wch added 2 commits June 20, 2019 11:27
This is needed for uint64_t on some compilers.
@wch wch force-pushed the joe/feature/private-event-loops2 branch from f8db0d5 to 0c1501a Compare June 20, 2019 17:07
@wch wch marked this pull request as ready for review July 23, 2019 17:49
R/later.R Outdated Show resolved Hide resolved
R/later.R Show resolved Hide resolved
TODO.md Outdated Show resolved Hide resolved
Copy link
Member

@jcheng5 jcheng5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the nits I mentioned, then I think it's ready.

wch and others added 2 commits July 26, 2019 12:27
Co-Authored-By: Joe Cheng <joe@rstudio.com>
@jcheng5 jcheng5 self-requested a review July 26, 2019 17:45
@wch wch merged commit e379210 into master Jul 26, 2019
@wch wch deleted the joe/feature/private-event-loops2 branch July 31, 2019 17:54
@wch wch restored the joe/feature/private-event-loops2 branch July 31, 2019 17:54
@wch wch deleted the joe/feature/private-event-loops2 branch August 2, 2019 17:30
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Dec 31, 2019
## later 1.0.0

* Added private event loops: these are event loops that can be run
  independently from the global event loop. These are useful when you
  have code that schedules callbacks with `later()`, and you want to
  call `run_now()` block and wait for those callbacks to execute
  before continuing. Without private event loops, if you call
  `run_now()` to wait until a particular callback has finished, you
  might inadvertantly run other callbacks that were scheduled by other
  code. With private event loops, you can create a private loop,
  schedule a callback on it, then call `run_now()` on that loop until
  it executes, all without interfering with the global
  loop. ([#84](r-lib/later#84))
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jan 14, 2020
## later 1.0.0

* Added private event loops: these are event loops that can be run
  independently from the global event loop. These are useful when you
  have code that schedules callbacks with `later()`, and you want to
  call `run_now()` block and wait for those callbacks to execute
  before continuing. Without private event loops, if you call
  `run_now()` to wait until a particular callback has finished, you
  might inadvertantly run other callbacks that were scheduled by other
  code. With private event loops, you can create a private loop,
  schedule a callback on it, then call `run_now()` on that loop until
  it executes, all without interfering with the global
  loop. ([#84](r-lib/later#84))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants