Create a background R session and keep running code in it #56

gaborcsardi · 2018-05-25T15:24:18Z

This is basically ready, but it does need more testing, especially when sg fails, e.g. the remote R crashes.

processx::process already has a wait method...

gaborcsardi · 2018-05-25T15:35:14Z

❯ bench::mark("callr::r" = h(), "callr::r_session" = f(), "callr::r_session2" = f2(), local = g(), check = FALSE)
# A tibble: 4 x 14
  expression           min     mean   median      max `itr/sec` mem_alloc  n_gc
  <chr>           <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
1 callr::r        165.47ms 170.33ms 170.09ms 175.44ms    5.87e0    45.9KB     0
2 callr::r_sessi…   5.42ms   5.93ms   5.79ms   7.25ms    1.69e2    13.8KB     0
3 callr::r_sessi…   3.21ms   3.55ms   3.41ms   5.67ms    2.81e2      704B     1
4 local              184ns 281.17ns    203ns   5.47µs    3.56e6        0B     0
# ... with 6 more variables: n_itr <int>, total_time <bch:tm>, result <list>,
#   memory <list>, time <list>, gc <list>

The first r_session is the current implementation. The second is a better future implementation, without temporary files on the disk. Its running time is estimated here. local is an ordinary R function call.

hadley

Is there a way for the process to know who it's parent is?

hadley · 2018-05-25T15:39:27Z

R/r-session.R

+#' has started, and the elapsed time since the current computation has
+#' started. The latter is NA if there is no active computation.
+#'
+#' `rs$get_state()` return the state of the R session. Possible values:


Do we need a way to get the status code if the process has finished?

It inherits all methods from processx::process, so you have rs$get_exit_status().

gaborcsardi · 2018-05-25T15:45:16Z

Is there a way for the process to know who it's parent is?

At the system level yes, but I don't think that we have an R API for it. Would that be useful?

codecov-io · 2018-05-25T15:59:53Z

Codecov Report

Merging #56 into master will decrease coverage by 6.19%.
The diff coverage is 83.39%.

@@            Coverage Diff            @@
##           master      #56     +/-   ##
=========================================
- Coverage   97.17%   90.98%   -6.2%     
=========================================
  Files          15       16      +1     
  Lines         390      610    +220     
=========================================
+ Hits          379      555    +176     
- Misses         11       55     +44

Impacted Files	Coverage Δ
R/r-process.R	`95.83% <ø> (-4.17%)`	⬇️
R/setup.R	`96.15% <100%> (-1.38%)`	⬇️
R/utils.R	`90.62% <100%> (+3.12%)`	⬆️
R/result.R	`91.11% <100%> (-1.05%)`	⬇️
R/r-session.R	`80.43% <80.43%> (ø)`
R/script.R	`91.46% <88.88%> (-6.27%)`	⬇️
R/options.R	`100% <0%> (ø)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b092d9...5244bea. Read the comment docs.

gaborcsardi · 2018-05-25T16:18:06Z

One question here is what to do with standard output and standard error (by default). They probably should not be ignored, so we can write them to a file, or to a pipe. The pipe is easier to handle in the parent, but the parent needs to read it out, otherwise the child stops. And then it is easy to create a deadlock.

lionel- · 2018-05-25T16:53:13Z

Would it make sense to leave that up to the function passed to the process? e.g. it could be wrapped with purrr::quietly(). This would assume that only warnings and messages write to stderr. Would it be possible to reliably capture stderr without going through sink()?

jimhester · 2018-05-25T17:12:15Z

R/r-session.R

+      while (1) {
+        data <- processx::conn_write(con, data)
+        if (!length(data)) break;
+        Sys.sleep(.1)


I wonder if the sleep time here and on L274 should be a parameter of the object?

Could be, but this sleep is actually not very important. It is a safety net that is almost never needed. Basically it is only needed if the write buffer is full, but the write buffer is at least 4k or 8k, and we don't send messages longer than that, and only ever keep at most one message in the buffer. So it is not really important to make this user configurable, I think.

jimhester · 2018-05-25T17:16:03Z

R/r-session.R

+
+rs_finish <- function(self, private, grace) {
+  close(self$get_input_connection())
+  self$poll_io(grace)


Is 200 (ms?) enough time to ensure the everything closes gracefully?

200ms, yes. Should be, unless the R session has .Last which takes longer to run. For this odd case, the grace period is user configurable.

jimhester · 2018-05-25T17:17:01Z

R/r-session.R

+rs_finish <- function(self, private, grace) {
+  close(self$get_input_connection())
+  self$poll_io(grace)
+  self$kill()


What signal does this send to the child process? SIGKILL, SIGTERM, SIGINT or SIGQUIT?

SIGKILL On windows there is nothing else, and then it is better to behave the same way across platforms.

jimhester · 2018-05-25T17:21:09Z

R/r-session.R

+}
+
+rs__write_for_sure <- function(self, private, text) {
+  while (1) {


Maybe this could be

while(length(self$write_input(text)) > 0) { Sys.sleep(.1) }

Although perhaps you feel that obscures what function is doing the work.

You need to save the return value of write_input(), though. That's how write_input() works: it writes as much as it can, and returns the leftover that it could not write.

Oh I see, so you would need to do something like

while(length(text <- self$write_input(text)) > 0) { Sys.sleep(1) }

Which isn't much of an improvement, so it is fine as is.

gaborcsardi · 2018-05-25T17:59:59Z

Would it make sense to leave that up to the function passed to the process?

You need to select where the stdout and stderr go when you create the process.

I would avoid using sinks, because that's just another weak link, it makes the child logic more difficult, and I don't think it buys you much. Ultimately you want to be able to get the stdout and stderr in the parent, and either a file or a pipe is a better solution for that.

gaborcsardi · 2018-05-25T18:32:16Z

@lionel-

This would assume that only warnings and messages write to stderr

We already catch the errors and pass them to the parent, but we should probably catch the warnings as well.

gaborcsardi · 2018-05-25T21:21:41Z

So, after some thinking, I think by default we could write stdout and stderr to temp files. Then after a run or call + get_result we can just read the files from the right position to the end, to get the stdout and stderr of the evaluated function.

The only glitch with this solution, is that the files will forever grow on the disk, until the session is done. But I don't think we can do much about that.

In the (distant?) future, when we'll have a proper event loop running on a background thread, we can write stdout and stderr to pipes, and then the event loop thread will read them out regularly in the background, without bothering the main R thread.

This does not really matter for the one-shot functions (except that their exit code is now different), but it does matter for r_session.

lionel- · 2018-05-28T08:33:01Z

About capturing conditions, I've been thinking about a maybe object that you could deref() (either get the value or rethrow conditions) or purrr::modify() (change the result unless the error field is active, i.e. Haskell's fmap). maybe would be the return value of functions modified with purrr::safely(). I'm not sure how warnings and messages would fit with this yet, either a different object type or having optional maybe fields:

maybe <- function(expr, error = TRUE, warnings = FALSE, messages = FALSE) {
  ...
}

x <- maybe({ warning("foo"); stop("bar") }, warnings = TRUE)
x
#> <maybe>
#> $result
#> NULL
#>
#> $error
#> <simpleError in force(expr): bar>
#>
#> $warnings
#> $warnings[[1]]
#> <simpleWarning in force(expr): foo>

has_failed(x)
#> [1] TRUE

deref(x)
#> Error: bar
#> In addition: Warning message:
#> foo

identical(x, purrr::modify(x, toupper))
#> [1] TRUE

Maybe that could be the return value of process calls?

gaborcsardi · 2018-05-28T18:59:46Z

@lionel- For this we need to change API, so we would need to introduce new functions. Which is fine, but I am wondering if the higher level utils belong to some other package, and if we should keep callr close to the system calls. E.g. callr could just return the result / error, the warnings, the stdout and strerr.

gaborcsardi · 2018-05-31T10:43:56Z

OK, now we use separate temporary files for each $call() to get the stdout and stderr. It turns out that it is possible to re-route stdout and stderr on the fly, and processx now supports this.

There is one file for each call.

It has everything merged now.

When we read out the result.

When session is not in the right state for an operation.

In case it crashed, or was $kill()-ed.

gaborcsardi · 2018-06-04T08:24:47Z

I'll merge this now, can continue to improve....

gaborcsardi added 5 commits May 25, 2018 12:02

r_session, first working version on Unix

1612f1f

Need feature/pollable-connection branch from processx

5ccdf1f

r_session$wait -> wait_for_call, avoid process$wait clash

206f754

processx::process already has a wait method...

Test case for r_session$run()

c72475c

Docs for r_session

52eaf62

gaborcsardi requested review from hadley, jimhester and lionel- May 25, 2018 15:24

gaborcsardi changed the title ~~Keep a background R session and keep running code in it~~ Create a background R session and keep running code in it May 25, 2018

hadley reviewed May 25, 2018

View reviewed changes

Better cleanup in tests

f5a223f

jimhester reviewed May 25, 2018

View reviewed changes

gaborcsardi added 3 commits May 26, 2018 10:44

r_session tests for stdout and stderr

f60a493

Simplify r_session with processx poll connection

1fe02ca

Catch errors, interrupts as well

24cc242

This does not really matter for the one-shot functions (except that their exit code is now different), but it does matter for r_session.

gaborcsardi force-pushed the feature/session branch from 7851718 to 24cc242 Compare May 27, 2018 23:45

Need newer processx

12a4706

gaborcsardi added 2 commits May 31, 2018 11:57

r_session: use temporary files to get stdout/stderr

c0e6b7f

There is one file for each call.

Fix and test for $interrupt()

b2f1c31

gaborcsardi force-pushed the feature/session branch from 2554d31 to b2f1c31 Compare May 31, 2018 10:57

gaborcsardi added 10 commits June 2, 2018 09:47

Use processx master

9a73004

It has everything merged now.

Fix r_session interrupts

e3e8433

Session: make sure finish() kills process

550c533

Make session work with stdout/stderr pipes

3828227

Remove some dead code

b6ac746

Session: set process to idle on error

8fc630a

When we read out the result.

r_session$run() handles interrupts

0592cb3

r_session: print method

59ea0ca

r_session: better error messages

778055f

When session is not in the right state for an operation.

r_session: get_state() checks if alive

5244bea

In case it crashed, or was $kill()-ed.

gaborcsardi merged commit 0bc8610 into master Jun 4, 2018

gaborcsardi mentioned this pull request Jun 4, 2018

r_session #54

Closed

wlandau mentioned this pull request Mar 26, 2019

Optional heartbeating for the master process mschubert/clustermq#131

Closed

gaborcsardi deleted the feature/session branch June 21, 2019 09:53

wlandau mentioned this pull request Dec 10, 2023

Allow daemons to exit immediately when the connection terminates? shikokuchuo/mirai#87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a background R session and keep running code in it #56

Create a background R session and keep running code in it #56

gaborcsardi commented May 25, 2018

gaborcsardi commented May 25, 2018

hadley left a comment

hadley May 25, 2018

gaborcsardi May 25, 2018

gaborcsardi commented May 25, 2018

codecov-io commented May 25, 2018 •

edited

Loading

gaborcsardi commented May 25, 2018

lionel- commented May 25, 2018

jimhester May 25, 2018

gaborcsardi May 25, 2018 •

edited

Loading

jimhester May 25, 2018

gaborcsardi May 25, 2018

jimhester May 25, 2018

gaborcsardi May 25, 2018

jimhester May 25, 2018

gaborcsardi May 25, 2018

jimhester May 25, 2018

gaborcsardi commented May 25, 2018

gaborcsardi commented May 25, 2018

gaborcsardi commented May 25, 2018

lionel- commented May 28, 2018

gaborcsardi commented May 28, 2018

gaborcsardi commented May 31, 2018

gaborcsardi commented Jun 4, 2018

Create a background R session and keep running code in it #56

Create a background R session and keep running code in it #56

Conversation

gaborcsardi commented May 25, 2018

gaborcsardi commented May 25, 2018

hadley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaborcsardi commented May 25, 2018

codecov-io commented May 25, 2018 • edited Loading

Codecov Report

gaborcsardi commented May 25, 2018

lionel- commented May 25, 2018

Choose a reason for hiding this comment

gaborcsardi May 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaborcsardi commented May 25, 2018

gaborcsardi commented May 25, 2018

gaborcsardi commented May 25, 2018

lionel- commented May 28, 2018

gaborcsardi commented May 28, 2018

gaborcsardi commented May 31, 2018

gaborcsardi commented Jun 4, 2018

codecov-io commented May 25, 2018 •

edited

Loading

gaborcsardi May 25, 2018 •

edited

Loading