Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with forked (multicore) code in ark #3817

Open
joshuafayallen opened this issue Jul 2, 2024 · 11 comments
Open

Problem with forked (multicore) code in ark #3817

joshuafayallen opened this issue Jul 2, 2024 · 11 comments
Labels
bug Something isn't working lang: r support
Milestone

Comments

@joshuafayallen
Copy link

Positron Version:

Positron Version: 2024.06.1 (Universal) build 27
Code - OSS Version: 1.90.0
Commit: a893e5b
Date: 2024-06-26T02:08:06.673Z
Electron: 29.4.0
Chromium: 122.0.6261.156
Node.js: 20.9.0
V8: 12.2.281.27-electron.0
OS: Darwin arm64 23.5.0

Steps to reproduce the issue:

Hi all I am not sure if this is just an issue on my end. But I was doing some analysis and kept running into a peculiar issue. For whatever reason positron keeps giving me an error. Whereas Vscode running radian does not.

#devtools::install_github("susanathey/MCPanel")
#devtools::install_github("apoorvalal/ebal")
#devtools::install_github("apoorvalal/synthdid")

library(synthdid)


data(PENN)


check = panel_estimate(PENN, unit_id = 'country', time_id = 'year',
outcome = 'log_gdp', treatment = 'dem', mccores = 5, infmethod = 'bootstrap'

What did you expect to happen?

check
#>                   DID Synthetic Control (SC) Synthetic DID (SDID)
#> Est        0.17587873            -0.02096869         -0.008799165
#> Std. Error 0.07857483             0.05651538          0.005963059
#>            Matrix Completion
#> Est              -0.01729725
#> Std. Error        0.01561886

Created on 2024-07-02 with reprex v2.1.0

Were there any error messages in the output or Developer Tools console?

Error in bootstrap_sample():
! 'list' object cannot be coerced to type 'double'
Hide Traceback

  1. └─synthdid::panel_estimate(...)
  2. └─base::mapply(...)
  3. └─synthdid (local) `<fn>`(dots[[1L]][[1L]], dots[[2L]][[1L]])
    
  4.   ├─stats::vcov(e, method = infmethod, ncores = mccores, replications = reps)
    
  5.   └─synthdid:::vcov.synthdid_estimate(...)
    
  6.     └─synthdid:::bootstrap_se(object, replications, ncores)
    
  7.       ├─stats::sd(bootstrap_sample(estimate, replications, ncores))
    
  8.       │ └─stats::var(...)
    
  9.       │   └─base::is.data.frame(x)
    
  10.       └─synthdid:::bootstrap_sample(estimate, replications, ncores)
    
@joshuafayallen joshuafayallen added the bug Something isn't working label Jul 2, 2024
@joshuafayallen joshuafayallen changed the title Weird behavior when using mclapply Weird behavior when using mapply Jul 2, 2024
@joshuafayallen joshuafayallen changed the title Weird behavior when using mapply Weird behavior when using mcreplicate Jul 2, 2024
@juliasilge
Copy link
Contributor

I can reproduce this problem only in Positron (the code runs fine in RStudio). Here is the full traceback:

Error in `bootstrap_sample()`:
! 'list' object cannot be coerced to type 'double'
Hide Traceback
     ▆
  1. └─synthdid::panel_estimate(...)
  2.   └─base::mapply(...)
  3.     └─synthdid (local) `<fn>`(dots[[1L]][[1L]], dots[[2L]][[1L]])
  4.       ├─stats::vcov(e, method = infmethod, ncores = mccores, replications = reps)
  5.       └─synthdid:::vcov.synthdid_estimate(...)
  6.         └─synthdid:::bootstrap_se(object, replications, ncores)
  7.           ├─stats::sd(bootstrap_sample(estimate, replications, ncores))
  8.           │ └─stats::var(...)
  9.           │   └─base::is.data.frame(x)
 10.           └─synthdid:::bootstrap_sample(estimate, replications, ncores)

@juliasilge juliasilge changed the title Weird behavior when using mcreplicate "'list' object cannot be coerced to type 'double'" running R code Jul 3, 2024
@DavisVaughan
Copy link
Contributor

I see this when debugging it, there's a call to mcreplicate, which uses mclapply. Not entirely sure what's going on with the 5th core not returning a value

Screenshot 2024-07-03 at 3 17 22 PM

@DavisVaughan
Copy link
Contributor

Minimal reprex

fn <- function(i) {
  if (i == 3) {
    cat("hiya")
  }
  i
}

parallel::mclapply(
  X = 1:10,
  FUN = fn,
  mc.cores = 5
)

Something about cat() while on the multicore works freaks the worker out, and I think it crashes? Note that I only call cat() on iteration 3 but it forces iteration 8 (same worker) to also not return a value.

> fn <- function(i) {
-   if (i == 3) {
-     cat("hiya")
-   }
-   i
- }
- 
- parallel::mclapply(
-   X = 1:10,
-   FUN = fn,
-   mc.cores = 5
- )
Warning message:
In parallel::mclapply(X = 1:10, FUN = fn, mc.cores = 5) :
  scheduled core 3 did not deliver a result, all values of the job will be affected
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
NULL

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

[[7]]
[1] 7

[[8]]
NULL

[[9]]
[1] 9

[[10]]
[1] 10

@joshuafayallen
Copy link
Author

This may be anticipated but when I render a reprex it behaves as expected

fn <- function(i) {
  if (i == 3) {
    cat("hiya")
  }
  i
}

parallel::mclapply(
  X = 1:10,
  FUN = fn,
  mc.cores = 5
)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
#> 
#> [[4]]
#> [1] 4
#> 
#> [[5]]
#> [1] 5
#> 
#> [[6]]
#> [1] 6
#> 
#> [[7]]
#> [1] 7
#> 
#> [[8]]
#> [1] 8
#> 
#> [[9]]
#> [1] 9
#> 
#> [[10]]
#> [1] 10

Created on 2024-07-03 with reprex v2.1.0

Wheareas outside of a reprex I get the same error as Davis

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Jul 3, 2024

I think it is going to be very unsafe to use forked (multicore) code in ark, at least until we can take a closer look at this and see if it is even possible to do this safely. It is highly likely that forking is going to interfere with the way that we communicate to our frontends through Jupyter comms. The cat() call writes to stdout() and it seems like that isn't working correctly, though I'm not quite sure why yet.

It should be safe to use multi-process parallelism, i.e. PSOCK clusters created with parallel::makePSOCKcluster() or future::multisession()


R docs in ?mcfork even say

It is strongly discouraged to use mcfork and the higher-level functions which rely on it (e.g., mcparallel, mclapply and pvec) in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes)

And I know RStudio has been fighting issues like this for a long time

@DavisVaughan
Copy link
Contributor

@joshuafayallen if you control the call to mcreplicate and set refresh = FALSE to avoid the cat() call, it seems to work for this specific case, but I can't make any promises about everything else also working right.

# doesnt work
mcreplicate::mc_replicate(
  n = 5,
  expr = 1 + 1,
  mc.cores = 5
)

# works
mcreplicate::mc_replicate(
  n = 5,
  expr = 1 + 1,
  mc.cores = 5,
  refresh = FALSE
)

@joshuafayallen
Copy link
Author

@DavisVaughan Thank you so much!

@jennybc
Copy link
Member

jennybc commented Jul 3, 2024

This may be anticipated but when I render a reprex ...

Yeah reprex takes this off to an entirely separate process, so it's a totally different exercise, even when the reprex-ing happens in Positron.

@joshuafayallen
Copy link
Author

@jennybc Yeah that makes sense. After spending a few hours when I went to render the reprex I thought I was losing my mind

@kevinushey
Copy link
Contributor

Is there any hope of supporting this in ark via things like https://docs.rs/libc/latest/libc/fn.pthread_atfork.html to close the relevant communication channels in forked sub-processes? RStudio does something somewhat similar; we try to keep track of which process is the "main" process after a fork occurs, and let "child" processes continue execution but forbid certain things that only the main process should do.

@lionel-
Copy link
Contributor

lionel- commented Jul 5, 2024

Or use the atfork hook to fail/crash with a proper error message? I'd be tempted to only support truly cross-platform and well specified approaches in ark. R can always be invoked in a separate process if fork-parallelism is needed.

@juliasilge juliasilge changed the title "'list' object cannot be coerced to type 'double'" running R code Problem with forked (multicore) code in ark Jul 5, 2024
@juliasilge juliasilge added this to the Future milestone Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lang: r support
Projects
None yet
Development

No branches or pull requests

6 participants