Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker process information useful to gather #142

Open
HenrikBengtsson opened this issue Apr 15, 2017 · 0 comments
Open

Worker process information useful to gather #142

HenrikBengtsson opened this issue Apr 15, 2017 · 0 comments

Comments

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Apr 15, 2017

When processing futures on in external R processes, it may be useful to the main/calling R process to have access to some information about that R process also before the future is resolved (cf. discussion in Issue #93), e.g.

  • R version
  • operating system
  • hostname and process ID (PID)
  • current working directory
  • ...

This information could be gathered as:

worker_info <- function() {
  list(
    r = c(R.version, os.type = .Platform$OS.type),
    system = as.list(Sys.info()),
    process = list(pid = Sys.getpid()) ,
    workdir = getwd()
  )
}

Example:

> info <- worker_info()
> str(info)
List of 3
 $ r      :List of 15
  ..$ platform      : chr "x86_64-pc-linux-gnu"
  ..$ arch          : chr "x86_64"
  ..$ os            : chr "linux-gnu"
  ..$ system        : chr "x86_64, linux-gnu"
  ..$ status        : chr ""
  ..$ major         : chr "3"
  ..$ minor         : chr "3.3"
  ..$ year          : chr "2017"
  ..$ month         : chr "03"
  ..$ day           : chr "06"
  ..$ svn rev       : chr "72310"
  ..$ language      : chr "R"
  ..$ version.string: chr "R version 3.3.3 (2017-03-06)"
  ..$ nickname      : chr "Another Canoe"
  ..$ os.type       : chr "unix"
 $ system :List of 8
  ..$ sysname       : chr "Linux"
  ..$ release       : chr "4.4.0-72-generic"
  ..$ version       : chr "#93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 2017"
  ..$ nodename      : chr "hb-x1"
  ..$ machine       : chr "x86_64"
  ..$ login         : chr "unknown"
  ..$ user          : chr "hb"
  ..$ effective_user: chr "hb"
 $ process:List of 1
  ..$ pid: int 4246

This information should be exposed in the Future API as an element of a Future object, e.g.

> f <- future(Sys.sleep(300))
> info <- f$worker

For persistent clusters such as the ones created by parallel::makePSOCKcluster() this information could be created once already at setup, e.g. plan(cluster, workers = cl). The function future::makeClusterPSOCK(), or more specifically, future::makeNodePSOCK() could even collect this information when setting up each worker and plan(cluster, workers = cl) could add it only if missing. Doing this already at setup would also have the advantage of making a first validation that the worker and the master can communicate properly (beyond setting up the connection). The disadvantage of gathering this information is a small additional overhead, but since these workers are persistent, that is they serve many futures, that should be a minor problem.

HenrikBengtsson added a commit that referenced this issue Apr 21, 2017
retrieving session information including the process ID from the
corresponding R process.  The same information is also collected by
plan(cluster) and plan(multisession) if not already available, e.g. when
parallel::makeCluster() is used instead.  This makes it possible to find
session information for a future that is not yet resolved.

(Issue #142)
HenrikBengtsson added a commit that referenced this issue Apr 25, 2017
… for makeClusterPSOCK(); introduced race condition - need more thoughts [#142]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant