# parallel R lang

_installation_

see: https://stackoverflow.com/questions/76158663/r-kernel-on-vscode-jupyter-notebook-not-appearing

- make sure R is installed. then run `R` in terminal to open R console
- `install.packages('IRkernel')`
- `IRkernel::installspec()`
- open vscode, open a new `.ipynb` file, select `R` kernel

_parallelism_

see: https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html

- R is single-threaded by default
- for multi-threading, use foreign function interface (FFI) to call C functions
- for multi-processing, use `parallel` or similar packages


In [50]:
if (!require(pacman)) {
  install.packages(pacman, repos = "http://cran.us.r-project.org")
}
library(pacman)
pacman::p_load(parallel)

wait_times <- c(rep(100, 4), rep(1000, 4))
dummy_task <- function(wait_ms) {
  Sys.sleep(wait_ms / 1000)
  Sys.getpid()
}

num_cores <- 2
cat("number of cores:", num_cores, "\n")

# lapply
p1 <- proc.time()
res <- lapply(wait_times, dummy_task)
t <- proc.time() - p1
procs <- unlist(res) - min(unlist(res))
cat("process ids in lapply:", procs, "\n")
cat("lapply time:", t[3], "\n")

# mclapply
p1 <- proc.time()
res <- mclapply(wait_times, dummy_task, mc.cores = num_cores)
t <- proc.time() - p1
procs <- unlist(res) - min(unlist(res))
cat("process ids in mclapply:", procs, "\n")
cat("mclapply time:", t[3], "\n")

# mclapply dynamic scheduling
p1 <- proc.time()
res <- mclapply(wait_times, dummy_task, mc.cores = num_cores, mc.preschedule = FALSE)
t <- proc.time() - p1
procs <- unlist(res) - min(unlist(res))
cat("process ids in mclapply (dynamic scheduling):", procs, "\n")
cat("mclapply dynamic (scheduling time):", t[3], "\n")

number of cores: 2 


process ids in lapply: 0 0 0 0 0 0 0 0 
lapply time: 4.437 
process ids in mclapply: 0 1 0 1 0 1 0 1 
mclapply time: 2.221 
process ids in mclapply (dynamic scheduling): 0 1 2 3 5 6 7 8 
mclapply dynamic (scheduling time): 2.254 


_task scheduler_

- default scheduler:
	- create as many processes as there are cores. then assign tasks to processes in round-robin fashion.
	- bad load balancing: heavy tasks might be assigned to the same process.
	- fast scheduler: static, simple, no overhead.

- dynamic scheduler:
	- create a new process per task.
	- bad if process overhead is larger than task time.
