## Turbo Charged Code: Parallel Programming

Some problems can be solved faster using multiple cores on your machine. This chapter shows you how to write R code that runs in parallel.

### How many cores does this machine have?
The parallel package has a function detectCores() that determines the number of cores in a machine.
How many cores does this machine have?

In [2]:
# Load the parallel package
# install.packages("parallel")
library(parallel)

# Store the number of cores in the object no_of_cores
no_of_cores <- detectCores()

# Print no_of_cores
print(no_of_cores)


[1] 12


### Moving to parApply
To run code in parallel using the parallel package, the basic workflow 
has three steps.

1. Create a cluster using makeCluster(). 
2. Do some work.
3. Stop the cluster using stopCluster().

The simplest way to make a cluster is to pass a number to makeCluster(). This creates a cluster of the default type, running the code on that many cores. The object dd is a matrix with 10 columns and 100 rows. To run this in parallel, you swap apply() for parApply().  The arguments to this function are the same, except that it takes a 
cluster argument before the usual apply() arguments.

In [5]:
dd = matrix(rnorm(1000),ncol=10)

# Determine the number of available cores
detectCores()

# Create a cluster via makeCluster
cl <- makeCluster(2)

# Parallelize this code
# apply(dd, 2, median)
parApply(cl, dd, 2, median)

# Stop the cluster
stopCluster(cl)

### Using parSapply()
We previously played the following game:

1. Initialize: total = 0.
2. Roll a single die and add it to total.
3. If total is even, reset total to zero.
4. If total is greater than 10. The game finishes.
5. The game could be simulated using the play() function. 

In [6]:
play <- function() {
  total <- no_of_rolls <- 0
  while(total < 10) {
    total <- total + sample(1:6, 1)

    # If even. Reset to 0
    if(total %% 2 == 0) total <- 0 
    no_of_rolls <- no_of_rolls + 1
  }
  no_of_rolls
}

# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)

# Export the play() function to the cluster
clusterExport(cl, "play")

# Re-write sapply as parSapply
res <- parSapply(cl, 1:100, function(i) play())

# Stop the cluster
stopCluster(cl)

### Timings parSapply()
Running the dice game is embarrassingly parallel. These types of simulations usually (but not always) produce a good speed-up. 
As before, we can use microbenchmark() or system.time(). For simplicity, we'll use system.time() in this exercise.

In [9]:
# Set the number of games to play
no_of_games <- 1e5

## Time serial version
system.time(serial <- sapply(1:no_of_games, function(i) play()))

# Create a 4 core cluster object and export the play() function to it.
cl <- makeCluster(4)
clusterExport(cl, "play")

## Time parallel version
system.time(par <- parSapply(cl, 1:no_of_games, function(i) play()))

## Stop cluster
stopCluster(cl)

   user  system elapsed 
    5.3     0.0     5.3 

   user  system elapsed 
   0.02    0.03    1.97 