## Chapter 18: Parallel Computing

Briefly, parallel computing is a method of running code on multiple processors (or multiple cores of the same processor) at the same time. In general, this is a difficult task depending on where data is stored and retrieved. The Julia Documentation on parallel computing is a good place to start.

The following is a simple function that counts the number of heads out of n coin flips:

In [None]:
function countHeads(n::Int)
    c::Int = 0
    for i=1:n
        c += rand(Bool)
    end
    c
end

This finds the fraction of heads from 2 billion coin flips. 

In [None]:
@time countHeads(2*10^9)/(2*10^9)

The `Distributed` package contains a lot of functionality to use the multiple cores in a CPU

In [3]:
using Distributed

The following will add a "processor" or core

In [None]:
addprocs(1)

And now we have the following number of cores:

In [16]:
nprocs()

1

The following is the same function as above, but is avaiable on all cores:

In [None]:
@everywhere function countHeads(n::Int)
   c::Int = 0
   for i=1:n
       c += rand(Bool)
   end
   c
end

Here's a simple way to "send" the functions to the two cores:

In [None]:
a= @spawn countHeads(10^9)
b= @spawn countHeads(10^9)

Note that that took no time.  That's because it just sent the code, and didn't run it. The following now will run it and add the results

In [None]:
@time fetch(a)+fetch(b)

Note that this is faster than the original, but not much.  Basically, there is overhead into splitting code up and then bringing it back together.  Also, as we add more cores, this can be cumbersome.   We're going to see an alternative way.  This function will add an appropriate number of cores for your computer

The following gives the information about the individual cores in the CPU.

In [15]:
Sys.cpu_info()

8-element Array{Base.Sys.CPUinfo,1}:
 Intel(R) Core(TM) i5-1030NG7 CPU @ 1.10GHz: 
        speed         user         nice          sys         idle          irq
     1100 MHz      72853 s          0 s      45243 s     103692 s          0 s
 Intel(R) Core(TM) i5-1030NG7 CPU @ 1.10GHz: 
        speed         user         nice          sys         idle          irq
     1100 MHz      17351 s          0 s       9988 s     194398 s          0 s
 Intel(R) Core(TM) i5-1030NG7 CPU @ 1.10GHz: 
        speed         user         nice          sys         idle          irq
     1100 MHz      67626 s          0 s      35683 s     118428 s          0 s
 Intel(R) Core(TM) i5-1030NG7 CPU @ 1.10GHz: 
        speed         user         nice          sys         idle          irq
     1100 MHz      16812 s          0 s       8814 s     196110 s          0 s
 Intel(R) Core(TM) i5-1030NG7 CPU @ 1.10GHz: 
        speed         user         nice          sys         idle          irq
     1100 MHz      619

This will add all available cores()

In [17]:
addprocs()

8-element Array{Int64,1}:
 2
 3
 4
 5
 6
 7
 8
 9

In [None]:
@time let
 nheads = @distributed (+) for i = 1:2*10^9
   Int(rand(Bool))
 end
end

As you can see, this has helped out a bit for time

#### 18.2: Writing a parallel card simulator

We now will look at writing a parallel version of the PlayingCards stuff:

In [None]:
include("../julia-files/PlayingCards.jl")
using .PlayingCards, Random

Here's the original runTrials function:

In [None]:
function runTrials(trials::Int,f::Function)
    local deck=map(Card,1:52)
    local numhands=0
    for i=1:trials
        shuffle!(deck)
        h = Hand(deck[1:5])
        if(f(h))
            numhands+=1
        end
    end
    numhands
end

In [None]:
@time runTrials(10_000_000,isFullHouse)

Here's a parallel version of this.  There are two important aspects of this:
* use `@everywhere` on all modules/functions that you need
* we switch the for loop to a distributed loop.  

In [None]:
@everywhere include("../julia-files/PlayingCards.jl")
@everywhere using .PlayingCards, Random

In [None]:
@everywhere function paraCountHands(trials::Integer,f::Function)
  local deck=map(Card,1:52)
  function checkHand(f::Function) ## shuffle the deck then check the hand.
    shuffle!(deck)
    f(Hand(deck[1:5]))
  end
  @distributed (+) for i = 1:trials
    Int(checkHand(f))
  end  
end

In [None]:
@time fh = paraCountHands(10_000_000,isFullHouse)

This has cut the time by a significant amount. 

#### 18.3 A parallel map function

In [None]:
num_coins = 1_000_000_000*ones(Int64,12)

Here's a parallell map function

Running this, you'll see an error, go back above and rerun the @everywhere countHeads cell

In [None]:
@time pmap(countHeads,num_coins)

And here is the regular version of the `map` function

In [None]:
@time map(countHeads,num_coins)

#### 18.4 Shared Arrays

In [None]:
using Plots

One of the hard things to code in parallel manner is when there is something that needs to be accessed in a parallel manner.  It's difficult to just break up the code.  This example shows that when we have a array that we wish to smooth out, we can use a Shared Array

In [None]:
arr = [50+50*sin(x/1_000_000)+25*rand() for x=1:10_000_000];

In [None]:
plot(arr[1:5000:end])

The following function does a windowed mean, that is for a part of the array it calculates the mean of a subarray. 

In [None]:
function windowMean(arr::Vector{T},i::Integer,width::Integer) where T <: Real
  ## find a range of the window, making sure that it doesn't go beyond the bounds of the array
  window = max(1,i-width):min(i+width,length(arr))  
  sum(arr[window])/(last(window)-first(window)+1)
end

This now smooths the array, storing the results in `smoothed_array`

In [None]:
smoothed_array = zeros(Float64,length(arr));
@time let
  for i=1:length(arr)
    smoothed_array[i]=windowMean(arr,i,100)
  end
end

In [None]:
plot(arr[1:5000:end])
plot!(smoothed_array[1:5000:end])

In [None]:
@everywhere function windowMean(arr::Vector{T},i::Integer,width::Integer) where T <: Real
  ## find a range of the window, making sure that it doesn't go beyond the bounds of the array
  window = max(1,i-width):min(i+width,length(arr))  
  sum(arr[window])/(last(window)-first(window)+1)
end

In [None]:
@everywhere using SharedArrays

In [None]:
@everywhere arr = [50+50*sin(x/1_000_000)+25*rand() for x=1:10_000_000];
@everywhere orig_arr = SharedVector(arr);
@everywhere s_arr = SharedVector(zeros(Float64,length(orig_arr)));

In [None]:
@time let
  @sync @distributed for i=1:length(orig_arr)
    s_arr[i]=windowMean(arr,i,100)
  end
end

In [None]:
plot(orig_arr[1:5000:end])
plot!(s_arr[1:5000:end])