## Add available processors

Julia interpreter starts with one core process (main julia server)

In [13]:
ENV["LINES"] = 10
using Distributed
nprocs()

9

To avoid over-allocating the number of workers, we check first if currently we have one processor and then allocate 3 more processors.

In [14]:
nprocs() == 1 && addprocs(;exeflags="--project")

false

Let's validate the number of workers we have, not counting the main Julia server (worker id=1).

In [15]:
workers()

8-element Vector{Int64}:
 2
 3
 4
 ⋮
 8
 9

## Remote Calls and References

We use `@spawn` macro to dynamically assign a task to an available remote worker.

In [16]:
ref = @spawn sin(π/5)

Future(2, 1, 10, nothing)

The return type is a Future because the value may happen in the future or later due to network/communication delays. This is a non-blocking call which means Julia main processor does not wait for the task to finish. If you want for Julia to wait, you can use `@sync` command but it will defeat the purpose of parallelization if you wait for the remote call to finish before doing other independent tasks.

To get the value at a later time, you can use `fetch`. This is a blocking call so Julia waits until the `fetch` call gets its value.

In [17]:
fetch(ref)

0.5877852522924731

Here's an example of loops with remote calls. For simpler discussion, we use `sin` task but ideally, the task ideal for remote calls are those that takes time so that they can be run in parallel in the background.

We declare first an `Array` of `Futures` which will be populated by `Future` values by `@spawn` inside the loop.

In [18]:
n=10
refs = Array{Future,1}(undef,n)
for i = 1:n
    refs[i] = @spawn sin(i)
end

`refs` now contain an Array of Futures.

In [19]:
refs

10-element Vector{Future}:
 Future(3, 1, 12, nothing)
 Future(4, 1, 13, nothing)
 Future(5, 1, 14, nothing)
 ⋮
 Future(3, 1, 20, nothing)
 Future(4, 1, 21, nothing)

We can map the `fetch` function to each of the `Futures` and aggregate (reduce) them by getting the sum.

In [20]:
reduce(+,map(fetch,refs))

1.4111883712180104

A more elegant way to do this is using `@distributed` as show below.

In [21]:
res=@distributed (vcat) for i=1:n
    sin(i)
end

10-element Vector{Float64}:
  0.8414709848078965
  0.9092974268256817
  0.1411200080598672
  ⋮
  0.4121184852417566
 -0.5440211108893698

In [22]:
reduce(+,res)

1.4111883712180104

You can also replace (vcat) by (+) in the `@distributed`.

In [23]:
res = @distributed (+) for i in 1:10
    println("processing: ",i)
    sin(i)
end

      From worker 4:	processing: 5
      From worker 3:	processing: 3
      From worker 3:	processing: 4


1.4111883712180104

      From worker 8:	processing: 9
      From worker 2:	processing: 1


In [48]:
[@spawn sin(i) for i in 1:10]  .|> fetch   |> sum

1.4111883712180104

Here's another example which concatenates each `DataFrame` containing worker and their corresponding task result.

You will notice the pattern of task assignment. Workers are rotated sequentially in the beginning and then any idle worker will get the next task.

In [25]:
@everywhere using DataFrames
res=@distributed (vcat) for i = 1:10
    println((i,sin(i)))
    DataFrame(worker=myid(), vals=sin(i))
end
res

      From worker 9:	processing: 10
      From worker 2:	(1, 0.8414709848078965)
      From worker 3:	(3, 0.1411200080598672)
      From worker 8:	(9, 0.4121184852417566)
      From worker 6:	(7, 0.6569865987187891)
      From worker 9:	(10, -0.5440211108893698)
      From worker 4:	(5, -0.9589242746631385)
      From worker 5:	(6, -0.27941549819892586)
      From worker 7:	(8, 0.9893582466233818)
      From worker 2:	(2, 0.9092974268256817)
      From worker 3:	(4, -0.7568024953079282)


Unnamed: 0_level_0,worker,vals
Unnamed: 0_level_1,Int64,Float64
1,2,0.841471
2,2,0.909297
3,3,0.14112
4,3,-0.756802
5,4,-0.958924
6,5,-0.279415
7,6,0.656987
8,7,0.989358
9,8,0.412118
10,9,-0.544021


## Channel

In [26]:
function producer(c::Channel)     
    for n=1:10
       put!(c, n*n)
    end
end

producer (generic function with 1 method)

In [27]:
task = Channel(producer)

Channel{Any}(0) (1 item available)

In [28]:
take!(task)

1

In [29]:
for tsk in Channel(producer)
    @spawn println("received task: ",tsk)
end

      From worker 5:	received task: 1
      From worker 7:	received task: 9
      From worker 2:	received task: 36
      From worker 3:	received task: 49
      From worker 4:	received task: 64


In [30]:
[ @spawn tsk for tsk in Channel(producer)] .|> fetch |> sum

      From worker 6:	received task: 4
      From worker 8:	received task: 16
      From worker 9:	received task: 25
      From worker 5:	received task: 81
      From worker 6:	received task: 100


385

## Multi-threading

In [31]:
@everywhere using Base.Threads

In [32]:
nthreads()

4

In [33]:
@threads for i = 1:10
    id = threadid() 
    println("threaid: ",id)
end

threaid: 1
threaid: 1
threaid: 1
threaid: 4
threaid: 3
threaid: 3
threaid: 4
threaid: 2
threaid: 2
threaid: 2


In [34]:
@sync @distributed for i=1:nprocs()
    @threads for j=1:nthreads()
        id = Threads.threadid() 
        println("threaid: ",id)
    end
end

      From worker 9:	threaid: 4
      From worker 3:	threaid: 2
      From worker 3:	threaid: 4
      From worker 9:	threaid: 3
      From worker 3:	threaid: 3
      From worker 9:	threaid: 2
      From worker 4:	threaid: 4
      From worker 3:	threaid: 1
      From worker 4:	threaid: 2
      From worker 4:	threaid: 3
      From worker 9:	threaid: 1
      From worker 8:	threaid: 2
      From worker 8:	threaid: 4
      From worker 6:	threaid: 3
      From worker 8:	threaid: 3
      From worker 6:	threaid: 4
      From worker 6:	threaid: 2
      From worker 4:	threaid: 1
      From worker 9:	threaid: 1
      From worker 9:	threaid: 3
      From worker 9:	threaid: 2
      From worker 9:	threaid: 4
      From worker 6:	threaid: 1
      From worker 8:	threaid: 1
      From worker 7:	threaid: 2
      From worker 7:	threaid: 4
      From worker 7:	threaid: 3
      From worker 5:	threaid: 1
      From worker 7:	threaid: 1
      From worker 5:	threaid: 4
      From worker 5:	threaid: 3
      Fr

Task (done) @0x000000011dc7aa90

## Monte-Carlo Simulation to estimate $\pi$

In [35]:
#==========================#
# monte-carlo simulation
# π r^2 / 4 r^2 = s/n 
#==========================#


@everywhere function isInside() 
    x = rand()
    y = rand()
    x^2 + y^2 < 1 ? 1 : 0
end;

@everywhere function ppi(n)
    s=@distributed (+) for i = 1:n
        isInside()
    end
    4s/n
end;

function pi(n)
    s=0.0
    for i = 1:n
        s+=isInside()
    end
    4s/n
end;


In [36]:
@time ppi(10^9)

  3.775918 seconds (82.53 k allocations: 4.929 MiB, 0.45% compilation time)


3.141545476

In [42]:
@time pi(10^9)

 13.829326 seconds


3.141555412

## Cross-validation in parallel

In [38]:
@everywhere using RDatasets
@everywhere using Statistics
@everywhere using DecisionTree
@everywhere using Random

@everywhere function partitionTrainTest(data, at = 0.7)
    n = nrow(data)
    idx = shuffle(1:n)
    train_idx = view(idx, 1:floor(Int, at*n))
    test_idx = view(idx, (floor(Int, at*n)+1):n)
    return (data[train_idx,:], data[test_idx,:])
end


@everywhere function irisAcc() 
    iris = dataset("datasets", "iris")
    train,test = partitionTrainTest(iris, 0.7) # 70% train
    xtrain = train[:, 1:4] |>Matrix;
    ytrain = train[:, 5] |> Vector{String}
    xtest = test[:, 1:4] |>Matrix;
    ytest = test[:, 5] |> Vector{String}
    model = build_forest(ytrain, xtrain, 2, 4, 0.5, 6);
    pred = apply_forest(model,xtest);
    sum(ytest .== pred) / length(pred)
end

In [39]:
irisAcc()

0.9555555555555556

In [47]:
function mserial(n)
    sm=0.0
    for i=1:n
         sm += irisAcc()
    end
    print("Overall Acc:",sm/n*100.0)
end
@time mserial(1000)

Overall Acc:94.47555555555466  0.498136 seconds (1.08 M allocations: 174.963 MiB, 7.14% gc time)


In [46]:
function mparallel(n)
    s=@distributed (+) for i=1:n
        irisAcc()
    end
    return s/n*100.0
end
@time acc = mparallel(1000)

  0.269780 seconds (32.39 k allocations: 1.789 MiB, 16.06% compilation time)


94.69777777777763

In [45]:
println("Overall Acc:",acc)

Overall Acc:94.59333333333319
