<div class="alert alert-info">
    <h1> About these notebooks </h1>
    <p> If you opened this notebook in Binder, it is running on a server that was launched <b>just</b> for you. Your changes will be reset once server restarts due to inactivity, so don't rely on it for anything you want to last. Likewise, feel free to try and tweak everything you want, since you won't affect the original repository. And <b>if you have never used a Jupyter noteboook: </b> for running a cell just press <tt>Shift + Enter.</tt> </p>
    <p>Enjoy!</p>
</div>

<div class="alert alert-danger">
    <h1> About the parallelization results</h1>
    <p> This notebook is designed to explore parallelization, so it may make little sense to run in on just one CPU. </p>
    <ul><li><b>If you are running it on Binder</b>, four threads are already enabled and you don't need to take any further action. </li>
        <li><b>If you are running it on your own computer</b>, you may have to enable them manually, since by default Julia starts using only 1 CPU. <a href="https://docs.julialang.org/en/v1/manual/parallel-computing/#man-multithreading-1">Here you can find instructions for your machine</a>.  You may have to restart the Notebook for the effects to take place.</li></ul>
</div>

# 5.1 Julia and parallel computing

One of the aspects of Julia that makes it specially suitable for scientific computing is that support for parallelism is integrated in the very core of the language. Besides, different approches to parallelism are possible: from very basic instructions such as which data send to which processor, to nonchalant macros that send everything to all the threads and let them work their way. In this notebook we will explore briefly the second scenario, which is specially useful in the case that we have a lot of (almost) independent processes that can run without conflicting each other.

But before, run the following cell to check the number of threads the Notebook can use. If it is only one, check the instructions in the red box above. 

In [None]:
Threads.nthreads()

# 5.2 The `@threads` macro

We will start by using `Threads.@threads`. Place this macro before a `for` loop and it will send the instructions of each iteration to one of the processors. Cool, right? This is very well suited for the so called *embarrassingly parallel* tasks; i.e., when little or no manipulation is needed to separate the problem into a number of parallel tasks. Check the following code, taken from Julia's official documentation, that in each iteration writes the ID of the thread working on `a[j]` to `a[j]` itself:

In [None]:
using Base.Threads

a = zeros(1,10)
@threads for j = 1:10
           a[1,j] = Threads.threadid()
       end

a

Nested loops also work seamlessly:

In [None]:
using Base.Threads

a = zeros(3,10)
@threads for j = 1:10
    for i =1:3
       a[i,j] = Threads.threadid()
   end
end

a

Neat! Do you feel like implementing this into the Game of Life iteration?

#### Exercise 6:

Make a multithreaded version of the function below, which implements a single iteration in the game of life for the case of zero boundary conditions.

**Hint**: To see if your implementation is right you can run the test in the next cell. 

In [None]:
""" Game of Life iteration for the zero boundary condition case"""
function game_it(A)
    M, N = size(A)
    B = zeros(Int64, M, N)
    for j in 2:N-1
        for i in 2:M-1
            sum_neigh = A[i-1,j-1] + A[i-1,j] + A[i-1,j+1] + A[i,j-1] + A[i,j+1] + A[i+1,j-1] + A[i+1,j] + A[i+1,j+1]
            if sum_neigh == 3 || (sum_neigh == 2 && A[i,j] == 1)
                B[i,j] = 1
            end
        end
    end
    return B
end

""" Game of Life iteration for the zero boundary condition case, with parallelization"""
function game_it_parallel(A)
    M, N = size(A)
    B = zeros(Int64, M, N)
    
    # Your code goes here
    
    return B
end

Let's run some test to see if it works how it should

In [None]:
function test_implementation()
    for i in 1:100
        A = rand([0,1], 100,100)
        @assert game_it(A) == game_it_parallel(A) # asserts checkes if the condition holds; if not it throws an error
    end
    println("Test passed")
end

test_implementation()

So the implementation is at least right. Is it also faster? Let's check:

#### Exercise 7

Run the benchmarks and the plot. What do you observe? Is it logical? Try to explain the results

In [None]:
using BenchmarkTools: @benchmark

A = rand([0,1],1000,1000)
bench1 = @benchmark game_it(A)

In [None]:
bench2 = @benchmark game_it_parallel(A)

In [None]:
t1 = bench1.times/1e6
t2 = bench2.times/1e6

bins = range(0,maximum([t1;t2]),length = 20)
using Plots
histogram(t1, label = "serial",   alpha = 0.8,bins = bins,normalize = true)
histogram!(t2,label = "parallel", alpha = 0.8,bins = bins,normalize = true)

xlabel!("time (ms)")
ylabel!("Proportion of trials")

# 5.3 The dangers of `@threads`

This macro looks pretty useful: with virtually no effort we transformed our code from parallel to serial. Too good to be true! And indeed, in some sense, it is: `@threads` is only meant to be used for reasonably simple tasks. Check how `@threads` fails miserably in the following example:

In [None]:
function parallel_sum(A)
    s = 0
    @threads for i in 1:length(A)
        s += A[i]
    end
    s
end

A = ones(10000)

parallel_sum(A)

This problem comes from updating the value `s` using an outdated value of it. Of course there are ways to handle this: in the [official documentation](https://docs.julialang.org/en/v1/manual/parallel-computing/#Atomic-Operations-1) you can find a lot of info if you are interested.

# 5.4 So, should I use `@threads`?

Well, if your use case won't cause memory conflicts (for example in Monte Carlo simulation, when the runs are pretty much independent), you may well use it - and profit from it!

There are of course more advanced parallelization options available in Julia, but I hope that these simple examples helped to demonstrate how simple yet powerful Julia can be. 