# Parallel Computing
Julia has multiple ways of doing parallel computations. There's experimental multi-threading support and support for distributed computing. We'll touch upon the basics here to give you an idea what's possible.

### Threading

Threading is built-in nowadays, and we'll ignore the task part here, but go straight for speeding up computations. You can check whether this notebook actually already supports multiple threads:

In [4]:
Threads.nthreads()

8

Each thread has its own id, and we can use these. Let's do it in parallel as well.

In [5]:
a = zeros(Threads.nthreads()*2)
Threads.@threads for i = 1:length(a)
   a[i] = Threads.threadid()
end
a

16-element Vector{Float64}:
 1.0
 1.0
 2.0
 2.0
 3.0
 3.0
 4.0
 4.0
 5.0
 5.0
 6.0
 6.0
 7.0
 7.0
 8.0
 8.0

However, threads are not simple, because you introduce so called race conditions. Each thread on its will do its thing, without synchronizing with other threads. They can all modify the same value, or read values out of order, leading to unpredictable results.

In [14]:
sum = 0
@Threads.threads for i in 1:10000
  global sum += 1
end
sum

7405

You can prevent this with setting the sum to be an `Atomic` entity, it should only be accessed by one thread at a time. Another way would be synchronizing, but that introduces more overhead.

In [15]:
sum = Threads.Atomic{Int}(0)
@Threads.threads for i in 1:10000
    Threads.atomic_add!(sum, 1)
end
sum


Base.Threads.Atomic{Int64}(10000)

### Distributed
Instead of running threads, you can also run multiple Julia processes and let them communicate (or combine them).
Threading knows about your local memory, but the next process doesn't.

https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1


Let's add two new worker processes, which can be used for computations.

In [16]:
using Distributed

In [17]:
addprocs(2)

2-element Vector{Int64}:
 2
 3

We can use the `@distributed` macro to distribute this for loop over all worker processes. Workers make copies of the variables used in this loop. So if we want to write to the same Array on the master process, we need to use the package `SharedArrays`.

In [31]:
using SharedArrays

a = SharedArray{Float64}(10)
@info a  # empty array

@distributed for i = 1:10
    a[i] = i
end

┌ Info: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
└ @ Main In[31]:4


Task (runnable) @0x0000000119ac28c0

In [30]:
a

10-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

For longer running tasks, we can use `pmap`. It takes a function and an iterable. To use functions outside, we use @everywhere to copy these functions to all worker processes.

In [44]:
addprocs(100)  # don't repeat this cell too much!

@everywhere function slowtask(_)
    sleep(5)
    getpid()
end

A = rand(100)

@time pmap(slowtask, A)

  5.741226 seconds (97.52 k allocations: 3.866 MiB, 0.61% gc time, 1.50% compilation time)


100-element Vector{Int32}:
 40089
 40082
 40090
 40095
 40086
 40092
 40084
 40085
 40087
 40083
 40094
 40091
 40121
     ⋮
 40165
 40176
 40163
 40175
 40172
 40173
 40171
 40181
 40178
 40180
 40179
 40177

In [45]:
rmprocs(workers())

Task (done) @0x000000011fa3ecb0