# Parallel and Distributed Computing

<div style="border:2px solid gray; padding:10px; width: 95%;">

💡 **Distributed Parallelism vs Shared-Memory Parallelism**

- **Distributed Parallelism**: A computing paradigm where a collection of independent computers (nodes), typically interconnected through a network, work together on a task. Each node operates using its own local memory and communicates with other nodes to achieve a common goal.

- **Shared-Memory Parallelism**: A computing model where multiple processors (cores) within a single machine access a common shared memory space, allowing for high-speed data exchange and coordination between the processors.

</div>

## Shared-Memory Parallelism

### Out of the Box

Julia's standard library (and underlying libraries like OpenBLAS or MKL) is already optimized to take advantage of multiple cores for many operations.

**Example:** Parallel matrix multiplication.

In [1]:
# Create two large random matrices
A = rand(10000, 10000)
B = rand(10000, 10000)


10000×10000 Matrix{Float64}:
 0.0880308  0.462746   0.0349101   …  0.814143  0.711997  0.362411
 0.551543   0.991685   0.478472       0.876353  0.143858  0.163324
 0.682887   0.728967   0.0768402      0.1669    0.31213   0.672503
 0.907196   0.114813   0.882163       0.576486  0.147196  0.307042
 0.545289   0.854711   0.266293       0.389206  0.540322  0.921728
 0.541738   0.745926   0.00142832  …  0.207691  0.265996  0.71208
 0.942256   0.486721   0.439421       0.363547  0.800713  0.0898456
 0.0213028  0.846825   0.380308       0.651782  0.231789  0.377306
 0.0188436  0.264061   0.14944        0.915886  0.944509  0.239387
 0.546724   0.272917   0.67564        0.497031  0.101275  0.321173
 ⋮                                 ⋱                      
 0.0209981  0.362316   0.760614       0.879661  0.690113  0.058412
 0.873728   0.0935441  0.211232       0.88765   0.776647  0.0422653
 0.74894    0.681661   0.149213       0.240969  0.602014  0.478466
 0.645563   0.544908   0.298986       0.

This multiplication will run in parallel on all available CPU cores:

In [2]:
C = A * B  


10000×10000 Matrix{Float64}:
 2515.78  2495.12  2501.34  2503.07  …  2505.85  2467.27  2508.26  2482.7
 2505.42  2478.15  2483.36  2499.53     2493.8   2461.16  2509.98  2471.6
 2491.55  2492.19  2493.18  2491.78     2508.48  2487.95  2510.86  2485.45
 2505.19  2495.36  2513.08  2491.19     2519.35  2490.09  2528.75  2493.04
 2530.33  2518.57  2512.8   2516.23     2523.52  2509.21  2541.35  2506.34
 2479.6   2493.09  2482.24  2483.6   …  2492.3   2465.81  2508.81  2470.98
 2526.61  2509.97  2537.84  2507.06     2519.31  2505.89  2545.45  2507.47
 2527.82  2504.72  2526.28  2500.57     2525.36  2497.77  2536.79  2505.71
 2513.64  2513.99  2502.87  2491.04     2511.22  2497.12  2526.45  2498.72
 2512.05  2527.01  2508.93  2490.9      2510.17  2493.03  2525.36  2494.2
    ⋮                                ⋱                             
 2555.22  2547.32  2556.49  2540.53     2550.73  2531.81  2568.95  2534.09
 2487.69  2480.9   2464.52  2458.69     2498.29  2466.57  2494.53  2483.58
 2509.

## Distributed Parallelism

The `Distributed` package in Julia provides functionality for parallel and distributed computing, including:

- Management of worker processes.
- Remote execution of functions.
- Inter-process communication.
- Parallel execution of loops and tasks.
- Data movement and aggregation across workers.
- Asynchronous programming support.
- Error handling in a distributed environment.

In [3]:
using Pkg
Pkg.add("Distributed")

[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `~/Documents/Work/Training/point8/data-science-learning-paths/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Documents/Work/Training/point8/data-science-learning-paths/Manifest.toml`


### An Embarrassingly Parallel Example

<div style="border:2px solid gray; padding:10px; width: 95%;">


💡 **Estimating $\pi$ via Monte Carlo approximation**

Curious why this works? Read more on [how to calculate $\pi$ via Monte Carlo approximation](https://curiosity-driven.org/pi-approximation))

</div>

In [21]:
function calculate_pi(n)
    inside = 0
    for i = 1:n
        x = rand()
        y = rand()
        inside += (x^2 + y^2) <= 1.0 ? 1 : 0
    end
    return 4 * inside / n
end

calculate_pi (generic function with 1 method)

In [16]:
calculate_pi(1e10)

3.1415852016

In [6]:
using Distributed

# Add worker processes equal to the number of available cores
addprocs(Sys.CPU_THREADS)


4-element Vector{Int64}:
 2
 3
 4
 5

The @everywhere macro in Julia is used to execute a command on all available worker processes in a distributed computing environment. When you're working with multiple processes (for example, in parallel computing tasks), the @everywhere macro ensures that the enclosed expression is evaluated on each process.

In [7]:
@everywhere println("Hello from process $(myid())")


Hello from process 1
      From worker 4:	Hello from process 4
      From worker 3:	Hello from process 3
      From worker 2:	Hello from process 2
      From worker 5:	Hello from process 5


Prefixing a function definition with @everywhere is done to define the function across all worker processes in a distributed computing environment. Each process has its own separate workspace and does not automatically have access to the functions and variables defined in the main process.

In [24]:
@everywhere begin
    """
        count_inside(n::Int)

    Count the number of points that fall inside the unit circle by generating `n` random points.
    A point (x, y) is inside the unit circle if x^2 + y^2 <= 1.

    # Arguments
    - `n::Int`: The number of random points to generate.

    # Returns
    - `Int`: The count of points that fall inside the unit circle.
    """
    function count_inside(n::Int)
        inside = 0
        for i = 1:n
            x = rand()
            y = rand()
            inside += (x^2 + y^2) <= 1.0 ? 1 : 0
        end
        return inside
    end
end


In [27]:

"""
    calculate_pi_parallel(total_points::Int)

Calculate an estimate of π using the Monte Carlo method, in parallel.

The function distributes the task of generating random points and checking whether they fall
inside the quarter of a unit circle across multiple worker processes. It then collects the
results from all workers to calculate the final estimate of π.

# Arguments
- `total_points::Int`: The total number of random points to use for the estimation.

# Returns
- `Float64`: An estimate of π.
"""
function calculate_pi_parallel(total_points::Int)
    # Split the work across the workers
    points_per_worker = div(total_points, nworkers())
    remaining_points = total_points % nworkers()
    
    # Use @distributed for parallel reduction, summing up the results from each worker
    inside_total = @distributed (+) for i = 1:nworkers()
        # Handle any remaining points in the last worker
        if i == nworkers()
            count_inside(points_per_worker + remaining_points)
        else
            count_inside(points_per_worker)
        end
    end
    
    # Calculate pi using the aggregated result
    return 4 * inside_total / total_points
end

calculate_pi_parallel

In [30]:

n = Int(1e10)  
pi_estimate = calculate_pi_parallel(n)

println("Estimate of π: $pi_estimate")

Estimate of π: 3.1415767032


---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2024 [Point 8 GmbH](https://point-8.de)_