# Parallel and Distributed Computing

<div style="border:2px solid gray; padding:10px; width: 95%;">

💡 **Distributed Parallelism vs Shared-Memory Parallelism**

- **Distributed Parallelism**: A computing paradigm where a collection of independent computers (nodes), typically interconnected through a network, work together on a task. Each node operates using its own local memory and communicates with other nodes to achieve a common goal.

- **Shared-Memory Parallelism**: A computing model where multiple processors (cores) within a single machine access a common shared memory space, allowing for high-speed data exchange and coordination between the processors.

</div>

## Shared-Memory Parallelism

### Out of the Box

Julia's standard library (and underlying libraries like OpenBLAS or MKL) is already optimized to take advantage of multiple cores for many operations.

**Example:** Parallel matrix multiplication.

In [None]:
# Create two large random matrices
A = rand(10000, 10000)
B = rand(10000, 10000)


This multiplication will run in parallel on all available CPU cores:

In [None]:
@time C = A * B  


## Distributed Parallelism

The `Distributed` package in Julia provides functionality for parallel and distributed computing, including:

- Management of worker processes.
- Remote execution of functions.
- Inter-process communication.
- Parallel execution of loops and tasks.
- Data movement and aggregation across workers.
- Asynchronous programming support.
- Error handling in a distributed environment.

In [None]:
using Pkg
Pkg.add("Distributed")

### An Embarrassingly Parallel Example

<div style="border:2px solid gray; padding:10px; width: 95%;">


💡 **Estimating $\pi$ via Monte Carlo approximation**

Curious why this works? Read more on [how to calculate $\pi$ via Monte Carlo approximation](https://curiosity-driven.org/pi-approximation))

</div>

In [None]:
function calculate_pi(n)
    inside = 0
    for i = 1:n
        x = rand()
        y = rand()
        inside += (x^2 + y^2) <= 1.0 ? 1 : 0
    end
    return 4 * inside / n
end

In [None]:
@time calculate_pi(1e10)

In [None]:
using Distributed

# Add worker processes equal to the number of available cores
addprocs(Sys.CPU_THREADS)


The @everywhere macro in Julia is used to execute a command on all available worker processes in a distributed computing environment. When you're working with multiple processes (for example, in parallel computing tasks), the @everywhere macro ensures that the enclosed expression is evaluated on each process.

In [None]:
@everywhere println("Hello from process $(myid())")


Prefixing a function definition with @everywhere is done to define the function across all worker processes in a distributed computing environment. Each process has its own separate workspace and does not automatically have access to the functions and variables defined in the main process.

In [None]:
@everywhere begin
    """
        count_inside(n::Int)

    Count the number of points that fall inside the unit circle by generating `n` random points.
    A point (x, y) is inside the unit circle if x^2 + y^2 <= 1.

    # Arguments
    - `n::Int`: The number of random points to generate.

    # Returns
    - `Int`: The count of points that fall inside the unit circle.
    """
    function count_inside(n::Int)
        inside = 0
        for i = 1:n
            x = rand()
            y = rand()
            inside += (x^2 + y^2) <= 1.0 ? 1 : 0
        end
        return inside
    end
end


In [None]:

"""
    calculate_pi_parallel(total_points::Int)

Calculate an estimate of π using the Monte Carlo method, in parallel.

The function distributes the task of generating random points and checking whether they fall
inside the quarter of a unit circle across multiple worker processes. It then collects the
results from all workers to calculate the final estimate of π.

# Arguments
- `total_points::Int`: The total number of random points to use for the estimation.

# Returns
- `Float64`: An estimate of π.
"""
function calculate_pi_parallel(total_points::Int)
    # Split the work across the workers
    points_per_worker = div(total_points, nworkers())
    remaining_points = total_points % nworkers()
    
    # Use @distributed for parallel reduction, summing up the results from each worker
    inside_total = @distributed (+) for i = 1:nworkers()
        # Handle any remaining points in the last worker
        if i == nworkers()
            count_inside(points_per_worker + remaining_points)
        else
            count_inside(points_per_worker)
        end
    end
    
    # Calculate pi using the aggregated result
    return 4 * inside_total / total_points
end

In [None]:

n = Int(1e10)  
@time pi_estimate = calculate_pi_parallel(n)

println("Estimate of π: $pi_estimate")

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2024 [Point 8 GmbH](https://point-8.de)_