# Lab 2. Prallel programming (Cheat sheet)

### Instructions
rubric={mechanics:3}

Follow the [general lab instructions](https://ubc-mds.github.io/resources_pages/general_lab_instructions/).

How many maximum parallel processes can you run?

In [48]:
# import multiprocessing as mp
import multiprocess as mp

print("Number of processors: ", mp.cpu_count())

Number of processors:  8


## T.3 Estimating Pi Using the Monte Carlo Method

rubric={accuracy:4, quality:2}

We can estimate pi by throwing thousands of imaginary darts into a “dartboard” represented by a unit circle. The relationship between the number of darts falling inside
the circle’s edge and the number falling outside it will allow us to approximate pi.  

This is an ideal first problem, as we can split the total workload evenly across a number of processes, each one running on a separate CPU. Each process will end at the
same time since the workload for each is equal, so we can investigate the speedups
available as we add new CPUs and hyperthreads to the problem.  

We throw 10,000 darts into the unit square, and a percentage of them fall into the quarter of the unit circle that’s drawn. This estimate is rather bad—10,000 dart throws does not reliably give us a three-decimal-place result. If you ran your own code, you’d see this estimate vary between 3.0 and 3.2 on each run.
To be confident of the first three decimal places, we need to generate 10,000,000 random dart throws.1 This is massively inefficient (and better methods for pi’s estima‐
tion exist), but it is rather convenient to demonstrate the benefits of parallelization using multiprocessing.  

With the Monte Carlo method, we use the Pythagorean theorem to test if a dart has
landed inside our circle

$$
    x^2 + y^2 \leq 1^2 = 1
$$

![Alt text](image.png)

$\pi$ can be estimated as follows.

$\pi r^2/4 : r^2$ = area of quarter circle : area of the retangle

$\pi$ = area of quarter circle*4/area of the retangle

In [2]:
import os
import random
import time
from multiprocessing import Pool

def estimate_nbr_points_in_quarter_circle(nbr_estimates):
    """Monte carlo estimate of the number of points in a quarter circle using pure Python"""
    print(f"Executing estimate_nbr_points_in_quarter_circle with {nbr_estimates:,} on pid {os.getpid()}")
    nbr_trials_in_quarter_unit_circle = 0
    for step in range(int(nbr_estimates)):
        x = random.uniform(0, 1)
        y = random.uniform(0, 1)
        is_in_unit_circle = # YOUR CODE HERE
        nbr_trials_in_quarter_unit_circle += is_in_unit_circle

    return nbr_trials_in_quarter_unit_circle

In [3]:
# `is_in_unit_circle` is required to calculate the Euclidean distance:
#     x^2 + y^2 <= 1^2 = 1

In [42]:
true_false = 0 
true_false += True 
print(true_false)
true_false = 0
true_false += False 
print(true_false)

1
0


In [44]:
nbr_samples_in_total = 1e8
print(nbr_samples_in_total)

100000000.0


In [5]:
nbr_samples_in_total = 1e8

t1 = time.time()
nbr_in_quarter_unit_circles = estimate_nbr_points_in_quarter_circle(nbr_samples_in_total)
pi_estimate = nbr_in_quarter_unit_circles*4/float(nbr_samples_in_total)
print("Estimated pi", pi_estimate)
print("Delta:", time.time() - t1)

Executing estimate_nbr_points_in_quarter_circle with 100,000,000.0 on pid 19201
Estimated pi 3.14138636
Delta: 67.95222115516663


Please implement a parallelized program to estimate pi. 

In [6]:
nbr_parallel_blocks = 10 # or mp.cpu_count()

with Pool(processes=nbr_parallel_blocks) as P:
    # YOUR CODE HERE

    # for you `map`
    #   function: `estimate_nbr_points_in_quarter_circle``
    #   iterable: `[nbr_samples, nbr_samples, ..., nbr_samples]`, where len == nbr_parallel_blocks (==10)
    #   it returns `[nbr_in_quarter_unit_circle, nbr_in_quarter_unit_circle, ..., nbr_in_quarter_unit_circle]`, 
    #               still len == nbr_parallel_blocks 
    # then, you do `pi_estimate = nbr_in_quarter_unit_circles*4/float(nbr_samples_in_total)`
    #       where `nbr_in_quarter_unit_circles`  is `sum` of `nbr_parallel_blocks` 
    print("Estimated pi", pi_estimate)
    print("Delta:", time.time() - t1)

#### Optional

Implement it using `joblib`.  Tutorial [here](https://joblib.readthedocs.io/en/latest/parallel.html)

In [8]:
from joblib import Parallel, delayed

# YOUR CODE HERE

In [9]:
# solution

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.


Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19229
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19230
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19232
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19233
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19235
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19231
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19234
Executing estimate_nbr_points_in_quarter_circle with 12,500,000.0 on pid 19236


[Parallel(n_jobs=8)]: Done   2 out of   8 | elapsed:   24.9s remaining:  1.2min
[Parallel(n_jobs=8)]: Done   8 out of   8 | elapsed:   25.3s finished


## Exercise 1

#### Generators and decorators

#### 1.1

rubric={accuracy:2, quality:1}

Write a **decorator** to measure the time it takes to run a given function. In this exercise -

- The execution_time variable is calculated by subtracting start_time from end_time, representing the elapsed time.  
- The execution time is printed to the console, indicating which function was measured and how long it took. This is done using the format string f"Function {func.__name__} took {execution_time:.4f} seconds to execute". The .4f format specifier ensures four decimal places for execution time.  
- Finally, the original function result is returned.  

The @execution_time syntax is used to apply the execution_time decorator to the calculate_multiply function. This indicates that it should be decorated with the functionality provided by the execution_time function.

In [10]:
import time

def execution_time(func):
    def wrapper(*args, **kwargs):
        # YOUR CODE HERE
        # your_time = time.time()
        # 1. measure start time
        # 2. get a `result`` by running `func(*args, **kwargs)`
        # 3. measure end time
        # 4. run_time =  end - start; print `run_time`
        return result
.   return wapper

In [26]:
# Example usage
@execution_time
def calculate_multiply(numbers):
    total = 1
    for x in numbers:
        total *= x
    return total

# Call the decorated function
result = calculate_multiply([1, 2, 3, 4, 5])
print("Result:", result)

Function calculate_multiply took 0.0000 seconds to execute
Result: 120


#### 1.2

{reasoning: 2}

What is the advantanges of a Python generator?

YOUR ANSWER HERE. 

## Exercise 2

Parallel programming

#### 2.1

rubric={accuracy:2,quality:1}

Use `map`, `filter` and `reduce` to calculate the sum of squares of even numbers in a list.
Note. This exercise can be completed with one line of code but this is not required. 

In [13]:
from functools import reduce
# Input
numbers = [1, 2, 3, 4, 5, 6]

# Your code here
#   1. `filter` : filter even number
#   2. `map`    : get squares of them
#   3. `reduce` : sum the list

# filter(function: %2, iterables: numbers)
# map(function: x^2, iterables: filtered numbers)
# reduce(function: sum, e.g. x+y,  iterables: map)

# Expected Output
# 56

#### 2.2

rubric={accuracy:4,quality:2}

Given a list of vector pairs, each represented as a tuple of two longer vectors (tuples), calculate the dot product for each pair and then calculate the sum of these dot products using  `map` and `reduce`.

A dot product, also known as the inner product, is a mathematical operation that takes two vectors and returns a scalar value. In the context of this exercise, we calculate the dot product for each pair of vectors in the vector_pairs list. The dot product of two vectors, A and B, is calculated as follows:

1. Pairwise multiplication: Multiply the corresponding components (elements) of the two vectors. For example, for vectors A and B with elements A[0], A[1], A[2], ..., A[n] and B[0], B[1], B[2], ..., B[n], calculate A[0] * B[0], A[1] * B[1], A[2] * B[2], and so on.

2. Summation: Add up all the results from step 1 to obtain the dot product.

Please implement the dot product yourself without using `numpy`.

In [15]:
from functools import reduce

# # Input: List of vector pairs, each pair represented as a tuple of two longer vectors (tuples). You need to calculate the dot product for each tuple of vectors
# # Example: dot_product(vector_pair[0]) should return the dot product beween two vectors in the same tuple ((2, 3, 1, 5, 2, 4), (4, -1, 2, 0, 3, 1)).
vector_pairs = [
    ((2, 3, 1, 5, 2, 4), (4, -1, 2, 0, 3, 1)),
    ((0, 5, -3, 2, 1, 2), (1, 2, 3, 6, 7, 1)),
    ((-1, 0, 2, 1, 3, -4, 0, 2), (5, 2, -1, 3, -1, 0, 2, 1)),
    ((2, 4, 3, 7, -1, 0, 6, 5), (1, 1, 1, 0, 0, 1, 1, -1)),
    ((3, 1, -2, -1, 1, 0, -3, 4), (2, -3, 0, 2, 1, -1, 0, 2)),
    ((-2, 0, -1, 2, 2, 2, 1, 3), (0, 1, 2, -3, 4, 2, -1, 2)),
    ((1, 3, 5, 2, 4, 0, 1, 3), (0, -1, 2, -1, 0, 0, 4, 3)),
    ((2, 2, -1, 1, 0, 1, -3, -1), (-2, 1, -2, -1, 0, -1, -3, 2)),
    ((-2, -1, -1, 0, 2, -3, 1, 3), (0, 0, 1, 2, 1, -2, -1, 1)),
    ((-1, 2, 0, -3, -2, 1, 2, 0), (-1, -2, -1, -2, 0, 1, 0, 1)),
    ((3, 1, 3, -1, -2, -2, 2, 3), (1, 0, -2, 3, 1, -2, 0, 0)),
    ((-1, 0, -3, -2, 1, -1, 0, -2), (-3, -1, 1, 1, 1, 1, 0, -2)),
    ((2, -2, -1, 0, 1, 1, 2, 0), (1, -2, 2, 0, -2, -3, -1, -2)),
    ((0, 1, 1, 2, -1, 3, 1, -2), (2, -2, -2, -1, 0, -1, 2, 0)),
    ((-2, -1, 1, 0, 0, 2, 0, -3), (2, 0, 3, -1, 1, 2, 2, 1)),
    ((-2, 0, 2, 2, 2, 3, 0, -1), (-1, -3, 1, -2, 2, 0, 2, -1)),
    ((1, -3, -2, -2, -1, 1, -1, 2), (1, 0, 3, 2, -2, -3, -1, 1)),
    ((-1, 0, -1, -3, -1, 2, -2, -2), (-2, -1, -3, -2, 0, 0, -1, 1)),
    ((3, -2, 1, -2, -1, 2, 1, 2), (-3, 1, -2, 2, -1, 0, 0, 0)),
    ((2, -3, -3, -1, -2, -3, 2, -1), (1, -1, -1, 0, -3, 1, -2, 2))
]

# Define a function to calculate the dot product of two vectors
def dot_product(vector1, vector2):
    # YOUR CODE HERE (you need to use map-reduce here; if you solution does not use map-reduce, you lose 1 point)

# Use map to calculate dot products for each vector pair
dot_products = # YOUR CODE HERE (hint: use map)

# Use reduce to calculate the total dot product of all vector pairs
total_dot_product = # YOUR CODE HERE (hint: use reduce)

# Output the total dot product
print("Total Dot Product:", total_dot_product)


In [27]:
# # Define a function to calculate the dot product of two vectors
# def dot_product(vector1, vector2):
#     # YOUR CODE HERE (you need to use map-reduce here; if you solution does not use map-reduce, you lose 1 point)

# THIS IS NOT A SOLUTION with map-reduec:
def dot_product(vector1, vector2):
    result = []
    for x, y in zip(vector1, vector2):
        result.append(x*y)
    print(result)
    print(sum(result))
    return sum(result)

# dot_product()
vector1 = vector_pairs[0][0]
vector2 = vector_pairs[0][1]
print(vector1)
print(vector2)

dot_product(vector1, vector2)

(2, 3, 1, 5, 2, 4)
(4, -1, 2, 0, 3, 1)
[8, -3, 2, 0, 6, 4]
17


17

In [17]:
# Then, 

# your `map` for dot_product, 
# function:  a product of  two vectors at ith; 
# iterables: two vectors

# since you need to `sum` during `reduce`, it would be better that you return `sum` of the current dot_product result during `map`;

# your `reduce` for dot_product:
# function:  a sum
# iterables: your map results

#### 2.3

rubric={accuracy:4,quality:2}

In this exercise, you will create a frequency table of words from a list of words.

You are given a sequential implementation of word counting functions. Please implement a parallized program doing the same word counting.

You can't use external libraries like `collections.Counter` to count. You must implement the dictionary yourself. 

In [19]:
# import nltk and load ata
import nltk
from toolz import reduce
nltk.download('gutenberg')


[nltk_data] Downloading package gutenberg to
[nltk_data]     /Users/jungyeul/nltk_data...
[nltk_data]   Package gutenberg is already up-to-date!


True

In [28]:
# load a word list
emma = nltk.corpus.gutenberg.words('austen-emma.txt')
emma

['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']', ...]

In [47]:
import time

# Sample list of words
# word_list = emma[:10]
word_list = ['a', 'b', 'c', 'a', 'a']

# Map function: convert words to (word, 1) pairs
def map_function(word):
    # time.sleep(0.5)
    print(word, 1)
    return (word, 1)

# Reduce function: sum the counts for each word
def reduce_function(counts, word_count):
    # time.sleep(0.5)
    # `counts` = {};
    # `word_count` = item in mapped: (word, 1)
    print(word_count)
    word, count = word_count
    if word in counts:
        counts[word] += count
    else:
        counts[word] = count
    return counts

# Map step: Apply map_function to each word in the list
mapped = map(map_function, word_list)

# Reduce step: Apply reduce_function to calculate word frequencies
word_frequencies = reduce(reduce_function, mapped, {})

print(word_frequencies)
# # this program may take 3~5 minutes to run


a 1
('a', 1)
b 1
('b', 1)
c 1
('c', 1)
a 1
('a', 1)
a 1
('a', 1)
{'a': 3, 'b': 1, 'c': 1}


In [22]:
import multiprocessing
import time
from functools import reduce


# Map function: convert words to (word, 1) pairs
def map_function_parallel(word):
    time.sleep(0.5)
    # YOUR CODE HERE
    # SHOULD BE same as before;

# Reduce function: sum the counts for each word
def reduce_function_parallel(counts, word_count):
    time.sleep(0.5)
    # YOUR CODE HERE
    # SHOULD BE same as before; 

# Parallel Map function: Apply map_function to each word in the list using multiple processes
def parallel_map(word_list, num_processes):

    with multiprocessing.Pool(processes=num_processes) as P:
        # YOUR CODE HERE
        # mapped = P.map(...)
        #   function    : ? 
        #   iterables   : ?

    return mapped

# Parallel Reduce function: Apply reduce_function to calculate word frequencies using multiple processes
def reduce_worker(chunk):
    return reduce(reduce_function_parallel, chunk, {})

# aggregate results produced by parallel reduce processes
def aggregate_function(agg,result):
    # YOUR CODE HERE
    # SHOULD be similar to `reduce_function`

Please verify your implementation by running the following blocks.

In [24]:
num_processes = multiprocess.cpu_count()  # Adjust the number of processes as needed

# Sample list of words
word_list = emma[:10]  # Make the list larger for comparison

# Calculate word frequencies using parallel map and reduce
start_time_parallel = time.time()
mapped = parallel_map(word_list, num_processes) # create a list of (word,1) pairs
chunks = [mapped[i:i + len(mapped)//num_processes] for i in range(0, len(mapped), len(mapped)//num_processes)] # break word list into chunks for parallel processing

# perform reduce across multiple cores
with multiprocess.Pool(processes=num_processes) as P:
    results = P.map(reduce_worker,chunks)

# aggregate results - no need to parallelize this part
word_frequencies_parallel = reduce(aggregate_function, results, {})
parallel_time = time.time() - start_time_parallel



# Calculate word frequencies sequentially (for comparison)
start_time_sequential = time.time()
mapped = map(map_function, word_list)
word_frequencies_sequential = reduce(reduce_function, mapped, {})
sequential_time = time.time() - start_time_sequential

print("Sequential Word Frequencies:", word_frequencies_sequential)
print("Parallel Word Frequencies:", word_frequencies_parallel)

print("Sequential Time:", sequential_time, "seconds")
print("Parallel Time:", parallel_time, "seconds")


Sequential Word Frequencies: {'[': 1, 'Emma': 1, 'by': 1, 'Jane': 1, 'Austen': 1, '1816': 1, ']': 1, 'VOLUME': 1, 'I': 1, 'CHAPTER': 1}
Parallel Word Frequencies: {'[': 1, 'Emma': 1, 'by': 1, 'Jane': 1, 'Austen': 1, '1816': 1, ']': 1, 'VOLUME': 1, 'I': 1, 'CHAPTER': 1}
Sequential Time: 0.00010991096496582031 seconds
Parallel Time: 2.1481637954711914 seconds


### Optional

#### 2.4

{reasoning: 3}

Is a parallelized program always faster than a non-parallelized ones?
What are the factors we need to consider before parallizing our programs? (Name at least two) 

YOUR ANSWER HERE