# Multiprocessing using Pools 
A simple framework for assessing the impact of multiprocessing on runtime on a multi-core machine. 

In [1]:
import time
import math
import multiprocessing
from multiprocessing import Pool

# A function for timing a job that uses a pool of processes.
#  f is a function that takes a single argument
#  data is an array of arguments on which f will be mapped
#  pool_size is the number of processes in the pool. 
def pool_process(f, data, pool_size):
    tp1 = time.time()
    pool = Pool(processes=pool_size) # initialize the Pool.
    result = pool.map(f, data)       # map f to the data using the Pool of processes to do the work 
    pool.close() # No more processes
    pool.join()  # Wait for the pool processing to complete. 
#     print("Results", result)
    print("Overall Time:", int(time.time()-tp1))
 

Pool object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism). It has methods which allows tasks to be offloaded to the worker processes.

In [2]:
import miscFunc as miscFunc # if miscFunc.py is changed the kernel needs to be restarted. 

This verbose version shows which process in the pool is running each task.    

    def my_func_verbose(x):
        s = math.sqrt(x)
        print("Task", multiprocessing.current_process(), x, s)
        return s

In [3]:
dataRange = range(20)

Using the pool_process function to apply my_func to the data in dataRange.  


In [4]:
dataRange = range(20)
pool_process(miscFunc.my_func, dataRange, 1)

Overall Time: 0


## A naive function for checking primes 

In [5]:
def check_prime(num):
    t1 = time.time()
    res = False
    if num > 0:
        # check for factors
        for i in range(2,num):
            if (num % i) == 0:
                print(num,"is not a prime number")
                print(i,"times",num//i,"is",num)
                print("Time:", int(time.time()-t1))
                break
        else:
            print(num,"is a prime number")
            print("Time:", time.time()-t1) 
            res = True
            # if input number is less than
            # or equal to 1, it is not prime
    return res


In [6]:
check_prime(15488801)

15488801 is a prime number
Time: 1.23512601852417


True

In [7]:
check_prime(15488803)

15488803 is not a prime number
11 times 1408073 is 15488803
Time: 0


False

In [8]:
# Range of numbers from 1-100000 to check whether prime or not
check_work = range(1,100000)

In [9]:
# This denotes the number of logical cores present in the machine
multiprocessing.cpu_count()

8

In [10]:
# This denotes the number of physical cores present in the machine
import psutil 
psutil.cpu_count(logical = False)

4

In [11]:
import cpn
# Calculating using only 1 Core
pool_process(cpn.check_prime, check_work, 1)

Overall Time: 53


In [12]:
# Calculating using only 2 Cores
pool_process(cpn.check_prime, check_work, 2)

Overall Time: 36


In [13]:
# Calculating using 3 Cores
pool_process(cpn.check_prime, check_work, 3)

Overall Time: 28


In [14]:
# Calculating using 4 Cores
pool_process(cpn.check_prime, check_work, 4)

Overall Time: 24


- As per the above results, It is observed that when we split tasks amongst multiple phtsical cores of the system we achieve speed up in terms of task processing. It accounts to around 33% difference between 1 and 2 cores whereas very slight difference between core 3 and 4 when activated.
- Memory is still a shared resource here.It's then split between a series of caches which has a real bandwidth and latency number.For a multi-core CPU, a key value is how often we write to the same memory. The processors here are optimised for integer/floating-point performance which shared resources between cores, making multiprocessing possible.