# Python tutorial
Instructor: Stan Sobolevsky  
Founding Partner at www.indatlabs.com  
Associate Professor Of Practice And Director Of Urban Complexity Lab
at New York University (CUSP NYU)
sobolevsky@indatlabs.com

# Multiprocessing

In [1]:
import multiprocessing
import time

You probably heard of distributed/parallel computing and the opportunities they provide for scaling computational power for various tasks. E.g. it is often used for enabling deep learning model which are not feasible to fit on a single machine. Let us talk how parallel execution of multiple computational processes can be handled in Python.

First, consider running several procedures of computing the sum of numbers from 1 to n-1 one by another

In [2]:
#define a function implementing a computation process (summing up numbers from 1 to n-1)
def processFunction(n):
    starttime=time.time() #measure execution time
    S=0
    for i in range(n):
        S+=i
    stoptime=time.time() 
    #S=sum([i for i in range(n)])
    print 'Computing for %d finalized with result %d; finalized in %.6f sec'%(n,S,stoptime-starttime)
    return S

args = [10000000,100000,1,100] #arguments to pass to the computation processes

for a in args: #launch all processes
     processFunction(a) 

Computing for 10000000 finalized with result 49999995000000; finalized in 1.254162 sec
Computing for 100000 finalized with result 4999950000; finalized in 0.015702 sec
Computing for 1 finalized with result 0; finalized in 0.000002 sec
Computing for 100 finalized with result 4950; finalized in 0.000014 sec


Now try running them as processes to be executed in parallel

In [3]:
#define a function implementing a computation process (summing up numbers from 1 to n-1)
def processFunction(n):
    starttime=time.time()
    S=0
    for i in range(n):
        S+=i
    stoptime=time.time() 
    #S=sum([i for i in range(n)])
    print 'Computing for %d finalized with result %d; finalized in %.6f sec'%(n,S,stoptime-starttime)
    return S

args = [10000000,100000,1,100] #arguments to pass to the computation processes

for a in args: #launch all processes
    p = multiprocessing.Process(target=processFunction, args=(a,)) #create a process
    p.start() #start a process

Computing for 1 finalized with result 0; finalized in 0.000009 sec
Computing for 100 finalized with result 4950; finalized in 0.000037 sec
Computing for 100000 finalized with result 4999950000; finalized in 0.018272 sec
Computing for 10000000 finalized with result 49999995000000; finalized in 1.441842 sec


As we can see the processes do not finalize in the order they were launched. Instead the fastest ones finalize first. So they were indeed launched in parallel, not sequencially

However processes often do not run in isolation, like in the example above. It is often useful to share resources between them. Consider processes incrementing a certain common value for a certain number of times

In [4]:
def processFunction(n,S):
    starttime=time.time()
    for i in range(n):
        S.value+=1
    stoptime=time.time() 
    print 'Computing for %d finalized with result %d; finalized in %.6f sec'%(n,S.value,stoptime-starttime)
    return S.value
    

args = [1000000,100000,1,100] #arguments to pass to the computation processes


S=multiprocessing.Value('i', 0)

for a in args: #launch all processes
    p = multiprocessing.Process(target=processFunction, args=(a,S))
    p.start()

S.value

Computing for 1 finalized with result 74; finalized in 0.000066 sec
Computing for 100 finalized with result 831; finalized in 0.002852 sec


317

Computing for 100000 finalized with result 100227; finalized in 1.323428 sec
Computing for 1000000 finalized with result 1004272; finalized in 5.873719 sec


It is evident that the processes got mixed up, e.g. the one which had to increment 100 times finalized when the value was already 468. What is interesting however is that the total number of interments is wrong!

This is because +=1 is not an atomic operator and can be interrupted by another process. We can prevent this from happening using Lock

In [5]:
def processFunction(n,S,l):
    starttime=time.time()
    for i in range(n):
        l.acquire()
        S.value+=1
        l.release()
    stoptime=time.time() 
    print 'Computing for %d finalized with result %d; finalized in %.6f sec'%(n,S.value,stoptime-starttime)
    return S.value
    

args = [1000000,100000,1,100] #arguments to pass to the computation processes

lock = multiprocessing.Lock()

S=multiprocessing.Value('i', 0)

for a in args: #launch all processes
    p = multiprocessing.Process(target=processFunction, args=(a,S,lock))
    p.start()

S.value

Computing for 1 finalized with result 89; finalized in 0.000088 sec


169

Computing for 100 finalized with result 650; finalized in 0.003237 sec
Computing for 100000 finalized with result 197628; finalized in 2.096407 sec
Computing for 1000000 finalized with result 1100101; finalized in 9.422921 sec


In [7]:
#can also lock the entire process to prevent any interruption. Then they will run one by one, rather than in parallel
def processFunction(n,S,l):
    l.acquire()
    starttime=time.time()
    for i in range(n):
        S.value+=1
    stoptime=time.time() 
    print 'Computing for %d finalized with result %d; finalized in %.6f sec'%(n,S.value,stoptime-starttime)
    l.release()
    return S.value
    

args = [1000000,100000,1,100] #arguments to pass to the computation processes

lock = multiprocessing.Lock()

S=multiprocessing.Value('i', 0)

for a in args: #launch all processes
    p = multiprocessing.Process(target=processFunction, args=(a,S,lock))
    p.start()

S.value

0

Computing for 1000000 finalized with result 1000000; finalized in 5.006873 sec
Computing for 100000 finalized with result 1100000; finalized in 0.520134 sec
Computing for 1 finalized with result 1100001; finalized in 0.000122 sec
Computing for 100 finalized with result 1100101; finalized in 0.000752 sec


This was just a brief intro into the basic funcionality related with multiprocessing. Next we will consider how this and some more advanced machinery can be applied to agent based simulation