# Week10

### Multithreading in Python

__Process:__

Process is any program in execution(running). It is administered by Process control block (PCB). PCB contains information like Process priority, process id, process state, CPU, register, etc. 

__Thread:__

Thread is a segment of a process. A process can have multiple threads contained within itself. Can have 3 states: running, ready, and blocked. managed by Thread Control block (TCB)


```
        Single Thread Process        
+==================================+            
|   Code       Data        Files   |
+==================================+
| Registers    Counters     Stack  |
+==================================+
|                                  |
|         Thread Execution         |
|                                  |
+==================================+


            Multi-Threaded Process        
+=========================================+            
|     Code          Data          Files   |
+=============+=============+=============+
|  Registers  |  Registers  |  Registers  |
|   Counters  |   Counters  |   Counters  |
|    Stack    |     Stack   |     Stack   |
+=============+=============+=============+
|             |             |             |
|   Thread 1  |   Thread 2  |   Thread 3  |
|             |             |             |
+=========================================+

```

* Multiple threads within process share the same data space with the main thread and can therefore share information or communicate with each other more easily than if they were separate processes.
* Threads are like called light-weight processes and do not require much memory overhead; they are cheaper than processes to maintain.
* __Kernel Threads__ are part of the operating system, while the __User-space threads__ are not implemented in the kernel.

In [26]:
# By simply using the threading module
import os
import threading

def print_cube(num):
    print("Cube: {}".format(num * num * num))
    print("Thread: {}".format(threading.current_thread().name))
    print("Process: {}".format(os.getpid()))
  
def print_square(num):
    print("Square: {}".format(num * num))
    print("Thread: {}".format(threading.current_thread().name))
    print("Process: {}".format(os.getpid()))
  
if __name__ == "__main__":
    # creating thread
    t1 = threading.Thread(target=print_square, args=(5,), name="Square Thread")
    t2 = threading.Thread(target=print_cube, args=(5,), name="Cube Thread")
  
    # starting thread 1
    t1.start()
    # starting thread 2
    t2.start()
  
    # wait until thread 1 is completely executed
    t1.join()
    # wait until thread 2 is completely executed
    t2.join()
  
    # both threads completely executed
    print("Done!")

Square: 25
Thread: Square Thread
Process: 59
Cube: 125
Thread: Cube Thread
Process: 59
Done!


In [27]:
# Thread program by extending Thread class

import threading
import time

class myThread (threading.Thread):
   def __init__(self, threadID, name, delay):
      threading.Thread.__init__(self)
      self.threadID = threadID
      self.name = name
      self.delay = delay
   
   def run(self):
      print ("Starting " + self.name)
      print_time(self.name, self.delay, 5)
      print ("Exiting " + self.name)

def print_time(threadName, delay, counter):
   while counter:
      time.sleep(delay)
      print ("%s: %s" % (threadName, time.ctime(time.time())))
      counter -= 1

# Create new threads
thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

# Start new Threads
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print ("Exiting Main Thread")

Starting Thread-1
Starting Thread-2
Thread-1: Sat May 15 10:27:14 2021
Thread-1: Sat May 15 10:27:15 2021
Thread-2: Sat May 15 10:27:15 2021
Thread-1: Sat May 15 10:27:16 2021
Thread-1: Sat May 15 10:27:17 2021
Thread-2: Sat May 15 10:27:17 2021
Thread-1: Sat May 15 10:27:18 2021
Exiting Thread-1
Thread-2: Sat May 15 10:27:19 2021
Thread-2: Sat May 15 10:27:21 2021
Thread-2: Sat May 15 10:27:23 2021
Exiting Thread-2
Exiting Main Thread


### Thread Synchronization

In [16]:
# Program to simulate a banking transaction
import threading
  
# global variable balance
balance = 0
  
def deposit_money():
    """
    This function increments the bank balance
    """
    global balance
    balance += 1
  
def branch1_deposit():
    """
    This function is used by branch 1
    """
    for _ in range(100000):
        deposit_money()
  
def headOffice_deposit():
    """
    This function is used by head office 
    """
    global balance
    balance = 0
  
    # creating threads
    t1 = threading.Thread(target=branch1_deposit)
    t2 = threading.Thread(target=branch1_deposit)
  
    # start threads
    t1.start()
    t2.start()
  
    # wait until threads finish their job
    t1.join()
    t2.join()
  
if __name__ == "__main__":
    for i in range(10):
        headOffice_deposit()
        print("Simulation {0}: balance = {1}".format(i+1,balance))

Simulation 1: balance = 162418
Simulation 2: balance = 200000
Simulation 3: balance = 162554
Simulation 4: balance = 200000
Simulation 5: balance = 127826
Simulation 6: balance = 200000
Simulation 7: balance = 164073
Simulation 8: balance = 173749
Simulation 9: balance = 200000
Simulation 10: balance = 180789


Ideally, in each of the simulations the value of balance should be 200000, but we can see the unpredictabilty here due to __Race condition__

Sometimes, the global variable balance in being read by both the threads and incremented to a common value. Ex:
1.   balance = 1
2.   Thread 1 reads balance=1
3.   Thread 2 reads balance=1
4.   Thread 1 adds 1 to balance
5.   Thread 2 adds 1 to balance
6.   Thread 1 writes balance=2
7.   Thread 2 writes balance=2

whereas, the actual value should be 3. So when a thread had read a value, some other thread should not read it untill the previous thread writes its updated value.

A __Semaphore (Lock)__ is a synchronization object that controls access by multiple processes/threads to a common resource in a parallel programming environment. 


In [28]:
# Program to simulate a banking transaction
import threading
  
# global variable balance
balance = 0
  
def deposit_money():
    """
    This function increments the bank balance
    """
    global balance
    balance += 1
  
def branch1_deposit(lock):
    """
    This function is used by branch 1
    """
    for _ in range(100000):
        lock.acquire()
        deposit_money()
        lock.release()
  
def headOffice_deposit():
    """
    This function is used by head office 
    """
    global balance
    balance = 0
  
    # creating a lock
    lock = threading.Lock()

    # creating threads
    t1 = threading.Thread(target=branch1_deposit,  args=(lock,))
    t2 = threading.Thread(target=branch1_deposit,  args=(lock,))
  
    # start threads
    t1.start()
    t2.start()
  
    # wait until threads finish their job
    t1.join()
    t2.join()
  
if __name__ == "__main__":
    for i in range(10):
        headOffice_deposit()
        print("Simulation {0}: balance = {1}".format(i+1,balance))

Simulation 1: balance = 200000
Simulation 2: balance = 200000
Simulation 3: balance = 200000
Simulation 4: balance = 200000
Simulation 5: balance = 200000
Simulation 6: balance = 200000
Simulation 7: balance = 200000
Simulation 8: balance = 200000
Simulation 9: balance = 200000
Simulation 10: balance = 200000


1.   balance = 1
2.   Thread 1 reads balance=1 and locks
3.   Thread 2 tries to read balance but blocked by lock
4.   Thread 1 adds 1 to balance
5.   Thread 1 writes balance=2 and releases lock
6.   Thread 2 reads balance=2 and locks
7.   Thread 2 adds 1 to balance
8.   Thread 2 writes balance=3 and releases lock

__Advantages:__

* It doesn’t block the user. This is because threads are independent of each other.
* Better use of system resources is possible since threads execute tasks parallely.
* Enhanced performance on multi-processor machines.
* Multi-threaded servers and interactive GUIs use multithreading exclusively.

__Disadvantages:__

* As number of threads increase, complexity increases.
* Synchronization of shared resources (objects, data) is necessary.
* It is difficult to debug, result is sometimes unpredictable.
* Potential deadlocks which leads to starvation, i.e. some threads may not be served with a bad design
* Constructing and synchronizing threads is CPU/memory intensive.

*(Source: geeksforgeeks.org)*

### Multiprocessing in Python

__Multiprocessing__ refers to the ability of a system to support more than one process at the same time. Applications in a multiprocessing system are broken to smaller routines that run independently. However, it more memory and resource intensive compared to multi-threading.

In [29]:
import multiprocessing

cube_results = []

def cube_list(mylist):
    """
    function to cube a given list
    """
    global cube_results
    # append squares of mylist to global list cube_results
    for num in mylist:
        cube_results.append(num * num * num)
    # print global list cube_results
    print("Result(in process: Cube): {0}".format(cube_results))


def square_list(mylist, result, square_sum):
    """
    function to square a given list
    """
    # append squares of mylist to result array
    for idx, num in enumerate(mylist):
        result[idx] = num * num
  
    # square_sum value
    square_sum.value = sum(result)
  
    # print result Array
    print("Result(in process: Square): {}".format(result[:]))
  
    # print square_sum Value
    print("Sum of squares(in process: Square): {}".format(square_sum.value))
  
if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]
  
    # creating Array of int data type with space for 4 integers
    result = multiprocessing.Array('i', 4)
  
    # creating Value of int data type
    square_sum = multiprocessing.Value('i')
  
    # creating new process
    p1 = multiprocessing.Process(target=square_list, args=(mylist, result, square_sum))
    # Global variables will not help in inter-process communication
    p2 = multiprocessing.Process(target=cube_list, args=(mylist,))
    # starting processes
    p1.start()
    p2.start()

    # wait until processes are finished
    p1.join()
    p2.join()
  
    # print result array
    print("Square Result(in main program): {}".format(result[:]))
    print("Cube Result(in main program): {}".format(cube_results[:]))
  
    # print square_sum Value
    print("Sum of squares(in main program): {}".format(square_sum.value))

Result(in process: Square): [1, 4, 9, 16]
Sum of squares(in process: Square): 30
Result(in process: Cube): [1, 8, 27, 64]
Square Result(in main program): [1, 4, 9, 16]
Cube Result(in main program): []
Sum of squares(in main program): 30


### Process Synchronization

Source: https://www.geeksforgeeks.org/synchronization-pooling-processes-python/

In [30]:
# Python program to illustrate 
# the concept of race condition
# in multiprocessing
import multiprocessing
  
# function to withdraw from account
def withdraw(balance):    
    for _ in range(10000):
        balance.value = balance.value - 1
  
# function to deposit to account
def deposit(balance):    
    for _ in range(10000):
        balance.value = balance.value + 1
  
def perform_transactions():
  
    # initial balance (in shared memory)
    balance = multiprocessing.Value('i', 100)
  
    # creating new processes
    p1 = multiprocessing.Process(target=withdraw, args=(balance,))
    p2 = multiprocessing.Process(target=deposit, args=(balance,))

    # starting processes
    p1.start()
    p2.start()
  
    # wait until processes are finished
    p1.join()
    p2.join()
  
    # print final balance
    print("Final balance = {}".format(balance.value))
  
if __name__ == "__main__":
    for _ in range(10):
  
        # perform same transaction process 10 times
        perform_transactions()

Final balance = 610
Final balance = -1015
Final balance = -335
Final balance = -1575
Final balance = 1088
Final balance = 1344
Final balance = 68
Final balance = -824
Final balance = 1579
Final balance = -1750


In [31]:
import multiprocessing
  
# function to withdraw from account
def withdraw(balance, lock):    
    for _ in range(10000):
        lock.acquire()
        balance.value = balance.value - 1
        lock.release()
  
# function to deposit to account
def deposit(balance, lock):    
    for _ in range(10000):
        lock.acquire()
        balance.value = balance.value + 1
        lock.release()
  
def perform_transactions():
  
    # initial balance (in shared memory)
    balance = multiprocessing.Value('i', 100)
  
    # creating a lock object
    lock = multiprocessing.Lock()
  
    # creating new processes
    p1 = multiprocessing.Process(target=withdraw, args=(balance,lock))
    p2 = multiprocessing.Process(target=deposit, args=(balance,lock))
  
    # starting processes
    p1.start()
    p2.start()
  
    # wait until processes are finished
    p1.join()
    p2.join()
  
    # print final balance
    print("Final balance = {}".format(balance.value))
  
if __name__ == "__main__":
    for _ in range(10):
  
        # perform same transaction process 10 times
        perform_transactions()

Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
Final balance = 100
