### Thread vs process


#### process
An instance of a program(e.g.python interpreter)
###### Advantage
* takea advantage of multiple CPU's and cores.
* seperate memory space -> memory is not shared between processes.
* Great for CPU bound processing.
* New process is started independently from other.
* Process are interruptable/killable.
* one gil for each process(avoids gil limitations.

##### Disadvantages-
* Heavyweight 
* starting a process is slower than starting a thread.
* more memory.
* IPC (inter process communication) is more complicated since memory is not shared.

#### Threads-
An entity  within a process  that can be  scheduled (also known as lightweight process)
A process can spawn multiple threads.
##### Advantages
* All threads with in a process share the same memory.
* Lightweight.
* Starting a thread is faster then starting a process.
* Great for I/O-bound tasks.
##### Disadvantages-
* Threading is limited by GIL: Only  one thread at a time.
* No effect for CPU-bound tasks.
* Not interuptable/killable.
* Careful with race conditions.(Race condition occurs when two or more threads want to modify a same variable at the same time.)

#### What is GIL?
Global interpreter lock.
* A lock that allows only one thread a time to execute in python.
* needed in Cpython because memory managementis not threaed-safe.
##### Solution
* use multiprocessing
* use a different , free handed python implementation (Jython,IronPython).
* USe python as a wrapper for third party libraries(C/C++) ->numpy,scipy.

In [1]:
from multiprocessing import Process
import os

In [2]:
def square_it():
    for i in range(1000):
        i*i

In [3]:
processes = []
num_process = os.cpu_count()

In [4]:
# create a processes-
for i in range(num_process):
    p = Process(target=square_it)
    processes.append(p)

In [5]:
for i in processes:
    i.start()

In [6]:
# here we are waiting for all process to be finished and block the main thread untill
#the process is finish
for i in processes:
    p.join()

In [7]:
print('end main')

end main


## Multithreading-

In [8]:
from threading import Thread
import os
import time

In [9]:
def square_it():
    for i in range(1000):
        i*i

In [10]:
number_threads = 10
threads = []

In [11]:
for i in range(number_threads):
    t = Thread(target=square_it)
    processes.append(t)

In [12]:
# start-
for t in threads:
    t.start()

In [13]:
# join
for t in threads:
    t.join()

### we can share data between threads as they share the memory- 

In [14]:
from threading import Thread
import os
import time

In [15]:
database_value = 0

In [16]:
def increase():
    global database_value
    localcopy = database_value
    # processing
    localcopy+=1
    time.sleep(0.1)
    database_value = localcopy

In [17]:
if __name__ == "__main__":
    print(f'start value {database_value}')
    thread1 = Thread(target=increase)
    thread2 = Thread(target=increase)
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
    print(f'end value {database_value}')
    print('end main')
    

start value 0
end value 1
end main


In [18]:
# but is it not increased  to 2.
#This is because two threds are trying to modify the same variable.
# use lock object for this-
from threading import Thread,Lock
import time

In [19]:
database_value = 0

In [20]:
def increase(lock):
    global database_value
    with lock:
        localcopy = database_value
        # processing
        localcopy+=1
        time.sleep(0.1)
        database_value = localcopy

In [21]:
if __name__ == "__main__":
    print(f'start value {database_value}')
    lock = Lock()
    thread1 = Thread(target=increase,args=(lock,))
    thread2 = Thread(target=increase,args=(lock,))
    
    thread1.start()
    thread2.start()
    
    thread1.join()
    thread2.join()
    
    print(f'end value {database_value}')
    print('end main')
    

start value 0
end value 2
end main


### we are getting correct value now.

In [22]:
### Using Queue for data processing as they are thread safe and processing safe data exchange and data processing.

In [23]:
from threading import Thread, Lock,current_thread
from queue import Queue
import time

In [24]:
# A Queue is a linear data structure that follows the FIFO or First in First out principle
#.e.g. queue of customers in line.
if __name__ == '__main__':
    q = Queue()
    q.put(1)
    q.put(2)
    q.put(3)
    
    # 321 --> Front
    first=q.get()
    print(first)
    print(q.empty)
    # in threading environemnt when ever you get an object with q.get() after processing call q.taskdone()
    q.task_done
    q.join()
    print('end')

1
<bound method Queue.empty of <queue.Queue object at 0x7f69d8596748>>


KeyboardInterrupt: 

### Example

In [None]:
def worker(q):
    while True:
        value=q.get()
        #processing
        print(f'in {current_thread().name} got {value}')
        q.task_done()

In [None]:
if __name__ == "__main__":
    q = Queue()
    num_threads = 10
    for i in range(num_threads):
        thread = Thread(target=worker, args=(q,))
        # daemeon thread-
        thread.daemon = True
        thread.start()
        
    for i in range(1,20):
        q.put(i)
        
    q.join()
    print('endmain')

## daemon thread-

daemon thread is a background thread that will die when the main thread die. so we ran the while True which is infinite loop as daemon thread and when we ended the main thread these these daemon thread also die. 

### Multiprocess

In [25]:
from multiprocessing import Process, Value, Array, Lock
import os
import time

#### processes do not live in the same memory space so they dont ave access to public data. Because of that they need special share memory objects to share data.There are two share memory objects-
* Value
* Array

In [None]:
#sharing a value
def add100(number,lock):
    for i in range(100):
        time.sleep(0.01)
        with lock:
            number.value+=1
        
if __name__=="__main__":
    shared_number= Value('i',0)
    lock = Lock()
    print(f'Number at the begining is {shared_number.value}')
    p1 = Process(target=add100, args=(shared_number,lock))
    p2 = Process(target=add100, args=(shared_number,lock))
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()
    
    print(f'Number at the end is {shared_number.value}')

In [29]:
#sharing an array-
def add100(number,lock):
    for i in range(len(number)):
        with lock:
            number[i] +=1
        
if __name__ == '__main__':
    lock = Lock()
    shared_array = Array('d',[0.0,100.0,200.0])
    print(f'array at the begining is {shared_array[:]}')
    
    p1 = Process(target=add100, args=(shared_array,lock))
    p2 = Process(target=add100, args=(shared_array,lock))
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()
    
    print(f'array at the end is {shared_array[:]}')

array at the begining is [0.0, 100.0, 200.0]
array at the end is [2.0, 102.0, 202.0]


### using Queue as process safe data exchange-

In [33]:
from multiprocessing import Queue

In [37]:
def square(numbers, queue):
    for i in numbers:
        queue.put(i*i)
def make_negative(numbers,queue):
    for i in numbers:
        queue.put(-1*i)
    

if __name__ == "__main__":
    numbers = range(1,6)
    q=Queue()
    
    p1 = Process(target=square, args=(numbers,q))
    p2 = Process(target=make_negative, args=(numbers,q))
    
    p1.start()
    p2.start()
    
    
    p1.join()
    p2.join()
    
    while not q.empty():
        print(q.get())

1
4
9
16
25
-1
-2
-3
-4
-5


## Process pool

#### A process pool can be used to maintain multiple processes .A process pool object control a pool of workers proccesses to which jobs can be submitted and it can manage the available process for you and split data into small chunks which can then be processed parallel by different processes. 

In [39]:
from multiprocessing import Pool

In [41]:
def cube(number):
    return number*number*number

if __name__ == "__main__":
    pool = Pool()
    # important pool methods-
    # map,apply,join,close
    numbers = range(10)
    # map it will create processes in your machine to the number of cores available and then split
    # the iterable into equal size of chunks and sumbit it to the function.
    result = pool.map(cube,numbers)
    #apply will execute a process with one value.
    pool.apply(cube,numbers[0])
    pool.close()
    pool.join()
    print(result)
    

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
