<a href="https://colab.research.google.com/github/shanvelc/module4/blob/main/M4_AST_02_OpenMP_C.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Assignment 2: OpenMP

## Learning Objectives

At the end of the experiment, you will be able to:

* understand the parallelization in python
* implement multiprocessing using OpenMP

In [None]:
#@title Walkthrough Video
from IPython.display import HTML
HTML("""<video width="420" height="240" controls>
<source src="https://cdn.chn.talentsprint.com/content/OpenMP.mp4">
</video>""")

## Information

OpenMP is an Application Program Interface (API), jointly defined by a group of major computer hardware and software vendors. It provides a portable, scalable model for developers of shared memory parallel applications

OpenMP is the easiest to use and requires the minimum learning overhead and  most key parallel design patterns can be learned with OpenMP.

**Concurrency:** A condition of a system in which multiple
tasks are logically active at one time.

**Parallelism:** A condition of a system in which multiple
tasks are actually active at one time.

![link text](https://cdn.iisc.talentsprint.com/CDS/Images/OpenMP_Concurrent_Parallel.JPG)

### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "" #@param {type:"string"}

In [None]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M4_AST_02_OpenMP_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://cds-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



#### Importing required packages

In [None]:
import concurrent.futures
# The concurrent.futures module provides a high-level interface for asynchronously executing callables.
import time
from multiprocessing import Pool
# Pool is for parallelizing the execution of a function across multiple input values
import numpy as np

Let us derive a function that performs an action with sleep time and execute it with concurrent pool executors

In [None]:
start = time.perf_counter() # return time as nanoseconds

# function to delay the execution of a program
def do_something(seconds):
    print('Sleeping {} second(s)...'.format(seconds))
    time.sleep(seconds)
    return f'Done Sleeping...{seconds}'

# chops iterables into a number of chunks which it submits to the pool as separate tasks.
with concurrent.futures.ProcessPoolExecutor() as executor:
    secs = [5, 4, 3, 2, 1]
    results = executor.map(do_something, secs)

finish = time.perf_counter()
print('Finished in {} second(s)'.format(finish-start))

### Parallelization in Python

Python does not thread very well. Specifically, Python has a very nasty drawback known as a Global Interpreter Lock (GIL). The GIL ensures that only one compute thread can run at a time. This makes multithreaded processing very difficult. Instead, the best way to go about doing things is to use multiple independent processes to perform the computations. This method skips the GIL, as each individual process has it’s own GIL that does not block the others. This is typically done using the multiprocessing module.

The pool object gives us a set of parallel workers we can use to parallelize our calculations. In particular, there is a map function (with identical syntax to the map() function used earlier), that runs a workflow in parallel.

Let’s try map() out with a test function that just runs sleep.

In [None]:
# function for time sleep with 0.1 sec
def sleeping(arg):
    time.sleep(0.1)

%timeit list(map(sleeping, range(24)))

%timeit is an ipython magic function, which can be used to time a particular piece of code (A single execution statement, or a single method).

To know more about `%timeit` click [here](https://ipython.org/ipython-doc/dev/interactive/magics.html#magic-timeit)

Now let’s try it in parallel:

In [None]:
pool = Pool(4)

In [None]:
%timeit pool.map(sleeping, range(24))

The multiprocessing module has a major limitation: it only accepts certain  functions, and in certain situations. For instance any class methods, lambdas, or functions defined in __main__ won't work. This is due to the way Python “pickles” (read: serializes) data and sends it to the worker processes. “Pickling” simply can’t handle a lot of different types of Python objects.

Fortunately, there is a fork of the multiprocessing module called *multiprocess* that works just fine. *multiprocess* uses dill instead of pickle to serialize Python objects (read: send your data and functions to the Python workers), and does not suffer the same issues. Usage is identical:

In [None]:
!pip install multiprocess
from multiprocess import Pool

In [None]:
# shut down the old workers
pool.close()

pool = Pool(4)
%timeit pool.map(lambda x: time.sleep(0.1), range(24))
pool.close()

This is a general purpose parallelization recipe that we can use for your Python projects.

In [None]:
# function to square the number
def square(x):
    return x**2

In [None]:
# make sure to always use multiprocess

number_of_cores = 4
# start your parallel workers at the beginning of your script
pool = Pool(number_of_cores)

start = time.perf_counter()

# execute a computation(s) in parallel
result = pool.map(square, range(24))
result2 = pool.map(square, range(24))

finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')

# turn off your parallel workers at the end of your script
pool.close()

### MultiProcessing in Python using openMP

#### OpenMP
OpenMP employs a few principles in its programming model. The first is that everything takes place in threads. The second is the fork-join model, which comprises parallel regions in which one or more threads can be used

![link text](https://cdn.iisc.talentsprint.com/CDS/Images/Fork_join.png)

Above figure depicts the illustration of the fork–join paradigm, in which three regions of the program permit parallel execution of the variously colored blocks. Sequential execution is displayed on the top, while its equivalent fork–join execution is on the bottom.

#### Pymp
Because the goal of Pymp is to bring OpenMP-like functionality to Python, Pymp and Python should naturally share some concepts. A single master thread forks into multiple threads, sharing data and then synchronizing (joining) and destroying the threads.

As with OpenMP applications, when Pymp Python code hits a parallel region, processes – termed child processes – are forked and are in a state that is nearly the same as the “master process.” Note that these are forked processes and not threads, as is typical with OpenMP applications. As for the shared memory, according to the Pymp website, “… the memory is not copied, but referenced. Only when a process writes into a part of the memory [does] it gets its own copy of the corresponding memory region. This keeps the processing overhead low (but of course not as low as for OpenMP threads).”

In [None]:
# install the pymp
!pip -qq install pymp-pypi

To keep things simple, this is a serial code with a single array.

In [None]:
# creating an array of zeros
ex_array = np.zeros((100,), dtype='uint8')
for index in range(0, 100):
    # assigning 1
    ex_array[index] = 1
    print('Yay! {} done!'.format(index))

Let's start with Pymp version of the same code by importing the pymp

In [None]:
import pymp

In [None]:
ex_array = pymp.shared.array((100,), dtype='uint8')
with pymp.Parallel(4) as p:
    for index in p.range(0, 100):
        ex_array[index] = 1
        # The parallel print function takes care of asynchronous output.
        p.print('Yay! {} done!'.format(index))

#### OpenMP variables

Every parallel context provides its number of threads and the current thread's thread_num in the same way OpenMP does:

In [None]:
with pymp.Parallel(4) as p:
    p.print(p.num_threads, p.thread_num)

The original thread entering the parallel context always has `thread_num` 0

#### Variable scopes

The only implemented variable scopes are first private, shared and private.

- All variables that are declared before the `pymp.Parallel` call are implicitly first private
- All variables from the `pymp.shared` module are shared
- All variables created within a `pymp.Parallel` context are private.

The package `pymp.shared` provides a numpy array wrapper accepting the standard datatype strings, as well as shared list, dict, queue, lock and rlock objects wrapped from multiprocessing. High performance shared memory (ctypes) data structues are array, lock and rlock, the other data structures are synchronized via a *multiprocessing.Manager* and hence a little slower.

All data structures must be synchronized manually, if required, by using a lock. The parallel context offers one for your convenience:

In [None]:
# int array
incremental_array = pymp.shared.array((1,), dtype='uint8')
print(incremental_array)
# list
no_of_threads = pymp.shared.list()

with pymp.Parallel(4) as p:
    for index in p.range(0, 100):
        with p.lock:
            no_of_threads.append(p.thread_num)
            incremental_array[0] += 1
print(incremental_array)
print(no_of_threads)
# check the no.of threads
set([i for i in no_of_threads])

In [None]:
incremental_array

#### Nested loops

When `pymp.config.nested is True`, it is possible to nest parallel contexts with the expected semantics:

**Uncomment the code below and execute the try except block again**

In [None]:
pymp.config.nested = True

In [None]:
# nested
try:
    with pymp.Parallel(4) as p1:
        with pymp.Parallel(2) as p2:
            p.print(p1.thread_num, p2.thread_num)
except:
    print("Its an Error!")

#### Laplace Solver Example

The common [Laplace solver](https://www.codeproject.com/Articles/1087025/Using-Python-to-Solve-Computational-Physics-Proble), is a little more detailed. The code is definitely not the most efficient, it uses loops

**Note:** Laplace solver is used as an example to calculate the computation time

In [None]:
nx = 1201
ny = 1201

# Solution and previous solution arrays
sol = np.zeros((nx,ny))

# make a copy of an array
soln = sol.copy()

for j in range(0,ny-1):
    sol[0,j] = 10.0
    sol[nx-1,j] = 1.0

for i in range(0,nx-1):
    sol[i,0] = 0.0
    sol[i,ny-1] = 0.0

In [None]:
# Iterate
start_time = time.perf_counter()
for kloop in range(1,10):
    soln = sol.copy()
    for i in range(1,nx-1):
        for j in range (1,ny-1):
            sol[i,j] = 0.25 * (soln[i,j-1] + soln[i,j+1] + soln[i-1,j] + soln[i+1,j])
end_time = time.perf_counter()

print('Elapsed wall clock time = %g seconds.' % (end_time-start_time) )

Same Implementation of laplace solver using Pymp

In [None]:
# Solution and previous solution arrays
sol = pymp.shared.array((nx,ny))
soln = pymp.shared.array((nx,ny))

for j in range(0,ny-1):
    sol[0,j] = 10.0
    sol[nx-1,j] = 1.0

for i in range(0,nx-1):
    sol[i,0] = 0.0
    sol[i,ny-1] = 0.0

# Iterate
start_time = time.perf_counter()
with pymp.Parallel(4) as p:
    for kloop in range(1,10):
        soln = sol.copy()
        for i in p.range(1,nx-1):
            for j in p.range (1,ny-1):
                sol[i,j] = 0.25 * (soln[i,j-1] + soln[i,j+1] + soln[i-1,j] + soln[i+1,j])

end_time = time.perf_counter()
print('Elapsed wall clock time = %g seconds.' % (end_time-start_time) )

### Please answer the questions below to complete the experiment:




In [None]:
# @title Complete the following statement by selecting the most appropriate option: Fork-join model is a parallel design pattern in which { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "" #@param ["","all the parallel processes stack one over the other and sequentially merge at the centre of the long stack","the parallel executions run in series as separate non sequential processes and merge at the end","the execution branches off in parallel at designated points in the program and merges at a subsequent point to resume sequential execution"]

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")