<a href="https://colab.research.google.com/github/pallavibekal/IISC--Parallel-Computing/blob/main/2200092_M2_AST_07_Numba_C.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Assignment 7: Introduction to Numba

## Learning Objectives

At the end of the experiment, you will be able to:

* use the jit decorator to improve the performance
* understand the difference between Numba’s compilation modes
* understand limitations of Numba with examples
* vectorize code for use as a ufunc

## Information

#### Numba in a Nutshell

Numba is a Python module which translates a subset of Python and NumPy code into high-speed machine code. Numba allows the compilation of selected portions of pure Python code to native code, and generates optimized machine code using the LLVM (Low Level Virtual Machine) compiler infrastructure.

With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time (JIT) optimized to achieve performance similar to C, C++ and Fortran, without having to switch languages or Python interpreters.

**High-Level architecture of Numba**

The Numba translation process can be translated in a set of important steps ranging from the Bytecode analysis to the final machine code generation. The picture bellow illustrates this process, where the green boxes correspond to the frontend of the Numba compiler and the blue boxes belong to the backend.

![Image](https://cdn.iisc.talentsprint.com/CDS/Images/numba.png)

To know more about Numba click [here](https://towardsdatascience.com/speed-up-your-algorithms-part-2-numba-293e554c5cc1)


### Setup Steps:

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2200092" #@param {type:"string"}

In [None]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "9686800288" #@param {type:"string"}

In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()
  
notebook= "M2_AST_07_Numba_C" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")  
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None
    
    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:        
        print(r["err"])
        return None   
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://cds.iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if not Additional: 
      raise NameError
    else:
      return Additional  
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None
  
  
# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None
  
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None
  

def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError 
    else: 
      return Answer
  except NameError:
    print ("Please answer Question")
    return None
  

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup() 
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


Importing necessary packages

In [None]:
from numba import * # Importing all the functions present in numba package
import numpy as np # Importing numpy package under a name np

Let us first write a small python code to find the sums of all the elements of a given array and then understand its implementation using numba.

In [None]:
# Python version code
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum         
    return total # returning the sum of elements of an array

In [None]:
A = np.random.random((200,200)) # Generating a numpy array
ArraySum(A) # Calling the ArraySum function

19910.105270415013

Now let us time the execution of ArraySum function while calculating the sum of elements in array 'A'

In [None]:
# timing the execution
%timeit ArraySum(A)

100 loops, best of 5: 12.3 ms per loop


To know more about the timeit function click [here](https://docs.python.org/3/library/timeit.html)

Now let us see how to speed up execution of ArraySum function while calculating the sum of elements in array 'A' using numba

**Jit as function call**

In [None]:
sum_array_numba = jit()(ArraySum) # Calling the jit compiler 

The function **sum_array_numba** is a version of **ArraySum** that is “targeted” for JIT-compilation.

In [None]:
# Timing the excution of sum_array_numba function 

%timeit sum_array_numba(A)

The slowest run took 5107.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 5: 60 µs per loop


From the above codes, we can see that execution of the code gets faster using JIT Compiler. Now let us write numpy version of the code to calculate the sum of elements in an array and timeit

In [None]:
A.sum() #using in-built sum function to find sum of elements in an array (Its better idea; Pythonic style)

19910.10527041481

In [None]:
# Timing the code
%timeit A.sum()

The slowest run took 47.19 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 16.9 µs per loop


To know more about the sum function click [here](https://docs.python.org/3/library/functions.html#sum)

In the above code, we have created a JIT compiled version **ArraySum** of via the call **jit()(ArraySum)**. In practice this would typically be done using an alternative **decorator** syntax.

To know more about Python decorators click [here](https://link.medium.com/rixEI1907db)

**Decorator Notation**

 To target a function for JIT compilation we will put **@jit** before the ArraySum function definition.

In [None]:
@jit
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum         
    return total # returning the sum of elements of an array

In [None]:
# Timing the execution
%timeit ArraySum(A)

The slowest run took 2121.90 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 61.5 µs per loop


#### Think for a While!! 

- How does Numba get the code to run quickly?

Numba examines Python bytecode and then translates this into an 'intermediate representation'. We can view this using inspect_types method.

In [None]:
ArraySum.inspect_types() # Inspecting the types

ArraySum (array(float64, 2d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-12-75c42a977f09>
# --- LINE 1 --- 

@jit

# --- LINE 2 --- 

# Defining a function

# --- LINE 3 --- 

def ArraySum(array):

    # --- LINE 4 --- 
    # label 0
    #   array = arg(0, name=array)  :: array(float64, 2d, C)
    #   $4load_attr.1 = getattr(value=array, attr=shape)  :: UniTuple(int64 x 2)
    #   $6unpack_sequence.4 = exhaust_iter(value=$4load_attr.1, count=2)  :: UniTuple(int64 x 2)
    #   del $4load_attr.1
    #   $6unpack_sequence.2 = static_getitem(value=$6unpack_sequence.4, index=0, index_var=None)  :: int64
    #   $6unpack_sequence.3 = static_getitem(value=$6unpack_sequence.4, index=1, index_var=None)  :: int64
    #   del $6unpack_sequence.4
    #   m = $6unpack_sequence.2  :: int64
    #   del $6unpack_sequence.2
    #   n = $6unpack_sequence.3  :: int64
    #   del $6unpack_sequence.3

    m, n = array.shape # shape of a array

From the above results, we can infer that 
- every line of Python code is preceded by several lines of Numba IR(Intermediate Representations) code that gives a glimpse into what Numba is doing to the Python code behind the scenes. 
- at the end of most lines there are type annotations that show how Numba is treating variables and function calls.

### Compilation modes

There are two important modes: nopython and object. The nopython completely avoids the python interpreter and translates the full code to native instructions that can be run without the help of Python . However, if for some reason, that mode is not available (for example, when using unsupported Python features or external libraries) the compilation will fall back to the object mode, where it uses the Python interpreter when it is unable to compile some code . Naturally, the nopython mode is the one which offers the best performance gains.

**nopython mode**

In [None]:
@jit(nopython=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum         
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

The slowest run took 1965.56 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 61.9 µs per loop


#### Compilation flags for jit

There are two other main compilation flags for @jit

**a. cache mode**

if we don't always want to be caught up in compile time for each run, we could use cache mode. This will actually save the compiled function into something like a pyc file in your \__pycache\__ directory, so even between sessions we should have fast performance of the function / code.

In [None]:
@jit(cache=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum         
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

The slowest run took 2171.02 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 61.9 µs per loop


**b. nogil mode**

Whenever Numba optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python’s global interpreter lock (GIL). Numba will release the GIL when entering such a compiled function if you passed nogil=True.

To know more about nogil mode click [here](https://docs.python.org/3/glossary.html#term-global-interpreter-lock)

In [None]:
# Performing multi-threading using nogil 
@jit(nogil=True) # Option to release the gil
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum         
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

The slowest run took 2016.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 5: 61.7 µs per loop


Now let us add Add fastmath=True to trade accuracy for speed in some computations and time it

In [None]:
@jit(fastmath=True)
# Defining a function
def ArraySum(array):
    m, n = array.shape # shape of a array
    # This is a bad idea of calculating sum of elements in array(Not  Pythonic style)
    total = 0 # Defining a variable
    for j in range(m): # iterating over rows
        for i in range(n): # iterating over columns
            total += array[j, i] # calculating the sum         
    return total # returning the sum of elements of an array

In [None]:
# Calling the above defined function and timing it
%timeit ArraySum(A)

The slowest run took 17290.38 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.79 µs per loop


#### ParallelAccelerator

- ParallelAccelerator is a special compiler pass contributed by Intel Labs
    - Todd A. Anderson, Ehsan Totoni, Paul Liu
    - Based on similar contribution to Julia 
- Automatically generates mulithreaded code in a Numba compiled-function:
    - Array expressions and reductions
    - Random functions
    - Dot products
    - Reductions
    - Explicit loops indicated with prange() call
    
To know more about Parallel Accelerator click [here](https://numba.pydata.org/numba-doc/dev/user/parallel.html)


Now let us add Parallel = True tag in the @jil to use multi-core CPU via threading and to perform automatic parallelization

In [None]:
# without using parallel tag

@jit
def f(x): # Defining a function
    return np.cos(x) ** 2 + np.sin(x) ** 2 # calculating the value

In [None]:
data = np.random.random((10000000))

In [None]:
%timeit f(data)

1 loop, best of 5: 346 ms per loop


In [None]:
# Using parallel tag
@jit(parallel=True)
def f(x):
    return np.cos(x) ** 2 + np.sin(x) ** 2

In [None]:
%timeit f(data)



1 loop, best of 5: 343 ms per loop


Before we drive deep into Numba, let us try to understand few limitations of Numba

In [None]:
# Example 1
@jit
def hello(n):
    return ["hell0", 44] * 4

In [None]:
%timeit hello(1)

Compilation is falling back to object mode WITH looplifting enabled because Function "hello" failed type inference due to: No implementation of function Function(<built-in function mul>) found for signature:
 
 >>> mul(LiteralList((Literal[str](hell0), Literal[int](44))), Literal[int](4))
 
There are 12 candidate implementations:
  - Of which 6 did not match due to:
  Overload in function 'MulList.generic': File: numba/core/typing/listdecl.py: Line 152.
    With argument(s): '(Poison<LiteralList((Literal[str](hell0), Literal[int](44)))>, int64)':
   Rejected as the implementation raised a specific error:
     TypingError: Poison type used in arguments; got Poison<LiteralList((Literal[str](hell0), Literal[int](44)))>
  raised from /usr/local/lib/python3.7/dist-packages/numba/core/types/functions.py:235
  - Of which 5 did not match due to:
  Overload of function 'mul': File: <numerous>: Line N/A.
    With argument(s): '(LiteralList((Literal[str](hell0), Literal[int](44))), Literal[int](4

The slowest run took 170920.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 344 ns per loop


After the above code, we will get the desired output but with a warning as Compilation is falling back to object mode. Now let us run the above code in nopython mode to see the limitation.

In [None]:
# Example 1
@jit(nopython=True)
def hello(n):
    return ["hell0", 44] 

In [None]:
# Example 2
@jit(nopython=True)
def display():
    data = {"numbers":[1, 3, 4], "evens":[2, 4, 6]}
    return data["numbers"]

To know more about limitations of Numba click [here](https://www.oreilly.com/library/view/python-high-performance/9781787282896/6e5cc5c4-ad53-4657-b502-6630dd9efced.xhtml)

#### Universal Functions (Ufuncs)

- Ufuncs are a core concept in NumPy for array-oriented computing.
- A function with scalar inputs is broadcast across the elements of the input arrays:
    - np.add([1, 2, 3], 5) = [6, 7, 8]
- Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested.

To know more about Numpy Ufuncs click [here](https://numpy.org/doc/stable/reference/ufuncs.html)

In [None]:
# Numpy ufuncs
print(np.add(4, 5)) # Adding two numbers
print(np.add([1, 4, 5], 6)) # Adding 6 to the elements in the list
print(np.add(1, [3, 4])) # Adding 1 to the elements in the list
print(np.add.accumulate([4, 5, 7, 2, 4])) # Accumulate the result of applying the operator to all elements.

9
[ 7 10 11]
[4 5]
[ 4  9 16 18 22]


In [None]:
# Numba ufuncs
# Function to add two values
@vectorize("(int64, int64)")
def add(x, y):
    # adding the values
    return x + y

In [None]:
print(add(4, 5)) # Adding two numbers
print(add([1, 4, 5], 6)) # Adding 6 to the elements in the list
print(add(1, [3, 4])) # Adding 1 to the elements in the list
print(add.accumulate([4, 5, 7, 2, 4])) # Accumulate the result of applying the operator to all elements.

9
[ 7 10 11]
[4 5]
[ 4  9 16 18 22]


To know more about vectorize decorator click [here](https://numba.pydata.org/numba-doc/dev/user/vectorize.html)

#### Research Question

1. Write a code to approximate $\pi$ by Monte Carlo and, compare speed with and without Numba when the sample size is large.

    To know about $\pi$ by Monte Carlo click [here](https://medium.com/cantors-paradise/estimating-%CF%80-using-monte-carlo-simulations-3459a84b5ef9)

### Please answer the questions below to complete the experiment:




In [None]:
# @title Select the FALSE statement below: { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "nopython=True compiles the decorated function so that it will run entirely with the involvement of the Python interpreter" #@param ["","Just-in-time (JIT) compilation means compilation of a function at execution time, as opposed to compilation of a function in a separate step before running the program code", "nopython=True compiles the decorated function so that it will run entirely with the involvement of the Python interpreter", "Numba is a library that performs JIT compilation that translates pure python code to optimized machine code at runtime using the LLVM industry standard compiler"]

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "na" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Somewhat Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 3645
Date of submission:  11 Sep 2021
Time of submission:  23:03:36
View your submissions: https://cds.iisc.talentsprint.com/notebook_submissions
