# **NUMBA**
[NUMBA Documentation](https://numba.readthedocs.io/en/stable/)

**What is NUMBA?**

NUMBA is a powerful Python library used for optimizing and accelerating numerical computations. It specializes in just-in-time (JIT) compilation, which means it translates Python functions into machine code at runtime, leading to significant speedups compared to traditional Python execution.

# **Key Features:**

1. Just-in-Time (JIT) Compilation: NUMBA dynamically compiles Python functions to machine code, resulting in faster execution. This is particularly beneficial for numerical computations where performance is crucial.
2. Decorators for Function Acceleration: NUMBA provides decorators like @jit that you can apply to Python functions to instruct NUMBA to compile them for optimized performance.
3. Support for Numerical Types: NUMBA supports various numerical types such as integers, floats, and complex numbers.
4. Integration with NumPy: NUMBA seamlessly integrates with NumPy, a popular numerical computing library in Python. This means one can accelerate their existing NumPy code by adding NUMBA decorators to their functions.
5. Parallel Execution: NUMBA supports parallel execution of code on multiple CPU cores or even GPUs. This enables the user to take advantage of parallelism to further speed up their computations, especially for tasks involving large datasets or complex algorithms.
6. Ease of Use: NUMBA is easy to use with basic python knowledge.

# **Installation**

pip install numba

This should help setting up the environment to use Numba


# **Diagnostics in NUMBA**

[Numba Diagnostics](https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics)

**What is Diagnostics?**

Diagnostic is a useful tool and feature provided by NUMBA to analyze and optimize the performace of compiled code. A feature of Numba, Profiling which helps in identifying errors and debugging the code. Diagnostics usually is helpful in troubleshooting any issues that arise during compilation or execution.

**Use of Diagnostics**



1. Profiling - This includes measuring execution times of functions and understanding memory usage.
2. Meaningful error messages - This includes providing informative error messages and warnings during compilation and execution.
3. Debugging - This includes debugging code by providing information about the compilation process, including line numbers, variable types, and optimization stages.



In [None]:
!pip install numpy



In [None]:
import numpy as np
import time

A = np.random.rand(100, 100)
B = np.random.rand(100, 100)
C = np.zeros((100, 100))

start_time = time.time()
num_executions = 0

for i in range(100):
    for j in range(100):
        for k in range(100):
            C[i][j] += A[i][k] * B[k][j]
            num_executions += 1

end_time = time.time()
execution_time = end_time - start_time
print("Number of executions of the line of code:", num_executions)
print("Execution time: {:.6f} seconds".format(execution_time))

Number of executions of the line of code: 1000000
Execution time: 1.973450 seconds


### **Sample Code using for loops without python compiler**

In [None]:
import numpy as np
from numba import prange, jit
import time

@jit(nopython=True, parallel=True)
def matrix_mul(A, B):
    rows_A, cols_A = A.shape
    rows_B, cols_B = B.shape

    C = np.zeros((rows_A, cols_B))

    for i in prange(rows_A):
        for j in prange(cols_B):
            for k in prange(cols_A):
                C[i][j] += A[i][k] * B[k][j]
    return C

start_time = time.time()
A = np.random.rand(5, 5)
B = np.random.rand(5, 5)
C = matrix_mul(A, B)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time: {:.6f} seconds".format(execution_time))
print('\nResult: \n', C)

Execution time: 1.114748 seconds

Result: 
 [[1.66523292 1.44073188 1.50643627 1.25258289 0.79583439]
 [2.04264979 1.81689585 1.82406899 1.58781296 1.52281961]
 [1.30247635 1.27670653 1.27587483 1.00976042 1.08711463]
 [2.67430027 2.42565991 2.5245097  2.08145241 1.89493203]
 [1.57343789 1.46454058 1.34209443 1.20051057 1.1092336 ]]


### **Sample code using for loops with Python compiler**

In [None]:
import numpy as np
from numba import prange, jit
import time

@jit(nopython=False, parallel=True)
def matrix_mul(A, B):
  rows_A, cols_A = A.shape
  rows_B, cols_B = B.shape

  C = np.zeros((rows_A, cols_B))

  for i in prange(rows_A):
    for j in prange(cols_B):
      for k in prange(cols_A):
        C[i][j] += A[i][k] * B[k][j]

  return C

start_time = time.time()
A = np.random.rand(5, 5)
B = np.random.rand(5, 5)
C = matrix_mul(A, B)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time: {:.6f} seconds".format(execution_time))
print('\nResult: \n', C)

  @jit(nopython=False, parallel=True)


Execution time: 1.079202 seconds

Result: 
 [[2.35990832 1.10290646 1.52361174 1.30498967 1.89225645]
 [1.77751358 0.81817857 1.03788781 1.05353666 1.04390853]
 [1.27849061 0.67481473 0.90808839 0.8843647  0.95246578]
 [2.21154523 0.97375101 1.09121882 1.33523135 1.30408477]
 [2.93460294 1.4905548  1.76861435 1.77684032 2.20153892]]


Here in this above code the Numba decorator @jit has the keywords nopython and parallel, which represents if the code needs to use the python compiler while compiling and how the compiler needs to run

### **Using Prange**

In [None]:
import cProfile
import numpy as np
from numba import prange, jit
import time

@jit(nopython=True, parallel=True)
def matrix_mul(A, B):
  rows_A, cols_A = A.shape
  rows_B, cols_B = B.shape

  C = np.zeros((rows_A, cols_B))

  for i in prange(rows_A):
    for j in prange(cols_B):
      for k in prange(cols_A):
        C[i][j] += A[i][k] * B[k][j]

  return C

start_time = time.time()
A = np.random.rand(5, 5)
B = np.random.rand(5, 5)
C = matrix_mul(A, B)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time: {:.6f} seconds".format(execution_time))
print('\nResult: \n', C)
cProfile.run('matrix_mul(A, B)')

Execution time: 0.960954 seconds

Result: 
 [[1.32957628 1.43724936 1.20425947 0.68876728 1.26749742]
 [1.27157914 1.58222414 0.56477194 0.67318835 1.07186891]
 [1.21219102 1.55435008 0.57961429 0.58149866 0.89090795]
 [1.12536815 1.46023993 0.61263766 0.78507714 1.01741634]
 [1.66955412 1.91579661 0.74602997 0.92502417 1.52828929]]
         5 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <ipython-input-15-e5eae5ba16fd>:6(matrix_mul)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 serialize.py:30(_numba_unpickle)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




In [None]:
!pip install line_profiler



In [None]:
%%writefile harshith.py
import numpy as np
from numba import prange, jit
import time

@profile
def matrix_mul(A, B):
  rows_A, cols_A = A.shape
  rows_B, cols_B = B.shape

  C = np.zeros((rows_A, cols_B))

  for i in prange(rows_A):
    for j in prange(cols_B):
      for k in prange(cols_A):
        C[i][j] += A[i][k] * B[k][j]

  return C

start_time = time.time()
A = np.random.rand(5, 5)
B = np.random.rand(5, 5)
C = matrix_mul(A, B)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time: {:.6f} seconds".format(execution_time))
print('\nResult: \n', C)

Overwriting harshith.py


In [None]:
!kernprof -l harshith.py

Execution time: 0.000786 seconds

Result: 
 [[1.71713507 0.60067351 1.02303811 1.11876789 1.68303027]
 [2.46806613 1.28387016 1.32622231 1.79139634 2.36357834]
 [0.7445736  0.3445523  0.50744407 0.36034433 0.69746194]
 [1.41549689 0.71025133 0.75670762 1.12908796 1.15806736]
 [1.60744284 1.10489936 0.62010194 1.62123415 1.37756766]]
Wrote profile results to harshith.py.lprof
Inspect results with:
python3 -m line_profiler -rmt "harshith.py.lprof"


In [None]:
!python3 -m line_profiler -rmt "harshith.py.lprof"

Timer unit: 1e-06 s

Total time: 0.000487168 s
File: harshith.py
Function: matrix_mul at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
     [1;36m5[0m                                           [92;49m@profile[0m                            
     [1;36m6[0m                                           [96;49mdef[0m[97;49m [0m[92;49mmatrix_mul[0m[97;49m([0m[97;49mA[0m[97;49m,[0m[97;49m [0m[97;49mB[0m[97;49m)[0m[97;49m:[0m               
     [1;36m7[0m         [1;36m1[0m          [1;36m4.7[0m      [1;36m4.7[0m      [1;36m1.0[0m  [97;49m  [0m[97;49mrows_A[0m[97;49m,[0m[97;49m [0m[97;49mcols_A[0m[97;49m [0m[91;49m=[0m[97;49m [0m[97;49mA[0m[91;49m.[0m[97;49mshape[0m          
     [1;36m8[0m         [1;36m1[0m          [1;36m0.8[0m      [1;36m0.8[0m      [1;36m0.2[0m  [97;49m  [0m[97;49mrows_B[0m[97;49m,[0m[97;49m [0m[97;49mcols_B[0m[97;49m [0m[91;49m=[0m[97;49m [0m[97;49mB[0m[91;49m.

# **Numba Diagnostics**
A short introduction to the diagnostics generated by the NUMBA compiler.

A very usueful document to understand what parallelizing compilers do is the following Survey:

[Compiler Transformation for High-Performance Computing](https://www.google.com/url?q=https%3A%2F%2Fengineering.purdue.edu%2F%7Eeigenman%2Fapp%2Fbacon-compiling4hpc.pdf) published in the ACM Computing Surveys in December of 1994. The survey was written by Bacon, Graham and Sharp. We strongly encourage you to read the document before starting to use this notebook. It will give you a framework to understand what the NUMBA parallelizing compiler does to Python code.

The next code cells are taken from: [Numba's documentation on diagnostics](https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics)

The parallel option for jit() can produce diagnostic information about the transforms undertaken in automatically parallelizing the decorated code. This information can be accessed in two ways, the first is by setting the environment variable NUMBA_PARALLEL_DIAGNOSTICS, the second is by calling parallel_diagnostics(), both methods give the same information and print to STDOUT. The level of verbosity in the diagnostic information is controlled by an integer argument of value between 1 and 4 inclusive, 1 being the least verbose and 4 the most. For example:

In [None]:
!pip install numba



In [None]:
print('Hello Numba')

Hello Numba


Notice that when we use NUMBA, we are using two special features that are not part of the regular python language:

The decorator njit, which instructs numba to try to parallelize the function that follows.
The keyword prange, which has a special meaning in NUMBA, indicating that the for statement where prange is used should be treated as a parallel for loop, not as a regular sequential for loop.
Depending on the code in the function, NUMBA might or might not be able to produce parallel code.

The diagnostics option in NUMBA provides feedback to the user about the success or failure of the attempt to parallelize.

In [None]:
from numba import njit,prange
import numpy as np
@njit(parallel=True)
def test(x):
    n = x.shape[0]
    a = np.sin(x)
    b = np.cos(a * a)
    acc = 0
    for i in prange(n - 2):
        for j in prange(n - 1):
            acc += b[i] + b[j + 1]
    return acc

In [None]:
t = np.arange(10)
print(t)
test(t)
test.parallel_diagnostics(level=4)

[0 1 2 3 4 5 6 7 8 9]
 
 Parallel Accelerator Optimizing:  Function test, <ipython-input-3-d35ef36e144b>
 (3)  


Parallel loop listing for  Function test, <ipython-input-3-d35ef36e144b> (3) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def test(x):                          | 
    n = x.shape[0]                    | 
    a = np.sin(x)---------------------| #0
    b = np.cos(a * a)-----------------| #1
    acc = 0                           | 
    for i in prange(n - 2):-----------| #3
        for j in prange(n - 1):-------| #2
            acc += b[i] + b[j + 1]    | 
    return acc                        | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #0 and #1:
    - fusion succeeded: parallel for-loop #1 is fused into for-loop #0.
  Trying to fuse loops #0 and #3:
    - fusion failed: loop dimension misma

To aid users unfamiliar with the transforms undertaken when the parallel option is used, and to assist in the understanding of the subsequent sections, the following definitions are provided:

* **Loop fusion**

  Loop fusion is a technique whereby loops with equivalent bounds may be combined under certain conditions to produce a loop with a larger body (aiming to improve data locality).

* **Loop serialization**

  Loop serialization occurs when any number of prange driven loops are present inside another prange driven loop. In this case the outermost of all the prange loops executes in parallel and any inner prange loops (nested or otherwise) are treated as standard range based loops. Essentially, nested parallelism does not occur.

* **Loop invariant code motion**

  Loop invariant code motion is an optimization technique that analyses a loop to look for statements that can be moved outside the loop body without changing the result of executing the loop, these statements are then “hoisted” out of the loop to save repeated computation.

* **Allocation hoisting**

  Allocation hoisting is a specialized case of loop invariant code motion that is possible due to the design of some common NumPy allocation methods. Explanation of this technique is best driven by an example:


In [None]:
@njit(parallel=True)
def test(n):
    results = np.zeros((n, 50, 50))
    for i in prange(n):
        temp = np.zeros((50, 50))
        for j in range(50):
            temp[j, j] = i
        results[i] = temp
    return results

n = 5
results = test(n)
print(results)
# test.parallel_diagnostics(level=4)

[[[0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[1. 0. 0. ... 0. 0. 0.]
  [0. 1. 0. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 1. 0. 0.]
  [0. 0. 0. ... 0. 1. 0.]
  [0. 0. 0. ... 0. 0. 1.]]

 [[2. 0. 0. ... 0. 0. 0.]
  [0. 2. 0. ... 0. 0. 0.]
  [0. 0. 2. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 2. 0. 0.]
  [0. 0. 0. ... 0. 2. 0.]
  [0. 0. 0. ... 0. 0. 2.]]

 [[3. 0. 0. ... 0. 0. 0.]
  [0. 3. 0. ... 0. 0. 0.]
  [0. 0. 3. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 3. 0. 0.]
  [0. 0. 0. ... 0. 3. 0.]
  [0. 0. 0. ... 0. 0. 3.]]

 [[4. 0. 0. ... 0. 0. 0.]
  [0. 4. 0. ... 0. 0. 0.]
  [0. 0. 4. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 4. 0. 0.]
  [0. 0. 0. ... 0. 4. 0.]
  [0. 0. 0. ... 0. 0. 4.]]]


In [None]:
@njit(parallel=True)
def test(n):
    for i in prange(n):
        temp = np.empty((50, 50))
        temp[:] = 0
        for j in range(50):
            temp[j, j] = i
            results[i] = temp
    return results

### **After Hoisting**

In [None]:
@njit(parallel=True)
def test(n):
    temp = np.empty((50, 50))
    for i in prange(n):
        temp[:] = 0
        for j in range(50):
            temp[j, j] = i
            results[i] = temp
    return results

It can be seen that the np.zeros allocation is split into an allocation and an assignment, and then the allocation is hoisted out of the loop in i, this producing more efficient code as the allocation only occurs once.

# **The parallel diagnostics report sections**

1. **Code annotation**

  This is the first section and contains the source code of the decorated function with loops that have parallel semantics identified and enumerated. The loop #ID column on the right of the source code lines up with identified parallel loops. From the example, #0 is np.sin, #1 is np.cos and #2 and #3 are prange():

In [None]:
@njit(parallel=True)
def test(x):
    n = x.shape[0]
    a = np.sin(x)
    b = np.cos(a * a)
    acc = 0
    for i in prange(n - 2):
        for j in prange(n - 1):
            acc += b[i] + b[j + 1]
    return acc

It is worth noting that the loop IDs are enumerated in the order they are discovered which is not necessarily the same order as present in the source. Further, it should also be noted that the parallel transforms use a static counter for loop ID indexing. As a consequence it is possible for the loop ID index to not start at 0 due to use of the same counter for internal optimizations/transforms taking place that are invisible to the user.

2. **Fusing loops**

  This section describes the attempts made at fusing discovered loops noting which succeeded and which failed. In the case of failure to fuse a reason is given (e.g. dependency on other data). From the example:



In [None]:
@njit(parallel=True)
def fused_test(x):
    n = x.shape[0]
    a = np.sin(x)
    b = np.cos(a * a)
    acc = 0
    for i in prange(n - 2):
        for j in prange(n - 1):
            acc += b[i] + b[j + 1]
    return acc

x = np.arange(10)
fused_test(x)
fused_test.parallel_diagnostics(level=4)

 
 Parallel Accelerator Optimizing:  Function fused_test, <ipython-
input-20-70044bb7d342> (1)  


Parallel loop listing for  Function fused_test, <ipython-input-20-70044bb7d342> (1) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def fused_test(x):                    | 
    n = x.shape[0]                    | 
    a = np.sin(x)---------------------| #24
    b = np.cos(a * a)-----------------| #25
    acc = 0                           | 
    for i in prange(n - 2):-----------| #27
        for j in prange(n - 1): ------| #26
            acc += b[i] + b[j + 1]    | 
    return acc                        | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #24 and #25:
    - fusion succeeded: parallel for-loop #25 is fused into for-loop #24.
  Trying to fuse loops #24 and #27:
    - fusion failed: loop dimension mis

It can be seen that fusion of loops #0 and #1 was attempted and this succeeded (both are based on the same dimensions of x). Following the successful fusion of #0 and #1, fusion was attempted between #0 (now including the fused #1 loop) and #3. This fusion failed because there is a loop dimension mismatch, #0 is size x.shape whereas #3 is size x.shape[0] - 2.

3. **Before Optimization**

  This section shows the structure of the parallel regions in the code before any optimization has taken place, but with loops associated with their final parallel region (this is to make before/after optimization output directly comparable). Multiple parallel regions may exist if there are loops which cannot be fused, in this case code within each region will execute in parallel, but each parallel region will run sequentially. From the example:



In [None]:
@njit(parallel=True)
def test(x):
    n = x.shape[0]
    a = np.sin(x)
    b = np.cos(a * a)
    acc = 0
    for i in prange(n - 2):
        for j in prange(n - 1):
            acc += b[i] + b[j + 1]
    return acc

x = np.arange(10)
test(x)
test.parallel_diagnostics(level=4)

 
 Parallel Accelerator Optimizing:  Function test, <ipython-
input-22-81c91a65f043> (1)  


Parallel loop listing for  Function test, <ipython-input-22-81c91a65f043> (1) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def test(x):                          | 
    n = x.shape[0]                    | 
    a = np.sin(x)---------------------| #32
    b = np.cos(a * a)-----------------| #33
    acc = 0                           | 
    for i in prange(n - 2):-----------| #35
        for j in prange(n - 1): ------| #34
            acc += b[i] + b[j + 1]    | 
    return acc                        | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #32 and #33:
    - fusion succeeded: parallel for-loop #33 is fused into for-loop #32.
  Trying to fuse loops #32 and #35:
    - fusion failed: loop dimension mismatched in a

As alluded to by the Fusing loops section, there are necessarily two parallel regions in the code. The first contains loops #0 and #1, the second contains #3 and #2, all loops are marked parallel as no optimization has taken place yet.

4. **After Optimization**

  This section shows the structure of the parallel regions in the code after optimization has taken place. Again, parallel regions are enumerated with their corresponding loops but this time loops which are fused or serialized are noted and a summary is presented. From the example:

In [None]:
@njit(parallel=True)
def test(x):
    n = x.shape[0]
    a = np.sin(x)
    b = np.cos(a * a)
    acc = 0
    for i in prange(n - 2):
        for j in prange(n - 1):
            acc += b[i] + b[j + 1]
    return acc

x = np.arange(10)
test(x)
test.parallel_diagnostics(level=4)

 
 Parallel Accelerator Optimizing:  Function test, <ipython-
input-23-81c91a65f043> (1)  


Parallel loop listing for  Function test, <ipython-input-23-81c91a65f043> (1) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def test(x):                          | 
    n = x.shape[0]                    | 
    a = np.sin(x)---------------------| #36
    b = np.cos(a * a)-----------------| #37
    acc = 0                           | 
    for i in prange(n - 2):-----------| #39
        for j in prange(n - 1): ------| #38
            acc += b[i] + b[j + 1]    | 
    return acc                        | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #36 and #37:
    - fusion succeeded: parallel for-loop #37 is fused into for-loop #36.
  Trying to fuse loops #36 and #39:
    - fusion failed: loop dimension mismatched in a

It can be noted that parallel region 0 contains loop #0 and, as seen in the fusing loops section, loop #1 is fused into loop #0. It can also be noted that parallel region 1 contains loop #3 and that loop #2 (the inner prange()) has been serialized for execution in the body of loop #3.

5. **Loop invariant code motion**

  This section shows for each loop, after optimization has occurred:

  * The instructions that failed to be hoisted and the reason for failure
(dependency/impure).

  * The instructions that were hoisted.

  * Any allocation hoisting that may have occurred.

  From the example:

In [None]:
@njit(parallel=True)
def test(x):
    n = x.shape[0]
    a = np.sin(x)
    b = np.cos(a * a)
    acc = 0
    for i in prange(n - 2):
        for j in prange(n - 1):
            acc += b[i] + b[j + 1]
    return acc

x = np.arange(10)
test(x)
test.parallel_diagnostics(level=4)

 
 Parallel Accelerator Optimizing:  Function test, <ipython-
input-24-81c91a65f043> (1)  


Parallel loop listing for  Function test, <ipython-input-24-81c91a65f043> (1) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def test(x):                          | 
    n = x.shape[0]                    | 
    a = np.sin(x)---------------------| #40
    b = np.cos(a * a)-----------------| #41
    acc = 0                           | 
    for i in prange(n - 2):-----------| #43
        for j in prange(n - 1): ------| #42
            acc += b[i] + b[j + 1]    | 
    return acc                        | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #40 and #41:
    - fusion succeeded: parallel for-loop #41 is fused into for-loop #40.
  Trying to fuse loops #40 and #43:
    - fusion failed: loop dimension mismatched in a

The first thing to note is that this information is for advanced users as it refers to the Numba IR of the function being transformed. As an example, the expression a * a in the example source partly translates to the expression $arg_out_var.17 = $expr_out_var.9 * $expr_out_var.9 in the IR, this clearly cannot be hoisted out of loop #0 because it is not loop invariant! Whereas in loop #3, the expression $const58.3 = const(int, 1) comes from the source b[j + 1], the number 1 is clearly a constant and so can be hoisted out of the loop.

# **Sample Code Snippets**

In [None]:
import numpy as np
from numba import njit, prange

@njit(parallel=True)
def square_elements(arr):
    for i in prange(arr.shape[0]):
        arr[i] = arr[i] ** 2

arr = np.array([1, 2, 3, 4, 5])
square_elements(arr)
print(arr)
square_elements.parallel_diagnostics(level=4)

[ 1  4  9 16 25]
 
 Parallel Accelerator Optimizing:  Function square_elements, <ipython-
input-4-946b419b96e4> (4)  


Parallel loop listing for  Function square_elements, <ipython-input-4-946b419b96e4> (4) 
--------------------------------------|loop #ID
@njit(parallel=True)                  | 
def square_elements(arr):             | 
    for i in prange(arr.shape[0]):----| #3
        arr[i] = arr[i] ** 2          | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
-----------------------------------------------------

In [None]:
from numba import njit, prange
import numpy as np

@njit(parallel=True)
def parallel_multiply(a, b):
    result = np.zeros_like(a)
    for i in prange(a.shape[0]):
        result[i] = a[i] * b[i]
    return result

a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
result = parallel_multiply(a, b)
print("Result:", result)
parallel_multiply.parallel_diagnostics(level=4)

Result: [5 8 9 8 5]
 
 Parallel Accelerator Optimizing:  Function parallel_multiply, <ipython-
input-5-2320cfa799ca> (4)  


Parallel loop listing for  Function parallel_multiply, <ipython-input-5-2320cfa799ca> (4) 
------------------------------------|loop #ID
@njit(parallel=True)                | 
def parallel_multiply(a, b):        | 
    result = np.zeros_like(a)       | 
    for i in prange(a.shape[0]):----| #4
        result[i] = a[i] * b[i]     | 
    return result                   | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
-----------------------------------------------------------

In [None]:
from numba import njit, prange
import numpy as np

@njit(parallel=True)
def parallel_sort(arr):
    for i in prange(len(arr) - 1):
        for j in range(i + 1, len(arr)):
            if arr[i] > arr[j]:
                arr[i], arr[j] = arr[j], arr[i]
    return arr

arr = np.array([5, 2, 9, 1, 5, 6, 3])
result = parallel_sort(arr.copy())
print("Sorted Array:", result)
parallel_sort.parallel_diagnostics(level=4)

Sorted Array: [1 2 3 5 5 6 9]
 
 Parallel Accelerator Optimizing:  Function parallel_sort, <ipython-
input-6-e346758eb564> (4)  


Parallel loop listing for  Function parallel_sort, <ipython-input-6-e346758eb564> (4) 
---------------------------------------------------|loop #ID
@njit(parallel=True)                               | 
def parallel_sort(arr):                            | 
    for i in prange(len(arr) - 1):-----------------| #5
        for j in range(i + 1, len(arr)):           | 
            if arr[i] > arr[j]:                    | 
                arr[i], arr[j] = arr[j], arr[i]    | 
    return arr                                     | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------