**Todo list before start**
1. pip install numba


In [1]:
# importing the necessary libraries
import numpy as np
import time
from numba import njit

2.1.1 Implement the example of loop-unrolling in Python. Ignore the use of pointers and use a while loop instead of for loop. The reason is that Python for loops are automatically optimized in the background. 

In [10]:
# Data
N = 100000000
X = np.random.rand(N)

In [11]:
# Normal Loop, using while
@njit
def sum_normal(x):
    i = 0
    s = 0.0
    n = len(x)
    while i < n:
        s += x[i]
        i += 1
    return s



In [12]:
# Unroll the loop by 2 steps
@njit
def sum_unroll_2(x):
    i = 0
    s = 0.0
    n = len(x)
    while i < n - 1:
        s += x[i] 
        s += x[i + 1]
        i += 2
    while i < n:   # Handle the remaining element if n is odd
        s += x[i]
        i += 1
    return s
    

In [13]:
# Unroll the loop by 4 steps
@njit
def sum_unroll_4(x):
    i = 0
    s = 0.0
    n = len(x)
    while i < n - 3:
        s += x[i] 
        s += x[i + 1]
        s += x[i + 2]
        s += x[i + 3]
        i += 4
    while i < n:   # Handle the remaining elements if n is not a multiple of 4
        s += x[i]
        i += 1
    return s

In [21]:
# Let's measure the execution time of each function
def measure_time(func, x):
    start_time = time.time()
    result = func(x)
    end_time = time.time()
    print(f"{func.__name__} Sum: {result}, Time taken: {end_time - start_time:.6f} seconds")

measure_time(sum_normal, X)
measure_time(sum_unroll_2, X)           
measure_time(sum_unroll_4, X)       

sum_normal Sum: 49998577.607762985, Time taken: 0.120654 seconds
sum_unroll_2 Sum: 49998577.607762985, Time taken: 0.115361 seconds
sum_unroll_4 Sum: 49998577.607762985, Time taken: 0.111831 seconds


Importance of Numba:
-> It converts python code to machine code. Without numbe, python interpreter dominates runtime to optimization invisible
```
Result we got:
    normal      → slowest
    unroll2     → faster
    unroll4     → fastest
```
Because:
- fewer branch predictions
- better instruction pipeline usage
- compiler can vectorize easier


*Note:*
Manual loop unrolling reduces loop-control overhead and improves instruction-level parallelism. When compiled with Numba, performance improves significantly because the CPU executes multiple arithmetic operations per iteration and reduces branch instructions.