# Python Example for Loop Unrolling

Performing the same task in Python that we did in C using Cache-Aware Optimization.

> <b>Note:</b> This example is not meant to discourage Python programming! Python is robust and quite useful, which is why most CS majors enjoy it more. We will do Virtual Memory-Aware Programming later this semester so if you prefer Python, you can still apply your Computer Architecture knowledge to improve your Python code. This example is just meant to show that loop unrolling in Python provides significant benefits.

In [1]:
def func( count, value ):
    return count + value

In [2]:
def no_opt( array_size, the_array ):
    
    sum_val = 0
    
    for count in range(0, 5):
        
        for idx in range(0, array_size):
            
            the_array[idx] = func( count, the_array[idx] )
            sum_val += the_array[idx]

In [3]:
def reg_opt( array_size, the_array ):
    
    sum_val = 0
    
    for idx in range(0, array_size):
        
        arr_idx = the_array[idx]
        
        for count in range(0, 5):
            
            arr_idx = func( count, arr_idx )
            sum_val += arr_idx
            
        the_array[idx] = arr_idx

In [4]:
def unroll_opt( array_size, the_array ):
    
    sum_val = 0
    
    for idx in range(0, array_size):
        
        arr_idx = the_array[idx]
        
        arr_idx = func( 0, arr_idx )
        arr_idx = func( 1, arr_idx )
        arr_idx = func( 2, arr_idx )
        arr_idx = func( 3, arr_idx )
        arr_idx = func( 4, arr_idx )
        
        the_array[idx] = arr_idx

In [5]:
def inline_opt( array_size, the_array ):
    
    sum_val = 0
    
    for idx in range(0, array_size):
        
        arr_idx = the_array[idx]
        
        arr_idx = 0 + arr_idx
        arr_idx = 1 + arr_idx
        arr_idx = 2 + arr_idx
        arr_idx = 3 + arr_idx
        arr_idx = 4 + arr_idx
        
        the_array[idx] = arr_idx

In [6]:
def test_opt( array_test_size ):
    
    the_array = [0] * array_test_size

    print("No Opt")
    %timeit -r1 no_opt( array_test_size, the_array )
    
    print("Reg Opt")
    %timeit -r1 reg_opt( array_test_size, the_array )
    
    print("Unroll Opt")
    %timeit -r1 unroll_opt( array_test_size, the_array )
    
    print("Inline Opt")
    %timeit -r1 inline_opt( array_test_size, the_array )

In [7]:
test_size = 1024
test_opt( test_size )

No Opt
478 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
Reg Opt
471 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
Unroll Opt
268 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
Inline Opt
124 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 10000 loops each)


In [8]:
test_size = 2048
test_opt( test_size )

No Opt
976 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
Reg Opt
963 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
Unroll Opt
543 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
Inline Opt
246 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)


In [9]:
test_size = 16384
test_opt( test_size )

No Opt
7.94 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
Reg Opt
7.58 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
Unroll Opt
4.23 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)
Inline Opt
1.97 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1000 loops each)
