# Some code optimization exercises - Removing for loops!

The following cells contain examples of python code that are written using explicit for loops. Time these functions (using `timeit`) and rewrite the for loop using numpy calls. After rewriting, time the functions again.

## EXERCISE 1.1

Investigate the time taken to compute the two functions below. Which is faster?
Why do you think this is?
Numpy is really inefficient when running on scalar values.

Then, please rewrite this code to be more optimal! What speed up do you gain?



In [None]:
import numpy, math

def compute_sin_tseries():
    tseries = numpy.arange(1, 10000, 1./100.)
    sin_tseries = numpy.zeros(len(tseries))
    for i in range(len(timeseries)):
        sin_tseries[i] = numpy.sin(timeseries[i])
    return sin_tseries

def compute_sin_tseries2():
    tseries = numpy.arange(1, 10000, 1./100.)
    sin_tseries = numpy.zeros(len(tseries))
    for i in range(len(timeseries)):
        sin_tseries[i] = math.sin(timeseries[i])
    return sin_tseries

## EXERCISE 1.2

As with exercise 1.1, please optimize this code. Quantify how much faster you can make it. Remember to ensure that the output doesn't change after optimizing!!

In [None]:
import math, numpy

def compute_exp_tseries():
    tseries = numpy.arange(1, 10000, 1./100.)
    exp_tseries = numpy.zeros(len(tseries))
    for i in range(len(tseries)):
        exp_tseries[i] = math.e ** tseries[i]
    return exp_tseries

# EXERCISE 1.3

As before: optimize and show how much faster you can make this!

Note that there are two for loops here. It is possible to collapse into one single vectorized call, but simply removing one of the for loops will make the code both readable and reasonably optimized. Feel free to try this for yourself, but it is *not at all* trivial

In [None]:
import numpy as np

def sum_2d_array_to_1d(input_data):
    """
    This function takes as input a 2-dimensional array, for example:
    [[0,1,2,3],
     [2,2,2,2],
     [1,1,1,1],
     [10,11,12,13]]
    
    It should sum over each of the *rows* in turn and return the sum of each as a new array
    
    [6, 8, 4, 46]
    """
    output = np.zeros(len(input_data), dtype=float)
    for i in range(len(input_data)):
        # Looping over each row
        current_sum = 0
        for j in range(len(input_data[i])):
            # Looping over each element in each row, e.g. 10->11->12->13 if the bottom row
            current_sum += input_data[i][j]
        output[i] = current_sum
    return output

# Short example to test that it works
input_data = numpy.array([[0,1,2,3],
                          [2,2,2,2],
                          [1,1,1,1],
                          [10,11,12,13]])

print(sum_2d_array_to_1d(input_data))

# *This* is the large example to use timeit on and to make fast to run
input_data = numpy.random.random(size=[2000,2000])
print(sum_2d_array_to_1d(input_data))

# EXERCISE 1.4

This example performs a cross-correlation (hmm, now where have we seen that before? Does this feel like something that would be **very, very important** for the coursework??. Of course the unoptimized code below would work .... I just think it would take about 2 hours *per run* on the coursework data.


In [None]:
def compute_cross_correlation():
    signal = numpy.random.random(1024)
    data = numpy.random.random(1024*10)
    cross_correlation = []
    for i in range(len(data) - len(signal)):
        curr_cross_corr = 0
        for j in range(len(signal)):
            curr_cross_corr += signal[j] * data[i+j]
        cross_correlation.append(curr_cross_corr)
