# A long(ish) example showing how to optimize a numerical integration code

In [None]:
import sys
!{sys.executable} -m pip install line_profiler
!{sys.executable} -m pip install memory_profiler

%load_ext line_profiler
%load_ext memory_profiler


Here's some example code which integrates

$cos(x) \times \frac{1}{x} $

from $x = 1$ to $x=1000$.

We will do this by using the simple rectangular method for numerically integrating. https://en.wikipedia.org/wiki/Riemann_sum .... I know you all know the theory behind this, and how to code up such an example, as I taught it to you all in MATLAB last year!!

In [None]:
# NOTE: We write this as a set of functions. Functions are better to isolate different parts of
#       the code and to be able to check each component individually. Using a class and class methods
#       to do this is also fine, and we mix both here to show this ... Might not be the most aesthetic
#       way of doing this, though! However, this does serve as a reminder: Don't write long blocks of code
#       split up into functions!
import numpy
import math

def compute_cosx(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    cosx = numpy.zeros(len(tseries))
    for idx, tval in enumerate(tseries):
        cosx[idx] = math.cos(tseries[idx])
    return cosx

def compute_invx(tseries):
    """
    Computes 1/x for all values in tseries
    """
    invx = numpy.zeros(len(tseries))
    for idx, tval in enumerate(tseries):
        invx[idx] = 1 / tseries[idx]
    return invx

def compute_seriesproduct(series1, series2):
    """
    Multiplies each element in series1 with the corresponding element in series2.
    This returns an array of the multiplied elements.
    """
    # Ensure the two arrays are the same length
    assert(len(series1)==len(series2))
    seriessum = numpy.zeros(len(series1))
    for idx in range(len(series1)):
        seriessum[idx] = series1[idx] * series2[idx]
    return seriessum

def compute_seriessum(series):
    """
    Computes the sum of all values in series
    """
    sumvals = 0
    for idx in range(len(series)):
        sumvals = sumvals + series[idx]
    return sumvals


class Integrator():  
    def generate_integral(self):
        """
        Integral function goes here
        """
        cosx = compute_cosx(self.tseries)
        invx = compute_invx(self.tseries)
        prod = compute_seriesproduct(cosx, invx)
        summed_prod = compute_seriessum(prod)
        return summed_prod * self.delta_t


    def __init__(self, tmin, tmax, delta_t):
        """
        Initializes the class and timeseries
        """
        self.tmin = tmin
        self.tmax = tmax
        self.delta_t = delta_t
        tseries = numpy.arange(self.tmin, self.tmax, self.delta_t)
        # We shift tseries by delta_t / 2 to ensure that we are using the midpoint rule (see wikipedia page)
        tseries = tseries + self.delta_t / 2.
        self.tseries = tseries


def main_function():
    intgr = Integrator(1, 10000, 1./300.)
    return intgr.generate_integral()

print (main_function())

## Starting to understand the speed/optimality of the code

Our first step in understanding the code is to time it. This can be done in the following way:

In [None]:
%timeit main_function()

Timeit will run the code a number of times and take an average. How many times it runs the code depends on how long the code takes (though you can override these values). Here we can see that our code took approximately a second to run. If running this only once then this is of course trivial.

**Important rule of optimising: Don't waste time optimising a block of code, unless it is slowing down your work ... If others are using your code, you must think about how they might use your code though, might it be a problem in the future? There are certainly some things in the code above that are unnecessarily slow, which can be made faster just by writing the code better in the first place**

Let's assume though that we might need to run this code thousands of times (or hundreds of thousands of times). In that case let's see what we can do to make it faster.

## Profiling code

Python and Jupyter notebooks have some neat tools for profiling code. Profiling means measuring how long the code takes to run individual blocks of code. Most profilers measure the fraction of time spent inside each individual *function*. Therefore splitting your code up into a number of different functions can help both in terms of making it easier to read and understand the code, but also in terms of understanding any bottlenecks.

Below we run a built-in profiler on our code above:

In [None]:
%prun -l 10 -q -T prun0 main_function()

print(open('prun0', 'r').read())

The `tottime` entry is the amount of time spent within each function. We can see that the majority of time is spent computing `cos(x)`, `inv(x)` and doing the series product and sum.

To introduce all of our profiling tools in one place, let's also look at line-by-line and memory profiling. We can use the following to run the code to produce line-by-line profiling information

In [None]:
timeseries = numpy.arange(1.0,1000.,0.01)
#compute_cosx(timeseries)

%lprun -T lprof0 -f main_function main_function()

print(open('lprof0', 'r').read())

# And we can also profile the sub-functions, such as compute_cosx

%lprun -T lprof0 -f compute_cosx compute_cosx(timeseries)

print(open('lprof0', 'r').read())

Finally, although we will not use it in this class, we can also profile the *memory usage* of a function in the same way. Unfortunately this only works for functions written in an external file, so we need to dump our code to a file and then run it from the file. Here's an example of that:

In [None]:
%%file mprun_demo.py
import numpy
def invx_demo(tseries):
    """
    Computes 1/x for all values in tseries
    """
    invx = 1 / tseries
    return invx


In [None]:
from mprun_demo import invx_demo
timeseries = numpy.arange(1.0,10000.,0.01)

%mprun -f invx_demo invx_demo(timeseries)

## Using these tools and information to speed things up

Okay, now that we've understood the tools available to us. Let's try to see if we can't improve things. The first place where the code is slow is in the `compute_cosx` function. Here's a major rule in python optimization:

* Avoid for loops wherever possible

In this case rather than using a python for loop to compute `cos(x)` at every index, let's use numpy to compute it at all indexes in one call. Yes, internally it will still need to loop over all values of `x` at some point and compute `cos(x)` for each point, but this will happen deep in some compiled numpy routine. In short

* Use numpy routines on vectors to avoid for loops where possible.

So we can replace our `compute_cosx` function with:

In [None]:
def compute_cosx(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    return numpy.cos(tseries)


Note that running this *after* the block above just replaces this function, so we can just run our profiler again:

In [None]:
%prun -l 10 -q -T prun0 main_function()

print(open('prun0', 'r').read())

Now we can see that quite a bit of time is being spent in the remaining 3 `compute_x` functions. Let's try optimizing these as well by replacing the for loops with numpy vectorized calls

In [None]:
def compute_invx(tseries):
    """
    Computes 1/x for all values in tseries
    """
    return 1. / tseries

def compute_seriesproduct(series1, series2):
    """
    Multiplies each element in series1 with the corresponding element in series2.
    This returns an array of the multiplied elements.
    """
    # Ensure the two arrays are the same length
    assert(len(series1)==len(series2))
    return series1 * series2

def compute_seriessum(series):
    """
    Computes the sum of all values in series
    """
    return series.sum()

%prun -l 10 -q -T prun0 main_function()

print(open('prun0', 'r').read())


The code is now limited be a vectorized computation of cos(x). Not easy to make that much faster! Now with these optimizations let's see how fast our code runs

In [None]:
# Please note: The times shown here, and on the functions above are based on Ian's ARM macbook.
# That is a very different machine to the standard Colab/SciServer virtualmachines you will be running on.
# Expect my macbook to be faster, and to perhaps have different performance gains!!

%timeit main_function()