# A longer Cython example

Let's have a look at a longer example. Again, as we used in the optimization class. Here's an example code which integrates

$cos(x) \times \frac{1}{x} $

from $x = 1$ to $x=1000$.

We will do this by using the simple rectangular method for numerically integrating. https://en.wikipedia.org/wiki/Riemann_sum

In [1]:
%load_ext cython

In [2]:
import numpy
import math

def compute_cosx(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    cosx = numpy.zeros(len(tseries))
    for idx, tval in enumerate(tseries):
        cosx[idx] = math.cos(tseries[idx])
    return cosx

def compute_invx(tseries):
    """
    Computes 1/x for all values in tseries
    """
    invx = numpy.zeros(len(tseries))
    for idx, tval in enumerate(tseries):
        invx[idx] = 1 / tseries[idx]
    return invx

def compute_seriesproduct(series1, series2):
    """
    Multiplies each element in series1 with the corresponding element in series2.
    This returns an array of the multiplied elements.
    """
    # Ensure the two arrays are the same length
    assert(len(series1)==len(series2))
    seriessum = numpy.zeros(len(series1))
    for idx in range(len(series1)):
        seriessum[idx] = series1[idx] * series2[idx]
    return seriessum

def compute_seriessum(series):
    """
    Computes the sum of all values in series
    """
    sumvals = 0
    for idx in range(len(series)):
        sumvals = sumvals + series[idx]
    return sumvals


class Integrator():
    def generate_integral(self):
        """
        Integral function goes here
        """
        cosx = compute_cosx(self.tseries)
        invx = compute_invx(self.tseries)
        prod = compute_seriesproduct(cosx, invx)
        summed_prod = compute_seriessum(prod)
        return summed_prod * self.delta_t


    def __init__(self, tmin, tmax, delta_t):
        """
        Initializes the class and timeseries
        """
        self.tmin = tmin
        self.tmax = tmax
        self.delta_t = delta_t
        tseries = numpy.arange(self.tmin, self.tmax, self.delta_t)
        # We shift tseries by delta_t / 2 to ensure that we are using the midpoint rule (see wikipedia page)
        tseries = tseries + self.delta_t / 2.
        self.tseries = tseries


def main_function():
    intgr = Integrator(1, 1000, 1./300.)
    return intgr.generate_integral()

print (main_function())

-0.33657824671576514


In [3]:
%timeit main_function()

106 ms ± 768 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


From before we already know that `compute_cosx`, `compute_invx`, `compute_seriesproduct` and `compute_seriessum` are the slow parts of this function. Let's rewrite these in cython using rules from part 1:

In [4]:
%%cython -a
import math
import numpy

from libc.math cimport cos # This imports c's sin function from the math library
from cython import wraparound, boundscheck, cdivision

@boundscheck(False)
@wraparound(False)
@cdivision(True)
def compute_cosx_cython(double [::1] timeseries):
    """
    Computes cos(t) for all values in tseries
    """
    cdef int n = timeseries.size # How many values in the timeseries
    cdef int idx
    cdef double[::1] cosx = numpy.zeros(n) # Create an array to store the cos(x) values
    for idx in range(n):
        cosx[idx] = cos(timeseries[idx])
    return cosx

@boundscheck(False)
@wraparound(False)
@cdivision(True)
def compute_invx_cython(double [::1] timeseries):
    """
    Computes 1/x for all values in tseries
    """
    cdef int idx
    cdef int n = timeseries.size # How many values in the timeseries
    cdef double[::1] invx = numpy.zeros(n) # Create an array to store the 1/x values
    for idx in range(n):
        invx[idx] = 1. / timeseries[idx]
    return invx

@boundscheck(False)
@wraparound(False)
@cdivision(True)
def compute_seriesproduct_cython(double [::1] series1, double [::1] series2):
    """
    Multiplies each element in series1 with the corresponding element in series2.
    This returns an array of the multiplied elements.
    """
    cdef int idx
    cdef int n = series1.size # How many values in the timeseries
    cdef double[::1] seriesprod = numpy.zeros(n)
    for idx in range(n):
        seriesprod[idx] = series1[idx] * series2[idx]
    return seriesprod

@boundscheck(False)
@wraparound(False)
@cdivision(True)
def compute_seriessum_cython(double [::1] series):
    """
    Computes the sum of all values in series
    """
    cdef int idx
    cdef int n = series.size
    cdef double sumvals = 0.
    for idx in range(n):
        sumvals += series[idx]
    return sumvals



And let's also define the optimized numpy functions we used in lecture 09b:

In [5]:
import numpy
def compute_cosx_numpy(tseries):
    """
    Computes cos(t) for all values in tseries
    """
    return numpy.cos(tseries)

def compute_invx_numpy(tseries):
    """
    Computes 1/x for all values in tseries
    """
    return 1. / tseries

def compute_seriesproduct_numpy(series1, series2):
    """
    Multiplies each element in series1 with the corresponding element in series2.
    This returns an array of the multiplied elements.
    """
    # Ensure the two arrays are the same length
    assert(len(series1)==len(series2))
    return series1 * series2

def compute_seriessum_numpy(series):
    """
    Computes the sum of all values in series
    """
    return series.sum()


And then define a class that can integrate *either* using the Cython functions, or the earlier Python ones.

In [27]:
class Integrator():
    def generate_integral(self, method='python'):
        """
        Integral function goes here
        """
        if method=='cython':
            cosx = compute_cosx_cython(self.tseries)
            invx = compute_invx_cython(self.tseries)
            prod = compute_seriesproduct_cython(cosx, invx)
            summed_prod = compute_seriessum_cython(prod)
        elif method=='python':
            cosx = compute_cosx(self.tseries)
            invx = compute_invx(self.tseries)
            prod = compute_seriesproduct(cosx, invx)
            summed_prod = compute_seriessum(prod)
        elif method=='numpy':
            cosx = compute_cosx_numpy(self.tseries)
            invx = compute_invx_numpy(self.tseries)
            prod = compute_seriesproduct_numpy(cosx, invx)
            summed_prod = compute_seriessum_numpy(prod)
        else:
            raise ValueError(f"Don't understand method {method}")
        return summed_prod * self.delta_t


    def __init__(self, tmin, tmax, delta_t):
        """
        Initializes the class and timeseries
        """
        self.tmin = tmin
        self.tmax = tmax
        self.delta_t = delta_t
        tseries = numpy.arange(self.tmin, self.tmax, self.delta_t)
        # We shift tseries by delta_t / 2 to ensure that we are using the midpoint rule (see wikipedia page)
        tseries = tseries + self.delta_t / 2.
        self.tseries = tseries


def main_function(method='cython'):
    intgr = Integrator(1, 1000, 1./30000.)
    return intgr.generate_integral(method=method)

print (main_function())

-0.33657760745388066


Then we compare runtimes, and profile the codes, using methods from the previous optimization notebook.

In [7]:
%timeit main_function(method='cython')
%prun -l 10 -q -T prun0 main_function(method='cython')
print(open('prun0', 'r').read())


157 ms ± 4.98 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
 
*** Profile printout saved to text file 'prun0'.
         11 function calls in 0.151 seconds

   Ordered by: internal time
   List reduced from 11 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.069    0.069    0.069    0.069 {_cython_magic_c399608e6c3b85605bd850f6c85e6800.compute_cosx_cython}
        1    0.017    0.017    0.030    0.030 1225634365.py:26(__init__)
        1    0.015    0.015    0.015    0.015 {_cython_magic_c399608e6c3b85605bd850f6c85e6800.compute_seriesproduct_cython}
        1    0.014    0.014    0.014    0.014 {_cython_magic_c399608e6c3b85605bd850f6c85e6800.compute_seriessum_cython}
        1    0.013    0.013    0.013    0.013 {_cython_magic_c399608e6c3b85605bd850f6c85e6800.compute_invx_cython}
        1    0.013    0.013    0.013    0.013 {built-in method numpy.arange}
        1    0.008    0.008    0.149    0.149 12256343

In [22]:
%timeit main_function(method='python')
%prun -l 10 -q -T prun0 main_function(method='python')
print(open('prun0', 'r').read())


KeyboardInterrupt: 

In [28]:
%timeit main_function(method='numpy')
%prun -l 10 -q -T prun0 main_function(method='numpy')
print(open('prun0', 'r').read())


150 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 
*** Profile printout saved to text file 'prun0'.
         16 function calls in 0.149 seconds

   Ordered by: internal time
   List reduced from 15 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.072    0.072    0.072    0.072 2202265963.py:2(compute_cosx_numpy)
        1    0.017    0.017    0.017    0.017 2202265963.py:14(compute_seriesproduct_numpy)
        1    0.017    0.017    0.031    0.031 1225634365.py:26(__init__)
        1    0.015    0.015    0.015    0.015 2202265963.py:8(compute_invx_numpy)
        1    0.014    0.014    0.014    0.014 {built-in method numpy.arange}
        1    0.008    0.008    0.146    0.146 1225634365.py:39(main_function)
        1    0.004    0.004    0.004    0.004 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.003    0.003    0.149    0.149 <string>:1(<module>)
        1    0.000    0.000    0.149    

We now have a code that is actually *faster* than our numpy optimized code from before. Nevertheless, given the added complexity of writing this, and the fact the speed differential will be less noticeable for larger arrays, in most cases the numpy code is more than good enough.

To remind again:

 * The use-case of Cython is primarily to optimize code for which there is no numpy optimized version.
 * Writing (fast) Cython code does require more effort (and more Googling) than writing python code. However, a moderate speed increase can sometimes be acheived without this.
 * Our Cython magic function is coverting our code written in Cython into C and then compiling it: It is possible to write the C-code directly, but that is more effort. There is the possibility to approach this from the other side and write pure C or C++ code, and then use cython to directly with the C code (this can even be possible with fortran). This can be used if you have a pre-existing C-code, or library, that you want to use in python.