# Numba vs Cython for Implementing CMR Operations
> A basic comparison of Cython and Numba for speeding up operations central to simulation of the Context Maintenance and Retrieval (CMR) model

An central selling point of `cymr` as an implementation of the Context Maintenance and Retrieval (CMR) model is that it uses Cython to accelerate model execution in comparison to a base Python implementation.

However, there are some downsides to using Cython. The main downside is that Cython requires knowledge Cython-specific syntax and concepts to understand and use effectively. 
This can be a barrier for contributors and users who are not familiar with concepts associated with low-level programming languages or otherwise prefer to develop in regular Python.
With libraries designed for model-based research, users are frequently interested in modifying the code to test new ideas or to adapt the model to their own research questions.
In this case, the barrier to entry for Cython can be a significant drawback.

Still, given the computational demands of fitting a model like CMR to data, the speedup provided by Cython is significant enough that it is worth it to use Cython for the core model code.
In this notebook, we provide evidence that the speedup provided by Cython for key operations in `cymr` is substantial.
But we also show that these gains can be obtained using a different approach: using Numba to compile applicable functions "just in time".
Numba translates Python functions to optimized machine code at runtime.
Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.
The key advantage of Numba is that it does not require any special syntax or concepts to use, or require any extra compilation steps.
Instead, as long as a function sticks to the subset of Python and NumPy that Numba supports, you can just apply one of the Numba decorators to your Python function, and Numba does the rest.
This might provide a more accessible alternative to Cython for providing fast implementations of model code while keeping a codebase accessible to users without familiarity with Cython.
Even for experienced Cython users, Numba might still provide an alternative approach to speeding up code that can be easier to use and maintain.

In this demonstration, I focus on the the function `integrate_context` in `cymr`'s `operations.pyx`.
It uses an additional hidden function `calc_rho` to integrate contextual input into a context vector.
For each method, I perform a direct translation of the Cython code to regular Python.
To confirm that these methods are faster in Cython than in Python, I compare the Cython implementations to a direct translation of the Cython code to Python.
Next, to show how Numba can be used to provide a fast implementation of the same method, I apply the `@njit` decorator from Numba to the Python translation of the Cython code.
Finally, I compare the speed of the Numba-compiled Python code to the Cython implementation.
This speed comparison shows that the Numba-compiled Python code is even substantially faster than the Cython implementation, and that both are several times than the base Python implementation.

A fuller comparison might convert the entire set of operations defined in `operations.pyx` to Numba-compiled Python code and then run corresponding tests defined already in the `cymr` test suite.
Perhaps within more complex functions distributed across multiple files, the speedup provided by Cython might turn out to be more substantial -- though I have no particular reason to expect this.
However, these initial comparisons provide a proof of concept that Numba is a viable alternative to Cython for speeding up functions central to CMR that might motivate further exploration.
I also hopefully show that these Numba makes fewer tradeoffs in terms of accessibility and ease of use than Cython.

## `integrate_context`
The function `integrate_context` in `operations.pyx` uses a hidden function `calc_rho` to integrate contextual input into a context vector.
We import the module for experiments here, but also copy the underlying code here for reference.

In [1]:
from operations import integrate_context as cython_integrate_context

### Implementation Using Cython

```cython
@cython.profile(False)
cdef inline double calc_rho(double cdot, double B):
    """
    Calculate context integration scaling factor.
    
    Parameters
    ----------
    cdot
        Dot product between :math:`c` and :math:`c^{IN}`.
    
    B
        Beta parameter weighting :math:`c^{IN}`.
    
    Returns
    -------
    rho
        Scaling factor for :math:`c`.   
    """
    rho = sqrt(1 + (B * B) * ((cdot * cdot) - 1)) - (B * cdot)
    return rho


cpdef integrate_context(double [:] c, double [:] c_in, double B, int [:] c_ind):
    """
    Integrate context input.
    
    Parameters
    ----------
    c
        Context state :math:`c`.
    
    c_in
        Input to context :math:`c^{IN}`
    
    B
        :math:`\beta` parameter weighting :math:`c`.
    
    c_ind
        Start and end indices of context to update.
    """
    cdef double cdot = 0
    cdef int i
    for i in range(c_ind[0], c_ind[1]):
        cdot += c[i] * c_in[i]
    rho = calc_rho(cdot, B)

    for i in range(c_ind[0], c_ind[1]):
        c[i] = rho * c[i] + B * c_in[i]
```

### Implementation Using Base Numpy
Unlike in Cython, type annotations here are optional when it comes to getting a function going -- both here, and even once we start using Numba.
Type annotation is useful anyway, but in this case we exclude them to show off how little is required to prototype a function that runs as fast as the Cython version.

In [2]:
import numpy as np

def calc_rho(cdot, B):
    rho = np.sqrt(1 + (B * B) * ((cdot * cdot) - 1)) - (B * cdot)
    return rho

def integrate_context(c, c_in, B, c_ind):
    cdot = 0
    
    for i in range(c_ind[0], c_ind[1]):
        cdot += c[i] * c_in[i]
    rho = calc_rho(cdot, B)

    for i in range(c_ind[0], c_ind[1]):
        c[i] = rho * c[i] + B * c_in[i]

### Adding Numba Compilation

In [3]:
from numba import njit

@njit
def numba_calc_rho(cdot, B):
    rho = np.sqrt(1 + (B * B) * ((cdot * cdot) - 1)) - (B * cdot)
    return rho

@njit
def numba_integrate_context(c, c_in, B, c_ind):
    cdot = 0
    
    for i in range(c_ind[0], c_ind[1]):
        cdot += c[i] * c_in[i]
    rho = numba_calc_rho(cdot, B) # you can't mix numba with base python, so we use the numba version of calc_rho

    for i in range(c_ind[0], c_ind[1]):
        c[i] = rho * c[i] + B * c_in[i]

### Speed Comparison
We run each function to confirm they work before our speed test, but also because the first run of a JIT-compiled function is necessarily slower than subsequent runs because of the compilation step.
The cost of compilation step is negligible when the function is called many times, but it can be significant when the function is only called once.
Even in this case, compilation results can be cached to ensure this cost is only paid once, similar to how Cython compiles code just once.
We don't do any demonstration of caching here, though.
We use the `timeit` module to time each function call, allowing the module to configure the number of loops and repetitions to get a good estimate of the time required to run each function.

In [4]:
c1 = np.array([0.0, 0.09128709, 0.18257419, 0.27386128, 0.36514837, 0.8660254])
c2 = np.array([0.15655607, 0.24875946, 0.34096284, 0.43316622, 0.52536961, 0.57767384])
B = 0.5
c_ind = np.array([0, 6], dtype=np.int32)

integrate_context(c1, c2, B, c_ind)
cython_integrate_context(c1, c2, B, c_ind)
numba_integrate_context(c1, c2, B, c_ind)

In [5]:
%timeit integrate_context(c1, c2, B, c_ind)

5.92 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [6]:
%timeit cython_integrate_context(c1, c2, B, c_ind)

1.02 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [7]:
%timeit numba_integrate_context(c1, c2, B, c_ind)

564 ns ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


This speed comparison shows that the Numba-compiled Python code is even significantly faster than the Cython implementation, and that both are several times than the base Python implementation.
Outcomes like these are why I think Numba is worth considering as an alternative to Cython for speeding up most of the code we use to do much of our research.