In this notebook, I'll compare how fast numpy in more complex operations and what are our alternatives. Let's get started

In [32]:
import numpy as np

## Pure numpy

In [33]:
def complex_function(x: np.ndarray):
    """
    A complex function that performs a series of operations on a NumPy array.

    Parameters:
    x (np.ndarray): Input array of complex numbers.

    Returns:
    np.ndarray: Resulting array after applying the operations.
    """
    # Ensure input is a NumPy array
    if not isinstance(x, np.ndarray):
        raise ValueError("Input must be a NumPy array.")

    # Perform some operations
    result = np.exp(np.sin(x) + np.cos(x))
    return result

In [34]:
x = np.arange(1e6)

In [35]:
%%timeit
complex_function(x)

21.9 ms ± 504 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In our base case, with pure numpy we have 3.86s for an array of 10m elements. Let's see how we can vectorize it maybe with numpy

## Numpy vectorize

In [36]:
import math


def vec_func(x: float):
    """
    A vectorized function that computes the sine of a float value.

    Parameters:
    x (float): Input value.

    Returns:
    float: Sine of the input value.
    """
    result = math.sin(x) + math.cos(x)
    result = math.exp(result)  # Exponential of the complex result
    return result


vec_f = np.vectorize(vec_func)

In [37]:
%%timeit
vec_f(x)

223 ms ± 4.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Numba

Let's try something better. Maybe numba will work out well

In [45]:
from numba import vectorize, float64


@vectorize([float64(float64)])
def numba_func(x: float):
    """
    A Numba-optimized function that computes the sine and cosine of an element.

    Parameters:
    x (float): Input value.

    Returns:
    float: Exponential of the sine and cosine of the input value.
    """
    result = np.exp(np.sin(x) + np.cos(x))  # Combine sine and cosine functions
    return result

In [46]:
%%timeit
numba_func(x)

15.8 ms ± 136 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Numba managed to get us a 27% improvement which is pretty impressive for such a low effort 

## Cython

In [21]:
%load_ext cython

In [17]:
%%cython --annotate
import numpy as np
cimport cython

from libc.math cimport sin, cos, exp
from cython.parallel import prange

cimport numpy as cnp

cnp.import_array()
DTYPE = np.float64

ctypedef cnp.float64_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
def cython_complex_function(cnp.ndarray[DTYPE_t, ndim=1] f):
    """
    A Cythonized function that computes the sine and cosine of an array.
    This version is parallelized using OpenMP.

    Parameters:
    f (np.ndarray): Input array of floats.

    Returns:
    np.ndarray: Resulting array after applying the operations.
    """
    cdef double s
    cdef double c
    cdef int i
    cdef int n = f.shape[0]
    # Allocate the array directly with cnp.ndarray
    cdef cnp.ndarray[DTYPE_t, ndim=1] arr = cnp.ndarray(shape=(n,), dtype=DTYPE, order='C')

    # Use prange for parallel execution with OpenMP
    for i in prange(n, nogil=True):
        s, c = sin(f[i]), cos(f[i])
        arr[i] = exp(s + c)

    return arr

Content of stderr:
In file included from /workspace/.venv/lib/python3.12/site-packages/numpy/_core/include/numpy/ndarraytypes.h:1913,
                 from /workspace/.venv/lib/python3.12/site-packages/numpy/_core/include/numpy/ndarrayobject.h:12,
                 from /workspace/.venv/lib/python3.12/site-packages/numpy/_core/include/numpy/arrayobject.h:5,
                 from /home/runtimeuser/.cache/ipython/cython/_cython_magic_94d0289388a6259584cf5bde86324fa6ef820ea4205b6903199dca400708ec47.c:1139:
      |  ^~~~~~~

In [40]:
from cython_func import cython_complex_function

In [41]:
%%timeit
cython_complex_function(x)

16.2 ms ± 242 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Results are pretty much the same as Numba's but the effort was a bit more in order to create the Cython code and compile it

### NumExpr

In [42]:
import numexpr as ne

In [43]:
%%timeit
ne.evaluate("exp(sin(x) + cos(x))", local_dict={"x": x})

1.82 ms ± 17.3 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [44]:
np.all(
    ne.evaluate("exp(sin(x) + cos(x))", local_dict={"x": x})
    == cython_complex_function(x)
)

np.True_