# Cython in Jupyter notebooks

To use cython in a Jupyter notebook, the extension has to be loaded.

In [1]:
import cython
%load_ext cython

## Python

To illustrate the performance difference between a pure Python function and a cython implementation, consider a function that computes the list of the first $k_{\rm max}$ prime numbers.

In [2]:
from array import array

In [3]:
def primes(kmax, p=None):
    if p is None:
        p = array('i', [0]*kmax)
    result = []
    k, n = 0, 2
    while k < len(p):
        i = 0
        while i < k and n % p[i] != 0:
            i += 1
        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    return result

Checking the results for the 20 first prime numbers.

In [4]:
primes(20)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

Note that this is not the most efficient method to check whether $k$ is prime.

In [5]:
%timeit primes(1_000)

21.8 ms ± 737 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
p = array('i', [0]*10_000)
%timeit primes(10_000, p)

2.21 s ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Cython

Since release 3.x Cython support two forms of sythax, "classic" and pure Python syntax based on Python type annotations.

### Classic syntax

The cython implementation differs little from that in pure Python, type annotations have been added for the function's argument, and the variables `n`, `k`, `i`, and `p`.  Note that cython expects a constant array size, hence the upper limit on `kmax`.

In [7]:
%%cython
def c_primes(int kmax):
    cdef int n, k, i
    cdef int p[10_000]
    if kmax > 10_000:
        kmax = 10_000
    result = []
    k, n = 0, 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i += 1
        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    return result

Checking the results for the 20 first prime numbers.

In [8]:
c_primes(20)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

In [9]:
%timeit c_primes(1_000)

784 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [10]:
%timeit c_primes(10_000)

74.3 ms ± 505 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


It is clear that the cython implementation is more than 30 times faster than the pure Python implementation.

### Pure Python syntax

Using this syntax, we simply have to annotate the function arguments and local variables.

In [11]:
%%cython
import cython
def cp_primes(kmax: int):
    n: cython.int = 2
    k: cython.int = 0
    i: cython.int
    p: cython.int[10_000]
    if kmax > 10_000:
        kmax = 10_000
    result = []
    k, n = 0, 2
    while k < kmax:
        i: cython.int = 0
        while i < k and n % p[i] != 0:
            i += 1
        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    return result

Checking the results for the 20 first prime numbers.

In [12]:
cp_primes(20)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

In [13]:
%timeit cp_primes(1_000)

877 µs ± 9.93 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [14]:
%timeit cp_primes(10_000)

76.3 ms ± 508 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


There is virtually no perfomance difference between the two forms of syntax.

## Dynamic memory allocation

The cython implementation can be improved by adding dynamic memory allocation for the array `p`.

### Classic syntax

In [15]:
%%cython
from libc.stdlib cimport malloc, free

def c_malloc_primes(int kmax=100):
    cdef int n, k, i
    cdef int *p = <int *> malloc(kmax*sizeof(int))
    result = []
    k, n = 0, 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i += 1
        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    free(p)
    return result

Checking the results for the 20 first prime numbers.

In [16]:
c_malloc_primes(20)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

This has no noticeable impact on performance.

In [17]:
%timeit c_malloc_primes(1_000)

797 µs ± 5.49 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [18]:
%timeit c_malloc_primes(10_000)

76.9 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Pure Python syntax

In [19]:
%%cython
import cython
from cython.cimports.libc.stdlib import malloc, free
def cp_malloc_primes(kmax: int):
    n: cython.int = 2
    k: cython.int = 0
    i: cython.int
    p: cython.p_int = cython.cast(cython.p_int, malloc(kmax*cython.sizeof(cython.int)))
    if kmax > 10_000:
        kmax = 10_000
    result = []
    k, n = 0, 2
    while k < kmax:
        i: cython.int = 0
        while i < k and n % p[i] != 0:
            i += 1
        if i == k:
            p[k] = n
            k += 1
            result.append(n)
        n += 1
    free(p)
    return result

In [20]:
cp_malloc_primes(20)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

This has no noticeable impact on performance.

In [21]:
%timeit cp_malloc_primes(1_000)

880 µs ± 9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [22]:
%timeit cp_malloc_primes(10_000)

76.2 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Although there is no difference in performance, this version is more flexible.