# Cython Basics

In this example, we demonstrate the basic usage of Cython functionalities. Note that the it is not recommended to run cython code in such a fashion. This is simply for convenient purposes. 

In [2]:
%load_ext Cython

# The Cython data type

### Integral types

In [3]:
%%cython

cdef:
    int i = 0
    unsigned long j = 1
    signed short k = -3
    long long ll = 1LL
    bint flag = True #Pay attention to this one.

print(i, j, k, ll, flag)

0 1 -3 1 True


### Floating point types

In [4]:
%%cython

cdef:
    float a = 1.0
    double b = -1.0
    long double c = 1e100
print(a, b, c)

1.0 -1.0 1e+100


### String types

In [5]:
%%cython

cdef str s = "asdf"
cdef bytes b = b"jkl;"
print(s, b)

asdf b'jkl;'


### Other statically-declarable Python types

In [6]:
%%cython

import datetime
cimport cpython.datetime # We'll cover the `cimport` keyword later

import array
cimport cpython.array

cdef:
    list lst = [1]
    dict dd = {'a': 'b'}
    set ss = set([1])
    frozenset fs = frozenset([1])
    cpython.datetime.datetime dt = datetime.datetime.now()
    cpython.array.array aa = array.array('i', [1, 2, 3])
    
print(lst, dd, ss, fs, dt, aa)

[1] {'a': 'b'} {1} frozenset({1}) 2020-12-15 19:34:59.295030 array('i', [1, 2, 3])


## Use C++ Containers

There is virtually no libcpp documentations. Therefore the only way to know what is going on is by going to https://github.com/cython/cython/tree/master/Cython/Includes/libcpp.

In [9]:
%%cython --cplus 
from libcpp.vector cimport vector

cdef vector[int] my_vector
my_vector.push_back(1) # This is rather inefficient. Better call .reserve first. 
my_vector.push_back(2)

for i in my_vector:
    print(i)

1
2


## Cython functions
* Cython supports 3 kinds of functions:
  * Python `def` functions -- compiled Python functions that work with Python types. (Usually) Doesn't lead to any performance improvement. 
  * C-level `cdef` functions -- low-overhead C-level functions that support C-only types. Can only be called from within Cython code. 
  * Hybrid `cpdef` functions -- C-level function with auto-generated Python compatibility wrappers.

### `cdef` functions: C-functions with Python-like syntax

In [11]:
%%cython

cdef untyped(a, b):
    return a + b

print(untyped(1, 2), untyped('a', 'b'))

cdef int typed(double a, double b):
    return <int>(a + b) # Type casting in Cython.

print(typed(1, 2), typed(3.14, 2.72))

3 ab
3 5


### `cdef` variables / functions not visible to Python outside defining scope

In [13]:
print(untyped(1, 2), typed(1, 2)) # This will create an error

NameError: name 'untyped' is not defined

### `cpdef` functions: two functions in one!

In [14]:
%%cython -a
    
# cpdef functions are just like cdef functions with
# an implicitly defined Python wrapper for free.
cpdef int cpdef_func(int y, int z):
    return y + z

# Call directly from other Cython code:
print(cpdef_func(1, 2))

3


## Cython and NumPy, buffers, fused types

We'll be re-implementing the `scipy.stats.poisson.entropy()` function / method using Cython and its NumPy support, and we'll see what sort of speedup we can get for our efforts.

In [19]:
import numpy as np

### Pure Python implementation

In [18]:
def shannon_entropy_py(p_x):
    return - np.sum(p_x * np.log(p_x))

## Cythonized version

In [19]:
%%cython -a

import numpy as np
cimport numpy as cnp

def shannon_entropy_cy(cnp.ndarray p_x):
    return - np.sum(p_x * np.log(p_x))

## What's `cimport`?

Cython introduces a new keyword, `cimport` that allows interfacing with other Cython code _at the C-level_ and _at compile time_.

This is in distinction to Python's regular `import` statement, which interfaces with other _Python_ modules at _runtime_.

The statement `cimport numpy as cnp` allows Cython to interface with NumPy arrays at the C level and at compile time for improved performance.

The `cimport numpy` statement causes the Cython compiler to look for a `numpy.pxd` Cython declaration file at compile time.  This is where we can find the C-level declarations of the NumPy C-API.  Here we're using the `ndarray` object from `numpy.pxd` to declare the argument of `shannon_entropy_cy()`.

## Scipy.stats comparison

In [20]:
from scipy.stats import poisson
poi = poisson(10.0)
n = 100
pmf = poi.pmf(np.arange(n))

In [21]:
print(poi.entropy())
print(shannon_entropy_py(pmf))
print(shannon_entropy_cy(pmf))

2.5614099352749125
2.5614099352749125
2.5614099352749125


In [22]:
%%timeit
poi.entropy()

960 µs ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [23]:
%%timeit
shannon_entropy_py(pmf)

6.29 µs ± 36 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [24]:
%%timeit
shannon_entropy_cy(pmf)

6.23 µs ± 66.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [25]:
%%cython -a

cimport numpy as cnp
from libc.math cimport log as clog

cpdef shannon_entropy_v1(cnp.ndarray p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * clog(p_x[i])
    return -res

In [26]:
%%timeit
shannon_entropy_v1(pmf)

32.2 µs ± 361 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [27]:
%%cython -a

cimport numpy as cnp
from libc.math cimport log as clog

cpdef shannon_entropy_v2(cnp.ndarray[double] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * clog(p_x[i])
    return -res

In [28]:
%%timeit
shannon_entropy_v2(pmf)

1.84 µs ± 30.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## The `cnp.ndarray[double]` syntax

The `cnp.ndarray[double]` declares a NumPy array _buffer_ object.  Cython knows how to interact with this array-like object efficiently.  The `double` in square brackets is the (scalar) dtype of the array elements.

In [29]:
%%cython -a

cimport cython
cimport numpy as cnp
from libc.math cimport log

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef shannon_entropy_v3(cnp.ndarray[double] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * log(p_x[i])
    return -res

In [30]:
%%timeit
shannon_entropy_v3(pmf)

1.91 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## The `cython` cimported magic module

Cython allows us to control its compile-time semantics and behavior via the magic `cython` module that we `cimport`.

We can then use Cython directives like:

```python
@cython.boundscheck(False)
@cython.wraparound(False)
def func(...):
  ...
```

To control how code is generated.

In this case, we're telling cython to not generate code that does boundschecking or wraparound checking (negative indexing).

## Typed memoryview syntax

In [36]:
%%cython -a

cimport cython
from libc.math cimport log

@cython.boundscheck(False)
@cython.wraparound(False)
def shannon_entropy_mv(double[::1] p_x):
    cdef double res = 0.0
    cdef int n = p_x.shape[0]
    cdef int i
    for i in range(n):
        res += p_x[i] * log(p_x[i])
    return -res

In [37]:
%%timeit
shannon_entropy_mv(pmf)

1.18 µs ± 7.55 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


### A big BUT

However, not all numpy array can be passed from/to cython without any trouble. The storage of a normal numpy array is NOT necessarily contiguous. Therefore we often need the following copy operation. 

Note that the following code will not compile since we need to 

In [33]:
%%cython 
import numpy as np
cimport numpy as np

matrix = np.random.randn(10000, 2)

cdef np.ndarray[double, ndim=2, mode='fortran'] arg = np.asfortranarray(matrix, dtype=np.float64)


Error compiling Cython file:
------------------------------------------------------------
...
import numpy as np
# cimport numpy as np

matrix = np.random.randn(10000, 2)

cdef np.ndarray[double, ndim=2, mode='fortran'] arg = np.asfortranarray(matrix, dtype=np.float64)
    ^
------------------------------------------------------------

/home/user/.cache/ipython/cython/_cython_magic_75cadbb9976b9f5487cb28715a333baa.pyx:6:5: 'np' is not a cimported module


TypeError: object of type 'NoneType' has no len()