## Cython 

### Cython functions

- **def** functions can be called from Python and Cython.
- **cdef** functins can be called from Cython and C.
- **cpdef** functions cause Cython o generate a cdef function and def function.

In [1]:
%load_ext Cython

In [2]:
%%cython
def hello_def():
    print('Hello def')

# this is not going to work
cdef hello_cdef():
    print('Hello cdef')
    
cpdef hello_cpdef():
    print('Hello cpdef')

Content of stderr:
 static PyObject *__pyx_f_54_cython_magic_7520a14d59bd0a6c205fcff9794a567c650d210d_hello_cdef(void) {
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In [3]:
hello_def()

Hello def


In [4]:
hello_cdef()

NameError: name 'hello_cdef' is not defined

In [5]:
hello_cpdef()

Hello cpdef


In [9]:
%%cython

# we need a wrapper to use C stuff
cdef hello_cdef():
    print('Hello cdef')
    
#def call_hello_cdef():
#   hello_cdef()

def wrapper_call_hello_cdef():
    hello_cdef()
    
hello_cdef()

In [10]:
hello_cdef()

NameError: name 'hello_cdef' is not defined

In [None]:
wrapper_call_hello_cdef()

**What advantages are offered by cdef function?**
- `cdef` functions can take any type of argument (including non-python such as pointers).
- They can also specify a return type.
- They are quicker to call than `def` because they translate directly to C.

**Why Would I need a `cdef` function?**
- To pass a non-Python types in or out
- To pass it to C as a function pointer
- For speedup if its called often (so you don't call it from Python)

Example of how fast are the different function types:

https://notes-on-cython.readthedocs.io/en/latest/fibo_speed.html

### Type definitions

Using the **cdef** statement.

Cython types:
https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#types

In [11]:
%%cython
cdef int a_global_variable

def func():
    cdef int i, j, k
    cdef float f
    cdef float[42] g
    cdef float *h
    # cdef float f, g[42], *h  # mix of pointers, arrays and values in a single line is deprecated

    i = j = 5

In [12]:
func()

### Typed Memoryviews

Typed memoryviews allow efficient access to memory buffers, such as those underlying NumPy arrays, without incurring any Python overhead.

Specially useful if you are working with NumPy arrays.

In [34]:
%%cython

import numpy as np

# Memoryview on a NumPy array
np_array = np.arange(16, dtype=np.dtype("i")).reshape((4, 4))
cdef int [:, :] np_array_mv = np_array
print(np_array)
print(np_array_mv)

# # Memoryview on a C array
cdef int[4][4] c_array
cdef int [:, :] c_array_mv = c_array
print(c_array)
print(c_array_mv)

# Copy the values from one memoryview into another (numpy-style)
c_array_mv[...] = np_array_mv

# cython decorators can only be followed by funciton definitions
# @cython.boundscheck(False)  # Deactivate bounds checking

# A function using a memoryview does not usually need the GIL
cpdef int sum2d(int[:, :] arr) nogil:
    cdef size_t i, j, I, J
    cdef int total = 0
    I = arr.shape[0]
    J = arr.shape[1]
    for i in range(I):
        for j in range(J):
            total += arr[i, j]
    return total

print(f"Memoryview sum of NumPy array is {sum2d(np_array)}")
print(f"Memoryview sum of C array is {sum2d(c_array)}")
print(f"Memoryview sum of NumPy array is {sum2d(np_array_mv)}")
print(f"Memoryview sum of C array is {sum2d(c_array_mv)}")

### Using  parallelism with Cython

```python
cython.parallel.prange([start,] stop[, step][, nogil=False][, schedule=None[, chunksize=None]][, num_threads=None])
```

- start – The index indicating the start of the loop.
- stop – The index indicating when to stop the loop.
- step – An integer giving the step of the sequence. It must not be 0.
- nogil – This function can only be used with the GIL released. If nogil is true, the loop will be wrapped in a nogil section.
- schedule - The schedule is passed to OpenMP and can be one of the following: static, dynamic, guided, runtime.
- num_threads - indicates how many threads the team should consist of. If not given, OpenMP will decide how many threads to use.
- chunksize - indicates the chunksize to be used for dividing the iterations among threads. This is only valid for static, dynamic and guided scheduling, and is optional

**Parallel reduction basic example**

In [14]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
from cython.parallel import prange

cdef int i
cdef int n = 30
cdef int total = 0

for i in prange(n, nogil=True):
    total += i

print(sum)

<built-in function sum>


**Parallel processing basic example**

In [15]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
from cython.parallel import prange
import numpy as np
cimport numpy as np



cdef test_cython(int n):
    cdef int i
    arr = np.array([x for x in range(n)], dtype=np.float32)
    print(arr)
    cdef float[:] arr_mv = arr
    result = np.zeros((n,), dtype=np.float32)
    cdef float[:] result_mv = result
    for i in prange(, nogil=True):
        result_mv[i] = arr_mv[i] * arr_mv[i]
    return result

cdef int n = 10

print(test_cython(n))


Error compiling Cython file:
------------------------------------------------------------
...
    arr = np.array([x for x in range(n)], dtype=np.float32)
    print(arr)
    cdef float[:] arr_mv = arr
    result = np.zeros((n,), dtype=np.float32)
    cdef float[:] result_mv = result
    for i in prange(, nogil=True):
                    ^
------------------------------------------------------------

/mnt/irisgpfs/users/padhav/.cache/ipython/cython/_cython_magic_a28648a1cbdea18360920e164874d34263c758f0.pyx:14:20: Expected an identifier or literal


In [23]:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
from cython.parallel import prange
import numpy as np
cimport numpy as np


cdef test_cython_cdef(n):
    cdef int N = n
    cdef int i
    arr = np.array([x for x in range(n)], dtype=np.float32)
    cdef float[:] arr_mv = arr
    result = np.zeros((n,), dtype=np.float32)
    cdef float[:] result_mv = result
    for i in prange(N, nogil=True):
        result_mv[i] = arr_mv[i] * arr_mv[i]
    return result

cdef int n = 1000000000

import time
start_time = time.time()
test_cython_cdef(n)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Elapsed time: {elapsed_time} seconds")



Content of stderr:
In file included from /mnt/irisgpfs/projects/precice_coupling/1_runCases/15_ULHPC_PythonSchool/python-school/jupyter_env/lib64/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822,
                 from /mnt/irisgpfs/projects/precice_coupling/1_runCases/15_ULHPC_PythonSchool/python-school/jupyter_env/lib64/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /mnt/irisgpfs/projects/precice_coupling/1_runCases/15_ULHPC_PythonSchool/python-school/jupyter_env/lib64/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from /mnt/irisgpfs/users/padhav/.cache/ipython/cython/_cython_magic_e7a9117eeedd583ac293a03f460a89df43621701.c:1228:
  ^~~~~~~Elapsed time: 87.28633642196655 seconds
