<h1>How to <i style="color:red;">OPTIMIZE</i> Python Code using Cython!</h1>

## <i>by Jack Camier, Python Developer</i>

## Inspiration for this talk came from several sources:

## Pythonprogramming.net
https://pythonprogramming.net/
https://www.youtube.com/watch?v=mXuEoqK4bEc

## - Stefan Behnel, core Cython developer
https://www.youtube.com/watch?v=lxh4lsvEBhI

## - Stefan Behnel, Fast Async Code with Cython and AsyncIO
https://dev.tube/video/-hk6nEQEMLs

## - Cython documentation
https://cython.readthedocs.io/en/latest/src/quickstart/overview.html

<hr>

## <i>Don't confuse this talk with CPython, which is the C written implemention of Python commonly used</i>

<hr>

## One of the major criticisms of developers outside of Python, is that it is too slow compared to C or C++

In [None]:
%%HTML
<img src="turtle_slow.jpg" width="400"/>

## With Cython, that argument is basically <i>null</i> and <i>void</i> because we can take key sections of the code and have it compile as C extensions 
## Also, in most cases, users can't recognize the difference between milliseconds
## Where the difference is most seen is in large mathematical computations (data science, etc.)
## Other options besides Cython are multiprocessing for CPU-bound tasks or Asyncio for I/O bound processes.
https://docs.python.org/2/library/multiprocessing.html

https://docs.python.org/3/library/asyncio.html

# In most cases, we get the best of both worlds in Cython. We get the speed of C with very little refactoring (if needed) and can still write Pythonic code
<h1 style="color:blue;"><i>You can have your cake and eat it too!</i></h1>

In [None]:
%%HTML
<img src="grfld_eating_cake.jpg" width="400"/>

## How does this work?

## Python is a dynamically typed language, meanining you don't have to declare the data type of your variables unlike most statically typed languages such as Java, Scala, C, C++, C#, etc.
## Here is an example of a simple function `multiplyTwoNumbers` written in C
## In each of these situations we need to declare the data types of the variables as well as the return values

## Here is the implementation in Python (dynamically typed):

In [None]:
def multiply_two_numbers (x, y):
    result = x*y
    print(f"The product of {x} multiplied by {y} is: {result}\n")

multiply_two_numbers(10, 20)
multiply_two_numbers(20, 30)
multiply_two_numbers(50, 2)

## It is obviously a lot simpler. No compiling, no declaration of variables, no brackets { }... 
## This is why we love Python!!! 
## We can just focus on writing great code, quickly.
<hr>

## Under the hood in Cython... "the source code gets translated into optimized C/C++ code and compiled as Python extension modules. This allows for both very fast program execution and tight integration with external C libraries, while keeping up the high programmer productivity for which the Python language is well known."
https://cython.readthedocs.io/en/latest/src/quickstart/overview.html

# First steps:
## To be able to run Cython, you need a C compiler:
## gcc for Linux
## MinGW for Windows
## clang - Apple’s XCode for Mac

http://docs.cython.org/en/latest/src/quickstart/install.html

# Step two:
# `pip install Cython`
<hr>

## You can import Cython's Jupyter extension...

In [None]:
%load_ext cython

## ... or use regular imports

In [None]:
import sys
import time
import Cython
import numpy as np
import timeit
import subprocess, os

In [None]:
print("Python %d.%d.%d %s %s" % sys.version_info)
print("Cython %s" % Cython.__version__)
print("Numpy %s" % np.__version__)
print(subprocess.check_output([os.environ.get('CC', 'cc'), "--version"]))

# Demo:
<hr>

In [None]:
# Import Python sine function - return the sine of x radians
from math import sin
sin(7)

## Here we are going to pass to the Cython Compiler the Python code and have it compiled to C, using the magic command

## `%%cython`

In [None]:
%%cython
from math import sin
print(sin(7))

## We can also, assign C functions to Python
##  For example, `libc.math.sin` ( C ) to `sine_func` ( Python ) 

In [None]:
%%cython
cimport libc.math
sine_func = libc.math.sin

## Now call the C function assigned to `sine_func`

In [None]:
sine_func(7) 

## Here is another implementation letting Cython do the C compiling for us. Notice, we are also declaring our data types here.
## A double in C is similar to a Python float but allows for more decimal places - double in precision (15 digits in total vs. seven digits of a C float)

In [None]:
%%cython
cimport libc.math

def square_sin(double x):
    cdef double x_square = x*x
    return libc.math.sin(x_square)

In [None]:
square_sin(7)

## Okay let's see an example of the speed in which we can improve a function in Cython
## Here is a simple Python function

In [None]:
def test_speed(x):
    y = 0
    for i in range(x):
        y += i
    return y

test_speed(5)

## Let's use timeit and do a 100 loops of the above function

In [None]:
python_time = timeit.timeit('test_speed(5)', setup='from __main__ import test_speed', number=100)
print(f"It took {python_time} seconds in Python")

## Okay, how about 1,000,000 iterations...

In [None]:
%%timeit
test_speed(5)

## Let's do this same function using Cython

In [None]:
%%cython
def test_speed_cy(x):
    y = 0
    for i in range(x):
        y += i
    return y

In [None]:
python_time_cy = timeit.timeit('test_speed_cy(5)', setup='from __main__ import test_speed_cy', number=100)
print(f"It took {python_time_cy} seconds with Cython")

## Now, let's use `-o` cli argument to capture results to jupyter magic functions

In [None]:
timeit_result = %timeit -o test_speed(5)
timeit_result_cy = %timeit -o test_speed_cy(5)

In [None]:
times_faster = round(timeit_result.average/timeit_result_cy.average, 2)
print(f"Cython is {times_faster}x faster")

## Use `-a` cli argument to Cython magic function to see how it is compiled. Also, adding type declarations to improve the speed since the PVM doesn't need to iterate to check what data type is each variable as it is interpreted

In [None]:
%%cython -a
def test_speed_cy_decl(int x):
    cdef int i, y
    y = 0
    for i in range(x):
        y += i
    return y

In [None]:
timeit_result_cy_decl = %timeit -o test_speed_cy_decl(5)

In [None]:
times_faster_decl = round(timeit_result.average/timeit_result_cy_decl.average, 2)
print(f"Cython with type declaration is {times_faster_decl}x faster")

## Let's look another example

In [None]:
def fibonacci_py(n):
    a, b = 0, 1
    for _ in range(1, n):
        a, b = b, a + b
    return b

In [None]:
timeit_fib = %timeit -o fibonacci_py(70) 

In [None]:
%%cython
def fibonacci_cy(n):
    cdef int _
    cdef int a = 0, b = 1
    for _ in range(1, n):
        a, b = b, a + b
    return b

In [None]:
timeit_fib_cy = %timeit -o fibonacci_cy(70)

In [None]:
times_faster_fib = round(timeit_fib.average/timeit_fib_cy.average, 2)
print(f"Cython is {times_faster_fib}x faster")

## Wow that is a great speed improvement!
<hr>

## To use Cython on your local machine, here are the following steps:
## 1) Save the file of you source code with the extension .pyx 
### - For example: `fibonacci.pyx`
## 2) Create the setup.py which is like a python Makefile
### - use the following syntax below: 

## To build your Cython file use the commandline argument:
## `python setup.py build_ext --inplace`
## This should create a file in your local directory called your_filename.so in Unix/Mac or your_filename.pyd for Windows
## Also, it will create a file with a .c extension for the compiled file
## Then you can import the module. I have already done these steps, so I will go ahead and import the module.

In [None]:
import fibonacci

## <i>Cython Static Type Declarations:</i>

## `cdef` is used for Cython functions and variables that are intended to be pure ‘C’. All types must be declared.
## `cdef` declared types are not visible to Python code that imports the module since it is compiled to C.

## `cpdef` types combine both `def` and `cdef` by creating two functions; a `cdef` for C types and a `def` for Python types. This creates early bindings (mappings to C) so that `cpdef` functioTo use this to build your Cython file use the commandline options:To use this to build your Cython file use the commandline options:To use this to build your Cython file use the commandline options:ns may be as fast as possible when using C fundamental types (by using cdef). `cpdef` functions use dynamic binding when passed Python objects and this might run much slower, perhaps as slow as `def` declared functions. Also, they add overhead to the processing of the code.
https://notes-on-cython.readthedocs.io/en/latest/function_declarations.html

## I am not a Cython expert, but because of these logic rules, I tend to just use `cdef` as I am trying to achieve speed optimizations when I can.

## <i style="color:red;">Also, just adding `cdef` doesn't necessarily speed up your code every time it is used! So, use it when it makes sense </i>

# Another use case example:

## Optimizing Web Frameworks

In [None]:
import asyncio

## <i>asyncio is a library to write concurrent code using the async/await syntax.</i>
https://docs.python.org/3/library/asyncio.html

In [None]:
async def factorial(name, number):
    f = 1
    for i in range(2, number + 1):
        print(f"Task {name}: Compute factorial({i})...")
        await asyncio.sleep(1)
        f *= i
    print(f"Task {name}: factorial({number}) = {f}")
    
# this code is needed to run in Jupyter notebook    
loop = asyncio.get_event_loop() 

async def main():
    s = time.perf_counter_ns()
    # Schedule three calls *concurrently*:
    await asyncio.gather(
        factorial("A", 2),
        factorial("B", 3),
        factorial("C", 4),
    )
    elapsed_sec = (time.perf_counter_ns() - s)/1000000000
    elapsed_mil = (time.perf_counter_ns() - s)/1000000
    print(f"Async executed in {elapsed_sec:0.2f} seconds or {elapsed_mil:0.2f} ms.")
    
loop.create_task(main())

In [None]:
%%cython
import asyncio
import time

async def factorial(name, int number):
    cdef int f = 1
    cdef int i
    for i in range(2, number + 1):
        print(f"Task {name}: Compute factorial({i})...")
        await asyncio.sleep(1)
        f *= i
    print(f"Task {name}: factorial({number}) = {f}")
    
# this code is needed to run in Jupyter notebook    
loop = asyncio.get_event_loop() 

async def main():
    cdef double s = time.perf_counter_ns()
    # Schedule three calls *concurrently*:
    await asyncio.gather(
        factorial("A", 2),
        factorial("B", 3),
        factorial("C", 4),
    )
    cdef double elapsed_sec = (time.perf_counter_ns() - s)/1000000000
    cdef double elapsed_mil = (time.perf_counter_ns() - s)/1000000
    print(f"Async executed in {elapsed_sec:0.2f} seconds or {elapsed_mil:0.2f} ms.")

loop.create_task(main())

## In this situation I didn't see any improvement since Asyncio is an I/O bound task rather than CPU. Maybe with some more time I could have improved it slightly but it will be limited with the time duration of external tasks. 
## As mentioned earlier, Cython seems best for CPU computation improvements.
## The link for this talk is found at:

https://github.com/jcamier/dfw_cython

## Thank you