# Speeding

Cython is a Python-like language that:
Improves Python’s performance – 1000x speedups not uncommon
wraps external code: C, C++, Fortran, others...
 
The cython command:
 generates an optimized C or C++ source file from a Cython source file,
 the C/C++ source is then compiled into a Python extension module.

Other features:
built-in support for NumPy,
integrates with IPython,
Combine C’s performance with Python’s ease of use.
http://www.cython.org/

Cython is both a language (a superset of Python) and a Python library. 

With Cython, we start from a regular Python program and we add annotations about the type of the variables. Then, Cython translates that code to C and compiles the result to a Python extension module. Finally, we can use this compiled module in any Python program.

While dynamic typing comes with a performance cost in Python, statically-typed variables in Cython generally lead to faster code execution.

Performance gains are most signi cant in CPU-bound programs, notably in tight Python loops. By contrast, I/O-bound programs are not expected to bene t much from a Cython implementation.


In [None]:
def fib(n):
    a,b = 1,1
    for i in range(n):
        a, b = a+b, a
    return a

In [None]:
fib

In [None]:
%load_ext cython

In [None]:
%%cython
def cfib(int n):
    cdef int i, a, b
    a,b = 1,1
    for i in range(n):
        a, b = a+b, a
    return a

```python
def cfib(int n):
    cdef int i, a, b
    a,b = 1,1
    for i in range(n):
        a, b = a+b, a
    return a
```

* We added `int` to argument

* We used `cdef`

In [None]:
%%cython
make an eror

In [None]:
help("cdef")

In [None]:
cfib

In [None]:
fib(10)

In [None]:
cfib(10)

In [None]:
%%capture
# Number to compute
test_size = 100000
# Normal python
t1 = %timeit -n1 -r1 -o fib(test_size)
# Cython library
t2 = %timeit -n1 -r1 -o cfib(test_size)

In [None]:
print("%sx speedup" % (t1.best // t2.best))

A C compiler is required.

# With numpy!

Generating the Mandelbrot fractal.

In [None]:
import numpy as np

def mandelbrot_python(m, size, iterations):
    for i in range(size):
        for j in range(size):
            c = -2 + 3./size*j + 1j*(1.5-3./size*i) 
            z= 0
            for n in range(iterations):
                if np.abs(z) <= 10:
                    z = z*z + c
                    m[i, j] = n
                else:
                    break

In [None]:
mandelbrot_python

In [None]:
size = 200
iterations = 100

In [None]:
%%timeit -n1 -r1 m = np.zeros((size, size),dtype=np.int32) 
mandelbrot_python(m, size, iterations)

In [None]:
%%cython
import numpy as np
def mandelbrot_cython(int[:,::1] m, int size, int iterations):
    cdef int i, j, n
    cdef complex z, c

    for i in range(size):
        for j in range(size):
            c = -2 + 3./size*j + 1j*(1.5-3./size*i)
            z= 0
            for n in range(iterations):
                if z.real**2 + z.imag**2 <= 100:
                    z = z*z + c
                    m[i, j] = n
                else:
                    break

In [None]:
%%timeit -n1 -r1 m = np.zeros((size, size),dtype=np.int32) 
mandelbrot_cython(m, size, iterations)

# Wait. Does this really work?

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

m = np.zeros((size, size),dtype=np.int32) 
mandelbrot_cython(m, size, iterations)
plt.imshow(np.log(m), cmap=plt.cm.hot)
plt.xticks([]); plt.yticks([])

All we have done is speci ed the type of the local variables and function arguments and bypassed NumPy's np.abs() function when computing the absolute value of z. These changes have helped Cython to generate more optimized C code from Python code.

# How it works

The cdef keyword declares a variable as a statically-typed C variable. C variables lead to faster code execution because the overhead from Python's dynamic typing is mitigated. Function arguments can also be declared as statically-typed C variables.
In general, variables used inside tight loops should be declared with cdef. To ensure that our code is well-optimized, we can use annotations. We just add the -a  ag after the %%cython magic and the non-optimized lines will be shown in a gradient of yellow (white lines are faster, yellow lines are slower). This is shown in the following screenshot. The color depends on the relative number of Python API calls at each line.

There are two ways of declaring NumPy arrays as C variables with Cython: using array buffers or using typed memory views. In this recipe, we used typed memory views. We will cover array buffers in the next recipe.
Typed memory views allow ef cient access to data buffers with a NumPy-like indexing syntax. For example, we can use int[:,::1] to declare a C-ordered 2D NumPy array with integer values, with ::1 meaning a contiguous layout in this dimension. Typed memory views can be indexed just like NumPy arrays.
However, memory views do not implement element-wise operations like NumPy. Thus, memory views act as convenient data containers within tight for loops. For element-wise NumPy-like operations, array buffers should be used instead.


We could achieve a signi cant performance speedup by replacing the call to np.abs with a faster expression. The reason is that np.abs is a NumPy function with a slight call overhead. It is designed to work with relatively large arrays, not scalar values. This overhead results in a signi cant performance hit in a tight loop such as here. This bottleneck can be spotted with Cython annotations.

Using Cython from IPython is very convenient with the %%cython cell magic. However, it is sometimes necessary to create a reusable C extension module with Cython. This is actually what IPython's %%cython cell magic does under the hood.

# Step by step

## Step 1

Write a standalone Cython script in a `.pyx` file. 

This should correspond exactly to the entire contents of a %%cython cell magic.


In [None]:
%%writefile fib.pyx
def fcfib(int n):
    cdef int i, a, b
    a,b = 1,1
    for i in range(n):
        a, b = a+b, a
    return a

## Step 2

Create a setup.py file that we will use to compile the Cython module.


In [None]:
%%writefile setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
    
setup(cmdclass = {'build_ext': build_ext},
      ext_modules = [Extension("cython_fibonacci", ["fib.pyx"])])

## Step 3

Execute this setup script with Python:

In [None]:
! python setup.py build_ext --inplace

Two files have been created during the build process: 

1. the C source file 
2. and a compiled Python extension. 

The  le extension is .pyd on Windows (DLL files) and .so on UNIX

In [None]:
! ls *.so

## Step 4

Finally, we can load the compiled module as usual (using from mandelbrot import mandelbrot).

In [None]:
from cython_fibonacci import fcfib

We can use the optimized function

In [None]:
fcfib(10)

In [None]:
fcfib


With this technique, Cython code can also be integrated within a Python package. 

Here are a few references:

* Distributing Cython modules
       explained at http://docs.cython.org/src/userguide/source_files_and_compilation.html
* Compilation with Cython
       explained at http://docs.cython.org/src/reference/compilation.html

# Interactive plots?

http://bokeh.pydata.org/en/0.10.0/docs/gallery.html

In [None]:
# Preparation
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from collections import OrderedDict

In [None]:
from bokeh.sampledata.iris import flowers

colormap = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}
flowers['color'] = flowers['species'].map(lambda x: colormap[x])

p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Petal Width'

p.circle(flowers["petal_length"], flowers["petal_width"], color=flowers["color"], fill_alpha=0.2, size=10)
output_notebook()
show(p)

In [None]:
from bokeh._legacy_charts import Donut, show, output_file
from bokeh.sampledata.olympics2014 import data

# throw the data into a pandas data frame
df = pd.io.json.json_normalize(data['data'])

# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 8]
df = df.sort("medals.total", ascending=False)

# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values

# build a dict containing the grouped data
medals = OrderedDict()
medals['bronze'] = bronze
medals['silver'] = silver
medals['gold'] = gold
medals = pd.DataFrame(medals)

donut = Donut(medals, countries)
output_notebook()
show(donut)

# Parallel

In [None]:
! conda install -y ipyparallel

Now: open a terminal a create a cluster.

```bash
$ ipcluster start -n 4
```

The first step is to import the IPython ipyparallel module and then create a Client instance

In [2]:
import ipyparallel as ipp
rc = ipp.Client()
rc

<ipyparallel.client.client.Client at 0x7fab2ca3dc88>

In [3]:
rc.ids

[0, 1, 2, 3]

In [4]:
import os
ar = rc[:].apply_async(os.getpid)
pid_map = ar.get_dict()
pid_map

{0: 486, 1: 487, 2: 488, 3: 489}

In [5]:
dview = rc[:]

Blocking execution

In [None]:
In [2]: dview = rc[:] # A DirectView of all engines
In [3]: dview.block=True
In [4]: dview['a'] = 5

In [5]: dview['b'] = 10

In [6]: dview.apply(lambda x: a+b+x, 27)
Out[6]: [42, 42, 42, 42]

Magic

In [6]:
%px print('hi')

[stdout:0] hi
[stdout:1] hi
[stdout:2] hi
[stdout:3] hi


In [7]:
# OOPS
%px print('hi'

CompositeError: one or more exceptions from call to method: execute
[0:execute]: SyntaxError: unexpected EOF while parsing (<ipython-input-2-a77ac1d8bdf6>, line 1)
[1:execute]: SyntaxError: unexpected EOF while parsing (<ipython-input-2-a77ac1d8bdf6>, line 1)
[2:execute]: SyntaxError: unexpected EOF while parsing (<ipython-input-2-a77ac1d8bdf6>, line 1)
[3:execute]: SyntaxError: unexpected EOF while parsing (<ipython-input-2-a77ac1d8bdf6>, line 1)

In [8]:
import numpy
%px numpy.random.rand(1)

CompositeError: one or more exceptions from call to method: execute
[0:execute]: NameError: name 'numpy' is not defined
[1:execute]: NameError: name 'numpy' is not defined
[2:execute]: NameError: name 'numpy' is not defined
[3:execute]: NameError: name 'numpy' is not defined

In [9]:
with rc[:].sync_imports():
    import numpy

importing numpy on engine(s)


In [10]:
%%px 
a = numpy.random.rand(2,2)
numpy.linalg.eigvals(a)

[0;31mOut[0:4]: [0marray([ 0.43836421,  0.96818569])

[0;31mOut[1:4]: [0marray([ 0.17273788,  0.33252851])

[0;31mOut[2:4]: [0marray([ 0.10277511,  0.46678834])

[0;31mOut[3:4]: [0marray([ 1.10829684, -0.25781599])

In [12]:
%%px --targets 1
print("I am number 1")

I am number 1


Non-blocking execution: asyncronous

In [27]:
%%px --noblock
import time
import random
pause = random.randint(1,5)
time.sleep(pause)
now = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime(time.time()))
print("[%s] Completed after %s seconds " % (now,pause) )

<AsyncResult: execute>

In [28]:
%pxresult

[stdout:0] [2015-12-02 22:26:41] Completed after 4 seconds 
[stdout:1] [2015-12-02 22:26:38] Completed after 1 seconds 
[stdout:2] [2015-12-02 22:26:40] Completed after 3 seconds 
[stdout:3] [2015-12-02 22:26:39] Completed after 2 seconds 


In [34]:
%%px --targets ::2
print("I am even")

[stdout:0] I am even
[stdout:2] I am even


In [36]:
%%px --targets 1:3
print("In the middle")

[stdout:1] In the middle
[stdout:2] In the middle


* Dependencies
* controller and engine are separated
    - they can run on different hosts
* Integrates with MPI
* PBS mode
* Load balancer
* Scheduler
* Retries

```ipython
# for a visible LAN controller listening on an external port:
rc = Client('tcp://192.168.1.16:10101')
# or to connect with a specific profile you have set up:
rc = Client(profile='mpi')
```

https://ipyparallel.readthedocs.org/en/latest/index.html