[Cython]: http://cython.org/
[docs]: http://docs.cython.org/en/latest/src/quickstart/index.html
[wiki]: https://github.com/cython/cython/wiki

[tutorial]: http://conference.scipy.org/proceedings/SciPy2009/paper_1/full_text.pdf
[numerical calculations]: http://conference.scipy.org/proceedings/SciPy2009/paper_2/full_text.pdf
[tinyr]: https://code.google.com/archive/p/tinyr/


# Cython

We now will try [Cython] as a solution to speedup our RBush ([docs], [wiki]).
Cython is a superset of Python, where special syntax can be used to call C-level functions and to compile the \*ython code as C.
Cython claims that by using the cython compiler engine even pure-python code can benefit (even if slightly) from (use of C) optimizations.

Two documents should help us start this journey, the Cython [tutorial] by *S. Behnel, R.W. Bradshaw & D.S. Seljebotn* and a benchmark on [numerical calculations] done by *Dag Sverre Seljebotn*.
We also take [tinyr], an implementation of a r-tree, as our starting point.

If you're using Anaconda, Cython is available through the default channel for install.

[Sage]: http://www.sagemath.org/

## Cython tutorial

From the tutorial we learn the following.

Cython code can be compiled using:
* `setup.py` setup (what we will eventually use);
* `pyximport` to call cython' `pyx` files as if they were `py` modules and have compilation done in the background;
* pre-compile the code with `cython` command-line utility (most for debugging/tests);
* by using [Sage] notebooks, which allows Cython code inline (most for experimentation).



# Profiling

We're now at py-rbush commit #`6b48a144f0d602c90cc32d9c82771a131242fedc`, cython functions to `insert` and `search` have been adapted using pieces of `tinyr` and is now time to profile those functions.

The new version using `cython` is doing much better then our previous attempt with `numba` (whenever possible\*), but it still doing worse (~10x) than the brute-force\*\* method -- this one using all numba. Since it is a brute-force, and the purpose of using a tree structure is to reduced (logarithmically) the time consumed, we have to figure out where/what is going on and fix it until we get a substantial improvement (~100x?) to a typical use case (1M items).

To figure that out, we have to profile the functions in use and, probably, eventually we'll go down to line profiling.

\*: we couldn't use numba (version 0.36) everywhere in `rbush` because the library still has some limitations: it does not accept lists with non-homogeneous (*i.e*, different sizes) items, and when items were abstracted in (numba) classes the performance gain was quite low.

\*\*: the brute-force method is to search the items (boxes) linearly, one-by-one in our entire dataset.

## python --> cython

Python code is typically profiled using the stdlib `profile` or `cProfile` [profilers]. We can also easily inspect the performance of our code line-by-line using the [line_profiler] package.

Cython profiling is pretty close to a pure python code, we just have to add some pragmas to it according to how/what we want to inspect; their [profiling tutorial] gives us a nice first-steps guide.
Some googling on the subject inform us that the [line_profiler may be used with cython][so_lp].

So, here is the summary (so far, from what I got):
* we can use `cprofile` to profile an entire (`pyx`) module;
* we exclude some functions out of interest by decorating it accordingly;
* we can use line-tracing (with cython/cprofile);
* we can annotate the line-tracing with the `coverage` package;
* we can use Kern's `line_profiler` to have the timings per-line.

[profilers]: https://docs.python.org/3/library/profile.html
[line_profiler]: https://github.com/rkern/line_profiler
[profiling tutorial]: http://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html
[so_lp]: https://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line

## The simplest first

First step I will take is to simply enable the profiling using nothing more then the minimal settings.
This means we'll do as with a pure-python code: we use cProfile while running our test case.

The only addition is to include the pragma
```python
# cython: profile=True
```
to the top of the (`pyx`) module being profiled (file `core_funcs.pyx`).

In [1]:
from rbush.data import generate_data_array
import rbush

from rbush.core_search import search,search_node

def searchme(dt):
    for i in range(len(dt)):
        _ = r.search(*tuple(dt[i]))

data = generate_data_array(10000)
r = rbush.RBush()
r.load(data)
dt = generate_data_array(1000)

In [2]:
import cProfile
import pstats

# import pyximport
# pyximport.install()

cProfile.runctx('searchme(dt)', globals(), locals(), 'profile.prof')

s = pstats.Stats('profile.prof')
s.strip_dirs().sort_stats('time').print_stats()

Mon Feb  5 22:23:23 2018    profile.prof

         2005 function calls in 3.267 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    3.238    0.003    3.238    0.003 {rbush.core_search.search}
        1    0.026    0.026    3.267    3.267 <ipython-input-1-47ca2b5ed306>:6(searchme)
     1000    0.002    0.000    3.241    0.003 core.py:142(search)
        1    0.000    0.000    3.267    3.267 {built-in method builtins.exec}
        1    0.000    0.000    3.267    3.267 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




<pstats.Stats at 0x10d32cda0>

Notice that we just profiled everything in our code.

If we want we can disable the functions out of our interest by decorating them with:
```python
cimport cython
@cython.profile(False)
def non_interesting_function():
    # ...
```

# Let's see what coverage is about

I understand that the line-tracing with the annotation done by `coverage` is helpful to see some sharp edges, let's check.

Now, instead of using `# cython: profile=True` we'll use `# cython: linetrace=True` at the top of our cython module.
Also, we have to add a `.coveragerc` file in our current directory with the content:
```
[run]
plugins = Cython.Coverage
```

Run the profiling script (above) and format that to a HTML for better visualization:
```bash
# cython  --annotate-coverage coverage.xml  rbush/core_search.pyx
```

In [3]:
from IPython.display import HTML
from IPython.display import display

display(HTML('core_search.html'))

# Line profiling

Line profiling (using `line_profile`) is a bit more complicated...not clear at all to me. But here goes a tentative after the docs:
* https://stackoverflow.com/questions/28301931/how-to-profile-cython-functions-line-by-line
* http://nbviewer.jupyter.org/gist/tillahoffmann/296501acea231cbdf5e7
* http://docs.cython.org/en/latest/src/reference/compilation.html
* http://blog.yclin.me/gsoc/2016/07/23/Cython-IPython/

In [4]:
#Load Robert Kern's line profiler
%load_ext line_profiler
import line_profiler

#Set compiler directives (cf. http://docs.cython.org/src/reference/compilation.html)
# from Cython.Compiler.Options import directive_defaults
# directive_defaults['linetrace'] = True
# directive_defaults['binding'] = True

In [5]:
import Cython
directive_defaults = Cython.Compiler.Options.get_directive_defaults()
directive_defaults['linetrace'] = True
directive_defaults['binding'] = True

In [6]:
%load_ext Cython

In [8]:
# %%cython -a -f --compile-args=-DCYTHON_TRACE=1

# ...cython code goes here...

In [9]:
#Print profiling statistics using the `line_profiler` API
profile = line_profiler.LineProfiler(searchme)

profile.add_module(rbush.core_search)
profile.runcall(searchme, dt)
# profile.runctx('searchme(dt)', globals(), locals())
profile.print_stats()

Timer unit: 1e-06 s

Total time: 2.77013 s
File: <ipython-input-1-47ca2b5ed306>
Function: searchme at line 6

Line #      Hits         Time  Per Hit   % Time  Line Contents
     6                                           def searchme(dt):
     7      1001         1420      1.4      0.1      for i in range(len(dt)):
     8      1000      2768707   2768.7     99.9          _ = r.search(*tuple(dt[i]))

