# Pyhon Binding with F2PY for a Euclidean Distance Matrix written in Fortran90

In this notebook we implement a Euclidean distance matrix subroutine in fortran! Thanks to F2PY, we can easily import it in numpy.

In [1]:
import numpy as np

In [2]:
# Let's have here our numpy implementation of the cityblock distance matrix
# for comparison
def euclidean_numpy(x, y):
    """Euclidean square distance matrix.
    
    Inputs:
    x: (N,) numpy array
    y: (N,) numpy array
    
    Ouput:
    (N, N) Euclidean square distance matrix:
    r_ij = x_ij^2 - y_ij^2
    """

    x2 = np.einsum('ij,ij->i', x, x)[:, np.newaxis]
    y2 = np.einsum('ij,ij->i', y, y)[:, np.newaxis].T

    xy = np.dot(x, y.T)

    return np.abs(x2 + y2 - 2. * xy)

The fortran implementation is under `./distance-f90`. There are the files `cityblock.f90`, `euclidean.f90`, which are the source, and `setup.py`, which can be used to compile the code into `.so` files that are importable in python.

The following command can be used to build the `.so` files
```bash
python setup.py build_ext -i
```
This will produce the folder `metrics` that contains `cbdm.cpython-36m-x86_64-linux-gnu.so` and `edm.cpython-36m-x86_64-linux-gnu.so`. The `setup.py` files has some extra lines to copy the folder `metrics` with the compiled libraries to the same level directory where the notebooks are. This is not really part of the setup. In general, the location of the installed modules is controled by command line options like `--prefix`:
```bash
python setup.py install --prefix /some/dir
```

Building with `setup.py` produces the command
```bash
f2py -c euclidean.f90 -m edm --f90flags='-fopenmp -O3' -lgomp
```
The `*so` files can be compiled with that command as well. Using the `setup.py` file is not mandatory.

We can run those commands on a terminal or within the notebook using the following cell (uncomment it first):

In [3]:
# %%bash
# cd distance-f90
# python setup.py build_ext -i

The fortran subroutine doesn't return an array with the distance matrix. Instead it needs an array passed as argument in which the matrix will be written. To make it in the same fashion we did the other functions, we use the following wrapper:

In [4]:
from metrics.edm import euclidean_distance_matrix


def euclidean_fortran(x, y):
    """Wraper for Euclidean square distance matrix
    implemented in fortran.
    
    Inputs:
    x: (N,) numpy array
    y: (N,) numpy array
    
    Ouput:
    (N, N) Euclidean square distance matrix:
    r_ij = x_ij^2 - y_ij^2
    """
    nsamples, nfeat = x.shape
    
    dist_matrix = np.empty([nsamples, nsamples], order='F')
    
    euclidean_distance_matrix(x.T, y.T, nsamples, nfeat, dist_matrix)

    return dist_matrix

In [5]:
nsamples = 5000
nfeat = 50

x = 10. * np.random.random([nsamples, nfeat])

np.abs(euclidean_fortran(x, x) - euclidean_numpy(x, x)).max()

3.524291969370097e-12

In [6]:
%timeit euclidean_numpy(x, x)
%timeit euclidean_fortran(x, x)

491 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
151 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Conclusions
1. Implementing functions in fortran and binding them to python with F2PY might result in significant speedups.
2. Move only small compute-intensive bits of a python programm to Fortran90. This will result in cleaner code.
3. Keep in mind the types and order of the multidimensional arrays passed to fortran functions.