# Wrapping a MATLAB package in Python.

For work, I had to use some code that is only available in MATLAB. Most of my workflow, though, is in Python, and I didn't want to mix two languages. As any normal person would do, I started looking around for some tutorial on how to wrap MATLAB cleanly in Python. I found none that were satisfying, so I figured it out by myself using MATLAB's documentation.

It took me a bit, because my MATLAB is rusty, and the docs aren't exactly great. So I thought I'd do something good for a change, and write a tutorial on how to wrap MATLAB code with Python. I actually did something better, because when you're done with this, you should have a recipe to easily wrap any MATLAB code in a few minutes. Mind that the tutorial presupposes knowledge of the import system, OOP, operator overloading and decorators.

The first MATLAB package I ever wrapped is [FALCON](https://github.com/sjbeckett/FALCON), written by [Stephen Beckett](http://sjbeckett.github.io/), so I'll use it as an example to guide you through its wrapping. The code is contained in the [../falcon](https://github.com/ganileni/py_falcon/tree/master/falcon) directory.

First of all, you need to have a working copy of MATLAB installed on your machine. When you do, install the [MATLAB API for Python](https://www.mathworks.com/help/matlab/matlab-engine-for-python.html). Here are the [installation instructions](https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html) but, in short, you can install it with this:
```
$ cd $MATLABROOT/extern/engines/python
$ python setup.py install
```
where `$MATLABROOT` is your MATLAB installation directory (e.g. for my sistem it's `~/.MathWorks/MATLAB/R2018a`). This will put a new package in your Python installation called `matlab`. Of course I advise to do this in a virtualenv or conda environment, and not in your system Python installation.

After you're done with this, you can start a MATLAB engine from Python like this:

In [1]:
import matlab
from matlab import engine
# start the engine
eng=engine.start_matlab()

and you can use MATLAB's built-ins directly. For example, to create a MATLAB array from a list:

In [2]:
arr=[1,2]
matlab_arr=matlab.double(arr)
matlab_arr

matlab.double([[1.0,2.0]])

Note that you cannot convert a numpy array directly to matlab:

In [3]:
# this code will raise ValueError
import numpy as np
arr=np.array([1,2])
matl_arr=matlab.double(arr)

ValueError: initializer must be a rectangular nested sequence

In [4]:
# convert to list instead before feeding it
import numpy as np
arr=np.array([1,2])
matl_arr=matlab.double(arr.tolist())

You can pass the array to matlab functions and it will work correctly:

In [5]:
eng.sqrt(matl_arr)

matlab.double([[1.0,1.4142135623730951]])

any attribute required from the engine will be passed to MATLAB directly, so e.g.
```
Python> eng.sqrt(x)
```

which is equivalent in Python to:
```
Python> getattr(eng,'sqrt').__call__(x)
```

will behave like this MATLAB code:
```
MATLAB> sqrt(x)
```
you can also pass strings to be executed by the engine with
```
Python> eng.eval('code')
```
which is equivalent to
```
MATLAB> code
```
When you're done you can stop the engine with this:

In [6]:
eng.quit()

In order to use custom MATLAB code, and not just the built-ins, you just call it from the engine like you would in MATLAB. So if we are using e.g. FALCON, we know that its main routine is in a file called `PERFORM_NESTED_TEST.m`. MATLAB forces users to write one function per file, and the function must have the same name as the `.m` file that contains it. So, in MATLAB, we would do:
```
MATLAB> PERFORM_NESTED_TEST(args)
```
so in Python it becomes:
```
Python> eng.PERFORM_NESTED_TEST(args)
```
Now, this code would fail, giving us:
```
MatlabExecutionError: Undefined function 'PERFORM_NESTED_TEST' for input arguments of type 'double'.
```
let's show it:

In [7]:
# load a test matrix from falcon's source code dir
FALCON_DIR='../falcon/'
test_matrix=np.loadtxt(FALCON_DIR+'test.csv',delimiter=',')
# convert to matlab double array
test_matlab_matrix=matlab.double(test_matrix.tolist())

In [8]:
# this should raise MatlabExecutionError
eng=engine.start_matlab()
eng.PERFORM_NESTED_TEST(test_matlab_matrix)

MatlabExecutionError: Undefined function 'PERFORM_NESTED_TEST' for input arguments of type 'double'.


In [9]:
eng.quit()

The code fails because the \*.m files containing the MATLAB code must be in Python's working directory when the engine is started, in order to be recognized\*. In order to make the code work, we have to change the working directory:

\* *Note: this is because when the engine is started Python adds its current working directory to the engine's search path. We could work around this by changing the working directory with MATLAB commands, but that is even more hacky than this approach, because it would involve OS-based path management.*.

In [10]:
import os

CURRENT_DIR=os.getcwd()
os.chdir(FALCON_DIR)

eng=engine.start_matlab()

In [11]:
# positional arguments needed for the function
args=[False,False,['NODF'],eng.eval('[]'),eng.eval('[]'),0]

# now it works:
result=eng.PERFORM_NESTED_TEST(test_matlab_matrix, *args)
print(result)

{'binary': False, 'sorting': False, 'MEASURE': ['NODF'], 'nulls': matlab.double([]), 'ensNum': matlab.double([]), 'plot': 0, 'Matrix': {'Matrix': matlab.double([[3.0,0.0,0.0,0.0,3.0],[0.0,2.0,0.0,7.0,1.0],[0.0,0.0,1.0,0.0,2.0],[2.0,6.0,0.0,0.0,0.0],[5.0,0.0,0.0,0.0,3.0]]), 'fill': 11.0, 'connectance': 0.44}, 'NestedConfig': {'DegreeMatrix': matlab.double([[1.0,0.0,2.0,0.0,7.0],[3.0,5.0,0.0,0.0,0.0],[3.0,3.0,0.0,0.0,0.0],[2.0,0.0,0.0,1.0,0.0],[0.0,2.0,6.0,0.0,0.0]]), 'Degreeindex_rows': matlab.double([[2.0,5.0,1.0,3.0,4.0]]), 'Degreeindex_cols': matlab.double([[5.0,1.0,2.0,3.0,4.0]])}, 'Qua_t1': {'EnsembleSize': 1000.0, 'SignificanceTable': matlab.double([[0.0,0.0,0.0,1.0,1.0]]), 'measures': [{'MEASURE': 'NODF', 'NANcount': 0.0, 'Measure': 15.0, 'pvalue': 1.0, 'pvalueCorrected': 0.0, 'Mean': 15.0, 'StandardDeviation': 0.0, 'sampleZscore': nan, 'Median': 15.0, 'minimum': 15.0, 'maximum': 15.0, 'NormalisedTemperature': 1.0, 'NestednessUpOrDown': 'Up'}]}, 'Qua_t2': {'EnsembleSize': 1000.0,

Note that the calculation still works even if we change Python's working directory, all that matters is that the MATLAB engine has the folder containing the code in its search path:

In [12]:
os.chdir(CURRENT_DIR)

# if you see no error, that means it worked
result=eng.PERFORM_NESTED_TEST(test_matlab_matrix, *args)
eng.quit()

This leads to our first ingredient for a module that wraps MATLAB code. We will write a context manager that temporarily switches the cwd to where the MATLAB code is:

In [13]:
class TempWD():
    """context manager to temporarily switch cwd to `dir_path`"""

    def __init__(self, dir_path):
        # when called, gets current working dir
        self.cwd=os.getcwd()
        self.dir_path=dir_path
        # changes cwd
        os.chdir(self.dir_path)

    def __enter__(self):
        return None

    def __exit__(self, type, value, traceback):
        # on exit, switch back to previous wd
        os.chdir(self.cwd)

now we just need to load the engine under the context manager:

In [14]:
with TempWD(FALCON_DIR) as wd_manager:
    eng=engine.start_matlab()

# now the engine finds the source file
# again, no error means it worked correctly.
result=eng.PERFORM_NESTED_TEST(test_matlab_matrix, *args)
eng.quit()

let's write a nice and clean function to start matlab without having to type all that context manager stuff:

In [15]:
def start_matlab():
    """starts matlab. It uses TempWD to temporarily switch cwd to the path where the matlab source files are,
    so that the path gets added to matlab PATH at startup and the sources are found by the engine. """
    # when you start matlab, cwd should be the dir where the matlab files are!
    with TempWD(FALCON_DIR) as wd_manager:
        eng=matlab.engine.start_matlab()
    return eng

In [16]:
# let's check if it works:
eng=start_matlab()
result=eng.PERFORM_NESTED_TEST(test_matlab_matrix, *args)
eng.quit()

Now we can proceed to write a function that wraps `PERFORM_NESTED_TEST` so that it looks more pythonic. The main difficulty here is that we have to properly parse and convert the arguments to pass to MATLAB. The parsing is explained in more detail in the comments. I chose to leave the argument names unchanged, even if camelcase is not strictly PEP8, so that they match FALCON's orignal documentation.

In [17]:
allowed_measures=['DISCREPANCY', 'JDMnestedness', 'MANHATTAN_DISTANCE', 'NODF', 'NTC', 'SPECTRAL_RADIUS', 'WNODF']
def nested_test(matrix, bintest=1, sortVar=False, functhand=['NODF'], nullmodels=[], EnsembleNumber=[], plotON=0,
                eng=None):
    """Wrapper for FALCON's PERFORM_NESTED_TEST function (the main routine that allows all calculations). See `PERFORM_NESTED_TEST.m` docstrings and code for further info.
    Args:
        matrix: the matrix to test
        bintest: is the matrix binary (1) quantitative (0) or both ((2) the case of spectral radius for example).
        sortVar: specifies whether to sortVar the matrix for maximal packaging (1) or not (0). Is applied both to input and null matrices, but only makes a difference to NODF, DISCREPANCY and MANHATTAN DISTANCE scores and tests.
        functhand:  specifies function name of measure(s) to perform. It is important that this argument is a list.
        nullmodels: pecifies which null tests to run. [] performs all that can be done based on whether the test is binary/quantitative. Binary null tests are positively numbered e.g.(1,2,3), whilst quantitative null tests are negatively numbered e.g. (-1,-2,-3). To run binary null tests 1 and 3 for example you should use the argument [1 3]
        EnsembleNumber: To use the adaptive method this should be set as [], else the fixed solver is invoked which performs the set number of nulls in its ensemble e.g. if argument was 50, 50 null models would be performed.
        plotON: Ignored. In Matlab, determines whether a plot should be displayed to the user about how the test measurement compares to those found in the null ensemble. 1 indicates the plotON should be made, 0 indicates it should not.
        eng: matlab engine to use.
    """
    # matrices need to be fed as lists to the matlab fcn
    if not isinstance(matrix, list):
        matrix=np.asarray(matrix).tolist()
    for measure in functhand:
        if measure not in allowed_measures:
            raise ValueError('Error in functhand argument: {} is not a supported measure.'.format(measure))
    # convert to proper matlab type
    if bintest==1:
        # booleans if binary matrix
        matrix=matlab.logical(matrix)
    else:
        # doubles for quantitative matrix
        matrix=matlab.double(matrix)
    # nullmodels lists the null models to compute to calculate test significance.
    # PERFORM_NESTED_TEST only accepts a sorted MATLAB vector for this parameter.
    if isinstance(nullmodels, list):
        if not nullmodels:
            # correct way to initialize an empty vector
            nullmodels=eng.eval('[]')
        else:
            # convert to sorted double vector
            nullmodels=matlab.double(sorted(list(nullmodels)))
    if not EnsembleNumber:
        # here too we need to give an empty vector, the function won't accept EnsembleNumber===0
        EnsembleNumber=eng.eval('[]')
    # this needs to be converted to int
    sortVar=int(sortVar)
    # plotON is always 0, so we don't have to deal with MATLAB trying to access the graphical frontend
    # final note: MATLAB only takes positional arguments!
    result=eng.PERFORM_NESTED_TEST(matrix, bintest, sortVar, functhand, nullmodels, EnsembleNumber, 0)
    return result

Let's test that it all works

In [18]:
eng=start_matlab()
result=nested_test(test_matlab_matrix, eng=eng)
eng.quit()

Now one more trick: we want to write a module that masks the MATLAB backend as much as possible. So we'll write a wrapper around the `start_matlab()` function that does "lazy loading" of the MATLAB engine when it is called.

In [19]:
class EngineWrapper():
    """Wraps a matlab.engine() instance to do lazy loading"""

    def __init__(self):
        self.eng=None
        # this checks if the engine is running
        self.is_started=False

    # whenever we try to access any attribute
    def __getattr__(self, item):
        # exception for the 'is_started' attribute
        if item=='is_started':
            return self.is_started
        # if engine not running:
        if self.is_started is False:
            # start the engine, put it into self.eng
            self.eng=start_matlab()
            # make a note that engine is running
            self.is_started = True
        # and then return the corresponding attribute of self.eng
        return getattr(self.eng, item)

    # also override `__del__` method by making the wrapper close the engine
    def __del__(self):
        if self.is_started:
            self.quit()

Now we can put `_engine` as a global variable in our module containing one "default engine" used for calculations. We can then write a decorator that feeds `_engine` to `nested_test()`. This allows to `import` the module very fast, and the engine (which eats up ~30MB of memory and loads in about one second - at least when it's loaded from an SSD by a 7th gen Core i7 on my machine) will be started on the first call to the function, without any need for an explicit call. 

In [20]:
from functools import wraps

# will use as global variable
_engine=EngineWrapper()

def lazy_load_engine(func):
    """Decorator that adds lazy loading of the engine to MATLAB wrapper functions.
    The wrappers must use the `eng` keyword argument as a matlab.engine object."""

    @wraps(func) # to preserve signature
    def wrapper(*args,**kwargs):
        # _engine now points explicitly to the global module variable
        global _engine
        # override the `eng` argument of the function
        kwargs['eng']=_engine
        # finally call it
        return func(*args, **kwargs)
    
    return wrapper

# decorate function
nested_test_lazyload=lazy_load_engine(nested_test)

# decorated function automatically loads engine on first call
result=nested_test_lazyload(test_matlab_matrix)

# you still have to close the engine at the end.
# if you shut down Python all its child processes are killed,
# including the MATLAB engine, but explicitly closing it is cleaner
_engine.quit()

all we have to do now is to put everything in the `__init__.py` file in the folder where the code is, and the module will load everything automatically. Check out the example in the [`test.ipynb`](https://github.com/ganileni/py_falcon/blob/master/examples/test.ipynb) notebook.

**Note:** what I actually did is to put two files in the [`../falcon`](https://github.com/ganileni/py_falcon/tree/master/falcon) directory. One is called [`wrapper.py`](https://github.com/ganileni/py_falcon/blob/master/falcon/wrapper.py), and contains all the machinery that is independent of the MATLAB module we're using. the [`__init__.py`](https://github.com/ganileni/py_falcon/blob/master/falcon/__init__.py) file contains an import statement at the top:
```
from wrapper import *
```
and then only the actual function wrappers that are dependent on FALCON. So if you want to wrap another MATLAB module, just copy [`wrapper.py`](https://github.com/ganileni/py_falcon/blob/master/falcon/wrapper.py) in the folder where the code is, and then write a small wrapper for the MATLAB functions that you need in `__init__.py`.