# Welcome to the algorithm creation tutorial!

The Algorithm class is a powerful tool that provides common ground for data
processing pipelines as well as persistence functionalities. If you need to
define and distribute standalone algorithms with full compatibility with
medusa native methods as well as third party packages, this is your tutorial.
This notebook will cover the main features, functions and classes involved in
the definition of an algorithm through illustrative examples.

In this notebook you will learn:
- What is the ProcessingMethod class
- Wrap functions and external classes in the ProcessingMethod class
- Define a processing pipeline
- Create an algorithm

Do not forget to check the documentation if you do not understand something!

## Introduction

Medusa has been designed to facilitate the implementation of signal processing
algorithms, meeting the needs of researchers and developers from different
fields. This includes not only the implementation of cutting-edge ready-to-use
signal processing methods, but also high level features to assure the
persistence and reproducibility of the algorithms created within medusa. All of
this, assuring interoperability with existing frameworks such as sklearn, mne,
etc. The Algorithm class meets these requirements, providing a unique
methodology to create, save and share complex data processing algorithms.

## Imports

Import the modules that will be used in this notebook

In [1]:
# General imports
import numpy as np

# External processing methods
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Medusa imports
from medusa import components

## Data

First, we will create some data, simulating a set of features extracted from a
biosignal recording with the corresponding labels

In [2]:
# Create train dataset
x_train = np.random.rand(1000).reshape((200, 5))
x_train[:100, :] = x_train[:100, :] + 1
x_train[100:, :] = x_train[100:, :] - 1

y_train = np.zeros((200,))
y_train[:100] = 1

# Create test dataset
x_test = np.random.rand(50).reshape((10, 5))
x_test[:5, :] = x_test[:5, :] + 1
x_test[5:, :] = x_test[5:, :] - 1

y_test = np.zeros((10,))
y_test[:5] = 1

print('Train set: ')
print('\tX shape: %s' % str(x_train.shape))
print('\ty shape: %s' % str(y_train.shape))

print('Test set: ')
print('\tX shape: %s' % str(x_test.shape))
print('\ty shape: %s' % str(y_test.shape))

print()
print('Test labels: %s' % str(y_test))

Train set: 
	X shape: (200, 5)
	y shape: (200,)
Test set: 
	X shape: (10, 5)
	y shape: (10,)

Test labels: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


## Processing methods

The next step is to instantiate the methods that will compose the algorithm.
Only methods that inherit from medusa.components.ProcessingMethod can be added
to the Algorithm class. Medusa framework includes a wide variety of signal
processing methods ready to use. Nevertheless, function and class wrappers
have also been designed to assure full interoperability with external packages.

To show these functionalities, we will implement a simple algorithm based on a
linear discriminant analysis (LDA) using the sklearn package. First, we will
implement a function to scale the input data, just to show how to use the class
ProcessingFuncWrapper. Afterwards, we will use the ProcessingClassWrapper to
wrap the LDA class from sklearn in ProcessingMethod. In practice, you could
also create your own wrapping class inheriting from ProcessinMethod, but this
would prevent the distribution of the algorithm in a standalone fashion. In
that case, you would need to distribute the code of the class along with the
algorithm file.

An important issue is the definition of the functions that will be exposed to
the algorithm for each processing method. Class Algorithm needs to know which
functions of each method can be executed, and what are the outputs. We will have
to define these parameters in the constructors of ProcessingFuncWrapper,
ProcessingClassWrapper and ProcessingMethod. Be careful, these definitions will
be used later to compile the processing pipelines!

In [3]:
# 1. Data scaler
def scale(x, n):
    return n * x
scaler = components. ProcessingFuncWrapper(scale, outputs=['x'])

# 2. LDA classifier from sklearn using ProcessingClassWrapper
lda = components.ProcessingClassWrapper(LinearDiscriminantAnalysis(),
                                        fit=[], predict=['y'])

print('OK!')

OK!


## Instantiate algorithm

Once the methods have been instantiated, we will create the algorithm and add
them, specifying a unique id for each one of them.

In [4]:
# Algorithm
alg = components.Algorithm()

# Add methods
alg.add_method('scaler', scaler)
alg.add_method('lda', lda)

print('OK!')

OK!


## Design the processing pipelines

An algorithm can have several processing pipelines that can be executed
independently. Some usage examples are the fit and apply stages of the
methods, or the splitting of the algorithm in several independent stages
(e.g., preprocessing, feature extraction, feature selection and feature
classification). Therefore, the Algorithm class provides high flexibility to
meet the needs of researchers and developers. Remember that the pipeline
doesn't check the consistency of the connections. This will be done by the
algorithm, which compiles the pipeline when add_pipeline function is called.

In [5]:
# Train pipe
train_pipe = components.Pipeline()
uid_0 = train_pipe.input(['x', 'y'])
uid_1 = train_pipe.add(method_func_key='scaler:scale',
                       x=train_pipe.conn_to(uid_0, 'x'), n=2)
uid_2 = train_pipe.add(method_func_key='lda:fit',
                       X=train_pipe.conn_to(uid_0, 'x'),
                       y=train_pipe.conn_to(uid_0, 'y'))
# Test pipe
test_pipe = components.Pipeline()
uid_0 = test_pipe.input(['x'])
uid_1 = test_pipe.add(method_func_key='scaler:scale',
                      x=test_pipe.conn_to(uid_0, 'x'), n=2)
uid_2 = test_pipe.add(method_func_key='lda:predict',
                      X=test_pipe.conn_to(uid_1, 'x'))


# Add pipelines
alg.add_pipeline('train', train_pipe)
alg.add_pipeline('test', test_pipe)

## Execute pipelines

Now, we will execute the pipelines. To do this, just call to exec_pipeline
method of Algorithm, providing the correct input keys and values. Remember
that the algorithm maintains the state between executions, so we can fit and
test the implemented classifiers.

Function exec_pipeline returns an ordered dict with the results of each step
of the pipeline. Each position is a dict with the method-function key, their
results assigned to each of the exposed outputs, and a simple performance
analysis that, for the moment, only measures the execution time.

In [6]:
# Execute pipelines
train_res = alg.exec_pipeline('train', x=x_train, y=y_train)
test_res = alg.exec_pipeline('test', x=x_test)

print()
print('Execution time lda:fit: %.4f s' % train_res[2]['perf']['run_time'])
print('Result of lda_predict: %s' % str(test_res[2]['res']['y']))


Execution time lda:fit: 0.0024 s
Result of lda_predict: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


## Persistence

The Algorithm class includes persistence options to save the algorithm in
the current state. Medusa uses dill as serialization tool and thus it has
the same advantages and disadvantages of this tool.

It is possible to come across classes that are not directly serializable with
dill (e.g., keras models). In such cases, override methods 'to_pickleable_obj'
and 'from_pickleable_obj' of class Processing method.

Execute the next cell to save and load the previous algorithm.

In [7]:
# Save algorithm
alg.save('alg.pkl')

# Load algorithm
loaded_alg = components.Algorithm.load('alg.pkl')
test_res = loaded_alg.exec_pipeline('test', x=x_test)

print()
print('Execution time lda:fit: %.4f s' % train_res[2]['perf']['run_time'])
print('Result of lda_predict: %s' % str(test_res[2]['res']['y']))


Execution time lda:fit: 0.0024 s
Result of lda_predict: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


## Standalone algorithms

Congratulations! The file alg.pkl in your working directory contains a
standalone version of our mini-example. To load and use it in a different
script or machine, use the following code:

    >>> from medusa import components
    >>> alg = components.Algorithm.load('alg.pkl')

Standalone algorithms are very useful for developers and scientists that design
add-hoc algorithms for a certain problem, database, etc, and want to share them
in an easy and quick way.

Remember that only algorithms that contain methods accessible in the destination
machine can be distributed as a single file.
For example, our tiny example can only be loaded in python environments which
have sklearn installed. This shouldn't be a problem, even for the most
complex examples, due to the huge amount of data processing packages available
nowadays. Additionally, note that dill is able to deserialize functions from
scratch, which means that you don't need the code of our little function
'scale.'

In the rare case that the available packages and dill functionalities don't suit
your needs, you have 2 options to distribute your algorithm: distribute your
code along with the algorithm file or create your own package in PyPI to easily
install your methods in any computer.

## Conclusion

That's all for now! Now you have a comprehensive picture of the functions and
classes involved in the creation of medusa algorithms. As you can see, you can
build full signal processing pipelines in a very flexible and easy way with
few code lines using Medusa!

See you in the next tutorial!