# Welcome to algorithm creation tutorial!

The Algorithm class is a powerful tool that provides common ground for data
processing pipelines as well as persistence functionalities. This notebook will
cover the main features, functions and classes involved in the definition of
an algorithm through illustrative examples.

In this notebook you will learn:
    - What is the ProcessingMethod class
    - Wrap functions and external classes in the ProcessingMethod class
    - Define a processing pipeline
    - Use the Algorithm class

Do not forget to check the documentation if you do not understand something!

## Introduction

Medusa has been designed to facilitate the implementation of signal processing
algorithms meeting the needs of researchers and developers from different
fields. This includes not only the implementation of cutting-edge ready-to-use
signal processing methods, but also high level features to assure the
persistence and reproducibility of the algorithms created within medusa. All of
this, assuring interoperability with existing frameworks such as sklearn,
tensorflow, mne, etc. The Algorithm class meets these requirements, providing
a novel methodology to create, save and share complex data processing
algorithms.

## Imports

Import the modules that will be used in this notebook

In [1]:
# General imports
import numpy as np

# External processing methods
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Medusa imports
import medusa.components as mds_cmp

## Data

First, we will create some data, simulating a set of features extracted from a
biosignal recording with the corresponding labels

In [2]:
# Create train dataset
x_train = np.random.rand(1000).reshape((200, 5))
x_train[:100, :] = x_train[:100, :] + 1
x_train[100:, :] = x_train[100:, :] - 1

y_train = np.zeros((200,))
y_train[:100] = 1

# Create test dataset
x_test = np.random.rand(50).reshape((10, 5))
x_test[:5, :] = x_test[:5, :] + 1
x_test[5:, :] = x_test[5:, :] - 1

y_test = np.zeros((10,))
y_test[:5] = 1

print('Train set: ')
print('\tX shape: %s' % str(x_train.shape))
print('\ty shape: %s' % str(y_train.shape))

print('Test set: ')
print('\tX shape: %s' % str(x_test.shape))
print('\ty shape: %s' % str(y_test.shape))

print()
print('Test labels: %s' % str(y_test))

Train set: 
	X shape: (200, 5)
	y shape: (200,)
Test set: 
	X shape: (10, 5)
	y shape: (10,)

Test labels: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


## Processing methods

The next step is to instantiate the methods that will compose the algorithm.
Only methods that inherit from medusa.components.ProcessingMethod can be added
to the Algorithm class. Medusa framework includes a wide variety of signal
processing methods ready to use. Nevertheless, function and class wrappers
have been designed to assure full interoperability with external packages.

To show these functionalities, we will implement 2 versions of the linear
discriminant analysis (LDA) using the sklearn package. First, we will implement
a function to scale the data, just to show how to use ProcessingFuncWrapper.
Afterwards, we will use the class ProcessingClassWrapper to wrap the LDA class
in ProcessingMethod. Finally, we will design our own wrapper for the sklearn
class subclassing ProcessingMethod directly. Both methods must have the same
results. In practice, the second way is preferred over the use of
ProcessingClassWrapper, because, in very rare occasions, the automatic attribute
inheritance may cause unexpected problems.

An important issue is the definition of the functions that will be exposed to
the algorithm for each processing method. Class Algorithm needs to know which
functions of each method can be executed, and what are the outputs. We will have
to define these parameters in the constructors of ProcessingFuncWrapper,
ProcessingClassWrapper and ProcessingMethod. Be careful, these definitions will
be used later to compile the processing pipelines!

In [3]:
# 1. Data scaler
def scale(x, n):
    return n * x
scaler = mds_cmp. ProcessingFuncWrapper(scale, outputs=['x'])

# 2. LDA classifier from sklearn using ProcessingClassWrapper
lda1 = mds_cmp.ProcessingClassWrapper(LinearDiscriminantAnalysis(),
                                      fit=[], predict=['y'])

# 3. Wrapper for sklearn LDA classifier
class LDAWrapper(mds_cmp.ProcessingMethod):

    def __init__(self):
        super().__init__(fit=[], predict=['y'])
        self.clf = LinearDiscriminantAnalysis()

    def fit(self, x, y):
        self.clf.fit(x, y)

    def predict(self, x):
        return self.clf.predict(x)

lda2 = LDAWrapper()

print('OK!')

OK!


## Instantiate algorithm

Once the methods have been instantiated, we will create the algorithm and add
them, specifying a unique id for each one of them.

In [4]:
# Algorithm
alg = mds_cmp.Algorithm()

# Add methods
alg.add_method('scaler', scaler)
alg.add_method('lda1', lda1)
alg.add_method('lda2', lda2)

print('OK!')

OK!


## Design the processing pipelines

An algorithm can have several processing pipelines that can be executed
independently. Some usage examples are the fit and apply stages of the
methods, or the splitting of the algorithm in several independent stages
(e.g., preprocessing, feature extraction, feature selection and feature
classification). Therefore, the Algorithm class provides high flexibility to
meet the needs of researchers and developers. Remember that the pipeline
doesn't check the consistency of the connections. This will be done by the
algorithm, which compiles the pipeline on fly.

In [5]:
# Train pipe
train_pipe = mds_cmp.Pipeline()
uid_0 = train_pipe.input(['x', 'y'])
uid_1 = train_pipe.add(method_func_key='scaler:scale',
                       x=train_pipe.conn_to(uid_0, 'x'), n=2)
uid_2 = train_pipe.add(method_func_key='lda1:fit',
                       X=train_pipe.conn_to(uid_0, 'x'),
                       y=train_pipe.conn_to(uid_0, 'y'))
uid_3 = train_pipe.add(method_func_key='lda2:fit',
                       x=train_pipe.conn_to(uid_1, 'x'),
                       y=train_pipe.conn_to(uid_0, 'y'))

# Test pipe
test_pipe = mds_cmp.Pipeline()
uid_0 = test_pipe.input(['x'])
uid_1 = test_pipe.add(method_func_key='scaler:scale',
                      x=test_pipe.conn_to(uid_0, 'x'), n=2)
uid_2 = test_pipe.add(method_func_key='lda1:predict',
                      X=test_pipe.conn_to(uid_1, 'x'))
uid_3 = test_pipe.add(method_func_key='lda2:predict',
                      x=test_pipe.conn_to(uid_1, 'x'))


# Add pipelines
alg.add_pipeline('train', train_pipe)
alg.add_pipeline('test', test_pipe)

## Execute pipelines

Now, we will execute the pipelines. To do this, just call to exec_pipeline
method of Algorithm, providing the correct input keys and values. Remember
that the algorithm maintains the state between executions, so we can fit and
test the implemented classifiers.

Function exec_pipeline returns 2 tuples. The first tuple contains the processing
method that has been applied in each step of the algorithm. The second tuple
contains the results of these steps, assigned to each of the exposed outputs.

In [6]:
# Execute pipelines
train_res_keys, train_res_val = alg.exec_pipeline('train', x=x_train, y=y_train)
test_res_keys, test_res_val = alg.exec_pipeline('test', x=x_test)

print()
print('Prediction of LDA 1: %s' % str(test_res_val[2]['y']))
print('Prediction of LDA 2: %s' % str(test_res_val[3]['y']))

Finished 'exec_pipeline' in 0.0041 secs
Finished 'exec_pipeline' in 0.0005 secs

Prediction of LDA 1: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]
Prediction of LDA 2: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


## Persistence

The Algorithm class includes persistence options to save the algorithm in
the current state. Execute the next cell to save and load the previous
algorithm.

In [7]:
# Save algorithm
alg.save('alg.pkl')

# Load algorithm
loaded_alg = mds_cmp.Algorithm.load('alg.pkl')
test_res_keys, test_res_val = loaded_alg.exec_pipeline('test', x=x_test)

print()
print('Prediction of LDA 1: %s' % str(test_res_val[2]['y']))
print('Prediction of LDA 2: %s' % str(test_res_val[3]['y']))

Finished 'exec_pipeline' in 0.0002 secs

Prediction of LDA 1: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]
Prediction of LDA 2: [1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


## Conclusion

That's all for now! Now you have a comprehensive picture of the functions and
classes involved in the creation of medusa algorithms. As you can see, you can
build full signal processing pipelines in a very flexible and easy way with
few code lines using Medusa!

See you in the next tutorial.