This tutorial helps you to understand how you can transform your data using DataTransformer and MatrixTransformer classes and how to make your own classes for data transformation.

## 1. MatrixTransformer

In [9]:
import numpy as np

from reskit.normalizations import mean_norm
from reskit.core import MatrixTransformer

matrix_0 = np.random.rand(5, 5)
matrix_1 = np.random.rand(5, 5)
matrix_2 = np.random.rand(5, 5)
y = np.array([0, 0, 1])

X = np.array([matrix_0,
              matrix_1,
              matrix_2])

output = np.array([mean_norm(matrix_0),
                   mean_norm(matrix_1),
                   mean_norm(matrix_2)])

result = MatrixTransformer(
            func=mean_norm).fit_transform(X)

(output == result).all()

True

If you have a data with specific data structure it is useful and convenient to write your function for data processing.

## 2. DataTransformer

In some cases, it is useful to store some additional information in X to creation final features set X.

Global and local functions can have their own parameters. To access global function parameters you should write "global__" before a needed parameter, as in the instance above. Other parameters you write in DataTransformer input will be referred to local_function parameters.

## 3. Transformers usage with Pipeliner

But if we use X as dictionary we should transform it into an array before usage in usual sklearn machine learning pipelines. Usually, we want to choose just one field from the dictionary and use it as X array, but sometimes we want to collect X array from different fields of the dictionary. In this case, we use collect parameter of DataTransformer. If you put a list of fields from X dictionary to DataTransformer, it stacks horizontally arrays from this fields to one X array. In the following instance, we created bag_of_edges and degrees features for our graphs and stack they for one X array.

## 4. Your own transformer

If you need more flexibility in transformation, you can implement your own transformer. Simplest example:

In [6]:
from sklearn.base import TransformerMixin
from sklearn.base import BaseEstimator

class MyTransformer(BaseEstimator, TransformerMixin):
    
    def __init__(self):
        pass
    
    def fit(self, X, y=None, **fit_params):
        #
        # Write here the code if transformer need
        # to learn anything from data.
        #
        # Usually nothing should be here, 
        # just return self.
        #
        return self
    
    def transform(self, X):
        #
        # Write here your transformation
        #
        return X