# Implementing a Transform

A transformation is a function that is applyied to a data set generating a new, transformed data set.
In `librep` transforms are classes are based in `scikit-learn` API. 
Every transformation must inherit from `librep.base.transform.Transform` abstract class and implement the `fit` (optional) and `transform` methods.

`fit` method allows to analyse the data set and extract useful information that may be used for transform. It receives the following parameters:

- `X`: which is an array-like object with the samples
- `y` (optional): which is an array-like of labels of each sample
- May receive additional parameters passed as keyworded arguments. 

The `fit` method return a self-reference (*e.g.* `return self`)

`transform` method transform the samples and returns a array-like with the transformed samples. It receives the parameter `X` which is an array-like object with the samples.

Options for customizing the transformation must be passed to class' constructor, that is, in `__init__` method.

Transforms tries to follow the following principles:

- For non-deterministic transforms, that is, that involve random computations, always tries to pass the `seed` as parameter and use it to create a determininistic transform.
- Customization options must be pased via constructor and used accordingly
- Transforms suppose that data already comes in the desired format
- `transform` function does not alter the input object (`X`). It generates a new array with transformed samples based on `X`.
- Optionally, transforms may implement `__str__` and `__repr__` function.

## A simple transform

Below is an example of a transformation called `MyTransform` that sums an integer `value` (passed as parameter to the class constructor) to every sample of the dataset. It only implements the `transform` method.

In [2]:
# Default imports
import numpy as np

# Base class
from librep.base.transform import Transform
# Typing definitions (from input)
from librep.config.type_definitions import ArrayLike


class MyTransform(Transform):
    def __init__(self, value: int):
        self.value = value

    def transform(self, X: ArrayLike):
        datas = []
        # Iterate over each sample of X
        for x in X:
            summed_x = x + self.value
            datas.append(summed_x)
        return np.array(datas)

    # This text will be printed when a string of an object ofthis class is called.
    def __str__(self) -> str:
        return f"MyTransform with: value={self.value}"

    def __repr__(self) -> str:
        return str(self)

We can instantiate our transform and test it on a synthetic dataset

In [3]:
# Instantiating the transform object

my_transform = MyTransform(value=10)
my_transform

MyTransform with: value=10

In [4]:
# Lets generate synthetic data (4x4 matrix)

array = np.arange(16).reshape(4, 4)
array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [5]:
# Lets apply the transform

new_array = my_transform.fit_transform(array)
new_array

array([[10, 11, 12, 13],
       [14, 15, 16, 17],
       [18, 19, 20, 21],
       [22, 23, 24, 25]])

## A more complex transform

Lets implement the MinMax Scaler. `fit` method will find the min and max from the dataset and `transform` will use this information scale the dataset, that is, for each sample `x`, calculate: $ x-min \over max $

In [6]:
class MinMaxTransform(Transform):
    def __init__(self):
        self.min_val = None
        self.max_val = None

    def fit(self, X: ArrayLike, y: ArrayLike):
        self.min_val = np.min(X)
        self.max_val = np.max(X)

    def transform(self, X: ArrayLike):
        return (X-self.min_val)/(self.max_val)

    def __str__(self) -> str:
        return f"My MinMax Scaler"

    def __repr__(self) -> str:
        return str(self)

In [7]:
# Instantiating the transform object
minmax_transform = MinMaxTransform()
minmax_transform

My MinMax Scaler

In [8]:
# Lets generate synthetic data (4x4 matrix)

array = np.arange(16).reshape(4, 4)
array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [9]:
# Lets apply the transform

new_array = minmax_transform.fit_transform(array)
new_array

array([[0.        , 0.06666667, 0.13333333, 0.2       ],
       [0.26666667, 0.33333333, 0.4       , 0.46666667],
       [0.53333333, 0.6       , 0.66666667, 0.73333333],
       [0.8       , 0.86666667, 0.93333333, 1.        ]])