# Pre and post processing operators

The `operators` module contains helpers to manipulate NumPy `ndarray`. These operators are useful for pre or post processing of data. Note that they cannot handle PyTorch `Tensors`. To nest operators in a neural network, see the following file.

In [1]:
import os
import sys

sys.path.append(os.path.join(os.path.abspath(""), ".."))

import numpy as np
import pandas as pd

from torch import nn

from nnbma.networks import FullyConnected
from nnbma.operators import (
    log10,
    pow10,
    asinh,
    Normalizer,
    NormTypes,
    SequentialOperator,
)

from functions import Fexample as F

## Introductory examples

In [2]:
n_features = 5
n_entries = 10

mean = np.random.normal(0, 1, size=n_features)
std = np.abs(np.random.normal(0, 1, size=n_features)) + 1
x = np.random.normal(mean, std, size=(n_entries, n_features)).astype("float32")

print(f"Data shape: {x.shape}")
print(x)

Data shape: (10, 5)
[[ 3.4063318e+00 -1.1354405e+00  9.8771578e-01  7.0870233e-01
   1.7290703e+00]
 [ 2.1967297e+00 -1.9723588e+00  1.1380010e-03 -2.0811489e+00
   2.4284778e+00]
 [ 1.6692861e+00 -9.6962959e-01  1.2388496e+00  2.8925421e+00
   5.9123330e+00]
 [ 1.6724726e+00  1.3535130e+00  3.5991206e+00  6.4644545e-01
   1.2336152e+00]
 [ 8.2049608e-01 -1.2848368e+00 -6.3905917e-02 -1.0490260e+00
  -2.0751235e+00]
 [ 1.1257614e+00 -1.9223274e-01  1.5219371e+00 -2.8947442e+00
   2.0434897e-01]
 [ 1.9933865e+00  2.5302145e-01 -1.3949286e+00  2.5250411e+00
   1.4091047e+00]
 [ 1.1154927e+00 -1.2004021e+00 -7.3979467e-02  2.7410027e-01
  -2.2426999e+00]
 [-2.4341707e-01  9.9058968e-01  1.4949166e+00 -6.7498440e-01
   1.8345610e+00]
 [-6.9475919e-01 -2.5285777e-01  5.9505379e-01  2.4864352e+00
  -3.8048300e-01]]


### Rescaling

You can choose the rescale your data with for instance `log10` if they are over several order of magnitude. Alternatively, you can choose to use `asinh` if you have positive and negative values (which is the case here).

In [3]:
y = log10(x)
print(y)

[[ 0.532287           nan -0.00536801 -0.14953615  0.23781267]
 [ 0.34177664         nan -2.9438577          nan  0.38533413]
 [ 0.2225308          nan  0.0930186   0.4612797   0.7717589 ]
 [ 0.22335902  0.13146244  0.55619645 -0.18946813  0.0911797 ]
 [-0.08592349         nan         nan         nan         nan]
 [ 0.05144635         nan  0.18239672         nan -0.6896276 ]
 [ 0.29959154 -0.5968427          nan  0.40226847  0.14894328]
 [ 0.04746673         nan         nan -0.5620906          nan]
 [        nan -0.0041062   0.17461695         nan  0.26353216]
 [        nan         nan -0.22544378  0.39557716         nan]]


  return np.log10(t)


In [4]:
y = asinh(x)
print(y)

[[ 1.9396644e+00 -9.7397798e-01  8.7266058e-01  6.5978122e-01
   1.3154666e+00]
 [ 1.5283064e+00 -1.4312052e+00  1.1380007e-03 -1.4793483e+00
   1.6203358e+00]
 [ 1.2851425e+00 -8.5973459e-01  1.0406084e+00  1.7839062e+00
   2.4772642e+00]
 [ 1.2867789e+00  1.1106617e+00  1.9926003e+00  6.0824162e-01
   1.0373164e+00]
 [ 7.4859309e-01 -1.0691719e+00 -6.3862495e-02 -9.1561884e-01
  -1.4767357e+00]
 [ 9.6756536e-01 -1.9106807e-01  1.2068704e+00 -1.7846254e+00
   2.0295283e-01]
 [ 1.4406739e+00  2.5039664e-01 -1.1350307e+00  1.6564913e+00
   1.1432626e+00]
 [ 9.6072841e-01 -1.0162306e+00 -7.3912151e-02  2.7077913e-01
  -1.5471891e+00]
 [-2.4107517e-01  8.7470382e-01  1.1919401e+00 -6.3205260e-01
   1.3671030e+00]
 [-6.4836782e-01 -2.5023797e-01  5.6457895e-01  1.6421815e+00
  -3.7185386e-01]]


### Normalization

You may probably want to normalize your data. There are several options for you:
- `NONE`: No normalization
- `MEAN0`: Center the columns, i.e., set their means to 0
- `STD1`: Reduce the columns, i.e., set their variances to 1
- `MEAN0STD1`: Center and reduce the columns, i.e., set their means to 0 and their variances to 1
- `MIN0MAX1`: Apply a MinMax normalization, i.e., set the minimum value of each column to 0 and the maximum to 1
- `MIN1MAX1`: Apply an alternative MinMax normalization, i.e., set the minimum value of each column to -1 and the maximum to 1

In [5]:
norm = Normalizer(pd.DataFrame(x), norm_type=NormTypes.MIN1MAX1)

y = norm(x)
print(y)

[[ 1.00000000e+00 -4.96722460e-01 -4.58065867e-02  2.45297432e-01
  -2.59339809e-02]
 [ 4.10107136e-01 -1.00000000e+00 -4.40907955e-01 -7.18833566e-01
   1.45593882e-01]
 [ 1.52886152e-01 -3.97012770e-01  5.47666550e-02  1.00000000e+00
   1.00000000e+00]
 [ 1.54440045e-01  1.00000000e+00  1.00000000e+00  2.23782420e-01
  -1.47443056e-01]
 [-2.61047721e-01 -5.86561322e-01 -4.66956556e-01 -3.62147272e-01
  -9.58902359e-01]
 [-1.12177372e-01  7.04717636e-02  1.68136597e-01 -1.00000000e+00
  -3.99867833e-01]
 [ 3.10941696e-01  3.38223577e-01 -1.00000000e+00  8.72997165e-01
  -1.04404747e-01]
 [-1.17185175e-01 -5.35786867e-01 -4.70990777e-01  9.51055288e-02
  -1.00000000e+00]
 [-7.79891670e-01  7.81757474e-01  1.57315493e-01 -2.32884109e-01
  -6.27040863e-05]
 [-1.00000000e+00  3.40151787e-02 -2.03058541e-01  8.59655499e-01
  -5.43296337e-01]]


In [6]:
norm = Normalizer(pd.DataFrame(x), norm_type=NormTypes.MEAN0STD1)

y = norm(x)
print(y)

[[ 1.771317   -0.6508456   0.1475014   0.21452552  0.30665794]
 [ 0.75111127 -1.4352963  -0.5907225  -1.1924846   0.60300183]
 [ 0.30625358 -0.4954296   0.33541664  1.3159049   2.079136  ]
 [ 0.3089411   1.6820718   2.1015303   0.18312742  0.0967301 ]
 [-0.409635   -0.79087603 -0.6393927  -0.6719524  -1.305206  ]
 [-0.1521674   0.23323111  0.54724175 -1.6028064  -0.33937728]
 [ 0.5796071   0.6505717  -1.6353533   1.1305625   0.17108625]
 [-0.16082823 -0.7117347  -0.6469304  -0.00465803 -1.3762091 ]
 [-1.3069632   1.3419006   0.52702314 -0.4833114   0.3513551 ]
 [-1.6876354   0.17640676 -0.14631471  1.1110923  -0.5871748 ]]


## Embedding within a network

These operators can be added before and after a network. This is really user-friendly as people that haven't trained the network won't have to check how the data was preprocessed.

In [7]:
preprocessing = SequentialOperator([asinh, norm])
postprocessing = pow10

net = FullyConnected(
    [n_features, 10, 10, 1],
    nn.ReLU(),
    inputs_transformer=preprocessing,
    outputs_transformer=postprocessing,
)

Usually, you can just evaluate the network as it was a function (`net(x)`). But, by default it won't apply the pre and post processings. So you need to call the method `evaluate` which is mainly the same with more options:

In [8]:
y = net.evaluate(x, transform_inputs=True, transform_outputs=True)
print(y)

[[1.5116196 ]
 [1.5584499 ]
 [1.4138592 ]
 [1.7413094 ]
 [1.9702724 ]
 [2.2404044 ]
 [0.88946015]
 [1.695295  ]
 [1.290986  ]
 [1.5530138 ]]
