# Polynomial expansion of inputs

In this notebook, we introduce the `PolynomialExpansion` module which is able to create polynomial features of inputs in a differentiable way.

In [1]:
import os
import sys

sys.path.append(os.path.join(os.path.abspath(''), ".."))

import torch

from nnbma.networks import FullyConnected, PolynomialNetwork
from nnbma.layers import PolynomialExpansion

## `PolynomialExpansion` module

`PolynomialExpansion` is a torch `Module` that creates all possible (non-constant) monomial from a set of inputs. For instance, for `degree=2`, we have:

$ \mathrm{poly}((x_1,\,x_2,\,x_3)) = (x_1,\,x_2,\,x_3,\,x_1^2,\,x_1x_2,\,x_1x_3,\,x_2^2,\,x_2x_3,\,x_3^2) $.

Here's the corresponding expansion:

In [2]:
input_features = 3
order = 2

layer = PolynomialExpansion(input_features, order)

x = torch.tensor([2., 3., 5.]) # Must have x.shape[-1] = input_features
print("Input:", x)

y = layer(x)
print("Output:", y)

Input: tensor([2., 3., 5.])
Output: tensor([ 2.,  3.,  5.,  4.,  6., 10.,  9., 15., 25.])


As any modules from this package, it works with batched inputs along the first axes:

In [3]:
x = torch.tensor([
    [
        [2., 3., 5.],
        [1., 1., 1.],
    ], [
        [1., 0., 0.],
        [0., 2., 3.],
    ]
]) # Must have x.shape[-1] = input_features
print("Input:", x)

y = layer(x)
print("Output:", y)

Input: tensor([[[2., 3., 5.],
         [1., 1., 1.]],

        [[1., 0., 0.],
         [0., 2., 3.]]])
Output: tensor([[[ 2.,  3.,  5.,  4.,  6., 10.,  9., 15., 25.],
         [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]],

        [[ 1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  2.,  3.,  0.,  0.,  0.,  4.,  6.,  9.]]])


Contrary to its classical use in preprocessing, this expansion is completely differentiable with respect to its inputs, so that it can be integrated into a neural network (and not placed before, a situation where the derivation of the network with respect to the inputs would be performed with respect to the developed inputs, and not with respect to the real inputs).

In [4]:
x = torch.tensor([2., 3., 5.], requires_grad=True) # Must have x.shape[-1] = input_features
print("Input:", x)

y = layer(x)
print("Output gradient:", y)

Input: tensor([2., 3., 5.], requires_grad=True)
Output gradient: tensor([ 2.,  3.,  5.,  4.,  6., 10.,  9., 15., 25.],
       grad_fn=<SqueezeBackward1>)


## `PolynomialNetwork` module

`PolynomialNetwork` is a convenience class that allows to integrate a `PolynomialLayer` at the input of a network inheriting from `NeuralNetwork`.

In [5]:
subnet = FullyConnected(
    [PolynomialExpansion.expanded_features(order, input_features), 10, 10, 1], # expanded_features allow to anticipate the number of polynomial features that the subnetwork will have as input, depending on the number of real input features and the max order.
    torch.nn.ReLU(),
)

net = PolynomialNetwork(
    input_features,
    order,
    subnet,
)
net.eval()

In [6]:
x = torch.tensor([2., 3., 5.], requires_grad=True) # Must have x.shape[-1] = input_features
print("Input:", x)

y = net(x)
print("Output gradient:", y)

Input: tensor([2., 3., 5.], requires_grad=True)
Output gradient: tensor([0.0873], grad_fn=<SqueezeBackward1>)


## Standardization of polynomial features

The outputs of a `PolynomialLayer` are not standardized, even for standardized inputs. For example, if an input feature $x$ is standardized (meaning that $\mu_{x}=0$ and $\sigma_{x}=1$), then $x^2$ is no longer standardized since $\mu_{x^2}=1$ and generally $\sigma_{x_2}\neq1$.

There is no analytical way of calculating the moments of polynomial features, as this depends on the distribution of the data.

In [7]:
x = torch.normal(0, torch.ones(100, 2))
order = 2

layer = PolynomialExpansion(x.size(-1), order)
print(layer.means, layer.stds, sep="\n")

Parameter containing:
tensor([0., 0., 0., 0., 0.])
Parameter containing:
tensor([1., 1., 1., 1., 1.])


In [8]:
print(x.mean(dim=0), layer(x).mean(dim=0))

tensor([0.0141, 0.0136]) tensor([ 1.4075e-02,  1.3608e-02,  1.3235e-02, -6.1836e-05,  9.3748e-03])


In [9]:
layer.update_standardization(x, reset=True)
print(layer.means, layer.stds, sep="\n")

Parameter containing:
tensor([ 1.4075e-02,  1.3608e-02,  1.3235e-02, -6.1836e-05,  9.3748e-03])
Parameter containing:
tensor([0.1142, 0.0959, 0.0212, 0.0111, 0.0129])


This standardization can also be done by batch in case of large datasets:

In [10]:
layer.update_standardization(x[:50], reset=True)
print(layer.means, layer.stds, "", sep="\n")
layer.update_standardization(x[50:])
print(layer.means, layer.stds, sep="\n")

Parameter containing:
tensor([0.0241, 0.0224, 0.0118, 0.0014, 0.0101])
Parameter containing:
tensor([0.1058, 0.0980, 0.0140, 0.0113, 0.0143])

Parameter containing:
tensor([ 1.4075e-02,  1.3608e-02,  1.3235e-02, -6.1836e-05,  9.3748e-03])
Parameter containing:
tensor([0.1142, 0.0959, 0.0212, 0.0111, 0.0129])


Note that there is a convenience method of `PolynomialNetwork` also called `update_standardization`.