In [None]:
# use this if you've just uploaded this notebook to Google Colaboratory
# better use a GPU runtime (TPU ones are not supported by the package yet)

!pip install neural-semigroups

In [12]:
# this is a simple example for semigroups from n=4 elements

cardinality = 4

In [13]:
# we want to build a neural network for such a task:
# given n equations between n or less variables
# reconstruct a semigroup of n elements satisfying these equations
# (or we can reformulate:
# given n cells from a Cayley table
# fill in the unknown cells to get an associative Cayley table
# )

# dropout rate is percentage of Cayley table cells to hide

dropout_rate = 1 - 1 / cardinality

We model each input Cayley table as a three index tensor $a_{ijk}$ such that

$a_{ijk}=P\left\{e_ie_j=e_k\right\}$

where $e_i$ are elements of a semigroup.

In our training data all $a_{ijk}$ are either zeros or ones, so probability distributions involved are degenerate.

When we need to hide a cell with indices $i,j$ from an original Cayley table we set

$a_{ijk}=\dfrac1n$

where $n$ is the semigroup's cardinality. Thus we set a probability distribution of the multiplication result $e_ie_j$ to discrete uniform.

In [14]:
from neural_semigroups import Magma
from neural_semigroups.utils import corrupt_input

# this functions helps us to formulate our main task
# x is a full associative Cayley table
def transform(x):
    # we want our network to be independent of isomorphisms
    # consider this a case for "data augmentation"
    # (applying symmetries to input data to enrich them
    # and exploit underlying symmetries of the data domain of origin)
    new_y = Magma(
        Magma(x[0]).random_isomorphism()
    ).probabilistic_cube
    # new_y is the full table, new_x is a partial table
    new_x = corrupt_input(
        new_y.view(1, cardinality, cardinality, cardinality),
        dropout_rate=dropout_rate
    ).view(cardinality, cardinality, cardinality)
    return new_x, new_y

In [15]:
# we use catalogues of semigroups from smallsemi package
# https://www.gap-system.org/Packages/smallsemi.html

from neural_semigroups.smallsemi_dataset import Smallsemi

data = Smallsemi(
    root=".",
    download=True,
    cardinality=cardinality,
    transform=transform
)

In [18]:
from torch.utils.data.dataset import random_split
from torch.utils.data import DataLoader

# for this case we split all available data into three subsets:
# for training, validating after each epoch and for testing the final model
data_size = len(data)
test_size = len(data) // 3
data_loaders = tuple(
    DataLoader(data_split, batch_size=32)
    for data_split
    in random_split(data, [data_size - 2 * test_size, test_size, test_size])
)

Possible choice of a loss function to minimize is a special [associator loss](https://neural-semigroups.readthedocs.io/en/latest/package-documentation.html#associator-loss). When the network produces an output which differs from the input but is associative, the classical DAE loss punishes it, but this one does not.

In [4]:
from neural_semigroups.associator_loss import AssociatorLoss
from torch import Tensor

def loss(prediction: Tensor, target: Tensor) -> Tensor:
    return AssociatorLoss()(prediction)

We choose a (denoising) autoencoder as an architecture for our neural network. It simply gets an input (disturbed) tensor and applies a linear transformations in the same dimension (not as in a common autoencoder) with a `ReLU` non-linearity and batch normalization four times (two to 'encode' and two to 'decode'). See the package code for the details.

One might consider these $n^3\rightarrow n^3$ transformations as basis changes in a free algebra on the elements of a semigroup.

In [22]:
from neural_semigroups import MagmaDAE

dae = MagmaDAE(
    cardinality=cardinality,
    hidden_dims=2 * [cardinality ** 3],
    dropout_rate=dropout_rate
)

In [23]:
dae

MagmaDAE(
  (encoder_layers): Sequential(
    (linear00): Linear(in_features=64, out_features=64, bias=True)
    (relu00): ReLU()
    (bn00): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (linear01): Linear(in_features=64, out_features=64, bias=True)
    (relu01): ReLU()
    (bn01): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (decoder_layers): Sequential(
    (linear10): Linear(in_features=64, out_features=64, bias=True)
    (relu10): ReLU()
    (bn10): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (linear11): Linear(in_features=64, out_features=64, bias=True)
    (relu11): ReLU()
    (bn11): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)

This architecture is relatively light having less than 20K weights.

In [28]:
sum(p.numel() for p in dae.parameters())

17152

In [7]:
%load_ext tensorboard

In [8]:
%tensorboard --logdir runs

In [29]:
!rm -rf runs

In [30]:
from neural_semigroups.training_helpers import learning_pipeline
from ignite.metrics.loss import Loss
from neural_semigroups.training_helpers import associative_ratio, guessed_ratio

params = {"learning_rate": 0.001, "epochs": 1000}
metrics = {
    "loss": Loss(loss),
    "associative_ratio": Loss(associative_ratio),
    "guessed_ratio": Loss(guessed_ratio)
}
learning_pipeline(params, dae, loss, metrics, data_loaders)

[1/1000]   0%|           [00:00<?]

The result is that we managed to get an associative table in about 60% of all the test cases. On the other hand, getting an associative Cayley table from a handful of known cells at random is highly improbable.