In [None]:
# use this if you've just uploaded this notebook to Google Colaboratory
# better use a GPU runtime (TPU ones are not supported by the package yet)
!pip install neural-semigroups

If you have a Cayley database, you can build a machine learning model for such a task:

Given a partially filled Cayley table of a semigroup, restore the full one.

It should be mentioned that a partially filled table sometimes can be filled in several ways to a full associative table. We will consider all such solutions as equally valid.

In `neural-semigroups` package we use `torch` for building deep learning models.

First of all, we need to get some training and validation data.
In this example, we take semigroups of 5 items, and hold 100 Cayley tables (each representing a different class of equivalent semigrous) as our training data, and another 100 tables as validation.
This is a rough 10/90 split of all tables of 5 elements available (there are 1160 of them up to equivalence).

Here we construct `DataLoaders` for `torch` which will feed a training pipeline with 512 tables at a time.
This number (batch size) can be changed for fine-tuning the model's quality.

In [None]:
from neural_semigroups.training_helpers import get_loaders

cardinality = 5
dropout_rate = 0.5
data_loaders = get_loaders(
    cardinality=cardinality,
    batch_size=512,
    train_size=100,
    validation_size=100,
    dropout_rate=dropout_rate
)

Note that for a training set we:
* take 100 representatives of different equivalence classes
* augment data by adding all equivalent tables
* as a result, we will train on 16100 tables from 100 classes of equivalence

For validation (for early stopping during training) we simply use 100 tables from different classes.

From each of the rest 960 equivalence classes one table goes into a test dataset on wich the trained model is finally evaluated.

We model each input Cayley table as a three index tensor $a_{ijk}$ such that

$a_{ijk}=P\left\{e_ie_j=e_k\right\}$

where $e_i$ are elements of a semigroup.

In our training data all $a_{ijk}$ are either zeros or ones, so probability distributions involved are degenerate.

When we need to hide a cell with indices $i,j$ from an original Cayley table we set

$a_{ijk}=\dfrac1n$

where $n$ is the semigroup's cardinality. Thus we set a probability distribution of the multiplication result $e_ie_j$ to discrete uniform.

We choose a simple denoising autoencoder as an architecture for our neural network. It simply gets an input tensor of zeros and ones, hides 50% of input cells in a manner described earlier, and applies a linear transformation into a higher dimension ($n^5$ which is contrary to a common idea of autoencoders) with a simple `ReLU` non-linearity. Then another linear transformation to the same dimension with `ReLU` is applied, and then the last one to return back to the original $n^3$ dimension. We also apply batch normalization here. See the package code for the details.

In [None]:
from neural_semigroups.denoising_autoencoder import MagmaDAE
from neural_semigroups.constants import CURRENT_DEVICE

dae = MagmaDAE(
    cardinality=cardinality,
    hidden_dims=2 * [cardinality ** 5],
    dropout_rate=dropout_rate
)

In total, our model has ca 20 million  parameters.

In [None]:
sum(p.numel() for p in dae.parameters())

Since the input and output are some probability distributions, we can employ a KL divergence as a measure of their similarity and a loss function for our network.

In [None]:
from torch.nn.functional import kl_div
from torch import Tensor
import torch

def loss(prediction: Tensor, target: Tensor) -> Tensor:
    return kl_div(torch.log(prediction), target, reduction="batchmean")

In the next cells we will run `tensorboard` to show training/validation curves during training process.

Metrics on the test dataset are depicted by points (one for the whole dataset).

In [3]:
%reload_ext tensorboard

In [4]:
%tensorboard --logdir runs

In [None]:
from neural_semigroups.training_helpers import learning_pipeline
from ignite.metrics.loss import Loss
from neural_semigroups.training_helpers import associative_ratio, guessed_ratio

dae = MagmaDAE(
    cardinality=cardinality,
    hidden_dims=2 * [cardinality ** 5],
    dropout_rate=dropout_rate
)
params = {"learning_rate": 0.0001, "epochs": 1000}
metrics = {
    "loss": Loss(loss),
    "associative_ratio": Loss(associative_ratio),
    "guessed_ratio": Loss(guessed_ratio)
}
learning_pipeline(params, dae, loss, metrics, data_loaders)

One can observe that although we ask the network to reproduce the input exactly, it's not very successful at this task (final guess ratio is about 10%). But the network surely learns something about associativity (it generates about 60% of associative tables on the test set). That suggests one can concetrate not on actually guessing the undistorted input but on generating something associative.

Possible choice of a loss function to minimize is a special [associator loss](https://neural-semigroups.readthedocs.io/en/latest/package-documentation.html#associator-loss). When the network produces an output which differs from the input but is associative, the classical DAE loss punishes it, but this one does not.

In [None]:
from neural_semigroups.associator_loss import AssociatorLoss

def loss(prediction: Tensor, target: Tensor) -> Tensor:
    return AssociatorLoss()(prediction)

In [None]:
dae = MagmaDAE(
    cardinality=cardinality,
    hidden_dims=2 * [cardinality ** 5],
    dropout_rate=dropout_rate
)
learning_pipeline(params, dae, loss, metrics, data_loaders)

Using the associator loss leads to better results. Moreover, it starts guessing the original input more often which is somewhat unexpected.

To observe, how its success is dependend on the choice of 100 training tables, we repeat the whole pipeline 10 times.

In [None]:
for i in range(10):
    torch.manual_seed(i)
    data_loaders = get_loaders(
        cardinality=cardinality,
        batch_size=512,
        train_size=100,
        validation_size=100,
        dropout_rate=dropout_rate
    )
    dae = MagmaDAE(
        cardinality=cardinality,
        hidden_dims=2 * [cardinality ** 5],
        dropout_rate=dropout_rate
    )
    learning_pipeline(params, dae, loss, metrics, data_loaders)

We see that the model generalizes well (it was trained only on one tenth of all equivalence classes), although the overall quality depends on the data selected for training.

Now let's see how it works on several examples of puzzles. Let's take one of the real tables from the database.

In [None]:
from neural_semigroups import CayleyDatabase

cayley_db = CayleyDatabase(cardinality)
cayley_db.model = dae
cayley_db.database[1100]

Then we can fill it with `-1` in some cells, creating a puzzle and giving it to the model.

In [None]:
guess, proba = cayley_db.fill_in_with_model([
  [-1, 0, 0, 0, -1],
  [0, -1, 1, 1, -1],
  [0, 1, -1, 1, -1],
  [0, 1, 1, -1, -1],
  [0, 1, 1, 1, -1]]
)

The model found not the same table as the original one.

In [None]:
guess

But it's still a possible completion since it's associative

In [None]:
from neural_semigroups import Magma

Magma(guess).is_associative

The model returns also it's probabilities of guess. They can be examined in cases when the model err.

In [None]:
proba