In [None]:
# use this if you've just uploaded this notebook to Google Colaboratory
# don't forget to restart your runtime after the package installation
# better use a GPU runtime (TPU ones are not supported by the package yet)
!pip install neural-semigroups

If you have a Cayley database, you can build a machine learning model for such a task:

Given a partially filled Cayley table of a semigroup, restore the full one.

It should be mentioned that a partially filled table sometimes can be filled in several ways to a full associative table. We will consider all such solutions as equally valid.

In `neural-semigroups` package we use `torch` for building deep learning models.

First of all, we need to get some training and validation data.
In this example, we take semigroups of 5 items, and hold 100 Cayley tables (each representing a different class of equivalent semigrous) as our training data, and another 100 tables as validation.
This is a rough 10/90 split of all tables of 5 elements available (there are 1160 of them up to equivalence).

Here we construct `DataLoaders` for `torch` which will feed a training pipeline with 512 tables at a time.
This number (batch size) can be changed for fine-tuning the model's quality.

In [1]:
from neural_semigroups.training_helpers import get_loaders

cardinality = 5
data_loaders = get_loaders(
    cardinality=cardinality,
    batch_size=512,
    train_size=100,
    validation_size=100
)

augmenting by equivalent tables: 100%|██████████| 100/100 [00:00<00:00, 221.84it/s]
generating train cubes: 100%|██████████| 15190/15190 [00:00<00:00, 45048.70it/s]
generating validation cubes: 100%|██████████| 100/100 [00:00<00:00, 36276.63it/s]
generating test cubes: 100%|██████████| 960/960 [00:00<00:00, 43487.29it/s]


Note that for a training set we:
* take 100 representatives of different equivalence classes
* augment data by adding all equivalent tables
* as a result, we will train on 16100 tables from 100 classes of equivalence

For validation we simply use 100 tables from different classes.

We model each input Cayley table as a three index tensor $a_{ijk}$ such that

$a_{ijk}=P\left\{e_ie_j=e_k\right\}$

where $e_i$ are elements of a semigroup.

In our training data all $a_{ijk}$ are either zeros or ones, so probability distributions involved are degenerate.

When we need to hide a cell with indices $i,j$ from an original Cayley table we set

$a_{ijk}=\dfrac1n$

where $n$ is the semigroup's cardinality. Thus we set a probability distribution of the multiplication result $e_ie_j$ to discrete uniform.

We choose a simple denoising autoencoder as an architecture for our neural network. It simply gets an input tensor of zeros and ones, hides 50% of input cells in a manner described earlier, and applies a linear transformation into a higher dimension ($n^5$ which is contrary to a common idea of autoencoders) with a simple `ReLU` non-linearity. Then another linear transformation to the same dimension with `ReLU` is applied, and then the last one to return back to the original $n^3$ dimension. We also apply batch normalization here. See the package code for the details.

In [2]:
from neural_semigroups import MagmaDAE
from neural_semigroups.constants import CURRENT_DEVICE

dae = MagmaDAE(
    cardinality=cardinality,
    hidden_dims=[
        cardinality ** 5,
        cardinality ** 5
    ],
    corruption_rate=0.5
)

In total, our model has ca 20 million  parameters.

In [3]:
sum(p.numel() for p in dae.parameters())

20341000

During the training process we try to minimize a special [associator loss](https://neural-semigroups.readthedocs.io/en/latest/package-documentation.html#associator-loss) on the output of the DAE.

In [4]:
import torch
from torch import Tensor
from neural_semigroups import AssociatorLoss

def loss(prediction: Tensor, target: Tensor) -> Tensor:
    return AssociatorLoss()(prediction)

Now it's time to run a pipeline! Here you can tune the learning schedule for better results.

You can construct your own pipeline if you don't want to import one provided by the package.

In the next three cells we will run `tensorboard` to show training/validation curves during training process.

In [5]:
%load_ext tensorboard

In [6]:
!pkill tensorboard

In [7]:
!rm -rf ./runs/

In [8]:
%tensorboard --logdir runs --host 0.0.0.0

Launching TensorBoard...

In [9]:
from neural_semigroups.training_helpers import learning_pipeline

params = {"learning_rate": 0.001, "epochs": 1000}
learning_pipeline(params, cardinality, dae, loss, data_loaders)

HBox(children=(FloatProgress(value=0.0, max=1000.0), HTML(value='')))

Now we restore the best saved model from a checkpoint.

In [22]:
from neural_semigroups.utils import get_newest_file

dae.load_state_dict(torch.load(get_newest_file("checkpoints")))

<All keys matched successfully>

And here is the report of results. It seems to be quite impressive. For it we took random 1000 Cayley tables from 5 elements (for different equivalent classes as always) and constructed 'puzzles' from it.

Level of difficulty for a puzzle is a number of hidden cells. A puzzle is considered to be solved if the model returns a full associative table.

We see that the model generalizes well (it was trained only on one tenth of all equivalence classes).

In [23]:
from neural_semigroups.utils import print_report
from neural_semigroups import CayleyDatabase

cayley_db = CayleyDatabase(cardinality)
cayley_db.model = dae
print_report(cayley_db.testing_report)

generating and solving puzzles: 100%|██████████| 1000/1000 [02:02<00:00,  8.16it/s]


Unnamed: 0_level_0,puzzles,solved,(%),hidden cells,guessed,in %
level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1000,970,97,1000,970,97
2,1000,935,93,2000,1929,96
3,1000,890,89,3000,2864,95
4,1000,871,87,4000,3833,95
5,1000,852,85,5000,4779,95
6,1000,863,86,6000,5762,96
7,1000,842,84,7000,6715,95
8,1000,819,81,8000,7662,95
9,1000,804,80,9000,8570,95
10,1000,815,81,10000,9544,95


As a sanity check we can look at the quality of a model which always fills in the same number, e.g. zero. Zero is a tangible choice since in the `smallsemi` database items are sorted in such a way that zero occurs most often. The NN model outperforms this baseline by a large margin.

In [20]:
from neural_semigroups.constant_baseline import ConstantBaseline

constant_baseline = CayleyDatabase(cardinality)
constant_baseline.model = ConstantBaseline(cardinality)
print_report(constant_baseline.testing_report)

generating and solving puzzles: 100%|██████████| 1000/1000 [00:10<00:00, 96.68it/s]


Unnamed: 0_level_0,puzzles,solved,(%),hidden cells,guessed,in %
level,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1000,574,57,1000,574,57
2,1000,392,39,2000,1134,56
3,1000,296,29,3000,1681,56
4,1000,225,22,4000,2191,54
5,1000,209,20,5000,2774,55
6,1000,188,18,6000,3346,55
7,1000,161,16,7000,3853,55
8,1000,165,16,8000,4442,55
9,1000,163,16,9000,4991,55
10,1000,155,15,10000,5433,54


Now let's see how it works on several example puzzles. Let's take one of the real tables from the database.

In [13]:
cayley_db.database[1100]

array([[0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1],
       [0, 1, 2, 1, 1],
       [0, 1, 1, 3, 1],
       [0, 1, 1, 1, 4]])

Then we can fill it with `-1` in some cells, creating a puzzle and giving it to the model.

In [14]:
guess, proba = cayley_db.fill_in_with_model([
  [-1, 0, 0, 0, -1],
  [0, -1, 1, 1, -1],
  [0, 1, -1, 1, -1],
  [0, 1, 1, -1, -1],
  [0, 1, 1, 1, -1]]
)

The model found not the same table as the original one.

In [15]:
guess

array([[0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1],
       [0, 1, 1, 1, 1],
       [0, 1, 1, 1, 1],
       [0, 1, 1, 1, 1]])

But it's still a possible completion since it's associative

In [16]:
from neural_semigroups import Magma

Magma(guess).is_associative

True

The model returns also it's probabilities of guess. They can be examined in cases when the model err.

In [17]:
proba

array([[[9.96360302e-01, 1.59832533e-03, 6.84402825e-04, 6.66551583e-04,
         6.90414279e-04],
        [9.99996006e-01, 9.99999997e-07, 9.99999997e-07, 9.99999997e-07,
         9.99999997e-07],
        [9.99996006e-01, 9.99999997e-07, 9.99999997e-07, 9.99999997e-07,
         9.99999997e-07],
        [9.99996006e-01, 9.99999997e-07, 9.99999997e-07, 9.99999997e-07,
         9.99999997e-07],
        [9.99038517e-01, 1.81942669e-04, 1.86081888e-04, 1.85037803e-04,
         4.08407650e-04]],

       [[9.99996006e-01, 9.99999997e-07, 9.99999997e-07, 9.99999997e-07,
         9.99999997e-07],
        [1.19513526e-04, 9.99522924e-01, 1.23277903e-04, 1.15676194e-04,
         1.18578835e-04],
        [9.99999997e-07, 9.99996006e-01, 9.99999997e-07, 9.99999997e-07,
         9.99999997e-07],
        [9.99999997e-07, 9.99996006e-01, 9.99999997e-07, 9.99999997e-07,
         9.99999997e-07],
        [1.58640934e-04, 9.99158084e-01, 1.66137295e-04, 1.59894160e-04,
         3.57232086e-04]],

      

In contrast, the constant baseline always fills in the `-1`s with zeros:

In [24]:
ans = constant_baseline.fill_in_with_model([
  [-1, 0, 0, 0, -1],
  [0, -1, 1, 1, -1],
  [0, 1, -1, 1, -1],
  [0, 1, 1, -1, -1],
  [0, 1, 1, 1, -1]]
)[0]
ans

array([[0, 0, 0, 0, 0],
       [0, 0, 1, 1, 0],
       [0, 1, 0, 1, 0],
       [0, 1, 1, 0, 0],
       [0, 1, 1, 1, 0]])

In [25]:
Magma(ans).is_associative

False