## Data

In [1]:
# !pip install --upgrade -q git+https://github.com/shuiruge/neural-ode.git@master

import numpy as np
import tensorflow as tf

from hopfield import (DiscreteTimeHopfieldLayer, DenseRecon, ModernDenseRecon,
                      Conv2dRecon, RBMRecon, HebbianRBMRecon)

from mnist import load_mnist

tf.random.set_seed(42)
tf.keras.backend.clear_session()
print(tf.__version__)

2.3.0


In [2]:
# global configurations

IMAGE_SIZE = (32, 32)
BINARIZE = True

In [3]:
(x_train, _), _ = load_mnist(image_size=IMAGE_SIZE, binarize=BINARIZE)

## Model

In [4]:
def create_model(model_type: str):
    if model_type == 'dense':
        model = tf.keras.Sequential([
            DiscreteTimeHopfieldLayer(
                DenseRecon(),
                max_steps=100,
                reg_factor=1),
        ])
    elif model_type == 'ca':
        model = tf.keras.Sequential([
            DiscreteTimeHopfieldLayer(
                Conv2dRecon(
                    filters=64,
                    kernel_size=5,
                    flatten=True,
                ),
                max_steps=100,
                reg_factor=1),
        ])
    elif model_type == 'modern_dense':
        model = tf.keras.Sequential([
            DiscreteTimeHopfieldLayer(
                ModernDenseRecon(
                    memory=x_train[:100],
                    #n=100,
                    activation=tf.nn.softmax,
                ),
                max_steps=100,
                reg_factor=1),
        ])
    elif model_type == 'rbm':
        model = tf.keras.Sequential([
            DiscreteTimeHopfieldLayer(
                RBMRecon(latent_dim=256),
                max_steps=100,
                reg_factor=1),
        ])
    elif model_type == 'hrbm':
        model = tf.keras.Sequential([
            DiscreteTimeHopfieldLayer(
                HebbianRBMRecon(latent_dim=256),
                max_steps=100,
                reg_factor=1),
        ])
    else:
        raise ValueError(f'Unknown model type: "{model_type}".')
    model.compile(optimizer='adam')
    return model

In [5]:
model = create_model('rbm')
X = x_train[:200].numpy()
ds0 = tf.data.Dataset.from_tensor_slices(X)
ds = ds0.shuffle(10000).repeat(5000).batch(128)
model.fit(ds)



<tensorflow.python.keras.callbacks.History at 0x7f72c8c3ad90>

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
discrete_time_hopfield_layer (None, 1024)              1026025   
Total params: 1,026,025
Trainable params: 1,026,024
Non-trainable params: 1
_________________________________________________________________


In [22]:
## noised_X = X + np.random.normal(size=X.shape) * 0.2
noised_X = np.where(np.random.random(size=X.shape) < 0.15, -X, X)
recon_X = model.predict(noised_X)

for layer in model.layers:
    try:
        print('Relax steps:', layer.final_step.numpy())
    except AttributeError:
        pass

orig_err = noised_X - X
err = recon_X - X
print(f'{np.quantile(np.abs(orig_err), 0.99)} => '
      f'{np.quantile(np.abs(err), 0.99)}')

Relax steps: 8
2.0 => 0.0


## Conclusions and Discussions

### Resource Occupations

#### Time

1. Dense version is much faster than CNN version.

#### Space

1. CNN version needs only ~ 10^2 parameters. Recall that dense version needs 10^7 parameters.

1. To reduce the number of variables in the dense version, use [prunning](https://stackoverflow.com/a/56451791/1218716) after training.

### De-noising

1. However, CNN version is not robust to bit-flipping. Dense version is still very robust to it. Bit-flipping fails for CNN version hints that the information is not sparsely (distributedly) stored. Thus it cannot re-construct the original bit only from the information stored in its local neighbors. (Notice that bit-flipping creates non-smooth, thus always great, differences.) To see this, run the re-constructor on the bit-flipping noised inputs to see the 0.99-quantile of the re-construction error, comparing for both dense and CNN versions. Increasing filters will not change the failure.

1. Dense version gains 99% re-construction even for 40% bit-flipping.

### Binarization

1. Binarization is also essential to CNN version. Non-binarized inputs won't de-noise. The essense of binarization maybe traced to the simplicity it leads to. Indeed, the final loss without binarization will be greater (0.03X -> 0.04X).

1. Change X in {-1, 1} to {0, 1} causes error in de-noising. Don't know why.

### Discrete Time

1. Discrete time version is much much faster in predicting. Without lossing the attributes the continuous version has

1. Async update decreases the performance.

### Discrete State

1. Discrete time when using discrete time improves performance significantly.

### Relation between Continuous and Discrete Time

The continous time version, i.e. ODE version, and the discrete time version, i.e. iterative equation version, are related, since both of them can be regarded as root-finding. That is, find the root $x_{\star}$ s.t. $x - f(x) = 0$. This root is the fixed point, or relaxition phase. The ODE version uses the gradient descent method, and the iterative equation version uses definition. Both of them will find one of the many roots, which is ensured by the Lyapunov functions of them.

### Modern Version

1. If the kernel is initialized by giving memory, the learning is redundant, and the performance is, as expected, splendid.

1. However, if the kernel is initialized by Glorot initializer as usual, the learning is hard, and the performance is terrible.

1. If the memory is a subset of the training data, then performance is terrible again.

1. If the weight is pruned, then the memory is destroyed too. However, this will not happen to the "traditional" version.

## References

1. Cellular automata as convolutional neural networks (arXiv: 1809.02942).

1. Blog: [Hopfield Networks is All You Need](https://ml-jku.github.io/hopfield-layers/).