# Loading modules

In [1]:
import pytorch_lightning as pl
from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.loggers import TensorBoardLogger

import torch

from iterativenn.utils.DataModules import MNISTRepeatedSequenceDataModule
from iterativenn.nn_modules.Sequential2D import Sequential2D
from iterativenn.lit_modules import IteratedModel
from iterativenn.utils.logger_factory import LoggerFacade

import logging
import warnings

# Introduction



I want to make some sample code where I validation a Sequential2D with as little jiggery-pokey as possible. I want it to look as close to standard pytorch code as possible. I want to be able to use the same code for validationing a Sequential2D as I would for training a standard pytorch model.  In particular, I want pytorch lightning to work with it in the same way it works with standard pytorch models.



I want to be able to create a model from a configuration dictionary that looks like this:

In [2]:
# It is the responsibility of each block given to the Sequential2D to take as input a vector of the right dimension and return a vector of the right dimension.
# This can be checked and enforced by the 2d sequential, but it is the responsibility of the block to do the right thing.

# It the responsibility of the factory to make sure that the blocks are compatible with each other.  This is a bit tricky, because some blocks take multi-dimensional
# tensors as input.  The factory should be able to check that the input and output dimensions are compatible, and that the number of channels is compatible.

# 'Input' blocks should be on the diagonal, but I guess that is not strictly necessary.  I mean, a non-diagonal block could be an input block, but it would be a bit strange in that it would
# overwrite some other blocks output.  I guess that is not a problem, but it is a bit strange.
cfg = {
    "sequential2D": {
        "in_features": [784, 200, 10], # 784 + 200 + 10 = 994
        "out_features": [784, 140, 50, 20], # = # 784 + 140 + 50 + 20 = 994
        "block_types": [
            ['Input', None, None, None],
            ['LSTM', 'Linear', 'MaskedLinear', None],
            ['MaskedLinear.from_description', 'MaskedLinear.from_description', None, 'Conv1D'],
        ],
        "block_kwargs": [
            [{'type':'MNIST'}, None, None],
            [{'LSTM_arg':'LSTM_value'}, None, None, None],
            [{'block_type':'S=15', 'initialization_type':'G=0.2,0.7', 'trainable':True},
             {'block_type':'S', 'initialization_type':'C=0.3', 'trainable':'non-zero'}, 
             None, {'Conv1D_arg':'Conv1D_value'}]
        ],
        "start_y_index":1 # Note correct... how to do?
    }
}

# Global parameters

In [3]:
global_max_epochs = 10
global_optimizer = 'SGD'

# Notes for factory for the model.

Note, this code is now in iterativenn/src/iterativenn/nn_modules/Sequential2D.py


I want to create a factory that generates the model of interest from a configuration dictionary. To give us a starting point, let's consider the following linear operator

$$
A = \begin{bmatrix}
  784 \times 784 & 784 \times 200 & 784 \times 10 \\
  140 \times 784 & 140 \times 200 & 140 \times 10  \\
  50 \times 784 & 50 \times 200 & 50 \times 10  \\
  20 \times 784 & 20 \times 200 & 20 \times 10  \\
\end{bmatrix}
\in \mathbb{R}^{994 \times 994}
$$

$$
X A^T = (A X^T)^T
$$



which has the following form when thinking in terms of functions

$$
F = \begin{bmatrix}
  f_{0,0}: 784 \rightarrow 784 & f_{1,0}: 200 \rightarrow 784 & f_{2,0}: 10 \rightarrow 784 \\
  f_{0,1}: 784 \rightarrow 140 & f_{1,1}: 200 \rightarrow 140 & f_{2,1}: 10 \rightarrow 140 \\
  f_{0,2}: 784 \rightarrow 50  & f_{1,2}: 200 \rightarrow 50  & f_{2,2}: 10 \rightarrow 50  \\
  f_{0,3}: 784 \rightarrow 20  & f_{1,3}: 200 \rightarrow 20  & f_{2,3}: 10 \rightarrow 20  \\
\end{bmatrix}
\in \mathbb{R}^{894 \times 894}
$$

and



Each row of $X$ is the concatenation of some $x$ (input), $h$ (hidden), and $y$ output columns.  What makes some columns special?

- $x$ columns are set by the outside world. They are the input to the model. 
- $h$ columns are set by the model. They are the hidden state of the model and don't really matter to anyone except the model.
- $y$ columns are set by the model. They are the output of the model and are the only thing that matters to the outside world.  They feed back into the model by way of back-propagation.

One way to handle $x$ is to make a special 

$$
F^T = \begin{bmatrix}
  f_{0,0}: 784 \rightarrow 784 & f_{0,1}: 784 \rightarrow 140 & f_{0,2}: 784 \rightarrow 50 & f_{0,3}: 784 \rightarrow 20\\
  f_{1,0}: 200 \rightarrow 784 & f_{1,1}: 200 \rightarrow 140 & f_{1,2}: 200 \rightarrow 50 & f_{1,3}: 200 \rightarrow 20\\
  f_{2,0}: 10  \rightarrow 784 & f_{2,1}: 10  \rightarrow 140 & f_{2,2}: 10  \rightarrow 50 & f_{2,3}: 10  \rightarrow 20\\
\end{bmatrix}
\in \mathbb{R}^{894 \times 894}
$$


The code below implements this idea.

```python
in_features_list = [784, 200, 10]
out_features_list = [784, 140, 50, 20]
blocks = [
          [MaskedLinear(784, 784), MaskedLinear(784, 140), MaskedLinear(784, 50), MaskedLinear(784, 20)], 
          [MaskedLinear(200, 784), MaskedLinear(200, 140), MaskedLinear(200, 50), MaskedLinear(200, 20)], 
          [MaskedLinear(10,  784), MaskedLinear(10,  140), MaskedLinear(10,  50), MaskedLinear(10,  20)], 
         ]
model = Sequential2D(in_features_list, out_features_list, blocks)
```

# Training

## My model using factory

In [4]:
def factory_run(cfg, name):
    log_name = name
    logger = TensorBoardLogger("outputs", name=log_name, version='main')
    logger = LoggerFacade(logger, 'tensorboard', 'info')
    sequential2D = Sequential2D.from_config(cfg["sequential2D"])
    callbacks = IteratedModel.ConfigCallbacks(cfg["callbacks"])
    model = IteratedModel.IteratedModel(sequential2D, 
                                        callbacks,
                                        optimizer=global_optimizer)
    data_module = MNISTRepeatedSequenceDataModule(min_copies=2, max_copies=2, seed=1234)
    # This can be used to remove all of the extra outut from the training
    logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)
    # Initialize a trainer
    trainer = Trainer(
        accelerator='auto',
        devices=1 if torch.cuda.is_available() else None,  # limiting got iPython runs
        max_epochs=global_max_epochs,
        log_every_n_steps=1,
        enable_progress_bar=False,
        logger=logger,
    )

    with torch.no_grad():
        data_module.prepare_data()
        data_module.setup('fit')
        batch = next(iter(data_module.train_dataloader()))
        loss = model.training_step(batch, 0, do_logging=False)
        print(f"loss before training: {loss}")

    with warnings.catch_warnings():
        # There are warning that I dont' care about at this moment and are not relevant to the example.
        warnings.simplefilter("ignore")
        trainer.fit(model, data_module)

    with torch.no_grad():
        data_module.prepare_data()
        data_module.setup('fit')
        batch = next(iter(data_module.train_dataloader()))
        loss = model.training_step(batch, 0, do_logging=False)
        print(f"loss after training: {loss}")

    return sequential2D

In [5]:
cfg = {
    "sequential2D": {
        "in_features_list": [28*28, 100, 10], 
        "out_features_list": [28*28, 100, 10], 
        "block_types": [
            [None, 'Linear', None],
            [None, None, 'Linear'],
            [None, None, None],
        ],
        "block_kwargs": [
            [None, None, None],
            [None, None, None],
            [None, None, None],
        ]
    },
    "callbacks": {
        "loss": {
            "func": "CrossEntropyLoss",
            "idx_list" : range(28*28+100, 28*28+100+10),
            "sequence_position": 'last',
        },
        "initialization": {
            "func": "zeros",
            "size": 28*28+100+10,
        },
        "data": {
            "func": "insert",
            "idx_list": range(28*28),
            "flatten_input": True,            
        },
        "output": {
            "func": "max",
            "idx_list" : range(28*28+100, 28*28+100+10)
        },
    }
}

In [6]:
previous_model = factory_run(cfg, "factory_MLP")

loss before training: 2.299802303314209
loss after training: 1.892664909362793


In [7]:
# Save the model
torch.save(previous_model, "previous_model.pt")

## Trivial growing the model

In [8]:
previous_model = torch.load("previous_model.pt")

In [9]:
cfg = {
    "sequential2D": {
        "in_features_list": [28*28+100+10, 10], 
        "out_features_list": [28*28+100+10, 10], 
        "block_types": [
            ['Module', None],
            [None, None],
        ],

        "block_kwargs": [
            [{'module':previous_model}, None],
            [None, None],
        ],
    },
    "callbacks": {
        "loss": {
            "func": "CrossEntropyLoss",
            "idx_list" : range(28*28+100, 28*28+100+10),
            "sequence_position": 'last',
        },
        "initialization": {
            "func": "zeros",
            "size": 28*28+100+10+10,
        },
        "data": {
            "func": "insert",
            "idx_list": range(28*28),
            "flatten_input": True,
        },
        "output": {
            "func": "max",
            "idx_list" : range(28*28+100, 28*28+100+10)
        },
    }
}

In [10]:
tmp_model = factory_run(cfg, "grow_MLP")

loss before training: 1.892664909362793


loss after training: 1.5198802947998047


## Non-trivial but non-trainable growing the model

In [11]:
previous_model = torch.load("previous_model.pt")

In [12]:
default_block_kwargs = {'block_type':'W', 'initialization_type':'C=0.0', 'trainable':False, 'bias':False}

cfg = {
    "sequential2D": {
        "in_features_list": [28*28+100+10, 10], 
        "out_features_list": [28*28+100+10, 10], 
        "block_types": [
            ['Module', 'MaskedLinear.from_description'],
            ['MaskedLinear.from_description', 'MaskedLinear.from_description'],
        ],
        "block_kwargs": [
            [{'module':previous_model}, default_block_kwargs],
            [default_block_kwargs, default_block_kwargs],
        ],
    },
    "callbacks": {
        "loss": {
            "func": "CrossEntropyLoss",
            "idx_list" : range(28*28+100, 28*28+100+10),
            "sequence_position": 'last',
        },
        "initialization": {
            "func": "zeros",
            "size": 28*28+100+10+10,
        },
        "data": {
            "func": "insert",
            "idx_list": range(28*28),
            "flatten_input": True,
        },
        "output": {
            "func": "max",
            "idx_list" : range(28*28+100, 28*28+100+10)
        },
    }

}

In [13]:
tmp_model = factory_run(cfg, "grow_MLP_no_train")

loss before training: 1.892664909362793


loss after training: 1.5198802947998047


## Non-trivial and trainable growing the model

### Zero initialization

In [14]:
previous_model = torch.load("previous_model.pt")

In [15]:
default_block_kwargs = {'block_type':'W', 'initialization_type':'G=0.0,0.0', 'trainable':True, 'bias':False}

cfg = {
    "sequential2D": {
        "in_features_list": [28*28+100+10, 10], 
        "out_features_list": [28*28+100+10, 10], 
        "block_types": [
            ['Module', 'MaskedLinear.from_description'],
            ['MaskedLinear.from_description', 'MaskedLinear.from_description'],
        ],
        "block_kwargs": [
            [{'module':previous_model}, default_block_kwargs],
            [default_block_kwargs, default_block_kwargs],
        ],
    },
    "callbacks": {
        "loss": {
            "func": "CrossEntropyLoss",
            "idx_list" : range(28*28+100, 28*28+100+10),
            "sequence_position": 'last',
        },
        "initialization": {
            "func": "zeros",
            "size": 28*28+100+10+10,
        },
        "data": {
            "func": "insert",
            "idx_list": range(28*28),
            "flatten_input": True,
        },
        "output": {
            "func": "max",
            "idx_list" : range(28*28+100, 28*28+100+10)
        },
    }

}

In [16]:
tmp_model = factory_run(cfg, "grow_MLP_train_init_std_0")

loss before training: 1.892664909362793


loss after training: 1.5198802947998047


### Random initialization

In [17]:
previous_model = torch.load("previous_model.pt")

In [18]:
default_block_kwargs = {'block_type':'W', 'initialization_type':'G=0.0,0.1', 'trainable':True, 'bias':False}

cfg = {
    "sequential2D": {
        "in_features_list": [28*28+100+10, 10], 
        "out_features_list": [28*28+100+10, 10], 
        "block_types": [
            ['Module', 'MaskedLinear.from_description'],
            ['MaskedLinear.from_description', 'MaskedLinear.from_description'],
        ],
        "block_kwargs": [
            [{'module':previous_model}, default_block_kwargs],
            [default_block_kwargs, default_block_kwargs],
        ],
    },
    "callbacks": {
        "loss": {
            "func": "CrossEntropyLoss",
            "idx_list" : range(28*28+100, 28*28+100+10),
            "sequence_position": 'last',
        },
        "initialization": {
            "func": "zeros",
            "size": 28*28+100+10+10,
        },
        "data": {
            "func": "insert",
            "idx_list": range(28*28),
            "flatten_input": True,
        },
        "output": {
            "func": "max",
            "idx_list" : range(28*28+100, 28*28+100+10)
        },
    }

}

In [19]:
tmp_model = factory_run(cfg, "grow_MLP_train_init_std_0.1")

loss before training: 1.9122493267059326


loss after training: 1.4130362272262573


### Random initialization with Linear model

In [20]:
previous_model = torch.load("previous_model.pt")

In [21]:
with torch.no_grad():
    model01 = torch.nn.Linear(28*28+100+10, 10, bias=False)
    model01.weight.normal_(0.0, 0.1)
    model10 = torch.nn.Linear(10, 28*28+100+10, bias=False)
    model10.weight.normal_(0.0, 0.1)
    model11 = torch.nn.Linear(10, 10, bias=False)
    model11.weight.normal_(0.0, 0.1)

cfg = {
    "sequential2D": {
        "in_features_list": [28*28+100+10, 10], 
        "out_features_list": [28*28+100+10, 10], 
        "block_types": [
            ['Module', 'Module'],
            ['Module', 'Module'],
        ],
        "block_kwargs": [
            [{'module':previous_model}, {'module':model01}],
            [{'module':model10}, {'module':model11}],
        ],
    },
    "callbacks": {
        "loss": {
            "func": "CrossEntropyLoss",
            "idx_list" : range(28*28+100-5, 28*28+100+10),
            "sequence_position": 'last',
        },
        "initialization": {
            "func": "zeros",
            "size": 28*28+100+10+10,
        },
        "data": {
            "func": "insert",
            "idx_list": range(28*28),
            "flatten_input": True,
        },
        "output": {
            "func": "max",
            "idx_list" : range(28*28+100, 28*28+100+10)
        },
    }

}

In [22]:
tmp_model = factory_run(cfg, "grow_MLP_train_Linear")

loss before training: 2.8078207969665527


loss after training: 1.5807777643203735
