# CONVOLUTIONAL GRAMMAR

Convolutional grammar is based on ConvGrammarV2 and ConvTranslator.

### GENERATE A RANDOM NETWORK


This grammar works with a set of production rules which generate a random network without specifying dimensions such as number of features or kernel size.

The user must specify the desired number of layers in the networks and the minimum number of spatial (convolutional) layers.

Tree construction is sequential and follows these simple rules:
1) First node is always \<start\>.
2) If previous node is \<start\>, sampled node can only be convolutional.
3) If previous node is convolutional and the minimum number of spatial layers have not been reached, sampled node can only be convolutional.
4) If previous node is convolutional and minimum number of spatial layers is reached, sampled node can be, with the same probability, convolutional or flatten.
5) If previous node is flatten sampled node can only be linear.
6) If previous node is linear, sampled node can be, with same probability, linear or dropout.
7) If previous node is dropout, sampled node can only be linear
8) Last node is always \<end\>.
9) If there is no flattening between a convolution and ending node, a flattening is added.

In [1]:
from ConvGrammarV2 import ImageProductionRules

n_layers = 15
min_spatial_layers = 7

production_rules = ImageProductionRules(n_layers = n_layers,
                                        min_spatial_layers = min_spatial_layers)

empty_tree = production_rules.grow_tree()

for node in empty_tree:
    print(node)

['<start>', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['conv2d', None, None]
['flatten', None, None]
['linear', None, None]
['<end>', None, None]


### FILL THE RANDOM NETWORK

Class __ImageGrammar__ does two things:
    1) Generates an empty tree as before
    2) Fill empty tree entries with reasonable values

It needs as parameters:
- input dimension (ex. (64,64))
- input channels (ex. 3 is RGB)
- output dimension (ex. 1 could be regression or binary classification)
- number of layers
- minumum number of spatial layers
- hidden\_in (the number of output channels specified for the first convolution)
- hidden\_out (the number of output channels specified for the last convolution)
- shrinkage objective (the dimension the output image from spatial layers should have before flattening, ex. (4,4))

Setting layers parameters follows some rules:

__\<start\>__: this node is filled in the first position with the number of _channels_ of input image and on the second with image _input dimensions_

__spatial layers (conv2d)__: these layers need the specification of _hidden\_in_ and _hidden\_out_. Each convolution output features number is calculated such that there is a linear convergence from _hidden\_in_ to _hidden\_out_. There is also the need to specify the _shrinkage\_objective_ which is the dimension reduction we would like to reach with convolutional layers before flattening. On the first position of each layer we can find the number of output features while on the second dimension kernel size. Note kernel size is calculated such that each node has almost the same dimension.

__flatten__: flattens all the dimensions (except batch dimension) in order to transform last convolutional output image into a single vector. On the first position of each layer we can find _number of features_ resulting from flattening while second position is empty.

__linear__: linear modules number of features are set such that there is linear convergence from flattening dimension to objective dimension. In first position we can find number of _output neurons_ while second dimension is empty.

__dropout__: _dropout percentage_ is randomly sampled from a Beta with most of the probability mass concentrated in values lower than 0.5. This percenage can be found in first position while second position is empty

__\<end\>__: this node is the final dense layer, which outputs objective number of features. This quantity is found in first position while second is empty.

In [2]:
from ConvGrammarV2 import ImageGrammar

imgram = ImageGrammar(input_dim=(64,64),
                       channels=3,
                       output_dim=1,
                       n_layers=15,
                       min_spatial_layers=10,
                       hidden_in=128,
                       hidden_out=64,
                       shrinkage_objective=(4,4))

net = imgram.produceNetwork()
for layer in net:
    print(layer)

('<start>', 3, (64, 64))
('conv2d', 128, (7, 7))
('conv2d', 122, (7, 7))
('conv2d', 115, (7, 7))
('conv2d', 109, (7, 7))
('conv2d', 102, (7, 7))
('conv2d', 96, (7, 7))
('conv2d', 90, (7, 7))
('conv2d', 83, (7, 7))
('conv2d', 77, (5, 5))
('conv2d', 70, (5, 5))
('conv2d', 64, (5, 5))
('flatten', 1024, None)
('linear', 768, None)
('linear', 512, None)
('dropout', 0.09147014725415402, None)
('<end>', 1, None)


### TRANSLATE THE MODEL INTO A TORCH WORKING MODEL

In order to translate previous network to a working pytorch architecture, we need to pass it to TranslatedNetwork, which only needs the specification of the last activation and the type of all inner activations.

In [3]:
from ConvTranslator import TranslatedNetwork
from torch import nn

model = TranslatedNetwork(network_tree=net,
                          default_activation=nn.ReLU(),
                          default_final_activation=nn.Sigmoid())

print(model.model)

Sequential(
  (0): Conv2d(3, 128, kernel_size=(7, 7), stride=(1, 1))
  (1): ReLU()
  (2): Conv2d(128, 122, kernel_size=(7, 7), stride=(1, 1))
  (3): ReLU()
  (4): Conv2d(122, 115, kernel_size=(7, 7), stride=(1, 1))
  (5): ReLU()
  (6): Conv2d(115, 109, kernel_size=(7, 7), stride=(1, 1))
  (7): ReLU()
  (8): Conv2d(109, 102, kernel_size=(7, 7), stride=(1, 1))
  (9): ReLU()
  (10): Conv2d(102, 96, kernel_size=(7, 7), stride=(1, 1))
  (11): ReLU()
  (12): Conv2d(96, 90, kernel_size=(7, 7), stride=(1, 1))
  (13): ReLU()
  (14): Conv2d(90, 83, kernel_size=(7, 7), stride=(1, 1))
  (15): ReLU()
  (16): Conv2d(83, 77, kernel_size=(5, 5), stride=(1, 1))
  (17): ReLU()
  (18): Conv2d(77, 70, kernel_size=(5, 5), stride=(1, 1))
  (19): ReLU()
  (20): Conv2d(70, 64, kernel_size=(5, 5), stride=(1, 1))
  (21): ReLU()
  (22): Flatten(start_dim=1, end_dim=-1)
  (23): Linear(in_features=1024, out_features=768, bias=True)
  (24): ReLU()
  (25): Linear(in_features=768, out_features=512, bias=True)
  (26):

## WORKING EXAMPLE - SYNTHETIC DATA, 50 RANDOM NETWORKS EVALUATION

In [4]:
import torch

n_samples_per_group = 100
train_size = 70

mu_group_1 = 1 # all gaussians have mean 1
sigma_group_1 = 4 # independent gaussians with variance 4
group_1_data = torch.randn((n_samples_per_group, 3, 32, 32)) * sigma_group_1 + mu_group_1
train_1_data, train_1_targets = group_1_data[:train_size], torch.ones((train_size, 1))
test_1_data, test_1_targets = group_1_data[train_size:], torch.ones((n_samples_per_group - train_size, 1))

mu_group_2 = -1 # all gaussians have mean -1
sigma_group_2 = 2 # independent gaussians with variance 2
group_2_data = torch.randn((n_samples_per_group, 3, 32, 32)) * sigma_group_2 + mu_group_2
train_2_data, train_2_targets = group_2_data[:train_size], torch.zeros((train_size, 1))
test_2_data, test_2_targets = group_2_data[train_size:], torch.zeros((n_samples_per_group - train_size, 1))


In [5]:
train_data = torch.cat([train_1_data, train_2_data], dim=0)
train_targets = torch.cat([train_1_targets, train_2_targets], dim=0)
shuffle_train_index = torch.randperm(n=train_size*2)
train_data = train_data[shuffle_train_index]
train_targets = train_targets[shuffle_train_index]

In [6]:
test_data = torch.cat([test_1_data, test_2_data], dim=0)
test_targets = torch.cat([test_1_targets, test_2_targets], dim=0)
shuffle_test_index = torch.randperm(n=n_samples_per_group*2 - train_size*2)
test_data = test_data[shuffle_test_index]
test_targets = test_targets[shuffle_test_index]

In [7]:
imgram = ImageGrammar(input_dim=(32,32),
                       channels=3,
                       output_dim=1,
                       n_layers=10,
                       min_spatial_layers=5,
                       hidden_in=128,
                       hidden_out=64,
                       shrinkage_objective=(4,4))

default_activation = nn.ReLU()
default_final_activation = nn.Sigmoid()

networks = []
for i in range(50):
    network_tree = imgram.produceNetwork()
    networks.append(TranslatedNetwork(network_tree=network_tree,
                                      default_activation=default_activation,
                                      default_final_activation=default_final_activation))





In [8]:
from Trainer import AutoTrainer
from torch.optim import AdamW

auto_trainer = AutoTrainer(train_data=train_data,
                           train_labels=train_targets,
                           test_data=test_data,
                           test_labels=test_targets,
                           criterion=nn.BCELoss(reduction='sum'),
                           optimizer=AdamW,
                           num_epochs=100,
                           lr=0.01,
                           batch_size=64)


In [9]:
performance_list = {'avg_train_loss' : [],
                    'avg_test_loss' : [],
                   'test_accuracy' : []}

# RUNNING THIS CODE ON CPU IS NOT EFFICIENT
"""
for network in networks:
    avg_test_loss, avg_train_loss, test_accuracy = auto_trainer.train(network)
    performance_list['avg_train_loss'].append(avg_train_loss)
    performance_list['avg_test_loss'].append(avg_test_loss)
    performance_list['test_accuracy'].append(test_accuracy)
    """

"\nfor network in networks:\n    avg_test_loss, avg_train_loss, test_accuracy = auto_trainer.train(network)\n    performance_list['avg_train_loss'].append(avg_train_loss)\n    performance_list['avg_test_loss'].append(avg_test_loss)\n    performance_list['test_accuracy'].append(test_accuracy)\n    "