# Implementation of Easy Ensemble
```
Tatsuhito Hasegawa, Kazuma Kondo,
"Easy Ensemble: Simple Deep Ensemble Learning for Sensor-Based Human Activity Recognition",
arXiv:2203.04153,
https://arxiv.org/abs/2203.04153
```

## Rule
Most CNN modules can be EE style by modifying each of layer as follows:

1. Set the hyperparameter of groups as $N$ of the convolution layer to change to group convolution (same for the point-wise convolution layer).
2. Change normalization layer to group normalization  by setting the group parameter to $N$.
3. Because the fully-connected layer is equivalent to a one-dimensional convolution layer with a kernel size of 1, change the fully-connected layer to reshape and the group convolution layer.
4. Multiply output $\boldsymbol{z}$ of E by $\frac{1}{N}$.

Notably, the activation function, pooling layer, depth-wise convolution layer, and shortcut connection are processed independently in the channel direction; therefore, they do not need to be changed. Using these simple procedures, most CNN architectures can be in the EE style.

## Pytorch example
1. Set the hyperparameter of groups as $N$ of the convolution layer to change to group convolution (same for the point-wise convolution layer).

```python
    from torch import nn
    N = 4  # the number of ensembles
    general_conv = nn.Conv1d(3, 64, kernel_size=3, padding=1, bias=False, groups=1)  # old style
    ee_conv = nn.Conv1d(3 * N, 64 * N, kernel_size=3, padding=1, bias=False, groups=N)  # EE style
```

2. Change normalization layer to group normalization  by setting the group parameter to $N$.

```python
    general_norm = nn.GroupNorm(1, 64, affine=False)  # old style (Layer normalization)
    ee_norm = nn.GroupNorm(N, 64 * N, affine=False)  # EE style (Group normalization)
```

3. Because the fully-connected layer is equivalent to a one-dimensional convolution layer with a kernel size of 1, change the fully-connected layer to reshape and the group convolution layer.
```python
    general_linear = nn.Linear(64, 128, bias=False)  # old style (Linear: 64 to 128)
    ee_norm = nn.Sequential(  # EE style (Linear: (64 to 128) * N)
                View(-1, 64 * N, 1),   # the original reshape module (see. https://discuss.pytorch.org/t/how-to-build-a-view-layer-in-pytorch-for-sequential-models/53958)
                nn.Conv1d(64 * N, 128 * N, kernel_size=1, bias=False, groups=N),
                View(-1, 128 * N)      # this layer can also be substituted with Flatten.
              )
```

## Example implementation of VGG in Pytorch
VGG is a CNN model composed of only simple convolution modules.
We describe the example implementation through translation of common VGG to EE-style VGG.
VGG is suitable to describe how to implementate the EE because VGG does not include complex techniques.

```
K. Simonyan and A. Zisserman,
“Very deep convolutional networks for large-scale image recognition,”
in Proc. of the International Conferenceon Learning Representations, May 2015, pp. 1–14.
https://arxiv.org/abs/1409.1556
```


In [5]:
from torch import nn
import torch.nn.functional as F

class VGG(nn.Module):
    def __init__(self, in_channels=3, num_classes=6, nb_fils=64, reps=[1,1,2,2,2], groups=1):
        super(VGG, self).__init__()
        self.groups = groups
        self.encoder = nn.Sequential(
            cbrp1d(in_channels, nb_fils, reps[0], groups=groups),
            cbrp1d(nb_fils, nb_fils * 2, reps[1], groups=groups),
            cbrp1d(nb_fils * 2, nb_fils * 4, reps[2], groups=groups),
            cbrp1d(nb_fils * 4, nb_fils * 8, reps[3], groups=groups),
            cbrp1d(nb_fils * 8, nb_fils * 8, reps[4], groups=groups)
            )
        self.output_channels = nb_fils * 8
        self.classifier = nn.Sequential(
            nn.Linear(self.output_channels, num_classes, bias=False)
        )

    def forward(self, x):
        x = self.encoder(x)
        x = F.avg_pool1d(x, kernel_size=x.size()[2:])  # GAP
        x = x.view(x.size(0), -1)  # Flatten
        x = self.classifier(x)
        return x

# n-times (convolution, layernorm, relu) and maxpooling
def cbrp1d(in_fils, out_fils, rep, groups=1):
    layers = []
    i_f, o_f = in_fils, out_fils
    for _ in range(rep):
        layers.append(nn.Conv1d(i_f, o_f, kernel_size=3, padding=1, bias=False, groups=groups))
        layers.append(nn.GroupNorm(groups, o_f, affine=False))
        layers.append(nn.ReLU(inplace=True))
        i_f = o_f
    layers.append(nn.MaxPool1d(kernel_size=2, stride=2))
    return nn.Sequential(*layers)

# Original VGG is created (groups=1)
vgg = VGG(3, 6, 64, [1,1,2,2,2], groups=1)

# EE VGG is created (groups=N)
N = 4
EE_vgg = VGG(3 * N, 6, 64 * N, [1,1,2,2,2], groups=N)

The above implementation of VGG is based on the original VGG architecture.
From original modle, we modified following two points:

1. insert layer-normalizaiton after each convolution.
2. replace the flatten layer to global average pooling and flatten layers.

these two points are not related to EE.

To translate original VGG to EE style, we only modified the cbrp1d module by adding the groups hyperparameter in Conv1d and GroupNorm.
If these two groups parameters are set to 1, this module works as original VGG.



