## Dataset

We used a subset of WaterDrop dataset from Deepmind. The videos only covers the specific case of a water droplet in vacuum, but that is fine with us, as that is exactly what we wanted to model!

TODO: download the dataset and extract it in the data folder

## Preprocessing

1. Apply noise to the training to mitigate error accumulation over long rollouts. We use a simple approach to make the model more robust to noisy inputs: at training we corrupt the input velocities of the model with random-walk noise N (0, $\sigma_v$ = 0.0003) (adjusting input positions), so the training distribution is closer to the distribution generated during rollouts. 
2. Normalize all input and target vectors elementwise to zero mean and unit variance, using statistics computed online during training. Preliminary experiments showed that normalization led to faster training, though converged performance was not noticeably improved.

## GNN model

### MLP

MLP is used in a lot of different places throughout the architecture, most notably the encoder and the decoder are both MLPs. We define it as a class to make it easier to use.

All MLPs have two hidden layers (with ReLU activations), followed by a nonactivated output layer, each layer with size of 128. All MLPs (except the output decoder) are followed by a LayerNorm layer.

In [None]:
import torch
from torch import nn
import torch.nn.functional as F

class MLP(nn.Module):
    
    def __init__(self, input_dim, output_dim, hidden_dim=128, layer_norm=True):
        super(MLP, self).__init__()
        self.layer1 = nn.Linear(input_dim, hidden_dim)
        self.layer2 = nn.Linear(hidden_dim, hidden_dim)
        self.layer3 = nn.Linear(hidden_dim, output_dim)
        self.layer_norm = layer_norm
    
    def reset_parameters(self):
        # The rationale behind setting the standard deviation of the normal distribution to 1/sqrt(layer.in_features)
        # is to normalize the variance of the layer's inputs and outputs. This helps to prevent the outputs
        # from exploding or vanishing during training. The 1/sqrt(layer.in_features) factor is based on the recommendation
        # in the paper "Understanding the difficulty of training deep feedforward neural networks" by Glorot and Bengio (2010).
        self.layer1.weight.data.normal_(0, 1 / torch.sqrt(self.layer1.in_features))
        # Setting the bias to 0 allows the network to learn the appropriate bias values during training.
        self.layer1.bias.data.fill_(0)
        # The same reasoning applies to the other layers.
        self.layer2.weight.data.normal_(0, 1 / torch.sqrt(self.layer2.in_features))
        self.layer2.bias.data.fill_(0)
        self.layer3.weight.data.normal_(0, 1 / torch.sqrt(self.layer3.in_features))
        self.layer3.bias.data.fill_(0)
        
    
    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.layer2(x)
        if self.layer_norm:
            x = nn.LayerNorm(x)
        return x