# Report Final assignment Hydraulic cross sections
## 3EBX0 - Machine learning in science
## Group 13 (also on kaggle)
#### Quincy Salden (1749900), Julian Gootzen (1676512), Sebastiaan Broers (1265091)
#### June 23rd 2024

## Table of Contents
1. Introduction
2. C - Physical dimensions analysis
3. D - Symmetry group reflection
4. E - Neural network desing and training
5. F - Network fitting
6. Conclusion

### C - Physical dimensions analysis
The input parameters in this problem are the areas of each pixel in $[m^2]$ with an additional information component that the pixel can either be turned open (value=1) or closed (value=0). In total these pixels form a 40x40 square 2D grid/matrix. 

The output is the Hydraulic Throughput $S$ which has unit $[m^4]$ and is given by $S = μLQ/ΔP$
where $μ$ is the dynamic viscosity of the fluid in $[Pa * s]$, 
$L$ the channel length in $[m]$, $ΔP$ the pressure drop with unit $[Pa]$ in the channel and $Q$ the volumetric flow rate in $[m^3/s]$.

To rescale input parameters and achieve dimensional homogeneity the pixels can be squared or multiplied with one or multiple neighbouring pixels such that the unit is $m^4$.

The value of $S$ can vary multiple orders of magnitude. The limiting cases are when all pixels are open or when all are closed. When all are closed, trivially, the value of $S$ is 0. The upper bound can be found by opening all pixels, yielding a value of order $10^4$ - $10^5$ $m^4$ which leads to a widely taken output data range of  $(0, 10^5)$ $m^4$.

In this problem multiple symmetries can be identified. Any rotation of 90 degrees leading to 4 equivalent states will not change the value of $S$. Furthermore, flipping the system along the x or y axis will also not change the outcome. Additionally the fact that $S$ is independent of the fluid properties, channel length and pressure drop means that these parameters can fluctuate freely without altering $S$.

Since the input data is given as a 40x40 matrix this will imply that flip operations or rotation operations can be used on any of these matrices as desired. 


​


### D - Symmetry group reflection

The symmetry group in this problem is characterized by $D_4$, yielding the combination of flip and rotation operations in a 2D setting. Because there are 4 rotary orientations and a mirror image for each of them this group yields 8 components.

#### Data augmentation 
For data augmentation the rotational symmetry can be used to create three other unique matrices per image, rotated one, two or three times by 90 degrees. This combined with the flip operation leads to 8 unique matrices, excluding all duplicates.


#### Disambiguation
To disambiguate the data we will first split up the 40x40 grids in eight octants, where there's someoverlap in the diagonals. Then we'll find the octant where there are the most amount of holes, i.e. where the sum of indexes is maximal. Once we've found the maximal octant, we'll use the symmetries, i.e. rotation and reflection through the y = -x axis, to ensure that the maximal octant is always in the upper left spot. (the bottom of the two in the upper left quadrant) Through this disambiguity the model will be able to better 'understand' the problem at hand.

#### Hardwiring 
Hardwiring the symmetries exploits the property that a CNN is translationally invariant due to convolutional layers with the Conv2D function in combination with max pooling. Parameters like stride and filter size within the conv2D function can be optimized to get maximal balance between running speed and results. Rotation and flip operations can be incorporated in the hardwiring such that all 8 orientations are ran through the network. Dropout operations can be added in order to randomly delete parts of the data to train the CNN better in new situations that it has not explored before. Furthermore, a symmetric net could be applied to all 8 orientations averaged and normalized by taking the sum of all orientations and dividing by 8. This is known as group averaging.

### E - Neural network design and training
First, a naïve neural network was trained which did not lead to optimal results, but gave some good insights. 

Then, all symmetries were hardwired directly into the network which lead to a RMSPE value of 0.04699, still insufficient. At this point, the CNN consisted of 6 Conv2D (C) layers and 3 max pooling (M) layers alternating in sequence C-C-M-C-C-M-C-C-M. The conv2D layers are using a kernel size of 5 and a stride of 1. The padding is half of the kernel size. Furthermore the 8 orientations are all included in the hardwired network through flip and rotation functions. Lastly, flattening and group averaging is performed.

In [None]:
class ApplySymmetry(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        x_lr = np.fliplr(x)
        x_ud = np.flipud(x)
        x_lr_ud = np.fliplr(np.flipud(x))
        x_90 = np.rot90(x,1)
        x_180 = np.rot90(x_90,1)
        x_270 = np.rot90(x_180,1)
        x_diag1 = np.rot90(np.fliplr(x),1)
        x_diag2 = np.rot90(np.fliplr(x),2)
        
        return torch.cat([x, x_lr, x_ud, x_lr_ud, x_diag1, x_diag2], dim=1).view(-1, x.size()[1])


class GroupAverage(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return x.view(-1, 2).mean(dim=1).unsqueeze(dim=-1)

class SquareRoot(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return torch.sqrt(x)

class HydraullicModel(nn.Module):
    
    width = 256
    
    def __init__(self, width=width):
        super(HydraullicModel, self).__init__()
        ksize = 5
        # Define the convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=ksize, stride=1, padding=ksize//2, bias=False)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=ksize, stride=1, padding=ksize//2, bias=False)
        self.conv3 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=ksize, stride=1, padding=ksize//2, bias=False)
        self.conv4 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=ksize, stride=1, padding=ksize//2, bias=False)
        self.conv5 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=ksize, stride=1, padding=ksize//2, bias=False)
        self.conv6 = nn.Conv2d(in_channels=16, out_channels=16, kernel_size=ksize, stride=1, padding=ksize//2, bias=False)
        
        # Pooling layer
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Flatten layer
        self.flatten = nn.Flatten()
        
        # Linear Layer
        self.fc1 = nn.Linear(400,1,bias=False)
        

      
    def forward(self, x, width=width):
#        ApplySymmetry()
        x = F.relu(self.conv1(x))  
        x = F.relu(self.conv2(x)) 
        x = self.maxpool(x)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = self.maxpool(x)
        x = F.relu(self.conv5(x))   
        x = F.relu(self.conv6(x))
        x = self.maxpool(x)
       
        x = self.flatten(x)
        x = self.fc1(x)
#        GroupAverage()

        # SquareRoot()
        return x

It was opted that hardwiring is inefficient and that disambiguity may likely lead to better and faster results. This was then implemented into the code, with the method described in section D.

In [None]:
in_train_data, out_train_data, in_test_data = load_data(
    'pub_input.npy', 'pub_out.npy', 'pri_in.npy'
)

def disambiguate(x):
    """ 
    rotates / reflects s.t. the lower top left octant has the most 1s.
    """
    sums = []
    for i in range(4):

        array = np.rot90(x,k = i)

        top_left_quadrant = array[:half_size, :half_size]
        bot_triangle_sum = sum([top_left_quadrant[i][j] for i in range(20) for j in range(20) if i<=j])
        top_triangle_sum = sum([top_left_quadrant[i][j] for i in range(20) for j in range(20) if i>=j])
        sums.append(bot_triangle_sum)
        sums.append(top_triangle_sum)

    max_sums_arg = np.argmax(sums)

    if max_sums_arg == 0:
        pass 
    if max_sums_arg == 1:
        x = np.transpose(x)
    if max_sums_arg == 2:
        x = np.rot90(x , 1)
    if max_sums_arg == 3:
        x = np.rot90(x , 1)
        x = np.transpose(x)
    if max_sums_arg == 4:
        x = np.rot90(x , 2)
    if max_sums_arg == 5:
        x = np.rot90(x , 2)
        x = np.transpose(x)
    if max_sums_arg == 6:
        x = np.rot90(x , 3)
    if max_sums_arg == 7:
        x = np.rot90(x , 3)
        x = np.transpose(x)

    return x


train_data_list = []
test_data_list = []
for x in in_train_data:
    new_x = disambiguate(x)
    train_data_list.append(new_x)
for x in in_test_data:
    new_x = disambiguate(x)
    test_data_list.append(new_x)  
in_train_data = np.array(train_data_list)
in_test_data = np.array(test_data_list)

### F - Network fitting
In order to protect the network from overtraining the latest validation loss average is compared to an earlier validation loss. For exapmle, in the following training run the average of validation losses from the 3 latest epochs is compared to the average validation loss of 17 to 20 epochs ago. Or in code:

In [None]:
if epoch > 20 and mean_val_loss > mean(loss_dict['val'][-20:-17]):
    break

The condition that the number of epochs should be larger then 20 is in place to prevent errors in the second condition. This condition alone still leaves a possibility for overtraining, since no comparison is made between the training and validation loss. Therefore another condition is added:

In [3]:
if mean_val_loss * 1.05 > mean_train_loss:
    break

SyntaxError: 'break' outside loop (4048069654.py, line 2)

In order to train our network, without putting too much strain on our university laptops, the training process was adapted to be able to run passively on kaggle.

In [None]:
import time
from datetime import datetime

def train_model_repeated(train_loader,
                val_loader, 
                model,
                nepochs=1,
                lr=1e-4,
                loss_fn=rmspe,
                batch_size = 32,
                plot_result=True):
    
    start_time = time.time()
    duration = 60*60*8
    data_dict = {"nepochs": [] , "val_loss" : [] , "train_loss" : [] , "best_loss" : []}
    results = []
    while time.time() - start_time < duration:
        
        model = HydraullicModel().to(device)

        optimizer = torch.optim.Adam(model.parameters(), lr=lr)

        loss_dict = {"train": [], "val": []}

        # Set up the main tqdm loop for epochs
        epochs_tqdm = tqdm(range(nepochs), desc='Training Model')
        

        for epoch in epochs_tqdm:
            model.train()
            epoch_loss_sum = 0

            # Train loop without tqdm for individual batches
            for x_batch, y_batch in train_loader:
                y_batch = y_batch.unsqueeze(1)
                y_pred = model(x_batch)
                loss = loss_fn(y_pred, y_batch)
                epoch_loss_sum += loss.item()
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            avg_train_loss = epoch_loss_sum / len(train_loader)
            loss_dict["train"].append(avg_train_loss)

            # Validation loop without tqdm for individual batches
            # model.eval()
            val_loss_sum = 0
            with torch.no_grad():
                for x_val, y_val in val_loader:
                    y_val = y_val.unsqueeze(1)
                    y_pred = model(x_val)
                    # print(y_val)
                    # print(y_pred)
                    loss = loss_fn(y_pred, y_val)
                    val_loss_sum += loss.item()

            avg_val_loss = val_loss_sum / len(val_loader)
            loss_dict["val"].append(avg_val_loss)
            
            # Update tqdm bar only once per epoch with the average losses
            epochs_tqdm.set_description(f'Epoch {epoch+1}/{nepochs} - Train Loss: {avg_train_loss:.4f} - Val Loss: {avg_val_loss:.4f}')

            if True:
                mean_val_loss = mean(loss_dict['val'][-15:])
                mean_train_loss = mean(loss_dict['train'][-15:])
            
                if epoch > 30 and mean_val_loss > mean_train_loss*1.05:
                    data_dict["nepochs"].append(epoch)
                    break
            if epoch+1 == max_epochs:
                data_dict["nepochs"].append(epoch+1)
                break
        data_dict['val_loss'].append(mean(loss_dict['val'][-5:]))
        data_dict['train_loss'].append(mean(loss_dict['train'][-5:]))
            
        if plot_result:
            plot_losses(loss_dict)
        
        results.append((model, loss_dict))

    
    return results , data_dict

mean_loss_list = []
for index in range(len(results)):
    loss_dictionary = results[index][1]
    mean_loss_list.append(mean([mean(loss_dictionary['train'][-5:]) , mean(loss_dictionary['val'][-5:])]))
    best_model = results[np.argmin(mean_loss_list)][0]    
data_dict["best_loss"].append(min(mean_loss_list))

    
current_time = datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
torch.save(best_model , f"model_{current_time}")
df = pd.DataFrame(list(data_dict.items()), columns=['Key', 'Value'])
df.to_csv(f'/kaggle/working/data_{current_time}')

## Conclusion
After trying different versions of neural networks the final best RMSPE value which was found is 0.03595. Possible future improvements are to make a deeper network or one that makes better or smarter use of the applied symmetries. 