# Introduction

Underwater image enhancement is a challenging task in the field of computer vision due to issues such as light absorption, scattering, and color distortion caused by the water medium. These distortions significantly degrade image quality and affect the performance of downstream tasks like object detection, classification, and navigation in underwater environments. In this project, we address these challenges by employing a deep learning-based enhancement technique using the WaterNet model. WaterNet is specifically designed to restore underwater images by correcting color shifts, enhancing contrast, and recovering structural details. We trained and evaluated the model using the UIEB dataset, which contains a diverse set of real-world underwater images and corresponding ground truths. The goal is to improve visual clarity and make underwater images more suitable for further computer vision applications.

# Step 1: How to access dataset


The dataset is publicly available on Google Drive at the following address. It can be downloaded using the command below:

In [1]:
!gdown --fuzzy "https://drive.google.com/uc?id=1adx6d7BNMc7KqHA5hLaInRkiZiYDGoZO"
!unzip -q UIEB_RAW_REF_splitted-zip.zip -d UIEB_Dataset

Downloading...
From (original): https://drive.google.com/uc?id=1adx6d7BNMc7KqHA5hLaInRkiZiYDGoZO
From (redirected): https://drive.google.com/uc?id=1adx6d7BNMc7KqHA5hLaInRkiZiYDGoZO&confirm=t&uuid=6eb6c5eb-7b30-4445-9a70-a468264d55d6
To: /content/UIEB_RAW_REF_splitted-zip.zip
100% 1.49G/1.49G [00:24<00:00, 59.8MB/s]


# Step 2: Import Necessary Libraries

All essential libraries and modules required for constructing, training, and evaluating the deep learning model using PyTorch are imported.

In [2]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import matplotlib.pyplot as plt
from tqdm import tqdm
import numpy as np
from torchvision.datasets import ImageFolder
from torchvision.utils import save_image

# Step 3: Dataset Preparation

The next section defines a custom dataset class, UIEBDataset, which is designed to load and preprocess paired input-target images from the UIEB dataset. Paths to the input and target image directories, along with an optional transformation function, are provided to the class. Methods for loading images, applying transformations (e.g., resizing, normalization), and returning processed images as tensors are included. The dataset is organized into separate directories for training and validation. DataLoader objects are created for both datasets to facilitate efficient batching, shuffling, and data loading during model training and evaluation.

In [3]:
class UIEBDataset(Dataset):
    def __init__(self, input_dir, target_dir, transform=None):
        self.input_dir = input_dir
        self.target_dir = target_dir
        self.input_images = sorted(os.listdir(input_dir))
        self.target_images = sorted(os.listdir(target_dir))
        self.transform = transform

    def __len__(self):
        return len(self.input_images)

    def __getitem__(self, idx):
        input_path = os.path.join(self.input_dir, self.input_images[idx])
        target_path = os.path.join(self.target_dir, self.target_images[idx])

        input_img = Image.open(input_path).convert('RGB')
        target_img = Image.open(target_path).convert('RGB')

        if self.transform:
            input_img = self.transform(input_img)
            target_img = self.transform(target_img)

        return input_img, target_img


The directory paths for the training and validation datasets are established in this cell. The locations of the main folders within the UIEB dataset, stored on Google Drive, are specified for both the training and validation splits. Each dataset split is organized into distinct subdirectories containing the input images (original underwater images) and the target images (enhanced or reference versions). These paths facilitate the subsequent loading and processing of data during model training and validation.

In [4]:
train_dir = "/content/UIEB_Dataset/UIEB_RAW_REF_splitted/train"
train_input_dir = train_dir + "/" + "input"
train_target_dir = train_dir + "/" + "target"

val_dir = "/content/UIEB_Dataset/UIEB_RAW_REF_splitted/val"
val_input_dir = val_dir + "/" + "input"
val_target_dir = val_dir + "/" + "target"


A series of image transformations is applied in this cell to prepare the data for model training. The transformations include resizing images to 256x256 pixels, converting them into tensor format, and normalizing their pixel values to a range of [-1, 1]. These preprocessing steps help standardize the input data and improve model performance. Following this, instances of the custom dataset class UIEBDataset are created for both the training and validation sets, incorporating the specified transformations to ensure consistent data handling during the training and evaluation phases.

In [5]:
transform = transforms.Compose([
      transforms.Resize((256, 256)),
      transforms.ToTensor(),
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # for normalize to [-1, 1]
    ])

train_dataset = UIEBDataset(train_input_dir, train_target_dir, transform=transform)
val_dataset = UIEBDataset(val_input_dir, val_target_dir, transform=transform)


DataLoader objects are instantiated in this cell to facilitate efficient data handling during training and validation. The training DataLoader is configured with a batch size of 8 and shuffling enabled to ensure that the data is randomly sampled in each epoch, promoting better generalization. The validation DataLoader uses the same batch size but disables shuffling to maintain a consistent order for evaluation. These DataLoaders enable streamlined batching and loading of the datasets during the model’s training and validation processes.

In [6]:
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=8, shuffle=False)


# Step 4: Enhanced WaterNet Model

The following code defines an enhanced deep learning model architecture designed for image restoration tasks, such as underwater image enhancement. The model incorporates advanced components to improve feature representation and learning capacity.

Specifically, it includes a Squeeze-and-Excitation (SE) attention block that adaptively recalibrates channel-wise feature responses, enhancing the model’s ability to focus on important image details. Residual blocks are used to facilitate efficient gradient flow and deeper network design by adding shortcut connections, which help the model learn more complex mappings without degradation.

The overall architecture, named WaterNetPlus, consists of an encoder to extract features, a middle processing stage to refine them, and a decoder to reconstruct the enhanced image output. The design leverages convolutional layers, nonlinear activations, batch normalization, and attention mechanisms to effectively restore degraded images. This modular and hierarchical structure allows the model to capture both low-level and high-level image features for improved restoration performance.
More discussion will be provided:








SEBlock (Squeeze-and-Excitation Block):
This module introduces a channel-wise attention mechanism that adaptively recalibrates feature responses. It achieves this by first aggregating spatial information through global average pooling, then passing the result through a small bottleneck network with nonlinear activations, and finally applying a sigmoid function to generate per-channel weights. These weights are used to emphasize the most informative features in each channel, improving the model’s focus on relevant details.

ResidualBlock with SE Attention:
This block consists of two convolutional layers, each followed by batch normalization and ReLU activation functions. After these convolutions, the SEBlock is applied to enhance the representational capacity through attention. Additionally, a residual connection adds the block’s input to its output, which helps preserve the original information and facilitates more effective training by mitigating issues such as gradient vanishing.

WaterNetPlus Model Architecture:
The overall architecture is divided into three main components:

Encoder: Extracts hierarchical features from the input image via an initial convolutional layer followed by several ResidualBlocks equipped with SE attention.

Middle: Processes the encoded features further with an additional convolution and ResidualBlock to refine the representations.

Decoder: Reconstructs the output image from the refined features using a convolutional layer and a Tanh activation function, which normalizes the pixel values to a range between -1 and 1.

In [7]:
# Enhanced version(WaterNet+):

# SE Attention Block
class SEBlock(nn.Module):
    def __init__(self, channels, reduction=16):
        super(SEBlock, self).__init__()
        self.fc = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(channels, channels // reduction, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(channels // reduction, channels, kernel_size=1),
            nn.Sigmoid()
        )

    def forward(self, x):
        w = self.fc(x)
        return x * w

# Residual Block with SE Attention
class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(channels, channels, kernel_size=3, padding=1)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(channels, channels, kernel_size=3, padding=1)
        self.se = SEBlock(channels)  # SE Attention Block

        # Batch Normalization
        self.bn1 = nn.BatchNorm2d(channels)
        self.bn2 = nn.BatchNorm2d(channels)

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)  # SE attention after second conv
        out += identity
        out = self.relu(out)
        return out

# WaterNet+ Model with improved architecture
class WaterNetPlus(nn.Module):
    def __init__(self):
        super(WaterNetPlus, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            ResidualBlock(64),
            ResidualBlock(64),
            ResidualBlock(64)  # Additional Residual Block
        )
        self.middle = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            ResidualBlock(64)
        )
        self.decoder = nn.Sequential(
            nn.Conv2d(64, 3, kernel_size=3, padding=1),
            nn.Tanh()  # [-1, 1] for final output
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.middle(x)
        x = self.decoder(x)
        return x


This code determines whether a GPU (CUDA) is available for computation and sets the device accordingly. If a compatible GPU is present, it selects "cuda" to leverage faster processing; otherwise, it defaults to the CPU. The chosen device is then printed to inform the user which hardware will be used for running the model and computations.









In [8]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


Using device: cuda


In [9]:
model_enhanced = WaterNetPlus().to(device)

In [10]:
model = model_enhanced

# Step 5: Image Quality Evaluation Metrics

The following code focuses on evaluating image restoration quality by computing two widely used metrics: **PSNR** (Peak Signal-to-Noise Ratio) and **SSIM** (Structural Similarity Index Measure). These metrics quantify how closely the restored (predicted) images resemble their original (target) counterparts, serving as objective measures of image fidelity.

Conceptual Overview:
PSNR measures the ratio between the maximum possible power of a signal (image) and the power of corrupting noise that affects the fidelity of its representation. Higher PSNR values generally indicate better quality, meaning the restored image is closer to the original.

SSIM assesses image similarity based on luminance, contrast, and structural information. It is designed to be more perceptually relevant than PSNR by considering how humans perceive changes in image quality. SSIM values range from -1 to 1, where 1 indicates perfect similarity.

**Code Explanation:**
The two functions, **calculate_psnr** and **calculate_ssim**, take as inputs the predicted images and their corresponding ground truth targets. Both tensors are first converted from PyTorch tensors to NumPy arrays after detaching from the computation graph and moving to the CPU.

Because these arrays have channel-first format (C, H, W), they are transposed to the common image format (H, W, C) to be compatible with the metric functions from the skimage library.

For each image pair, the respective metric is calculated and accumulated. Finally, the average value over the entire batch is returned as the evaluation score.

This approach enables batch-wise assessment of restoration performance during or after training deep learning models for tasks like denoising, enhancement, or super-resolution.

In [11]:
from skimage.metrics import peak_signal_noise_ratio as psnr_metric
from skimage.metrics import structural_similarity as ssim_metric

def calculate_psnr(pred, target):
    pred = pred.detach().cpu().numpy()
    target = target.detach().cpu().numpy()
    psnr_val = 0
    for p, t in zip(pred, target):
        p = np.transpose(p, (1, 2, 0))  # (H, W, C)
        t = np.transpose(t, (1, 2, 0))
        psnr_val += psnr_metric(t, p, data_range=2.0)
    return psnr_val / pred.shape[0] # avg PNSR

def calculate_ssim(pred, target):
    pred = pred.detach().cpu().numpy()
    target = target.detach().cpu().numpy()
    ssim_val = 0
    for p, t in zip(pred, target):
        p = np.transpose(p, (1, 2, 0))  # (H, W, C)
        t = np.transpose(t, (1, 2, 0))
        ssim_val += ssim_metric(t, p, channel_axis=2, data_range=2.0)
    return ssim_val / pred.shape[0]  # avg SSIM


# Step 6: Model Training and Validation Procedure

The following function(**train_model**) manages the complete training and validation process of the deep learning model. It iterates through a specified number of epochs, during which the model learns to map input images to their corresponding target images by minimizing a loss function.

In each epoch, the model is set to training mode to enable weight updates. Batches of training data are processed sequentially: the inputs are passed through the model, the loss between predictions and targets is computed, and gradients are backpropagated to optimize the model parameters via the selected optimizer.

After training, the model switches to evaluation mode, where it processes validation data without updating weights. During validation, the loss is calculated to monitor how well the model generalizes to unseen data. Additionally, two quantitative image quality metrics **PSNR** and **SSIM** are computed to provide more detailed insight into the fidelity of the reconstructed images.

A **learning rate scheduler** is used to progressively reduce the learning rate, facilitating more stable convergence.

The function keeps track of the best validation loss achieved and saves the model’s parameters when a new minimum is observed, ensuring that the best-performing model is preserved for later use.

In [12]:
def train_model(model, train_loader, val_loader, optimizer, criterion, num_epochs, save_dir="./"):
    best_val_loss = float('inf')

    # Learning Rate Scheduler
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)  # Decrease LR every 10 epoch
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0

        for inputs, targets in tqdm(train_loader, desc=f"Epoch {epoch+1} Training", leave=False):
            inputs = inputs.to(device)
            targets = targets.to(device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        train_loss = running_loss / len(train_loader)

        model.eval()
        val_loss = 0.0
        total_psnr = 0.0
        total_ssim = 0.0

        with torch.no_grad():
            for inputs, targets in tqdm(val_loader, desc=f"Epoch {epoch+1} Validation", leave=False):
                inputs = inputs.to(device)
                targets = targets.to(device)

                outputs = model(inputs)
                loss = criterion(outputs, targets)
                val_loss += loss.item()

                total_psnr += calculate_psnr(outputs, targets)
                total_ssim += calculate_ssim(outputs, targets)

        val_loss /= len(val_loader)
        avg_psnr = total_psnr / len(val_loader)
        avg_ssim = total_ssim / len(val_loader)

        print(f"Epoch [{epoch+1}/{num_epochs}] "
              f"Train Loss: {train_loss:.4f} "
              f"Val Loss: {val_loss:.4f} "
              f"PSNR: {avg_psnr:.2f} "
              f"SSIM: {avg_ssim:.4f}")

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            model_path = os.path.join(save_dir, "best_waternet.pth")
            torch.save(model.state_dict(), model_path)
            print(f"Best model saved at epoch {epoch+1} with Val Loss: {val_loss:.4f}")

        scheduler.step()  # Decrease LR

The loss function is defined as Mean Squared Error (MSELoss), which measures the average squared difference between the predicted and target outputs. The optimization algorithm selected is Adam, initialized with a learning rate of 0.001 to update the model parameters during training. Additionally, a directory path is specified for saving the trained model checkpoints and related files.

In [13]:
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

save_dir = "/content/Waternet_Model"


The training process is set to run for 50 epochs, enabling the model to iteratively learn from the training data.

In [None]:
num_epochs = 50

train_model(model, train_loader, val_loader, optimizer, criterion, num_epochs, save_dir)



Epoch [1/50] Train Loss: 0.0865 Val Loss: 0.0642 PSNR: 18.79 SSIM: 0.5193
Best model saved at epoch 1 with Val Loss: 0.0642




Epoch [2/50] Train Loss: 0.0565 Val Loss: 0.0548 PSNR: 19.51 SSIM: 0.5748
Best model saved at epoch 2 with Val Loss: 0.0548




Epoch [3/50] Train Loss: 0.0494 Val Loss: 0.0494 PSNR: 20.10 SSIM: 0.6012
Best model saved at epoch 3 with Val Loss: 0.0494




Epoch [4/50] Train Loss: 0.0443 Val Loss: 0.0463 PSNR: 20.30 SSIM: 0.6044
Best model saved at epoch 4 with Val Loss: 0.0463




Epoch [5/50] Train Loss: 0.0422 Val Loss: 0.0398 PSNR: 20.99 SSIM: 0.6437
Best model saved at epoch 5 with Val Loss: 0.0398




Epoch [6/50] Train Loss: 0.0403 Val Loss: 0.0426 PSNR: 20.85 SSIM: 0.6357




Epoch [7/50] Train Loss: 0.0394 Val Loss: 0.0384 PSNR: 21.25 SSIM: 0.6583
Best model saved at epoch 7 with Val Loss: 0.0384




Epoch [8/50] Train Loss: 0.0373 Val Loss: 0.0358 PSNR: 21.51 SSIM: 0.6599
Best model saved at epoch 8 with Val Loss: 0.0358




Epoch [9/50] Train Loss: 0.0355 Val Loss: 0.0386 PSNR: 21.23 SSIM: 0.6494




Epoch [10/50] Train Loss: 0.0349 Val Loss: 0.0333 PSNR: 21.97 SSIM: 0.6809
Best model saved at epoch 10 with Val Loss: 0.0333




Epoch [11/50] Train Loss: 0.0305 Val Loss: 0.0318 PSNR: 22.23 SSIM: 0.6893
Best model saved at epoch 11 with Val Loss: 0.0318




Epoch [12/50] Train Loss: 0.0296 Val Loss: 0.0312 PSNR: 22.37 SSIM: 0.6924
Best model saved at epoch 12 with Val Loss: 0.0312




Epoch [13/50] Train Loss: 0.0290 Val Loss: 0.0317 PSNR: 22.26 SSIM: 0.6877




Epoch [14/50] Train Loss: 0.0286 Val Loss: 0.0311 PSNR: 22.36 SSIM: 0.6927
Best model saved at epoch 14 with Val Loss: 0.0311




Epoch [15/50] Train Loss: 0.0287 Val Loss: 0.0305 PSNR: 22.44 SSIM: 0.6955
Best model saved at epoch 15 with Val Loss: 0.0305




Epoch [16/50] Train Loss: 0.0287 Val Loss: 0.0305 PSNR: 22.49 SSIM: 0.6962




Epoch [17/50] Train Loss: 0.0282 Val Loss: 0.0305 PSNR: 22.51 SSIM: 0.6977
Best model saved at epoch 17 with Val Loss: 0.0305




Epoch [18/50] Train Loss: 0.0281 Val Loss: 0.0307 PSNR: 22.44 SSIM: 0.6957




Epoch [19/50] Train Loss: 0.0283 Val Loss: 0.0302 PSNR: 22.56 SSIM: 0.6959
Best model saved at epoch 19 with Val Loss: 0.0302




Epoch [20/50] Train Loss: 0.0281 Val Loss: 0.0310 PSNR: 22.44 SSIM: 0.6944




Epoch [21/50] Train Loss: 0.0271 Val Loss: 0.0302 PSNR: 22.58 SSIM: 0.6983
Best model saved at epoch 21 with Val Loss: 0.0302




Epoch [22/50] Train Loss: 0.0265 Val Loss: 0.0300 PSNR: 22.62 SSIM: 0.6993
Best model saved at epoch 22 with Val Loss: 0.0300




Epoch [23/50] Train Loss: 0.0269 Val Loss: 0.0301 PSNR: 22.62 SSIM: 0.6985




Epoch [24/50] Train Loss: 0.0267 Val Loss: 0.0299 PSNR: 22.62 SSIM: 0.6997
Best model saved at epoch 24 with Val Loss: 0.0299




Epoch [25/50] Train Loss: 0.0266 Val Loss: 0.0299 PSNR: 22.62 SSIM: 0.6990




Epoch [26/50] Train Loss: 0.0266 Val Loss: 0.0299 PSNR: 22.63 SSIM: 0.7000




Epoch [27/50] Train Loss: 0.0263 Val Loss: 0.0298 PSNR: 22.65 SSIM: 0.7006
Best model saved at epoch 27 with Val Loss: 0.0298




Epoch [28/50] Train Loss: 0.0264 Val Loss: 0.0298 PSNR: 22.66 SSIM: 0.7001
Best model saved at epoch 28 with Val Loss: 0.0298




Epoch [29/50] Train Loss: 0.0266 Val Loss: 0.0298 PSNR: 22.65 SSIM: 0.7010
Best model saved at epoch 29 with Val Loss: 0.0298




Epoch [30/50] Train Loss: 0.0262 Val Loss: 0.0300 PSNR: 22.65 SSIM: 0.6989




Epoch [31/50] Train Loss: 0.0266 Val Loss: 0.0299 PSNR: 22.63 SSIM: 0.6997




Epoch [32/50] Train Loss: 0.0262 Val Loss: 0.0298 PSNR: 22.65 SSIM: 0.7003




Epoch [33/50] Train Loss: 0.0259 Val Loss: 0.0299 PSNR: 22.63 SSIM: 0.6991




Epoch [34/50] Train Loss: 0.0262 Val Loss: 0.0299 PSNR: 22.65 SSIM: 0.7002




Epoch [35/50] Train Loss: 0.0265 Val Loss: 0.0298 PSNR: 22.67 SSIM: 0.7006




Epoch [36/50] Train Loss: 0.0261 Val Loss: 0.0297 PSNR: 22.67 SSIM: 0.7011
Best model saved at epoch 36 with Val Loss: 0.0297




Epoch [37/50] Train Loss: 0.0263 Val Loss: 0.0297 PSNR: 22.66 SSIM: 0.7007




Epoch [38/50] Train Loss: 0.0264 Val Loss: 0.0298 PSNR: 22.67 SSIM: 0.7010




Epoch [39/50] Train Loss: 0.0263 Val Loss: 0.0299 PSNR: 22.63 SSIM: 0.7002




Epoch [40/50] Train Loss: 0.0259 Val Loss: 0.0300 PSNR: 22.65 SSIM: 0.6986




Epoch [41/50] Train Loss: 0.0262 Val Loss: 0.0299 PSNR: 22.66 SSIM: 0.6994




Epoch [42/50] Train Loss: 0.0262 Val Loss: 0.0298 PSNR: 22.66 SSIM: 0.7011




Epoch [43/50] Train Loss: 0.0258 Val Loss: 0.0297 PSNR: 22.65 SSIM: 0.7006




Epoch [44/50] Train Loss: 0.0263 Val Loss: 0.0297 PSNR: 22.65 SSIM: 0.7007




Epoch [45/50] Train Loss: 0.0263 Val Loss: 0.0297 PSNR: 22.67 SSIM: 0.7011




Epoch [46/50] Train Loss: 0.0261 Val Loss: 0.0298 PSNR: 22.63 SSIM: 0.6999




Epoch [47/50] Train Loss: 0.0259 Val Loss: 0.0298 PSNR: 22.68 SSIM: 0.7001




Epoch [48/50] Train Loss: 0.0260 Val Loss: 0.0298 PSNR: 22.69 SSIM: 0.7009




Epoch [49/50] Train Loss: 0.0262 Val Loss: 0.0297 PSNR: 22.68 SSIM: 0.7014


                                                                    

Epoch [50/50] Train Loss: 0.0259 Val Loss: 0.0298 PSNR: 22.68 SSIM: 0.7003




# Step 7: Model Evaluation on Test Dataset

The evaluation of the trained model is performed on the test dataset using the following function. The mean squared error loss is employed as the evaluation criterion. Within a no-gradient context to improve efficiency, the model processes each batch of test inputs, generating outputs that are compared to the ground truth targets to compute the loss. Additionally, quantitative metrics(PSNR and SSIM) are calculated to assess the quality of the reconstructed images. The cumulative loss and metrics are averaged over the entire test set, and the final results are reported to summarize the model’s performance on unseen data.

In [14]:
def test_model(model, test_loader):
    total_psnr = 0.0
    total_ssim = 0.0
    total_loss = 0.0
    criterion = torch.nn.MSELoss()  # Loss function for evaluation

    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs = inputs.to(device)
            targets = targets.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, targets)
            total_loss += loss.item()

            psnr = calculate_psnr(outputs, targets)
            ssim = calculate_ssim(outputs, targets)

            total_psnr += psnr
            total_ssim += ssim

    avg_loss = total_loss / len(test_loader)
    avg_psnr = total_psnr / len(test_loader)
    avg_ssim = total_ssim / len(test_loader)

    print(f"Test Loss: {avg_loss:.4f}")
    print(f"Average PSNR: {avg_psnr:.2f}")
    print(f"Average SSIM: {avg_ssim:.4f}")

The path to the previously saved best-performing model is specified and stored in saved_model_path. The available computing device is then determined, selecting a GPU (cuda) if accessible, otherwise defaulting to the CPU. An instance of the WaterNetPlus model is created and transferred to the selected device. The saved model parameters are loaded into this instance, ensuring that the model reflects the state from the best validation epoch. Finally, the model is set to evaluation mode, which disables training-specific behaviors such as dropout and batch normalization updates, preparing it for inference or testing.

In [20]:
import os

saved_model_path_local = save_dir + "/" + "best_waternet.pth"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = WaterNetPlus().to(device)

if os.path.exists(saved_model_path_local):
  model.load_state_dict(torch.load(saved_model_path_local, map_location=device))
else:
  # download from drive:
  !gdown --fuzzy "https://drive.google.com/uc?id=1DiT9wS7kbYPpbSbxYVqUIM87FALwFOMp"
  saved_model_path_local = "best_waternet.pth"
  model.load_state_dict(torch.load(saved_model_path_local, map_location=device))




model.eval()

Downloading...
From: https://drive.google.com/uc?id=1DiT9wS7kbYPpbSbxYVqUIM87FALwFOMp
To: /content/best_waternet.pth
  0% 0.00/1.39M [00:00<?, ?B/s] 76% 1.05M/1.39M [00:00<00:00, 9.66MB/s]100% 1.39M/1.39M [00:00<00:00, 11.8MB/s]


WaterNetPlus(
  (encoder): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (se): SEBlock(
        (fc): Sequential(
          (0): AdaptiveAvgPool2d(output_size=1)
          (1): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
          (2): ReLU(inplace=True)
          (3): Conv2d(4, 64, kernel_size=(1, 1), stride=(1, 1))
          (4): Sigmoid()
        )
      )
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (3): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu): ReLU(inplace=True)
      (con

In [None]:
test_model(model, val_loader)

Test Loss: 0.0297
Average PSNR: 22.67
Average SSIM: 0.7011


These metrics provide strong evidence that the WaterNet+ model effectively enhances underwater images. The PSNR score suggests a moderate level of noise reduction, while the SSIM value—being closer to 1—indicates that the model preserves structural and perceptual details well.

This objective evaluation complements the visual inspection results and confirms that the model has learned to generalize restoration capabilities across different validation samples.



# Step 8: Visual Evaluation of Model Performance on Validation Samples

A random subset of 10 samples is selected from the validation dataset to visually assess the model’s performance. For each chosen sample, the input image and its corresponding target (ground truth) are retrieved. The input image is passed through the trained model in evaluation mode to generate the enhanced output image. All images—the original input, the model’s output, and the target—are then processed for visualization by converting tensor data to NumPy arrays and normalizing the pixel values to a displayable range.

These images are displayed side-by-side in a grid layout with three columns representing the input, the model’s output, and the target, respectively. Each row corresponds to one randomly selected sample, allowing for a clear visual comparison across multiple examples. Axis labels are removed to emphasize the image content, and titles are added for clarity.

In [None]:
import random

random_indices = random.sample(range(len(val_dataset)), 10)

fig, axes = plt.subplots(nrows=10, ncols=3, figsize=(12, 40))

for i, idx in enumerate(random_indices):
    image, target = val_dataset[idx]

    image_tensor = image.unsqueeze(0).to(device)
    target_tensor = target.unsqueeze(0).to(device)

    with torch.no_grad():
        output = model(image_tensor)

    # Output
    output_image = output.squeeze().cpu().numpy()
    output_image = np.transpose(output_image, (1, 2, 0))
    output_image = (output_image + 1) / 2.0

    # Target
    target_image = target_tensor.squeeze().cpu().numpy()
    target_image = np.transpose(target_image, (1, 2, 0))
    target_image = (target_image + 1) / 2.0

    # Input
    input_image = image.cpu().numpy()
    input_image = np.transpose(input_image, (1, 2, 0))
    input_image = (input_image + 1) / 2.0


    axes[i, 0].imshow(input_image)
    axes[i, 0].set_title(f"Input {i+1}", fontsize=18)
    axes[i, 0].axis('off')

    axes[i, 1].imshow(output_image)
    axes[i, 1].set_title(f"Output {i+1}", fontsize=18)
    axes[i, 1].axis('off')

    axes[i, 2].imshow(target_image)
    axes[i, 2].set_title(f"Target {i+1}", fontsize=18)
    axes[i, 2].axis('off')

plt.tight_layout()
plt.show()


Output hidden; open in https://colab.research.google.com to view.

# Conclusion

This study successfully applied an enhanced deep learning model, WaterNet+, to the task of underwater image restoration. By incorporating residual blocks with Squeeze-and-Excitation (SE) attention mechanisms, the model effectively emphasized important feature channels, leading to improved image reconstruction and richer feature representation. The UIEB dataset, consisting of paired degraded and reference images, was utilized to train and validate the model in a robust and systematic manner.

The training process was carefully designed with an adaptive learning rate and a mean squared error loss function, which contributed to steady convergence and optimal performance. Quantitative evaluation through metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) confirmed the model’s ability to enhance image quality significantly. Additionally, qualitative visual comparisons illustrated the model’s capability to restore finer image details and reduce underwater distortions.

Overall, this work demonstrates the promising potential of combining advanced convolutional architectures with attention mechanisms for challenging image restoration problems, particularly in underwater scenarios where visibility is often compromised.