<a href="https://colab.research.google.com/github/ruih12/ec601-team/blob/main/model_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Perceptual Loss for Regularization
Perceptual loss encourages the model to generate images that are not only pixel-wise accurate but also perceptually similar to real images by focusing on high-level feature representations. This loss can enhance the texture and color fidelity of generated images.

Implementation Code:

In [None]:
import torch
import torch.nn as nn
from torchvision import models

class PerceptualLoss(nn.Module):
    def __init__(self):
        super(PerceptualLoss, self).__init__()
        vgg = models.vgg19(pretrained=True).features
        self.layers = nn.Sequential(*list(vgg[:36])).eval()  # Use the first 36 layers of VGG19
        for param in self.layers.parameters():
            param.requires_grad = False  # Freeze VGG19 layers

    def forward(self, generated, target):
        gen_features = self.layers(generated)
        target_features = self.layers(target)
        perceptual_loss = nn.functional.l1_loss(gen_features, target_features)
        return perceptual_loss


# Explanation:

VGG19 Feature Extractor: We use a pretrained VGG19 model to extract features from the generated and target images. The feature extractor is truncated at the 36th layer, capturing deeper and more detailed features for perceptual comparison.

Feature Extraction: For each forward pass, the generated and target images are passed through the VGG19 layers, producing feature maps.

Loss Calculation: The perceptual loss is the L1 loss between the feature maps of the generated and target images. This loss ensures that generated images match high-level features, improving texture and detail.

**Integration in Training**: Add this loss in the main training loop alongside other losses:

In [None]:
# Instantiate perceptual loss
perceptual_loss_fn = PerceptualLoss().cuda()

# In the training loop
for images, targets in dataloader:
    images, targets = images.cuda(), targets.cuda()

    generated_images = model(images)
    perceptual_loss = perceptual_loss_fn(generated_images, targets)

    total_loss = perceptual_loss + other_losses  # Combine with other loss terms
    total_loss.backward()
    optimizer.step()


# 2. Attention Mechanism
Adding attention layers in the model’s architecture, particularly around dynamic areas such as the eyes and mouth, allows the model to focus on these regions more effectively. Below is a simple spatial attention mechanism added to the generator model.

In [None]:
class AttentionLayer(nn.Module):
    def __init__(self, in_channels):
        super(AttentionLayer, self).__init__()
        self.query_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
        self.key_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
        self.value_conv = nn.Conv2d(in_channels, in_channels, kernel_size=1)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        batch, channels, height, width = x.size()
        query = self.query_conv(x).view(batch, -1, width * height).permute(0, 2, 1)
        key = self.key_conv(x).view(batch, -1, width * height)
        attention = self.softmax(torch.bmm(query, key))
        value = self.value_conv(x).view(batch, -1, width * height)

        out = torch.bmm(value, attention.permute(0, 2, 1))
        out = out.view(batch, channels, height, width)
        return out + x  # Residual connection

class EnhancedGenerator(nn.Module):
    def __init__(self, base_generator):
        super(EnhancedGenerator, self).__init__()
        self.base = base_generator
        self.attention = AttentionLayer(in_channels=64)  # Add attention to a specific layer

    def forward(self, x):
        x = self.base.layer1(x)  # Replace with actual base layers
        x = self.attention(x)
        x = self.base.layer2(x)
        return x


# Explanation:

Query, Key, and Value Convolutions: The attention layer creates three matrices (query, key, value) from the input feature map. These matrices represent different perspectives of the same data.

Softmax Attention: Calculates a similarity matrix (attention map) between query and key. This map weights the importance of different regions.

Attention Output: The weighted sum of value and attention map produces a refined output with focused attention on dynamic regions like eyes and mouth. This output is added to the original input as a residual connection to retain base features.

Integration in Generator Model: Wrap the base generator with this attention layer and incorporate it into the main model architecture.

# 3. Temporal Consistency Loss
Temporal consistency loss encourages coherence between consecutive frames in the generated video, reducing jitter and ensuring smoother transitions across frames.

Implementation Code:

In [None]:
class TemporalConsistencyLoss(nn.Module):
    def __init__(self):
        super(TemporalConsistencyLoss, self).__init__()

    def forward(self, current_frame, previous_frame):
        # Compute L1 loss between the consecutive frames
        return nn.functional.l1_loss(current_frame, previous_frame)

# In the training loop for video generation
temporal_loss_fn = TemporalConsistencyLoss()

# Assume frames is a list of generated frames
temporal_loss = 0.0
for i in range(1, len(frames)):
    temporal_loss += temporal_loss_fn(frames[i], frames[i-1])  # Loss between consecutive frames

total_loss = temporal_loss + other_losses


# Explanation:

Temporal Consistency Loss: Measures the L1 loss between consecutive frames, penalizing sudden changes. This loss encourages smooth transitions, reducing flicker in generated videos.

Summing Temporal Loss Across Frames: Accumulate the loss over all consecutive frame pairs in a generated sequence to ensure consistent changes over time.

Integration in Training Loop: The temporal loss is added alongside other losses. For video sequences, it should be applied to every consecutive frame pair in a mini-batch of generated frames.