# Domain Adaptation with Deep Domain Confusion (DDC) for Transfer Learning

## Introduction

In this notebook, you will implement the key components of the paper "Deep Domain Confusion: Maximizing for Domain Invariance" by Tzeng et al. This approach addresses the problem of domain adaptation in deep neural networks by introducing:

1. An adaptation layer within a standard CNN architecture
2. A domain confusion loss to learn domain-invariant features
3. A method to simultaneously optimize for classification accuracy and domain invariance

Throughout this assignment, you will gain hands-on experience with:
- Understanding domain adaptation in the context of deep learning
- Implementing Maximum Mean Discrepancy (MMD) as a metric for domain similarity
- Modifying standard CNN architectures for transfer learning
- Training models with multiple objective functions
- Evaluating domain adaptation performance

## Prerequisites

Before proceeding, make sure you have read:
- The original paper: "Deep Domain Confusion: Maximizing for Domain Invariance"
- This notebook assumes you are familiar with PyTorch and basic deep learning concepts

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
from torch.autograd import Variable
import math
import os
from sklearn.manifold import TSNE

# Check if CUDA is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## Part 1: Understanding the Office-31 Dataset

The Office-31 dataset is a benchmark dataset for visual domain adaptation. It contains 31 object categories in three distinct domains:
- **Amazon (A)**: Images from amazon.com
- **Webcam (W)**: Low-resolution images taken by a webcam
- **DSLR (D)**: High-resolution images taken by a DSLR camera

Let's implement a dataloader for this dataset:

In [None]:
# TODO: Implement the Office31Dataset class
class Office31Dataset(torch.utils.data.Dataset):
    """
    Dataset class for the Office-31 dataset
    
    Args:
        root_dir (string): Directory with all the images
        domain (string): Which domain to use ('amazon', 'webcam', or 'dslr')
        transform (callable, optional): Optional transform to be applied on a sample
    """
    def __init__(self, root_dir, domain, transform=None):
        # Your code here
        self.root_dir = root_dir
        self.domain = domain
        self.transform = transform
        pass
    
    def __len__(self):
        # Your code here
        # return len(self.root_dir)
        pass
    
    def __getitem__(self, idx):
        # Your code here
        # return self.root_dir[idx], self.domain, self.transform
        pass

In [None]:
# TODO: Define image transformations for data preprocessing
def get_data_transforms():
    """
    Define transforms for training and testing
    """
    # Your code here - make sure to normalize according to ImageNet statistics 
    # since we'll be using a pretrained model
    
    pass

In [None]:
# TODO: Implement data loaders for source and target domains
def get_office31_dataloaders(source_domain, target_domain, batch_size=32):
    """
    Create data loaders for the source and target domains
    
    Args:
        source_domain (string): Source domain name
        target_domain (string): Target domain name
        batch_size (int): Batch size
        
    Returns:
        source_loader: DataLoader for the source domain
        target_loader: DataLoader for the target domain
    """
    # Your code here
    pass

## Part 2: Implementing the Maximum Mean Discrepancy (MMD) Loss

The key to the DDC approach is the use of Maximum Mean Discrepancy (MMD) as a metric to measure the distance between the source and target feature distributions. Let's implement the MMD loss function:

In [None]:
# TODO: Implement the MMD (Maximum Mean Discrepancy) loss
def mmd_loss(source_features, target_features, kernel_mul=2.0, kernel_num=5, fix_sigma=None):
    """
    Calculate the MMD (Maximum Mean Discrepancy) between source and target features
    
    Args:
        source_features (torch.Tensor): Features from source domain (batch_size, feature_dim)
        target_features (torch.Tensor): Features from target domain (batch_size, feature_dim)
        kernel_mul (float): Kernel multiplier for RBF kernel
        kernel_num (int): Number of kernels
        fix_sigma (float): Fixed sigma value for the RBF kernel
        
    Returns:
        mmd_value (torch.Tensor): MMD loss value
    """
    # Your code here
    # Hint 1: You need to compute the mean embeddings of source and target features
    # Hint 2: Use a Gaussian kernel with multiple bandwidths (sigma values)
    # Hint 3: The formula is: MMD(X, Y) = ||E[φ(X)] - E[φ(Y)]||^2
    # source_mean = torch.mean(source_features, dim=0)
    # target_mean = torch.mean(target_features, dim=0)
    pass

## Part 3: Building the Deep Domain Confusion (DDC) Network

Now, let's implement the DDC network by modifying a pre-trained AlexNet model. We'll add an adaptation layer and use both classification and domain confusion losses for training.

In [None]:
# TODO: Implement the DDC network architecture
class DDCNet(nn.Module):
    """
    Deep Domain Confusion Network based on AlexNet
    
    Args:
        num_classes (int): Number of classes in the dataset
        adaptation_layer_dim (int): Dimension of the adaptation layer
    """
    def __init__(self, num_classes=31, adaptation_layer_dim=256):
        super(DDCNet, self).__init__()
        
        # Load a pre-trained AlexNet model
        self.alexnet = torchvision.models.alexnet(pretrained=True)
        
        # Extract feature extraction layers (everything before the final classifier)
        self.features = self.alexnet.features
        
        # Create adaptation layer (typically after fc7)
        # TODO: Add the adaptation layer as described in the paper
        # self.adaptation_layer = nn.Linear(256, adaptation_layer_dim)
        
        # Create classifier layers
        # TODO: Modify the classifier to output num_classes
        # self.classifier = nn.Linear(256, num_classes)
        
        # Initialize weights of the new layers
        # TODO: Initialize the weights of the adaptation layer and new classifier layers
        # self.adaptation_layer.weight.data.normal_(0, 0.01)
        # self.adaptation_layer.bias.data.zero_()
        # self.classifier.weight.data.normal_(0, 0.01)
        # self.classifier.bias.data.zero_()
        
    def forward(self, source_data, target_data=None):
        """
        Forward pass through the network
        
        Args:
            source_data (torch.Tensor): Source domain data
            target_data (torch.Tensor, optional): Target domain data
            
        Returns:
            source_preds: Class predictions for source data
            (source_features, target_features): Features for domain confusion loss (if target_data is provided)
        """
        # TODO: Implement the forward pass
        # Remember to extract features at the adaptation layer for computing MMD loss
        pass

## Part 4: Training the DDC Network

Now, let's implement the training procedure for the DDC network, which involves optimizing both classification and domain confusion losses:

In [None]:
# TODO: Implement the training function for DDC
def train_ddc(model, source_loader, target_loader, num_epochs=20, learning_rate=0.001, 
              lambda_mmd=0.25, beta1=0.9, beta2=0.999):
    """
    Train the DDC network
    
    Args:
        model (DDCNet): The DDC network
        source_loader (DataLoader): DataLoader for source domain
        target_loader (DataLoader): DataLoader for target domain
        num_epochs (int): Number of training epochs
        learning_rate (float): Learning rate
        lambda_mmd (float): Weight for the MMD loss
        beta1, beta2 (float): Beta parameters for Adam optimizer
        
    Returns:
        model (DDCNet): Trained model
        history (dict): Training history (losses and accuracies)
    """
    # TODO: Set up optimizer, loss function, and training loop
    # Make sure to optimize both classification and domain confusion losses
    
    # Initialize optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, betas=(beta1, beta2))
    
    # Initialize loss function
    criterion = nn.CrossEntropyLoss()
    
    # Initialize history
    history = {'loss': [], 'accuracy': []}
    
    # Training loop
    for epoch in range(num_epochs):
        # TODO: Implement training loop
        # 1. Forward pass
        # 2. Compute classification loss
        # 3. Compute domain confusion loss
        # 4. Compute total loss
        # 5. Backward pass
        # 6. Update weights
        # 7. Compute accuracy
        # 8. Store loss and accuracy
        pass

In [None]:
# TODO: Implement the testing function for DDC
def test_ddc(model, target_loader):
    """
    Test the DDC network on the target domain
    
    Args:
        model (DDCNet): The trained DDC network
        target_loader (DataLoader): DataLoader for target domain
        
    Returns:
        accuracy (float): Classification accuracy on the target domain
    """
    # TODO: Implement the testing procedure
    # 1. Initialize accuracy counter
    # 2. Iterate over target domain data
    # 3. Compute predictions and accuracy
    # 4. Return accuracy
    
    pass

## Part 5: Ablation Studies and Experiments

Now that we have implemented the DDC network, let's conduct some experiments to understand the contribution of each component:

In [None]:
# TODO: Implement experiments to analyze the contribution of different components
def run_experiments():
    """
    Run experiments to analyze different aspects of the DDC network
    
    Experiments to consider:
    1. Effect of adaptation layer position
    2. Effect of adaptation layer dimension
    3. Effect of lambda_mmd (weight for the MMD loss)
    4. Comparison with fine-tuning a pre-trained model without domain adaptation
    """
    # TODO: Implement various experiments
    # 1. Effect of adaptation layer position
    # 2. Effect of adaptation layer dimension
    # 3. Effect of lambda_mmd (weight for the MMD loss)
    # 4. Comparison with fine-tuning a pre-trained model without domain adaptation
    
    pass

## Part 6: Visualizing Domain Adaptation

Finally, let's visualize how the features change during the domain adaptation process:

In [None]:
# TODO: Implement feature visualization
def visualize_features(model, source_loader, target_loader, epoch=0):
    """
    Visualize features from source and target domains using t-SNE
    
    Args:
        model (DDCNet): The DDC network
        source_loader (DataLoader): DataLoader for source domain
        target_loader (DataLoader): DataLoader for target domain
        epoch (int): Current training epoch
    """
    # TODO: Implement feature visualization with t-SNE
    # Extract features from the adaptation layer
    # Apply t-SNE for dimensionality reduction
    # Plot the source and target features in different colors
    tsne = TSNE()
    
    # Extract features from the adaptation layer
    # TODO: Implement feature extraction
    source_features = []
    target_features = []
    
    # Apply t-SNE for dimensionality reduction

    combined_features = np.vstack((source_features, target_features))
    t_sne_features = tsne.fit_transform(combined_features)

    # Plot the source and target features in different colors
    # TODO: Implement visualization 
    plt.figure(figsize=(10, 8))
    plt.scatter(t_sne_features[:len(source_features), 0], t_sne_features[:len(source_features), 1], c='blue', label='Source')
    plt.scatter(t_sne_features[len(source_features):, 0], t_sne_features[len(source_features):, 1], c='red', label='Target')
    plt.legend()
    plt.title(f'Feature Distribution (Epoch {epoch})')
    plt.xlabel('t-SNE Dimension 1')
    plt.ylabel('t-SNE Dimension 2')
    plt.savefig(f'feature_distribution_epoch_{epoch}.png')
    plt.close()
    
    pass

## Part 7: Implementing the Complete Pipeline

Now let's put everything together and implement the complete DDC training and evaluation pipeline:

In [None]:
# TODO: Implement the main function to run the complete pipeline
def main():
    # Set random seed for reproducibility
    torch.manual_seed(42)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(42)
    
    # Define hyperparameters
    source_domain = 'amazon'
    target_domain = 'webcam'
    num_classes = 31
    batch_size = 32
    num_epochs = 20
    learning_rate = 0.001
    lambda_mmd = 0.25
    adaptation_layer_dim = 256
    
    # Create data loaders
    source_loader, target_loader = get_office31_dataloaders(
        source_domain, target_domain, batch_size)
    
    # Create the DDC network
    model = DDCNet(num_classes=num_classes, 
                  adaptation_layer_dim=adaptation_layer_dim).to(device)
    
    # Train the model
    trained_model, history = train_ddc(
        model, source_loader, target_loader, num_epochs, 
        learning_rate, lambda_mmd)
    
    # Test the model
    target_accuracy = test_ddc(trained_model, target_loader)
    print(f"Target domain accuracy: {target_accuracy:.4f}")
    
    # Plot training history
    plt.figure(figsize=(12, 4))
    
    plt.subplot(1, 2, 1)
    plt.plot(history['train_class_loss'], label='Classification Loss')
    plt.plot(history['train_mmd_loss'], label='MMD Loss')
    plt.plot(history['train_total_loss'], label='Total Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    
    plt.subplot(1, 2, 2)
    plt.plot(history['train_accuracy'], label='Source Accuracy')
    plt.plot(history['target_accuracy'], label='Target Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Run additional experiments
    run_experiments()

if __name__ == "__main__":
    main()

## Challenges and Extensions (for Advanced Students)

1. **Multi-Domain Adaptation**: Extend the DDC approach to handle multiple source or target domains.
2. **Different Network Architectures**: Implement the DDC approach with more modern architectures like ResNet or DenseNet.
3. **Alternative Domain Discrepancy Measures**: Replace MMD with other domain discrepancy measures like CORAL (Correlation Alignment) or adversarial training.
4. **Partial Domain Adaptation**: Modify the approach to handle the case where the target domain contains only a subset of the source domain classes.
5. **Parameter Sensitivity Analysis**: Conduct a thorough analysis of how different hyperparameters affect the performance of the DDC approach.

## References

1. Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., & Darrell, T. (2014). Deep Domain Confusion: Maximizing for Domain Invariance. arXiv preprint arXiv:1412.3474.
2. Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13(Mar), 723-773.
3. Saenko, K., Kulis, B., Fritz, M., & Darrell, T. (2010). Adapting visual category models to new domains. In European conference on computer vision (pp. 213-226).