semi-supervised domain adaptation, the minimax entropy approach alternates between:

**Maximizing entropy** forces the classifier to remain uncertain when predicting the unlabeled target samples. This discourages it from making confident but incorrect predictions, preventing bias toward the source domain.

**Minimizing entropy** ensures the feature extractor learns domain-invariant features, aligning the distributions of the source and target domains and making the classifier confident where it should be.


minimax entropy-based domain adaptation is feature-based. It focuses on aligning feature distributions between the source and target domains to reduce domain mismatch. By alternately maximizing and minimizing entropy, the method ensures that the features are both diverse and structured, enabling better generalization across domains

Imports

In [32]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split, Dataset
import torch.nn.functional as F

from torchvision import datasets, transforms
from torchvision.models import resnet50, ResNet50_Weights

import numpy as np
import matplotlib.pyplot as plt

**Data Processing**

In [33]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("mei1963/domainnet")

print("Path to dataset files:", path)


Path to dataset files: /kaggle/input/domainnet


In [34]:
def walk_through_dir(dir_path):
  """
  Walks through dir_path returning its contents.
  Args:
    dir_path (str or pathlib.Path): target directory

  Returns:
    A print out of:
      number of subdiretories in dir_path
      number of images (files) in each subdirectory
      name of each subdirectory
  """
  for dirpath, dirnames, filenames in os.walk(dir_path):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

In [35]:
# walk_through_dir(path)

In [36]:
source_path = os.path.join(path, "DomainNet/real")
target_path = os.path.join(path, "DomainNet/sketch")

In [37]:
simple_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

augmented_transform = transforms.Compose([
    # Random augmentations
    transforms.RandomHorizontalFlip(p=0.5),  # Randomly flip images horizontally with a 50% chance
    transforms.RandomRotation(degrees=15),   # Randomly rotate images by up to ±15 degrees
    transforms.RandomResizedCrop(size=(224, 224), scale=(0.8, 1.0)),  # Crop to 224x224 with random scale

    # Color jitter to introduce brightness/contrast variation
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),

    # Convert to tensor and normalize
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


In [38]:
source_dataset = datasets.ImageFolder(root=source_path, transform=augmented_transform)
target_dataset = datasets.ImageFolder(root=target_path, transform=simple_transform)


In [39]:
#Making label and unlabed datasets for semi-supervised

# Calculate sizes for the split
unlabeled_size = int(0.9 * len(target_dataset))
labeled_size = len(target_dataset) - unlabeled_size

# Split the dataset
target_dataset_unlabeled, target_dataset_labeled = random_split(target_dataset, [unlabeled_size, labeled_size])

In [40]:
class AugmentedDataset(Dataset):
    def __init__(self, dataset, transform):
        self.dataset = dataset
        self.transform = transform

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, label = self.dataset[idx]  # Get original image and label
        image = self.transform(image)  # Apply augmentation
        return image, label

In [41]:
target_dataset_labeled_augmented = AugmentedDataset(target_dataset_labeled, augmented_transform)


In [42]:
len(target_dataset_labeled), len(target_dataset_unlabeled)

(7039, 63347)

In [43]:
dataloader_source = DataLoader(source_dataset, batch_size=32, shuffle=True)
dataloader_target_labeled = DataLoader(target_dataset_labeled, batch_size=32, shuffle=True)
dataloader_target_unlabeled = DataLoader(target_dataset_unlabeled, batch_size=32, shuffle=True)

In [44]:
class_names = source_dataset.classes
# print(class_names)
len(class_names)

345

In [45]:
# img, label = next(iter(dataloader_target))

# # Batch size will now be 1, try changing the batch_size parameter above and see what happens
# print(f"Image shape: {img.shape} -> [batch_size, color_channels, height, width]")
# print(f"Label shape: {label.shape}")

Functions

In [46]:
def entropy_maximization(predictions):
    """
    Maximizes entropy for the given predictions.
    Args:
        predictions (torch.Tensor): Class probabilities (softmax outputs).
    Returns:
        torch.Tensor: Entropy loss.
    """
    # Compute entropy for each sample (with clamping for numerical stability)
    entropy = -torch.sum(predictions * torch.log(torch.clamp(predictions, min=1e-6, max=1.0)), dim=1)

    # Maximize entropy (minimize negative entropy)
    entropy_loss = torch.mean(entropy)
    return entropy_loss


def entropy_minimization(predictions):
    """
    Minimizes entropy for the given predictions.
    Args:
        predictions (torch.Tensor): Class probabilities (softmax outputs).
    Returns:
        torch.Tensor: Entropy loss.
    """
    # Compute entropy for each sample (with clamping for numerical stability)
    entropy = -torch.sum(predictions * torch.log(torch.clamp(predictions, min=1e-6, max=1.0)), dim=1)

    # Minimize entropy (maximize negative entropy)
    entropy_loss = -torch.mean(entropy)
    return entropy_loss


def generate_pseudo_labels(predictions, confidence_threshold=0.9):
    """
    Generate pseudo-labels for target domain data based on model predictions.
    Args:
        predictions (torch.Tensor): Class probabilities (softmax outputs).
        confidence_threshold (float): Minimum confidence to assign a pseudo-label.
    Returns:
        torch.Tensor: Pseudo-labels for confident predictions.
    """
    # Get the predicted class and confidence for each sample
    confidences, pseudo_labels = torch.max(predictions, dim=1)

    # Filter pseudo-labels based on confidence threshold
    mask = confidences >= confidence_threshold
    invalid_label = torch.tensor(-1, device=predictions.device)  # Define once
    pseudo_labels = torch.where(mask, pseudo_labels, invalid_label)  # Replace uncertain samples with -1

    return pseudo_labels


**Feature extractor**

In [47]:
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)

for param in model.parameters():
    param.requires_grad = False


model.fc = nn.Identity()

**Classifier**

In [48]:
class Classifier(nn.Module):
    def __init__(self, input_dim, num_classes, hidden_dim=256):
        super(Classifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)  # Fully connected layer
        self.relu = nn.ReLU()                       # Activation function
        self.dropout = nn.Dropout(p=0.5)            # Dropout for regularization
        self.fc2 = nn.Linear(hidden_dim, num_classes)  # Output layer for class probabilities
        # No Softmax here; raw logits will be returned

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        # x = self.dropout(x)  # Apply dropout
        x = self.fc2(x)       # Output logits
        return x              # Raw logits, suitable for CrossEntropyLoss


**Training Loop**

In [49]:
feature_encoder = model.to('cuda')
classifier = Classifier(input_dim=2048, num_classes=345).to('cuda')
optimizer_encoder = optim.Adam(feature_encoder.parameters(), lr=0.0005)
optimizer_classifier = optim.Adam(classifier.parameters(), lr=0.0005)

In [50]:
num_epochs = 10

torch.autograd.set_detect_anomaly(True)


for epoch in range(num_epochs):
    feature_encoder.train()
    classifier.train()

    total_supervised_loss_source = 0
    total_supervised_loss_target = 0
    total_entropy_loss_encoder = 0
    total_entropy_loss_classifier = 0

    batch_count = 0

    for (source_data, labeled_target_data, unlabeled_target_data) in zip(dataloader_source, dataloader_target_labeled, dataloader_target_unlabeled):
        batch_count += 1

        # source domain labeled
        source_images, source_labels = source_data
        source_images, source_labels = source_images.to('cuda'), source_labels.to('cuda')
        features_source = feature_encoder(source_images)
        predictions_source = classifier(features_source)
        supervised_loss_source = F.cross_entropy(predictions_source, source_labels)
        total_supervised_loss_source += supervised_loss_source.item()


        # target domain labeled
        labeled_target_images, labels_target_labeled = labeled_target_data
        labeled_target_images, labels_target_labeled = labeled_target_images.to('cuda'), labels_target_labeled.to('cuda')
        features_target_labeled = feature_encoder(labeled_target_images)
        predictions_target_labeled = classifier(features_target_labeled)
        supervised_loss_target  = F.cross_entropy(predictions_target_labeled, labels_target_labeled)
        total_supervised_loss_target += supervised_loss_target.item()


        # target domain unlabeled
        unlabeled_target_images, _ = unlabeled_target_data
        unlabeled_target_images = unlabeled_target_images.to('cuda')
        features_target_unlabeled = feature_encoder(unlabeled_target_images)
        predictions_target_unlabeled = classifier(features_target_unlabeled)
        pseudo_labels = generate_pseudo_labels(predictions_target_unlabeled)
        entropy_loss_encoder = entropy_maximization(predictions_target_unlabeled)
        entropy_loss_classifier = entropy_minimization(predictions_target_unlabeled)
        total_entropy_loss_encoder += entropy_loss_encoder.item()
        total_entropy_loss_classifier += entropy_loss_classifier.item()


        # Combine all losses for a single backward pass
        scaled_entropy_loss_encoder = 0.01 * entropy_loss_encoder
        scaled_entropy_loss_classifier = 0.01 * entropy_loss_classifier
        total_loss = (
            supervised_loss_source
            + supervised_loss_target
            + scaled_entropy_loss_encoder
            + scaled_entropy_loss_classifier
        )


        # Optimize both encoder and classifier together
        optimizer_encoder.zero_grad()
        optimizer_classifier.zero_grad()
        total_loss.backward()
        optimizer_encoder.step()
        optimizer_classifier.step()

    # Print epoch-level statistics
    print(f"Epoch {epoch + 1}/{num_epochs}:")
    print(f"  Source Supervised Loss: {total_supervised_loss_source / batch_count:.4f}")
    print(f"  Target Supervised Loss: {total_supervised_loss_target / batch_count:.4f}")
    print(f"  Encoder Entropy Loss: {total_entropy_loss_encoder / batch_count:.4f}")
    print(f"  Classifier Entropy Loss: {total_entropy_loss_classifier / batch_count:.4f}")

Epoch 1/10:
  Source Supervised Loss: 4.6682
  Target Supervised Loss: 5.0883
  Encoder Entropy Loss: -7782.7270
  Classifier Entropy Loss: 7782.7270
Epoch 2/10:
  Source Supervised Loss: 2.4713
  Target Supervised Loss: 3.6579
  Encoder Entropy Loss: -24775.7129
  Classifier Entropy Loss: 24775.7129
Epoch 3/10:
  Source Supervised Loss: 1.8579
  Target Supervised Loss: 3.0330
  Encoder Entropy Loss: -31384.7094
  Classifier Entropy Loss: 31384.7094
Epoch 4/10:
  Source Supervised Loss: 1.6433
  Target Supervised Loss: 2.6431
  Encoder Entropy Loss: -34079.8426
  Classifier Entropy Loss: 34079.8426
Epoch 5/10:
  Source Supervised Loss: 1.4783
  Target Supervised Loss: 2.3760
  Encoder Entropy Loss: -35529.7149
  Classifier Entropy Loss: 35529.7149
Epoch 6/10:
  Source Supervised Loss: 1.3915
  Target Supervised Loss: 2.1382
  Encoder Entropy Loss: -38047.6513
  Classifier Entropy Loss: 38047.6513
Epoch 7/10:
  Source Supervised Loss: 1.3931
  Target Supervised Loss: 1.9565
  Encoder En

**Conclusion**

results confirm steady progress in supervised learning, with both source and target supervised losses decreasing consistently. This means the model is effectively leveraging labeled data from both domains. Meanwhile, your encoder and classifier entropy losses are growing, which aligns with the expected behavior in a Minimax Entropy framework. However, the rapid growth of entropy losses still warrants careful attention.