# **Task 3: Evaluation for Domain Generalization**
In this task, we will evaluate the selected model(EfficientNet B3) for their ability to generalize to OOD data. Specifically, you will
start with evaluating the models on two datasets that reflect various domain shifts: </br>


*   PACS dataset: contains images across four distinct domains: Photos, Art paintings, Cartoons, and
Sketches. You need to note and observe that this dataset introduces shifts in style and representation that
differ from typical training data, providing a natural environment to test the models’ generalization capabilities. Given the domain shifts present in PACS, we expect some degree of performance drop compared to the
benchmarks set in Task 2.

*   CIFAR-100 Splits dataset or SVHN: PACS however only represents a form of Covariate Shift. We want
to analyze how much a Semantic Shift would hurt performance. This task is trickier, and it is recommended
for you to look into to understand how to look into probing for this type of shift. Since we used CIFAR-10
as our baseline in Task 1, you can look into either the CIFAR-100 Splits dataset (as detailed in the paper),
or SVHN.

Your goal is to evaluate the models on these two datasets (one exhibiting Covariate Shift, and the
other exhibiting Concept/Semantic Shift) to obtain a rough measure of how each model handles domain
generalization prior to controlling for specific types of shifts, such as shape, color, or texture. This will serve as an
initial, broad assessment of how well the models generalize to unseen domains, setting the stage for more focused
analyses in the following tasks.

In [1]:
pip install timm

Collecting timm
  Downloading timm-1.0.9-py3-none-any.whl.metadata (42 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.4/42.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Downloading timm-1.0.9-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m41.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: timm
Successfully installed timm-1.0.9


In [2]:
import tensorflow as tf
tf.test.gpu_device_name()

'/device:GPU:0'

# **Defining the Efficient Net Model**

In [3]:
import timm
import torch
import torch.nn as nn


class EfficientNetB3Model(nn.Module):
    def __init__(self, num_classes, pretrained=True):
        super(EfficientNetB3Model, self).__init__()
        self.enetb3 = timm.create_model('efficientnet_b3', pretrained=pretrained)
        self.enetb3.classifier = nn.Linear(self.enetb3.classifier.in_features, num_classes)



    def forward(self, x):
        x = self.enetb3(x)
        return x

def load_efficientnetb3_model(num_classes, device,task):
    model = EfficientNetB3Model(num_classes)
    if task!='nopath':
        model.load_state_dict(torch.load(f'fine_tuned_enetb3_{task}.pth'))
    model = model.to(device)
    return model
print("sanity check")

sanity check


# **Verifying the Model**

In [4]:
import torch
# from efficientnet_b3_model import load_efficientnetb3_model

def verify_efficientnetb3_model():
    device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
    print(f"Using device: {device}")

    num_classes = 10

    model = load_efficientnetb3_model(num_classes, device, task='nopath')
    print(f"Model loaded successfully. Number of classes: {num_classes}")

    print("\nModel Architecture:")
    print(model)

    batch_size = 1
    dummy_input = torch.randn(batch_size, 3, 112, 112).to(device)
    print(f"\nDummy input shape: {dummy_input.shape}")

    try:
        with torch.no_grad():
            output = model(dummy_input)
        print("Forward pass successful!")
        print(f"Output shape: {output.shape}")

        expected_shape = (batch_size, num_classes)
        assert output.shape == expected_shape, f"Expected output shape {expected_shape}, but got {output.shape}"
        print("Output shape is correct.")

        if device.type == 'cuda':
          torch.cuda.empty_cache()
        elif device.type == 'mps':
          torch.mps.empty_cache()

    except Exception as e:
        print(f"Error during forward pass: {str(e)}")
        return

    print("\nModel verification completed successfully!")
print("sanity check")

sanity check


In [5]:
verify_efficientnetb3_model()

Using device: cpu


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors:   0%|          | 0.00/49.3M [00:00<?, ?B/s]

Model loaded successfully. Number of classes: 10

Model Architecture:
EfficientNetB3Model(
  (enetb3): EfficientNet(
    (conv_stem): Conv2d(3, 40, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn1): BatchNormAct2d(
      40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
      (drop): Identity()
      (act): SiLU(inplace=True)
    )
    (blocks): Sequential(
      (0): Sequential(
        (0): DepthwiseSeparableConv(
          (conv_dw): Conv2d(40, 40, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=40, bias=False)
          (bn1): BatchNormAct2d(
            40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
            (drop): Identity()
            (act): SiLU(inplace=True)
          )
          (aa): Identity()
          (se): SqueezeExcite(
            (conv_reduce): Conv2d(40, 10, kernel_size=(1, 1), stride=(1, 1))
            (act1): SiLU(inplace=True)
            (conv_expand): Conv2d(10, 40, kernel_size=(1, 1), s

# **Loading the SVHN Dataset**

In [6]:
import torchvision.transforms as transforms
from torchvision.datasets import SVHN
from torch.utils.data import DataLoader

def get_data_loaders_svhn(batch_size=64):
    # Define image transformations
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])

    # Load SVHN dataset
    train_dataset = SVHN(root='./data', split='train', download=True, transform=transform)
    test_dataset = SVHN(root='./data', split='test', download=True, transform=transform)

    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4)

    return train_loader, test_loader, 10

print("sanity check")

sanity check


# **Fine Tuning the Model to the SVHN Dataset**

In [7]:
import timm
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision.datasets import SVHN
from torch.utils.data import DataLoader
import torch.optim as optim
from torch.cuda.amp import autocast, GradScaler
import time
# from efficientnet_b3_model import load_efficientnetb3_model
# from data_svhn import get_data_loaders_svhn

def train_model(model, train_loader, criterion, optimizer, device, num_epochs=5):
    model.train()
    scaler = GradScaler()

    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        start_time = time.time()

        for batch_idx, (inputs, labels) in enumerate(train_loader):
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            with autocast():
                outputs = model(inputs)
                loss = criterion(outputs, labels)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            running_loss += loss.item()

            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

            if (batch_idx + 1) % 20 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Batch [{batch_idx + 1}/{len(train_loader)}], Loss: {loss.item():.4f}")

        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = 100 * correct / total
        print(f"Epoch [{epoch + 1}/{num_epochs}] completed in {time.time() - start_time:.2f} seconds. "
              f"Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%")

print("sanity check")

sanity check


In [8]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_loader, test_loader, num_classes = get_data_loaders_svhn()
model = load_efficientnetb3_model(num_classes, device, task='nopath')

for name, param in model.named_parameters():
    print(name, param.requires_grad)

for name, param in model.named_parameters():
    if "some_specific_layer" in name:
        param.requires_grad = False

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=2)

torch.save(model.state_dict(), 'fine_tuned_enetb3_task31.pth')

print ("hogya")

Downloading http://ufldl.stanford.edu/housenumbers/train_32x32.mat to ./data/train_32x32.mat


100%|██████████| 182040794/182040794 [00:08<00:00, 22745419.32it/s]


Downloading http://ufldl.stanford.edu/housenumbers/test_32x32.mat to ./data/test_32x32.mat


100%|██████████| 64275384/64275384 [00:03<00:00, 18476840.61it/s]


enetb3.conv_stem.weight True
enetb3.bn1.weight True
enetb3.bn1.bias True
enetb3.blocks.0.0.conv_dw.weight True
enetb3.blocks.0.0.bn1.weight True
enetb3.blocks.0.0.bn1.bias True
enetb3.blocks.0.0.se.conv_reduce.weight True
enetb3.blocks.0.0.se.conv_reduce.bias True
enetb3.blocks.0.0.se.conv_expand.weight True
enetb3.blocks.0.0.se.conv_expand.bias True
enetb3.blocks.0.0.conv_pw.weight True
enetb3.blocks.0.0.bn2.weight True
enetb3.blocks.0.0.bn2.bias True
enetb3.blocks.0.1.conv_dw.weight True
enetb3.blocks.0.1.bn1.weight True
enetb3.blocks.0.1.bn1.bias True
enetb3.blocks.0.1.se.conv_reduce.weight True
enetb3.blocks.0.1.se.conv_reduce.bias True
enetb3.blocks.0.1.se.conv_expand.weight True
enetb3.blocks.0.1.se.conv_expand.bias True
enetb3.blocks.0.1.conv_pw.weight True
enetb3.blocks.0.1.bn2.weight True
enetb3.blocks.0.1.bn2.bias True
enetb3.blocks.1.0.conv_pw.weight True
enetb3.blocks.1.0.bn1.weight True
enetb3.blocks.1.0.bn1.bias True
enetb3.blocks.1.0.conv_dw.weight True
enetb3.blocks.1.0

  scaler = GradScaler()
  with autocast():


Epoch [1/2], Batch [20/1145], Loss: 2.2374
Epoch [1/2], Batch [40/1145], Loss: 1.9775
Epoch [1/2], Batch [60/1145], Loss: 1.8339
Epoch [1/2], Batch [80/1145], Loss: 1.5946
Epoch [1/2], Batch [100/1145], Loss: 0.9882
Epoch [1/2], Batch [120/1145], Loss: 1.0239
Epoch [1/2], Batch [140/1145], Loss: 0.5709
Epoch [1/2], Batch [160/1145], Loss: 0.5121
Epoch [1/2], Batch [180/1145], Loss: 0.3686
Epoch [1/2], Batch [200/1145], Loss: 0.3475
Epoch [1/2], Batch [220/1145], Loss: 0.3130
Epoch [1/2], Batch [240/1145], Loss: 0.3509
Epoch [1/2], Batch [260/1145], Loss: 0.3476
Epoch [1/2], Batch [280/1145], Loss: 0.2343
Epoch [1/2], Batch [300/1145], Loss: 0.4184
Epoch [1/2], Batch [320/1145], Loss: 0.3402
Epoch [1/2], Batch [340/1145], Loss: 0.2623
Epoch [1/2], Batch [360/1145], Loss: 0.1755
Epoch [1/2], Batch [380/1145], Loss: 0.3345
Epoch [1/2], Batch [400/1145], Loss: 0.3167
Epoch [1/2], Batch [420/1145], Loss: 0.2103
Epoch [1/2], Batch [440/1145], Loss: 0.1728
Epoch [1/2], Batch [460/1145], Loss:

# **Evaluating the Model on the SVHN Dataset**

In [9]:
import torch
# from efficientnet_b3_model import load_efficientnetb3_model
# from data_svhn import get_data_loaders_svhn
from sklearn.metrics import confusion_matrix
import numpy as np

def evaluate_model(model, dataloader, device):
    model.eval()
    correct = 0
    total = 0
    all_labels = []
    all_predicted = []

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            all_labels.extend(labels.cpu().numpy())
            all_predicted.extend(predicted.cpu().numpy())

    conf_matrix=confusion_matrix(all_labels,all_predicted)
    classwise_accuracies=np.zeros((10,1))
    for i in range(10):
        total_class_labels=0
        for j in range(10):
            total_class_labels += conf_matrix[i,j]
        classwise_accuracies[i,0]=conf_matrix[i,i]/total_class_labels
    accuracy = 100 * correct / total

    print("Confusion Matrix")
    print(conf_matrix)
    print(f"Accuracy on SVHN test set: {accuracy:.2f}%")
    print("Classwise Accuracies:")
    print(classwise_accuracies)

print("sanity check")

sanity check


In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load SVHN dataset
train_loader, test_loader, num_classes = get_data_loaders_svhn()

# Load model
model = load_efficientnetb3_model(num_classes, device,task='task31')

evaluate_model(model, test_loader, device)


Using downloaded and verified file: ./data/train_32x32.mat
Using downloaded and verified file: ./data/test_32x32.mat


  model.load_state_dict(torch.load(f'fine_tuned_enetb3_{task}.pth'))


Confusion Matrix
[[1701    5    3    8    1    4   12    3    0    7]
 [  18 4973   15   13   18    6    6   44    5    1]
 [   4   20 4040   30    4    2    2   29    5   13]
 [   5   26   10 2721    2   28    3   10   19   58]
 [   3   35   12    9 2456    1    2    3    2    0]
 [   2    9    3   44    3 2269   40    2    4    8]
 [  14    4    1   12    8    6 1902    4   22    4]
 [   1   35    6    5    5    2    2 1960    1    2]
 [  15    5    0    6    6    4   29    1 1566   28]
 [  33    8    7    8    1    4    2    0    3 1529]]
Accuracy on SVHN test set: 96.49%
Classwise Accuracies:
[[0.97534404]
 [0.97528927]
 [0.97372861]
 [0.94413602]
 [0.97344431]
 [0.95176174]
 [0.96206373]
 [0.97077761]
 [0.94337349]
 [0.95862069]]


# **Loading the PACS Dataset**

In [11]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [15]:
import os
import torch
from torch.utils.data import Dataset
from PIL import Image
import torchvision.transforms as transforms
from torch.utils.data import DataLoader


class PACS(Dataset):
    def __init__(self, root_dir, domain, transform=None):
        self.root_dir = root_dir
        self.domain = domain
        self.transform = transform
        self.categories = sorted(os.listdir(os.path.join(root_dir, domain)))
        self.images = []
        self.labels = []

        for category in self.categories:
            category_dir = os.path.join(root_dir, domain, category)
            for image_file in os.listdir(category_dir):
                image_path = os.path.join(category_dir, image_file)
                self.images.append(image_path)
                self.labels.append(self.categories.index(category))

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        image_path = self.images[index]
        label = self.labels[index]
        image = Image.open(image_path)
        if self.transform:
            image = self.transform(image)
        return image, label


def get_data_loaders_pacs(batch_size=64):
    # Define image transformations
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])

    # Load PACS dataset
    train_dataset = PACS(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/pacs_data/pacs_data', domain='photo',transform=transform)
    test_dataset_art = PACS(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/pacs_data/pacs_data', domain='art_painting', transform=transform)
    test_dataset_cartoon= PACS(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/pacs_data/pacs_data', domain='cartoon', transform=transform)
    test_dataset_sketches=PACS(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/pacs_data/pacs_data', domain='sketch', transform=transform)

    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
    test_loader_art = DataLoader(test_dataset_art, batch_size=batch_size, shuffle=False, num_workers=4)
    test_loader_cartoon = DataLoader(test_dataset_cartoon, batch_size=batch_size, shuffle=False, num_workers=4)
    test_loader_sketches = DataLoader(test_dataset_sketches, batch_size=batch_size, shuffle=False, num_workers=4)

    return train_loader, test_loader_art, test_loader_cartoon, test_loader_sketches, 7

print("sanity check")

sanity check


# **Fine Tuning the Model to the PACS Dataset**

In [16]:
import timm
import torch
import torch.nn as nn
import torchvision.transforms as transforms
# from data_pacs import PACS
from torch.utils.data import DataLoader
import torch.optim as optim
from torch.cuda.amp import autocast, GradScaler
import time
# from efficientnet_b3_model import load_efficientnetb3_model
# from data_pacs import get_data_loaders_pacs

def train_model(model, train_loader, criterion, optimizer, device, num_epochs=5):
    model.train()
    scaler = GradScaler()

    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        start_time = time.time()

        for batch_idx, (inputs, labels) in enumerate(train_loader):
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            with autocast():
                outputs = model(inputs)
                loss = criterion(outputs, labels)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            running_loss += loss.item()

            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

            if (batch_idx + 1) % 20 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Batch [{batch_idx + 1}/{len(train_loader)}], Loss: {loss.item():.4f}")

        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = 100 * correct / total
        print(f"Epoch [{epoch + 1}/{num_epochs}] completed in {time.time() - start_time:.2f} seconds. "
              f"Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%")

print ("sanity check")


sanity check


In [17]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_loader, test_loader_art, test_loader_cartoon, test_loader_sketches, num_classes = get_data_loaders_pacs()
model = load_efficientnetb3_model(num_classes, device, task='nopath')

for name, param in model.named_parameters():
    print(name, param.requires_grad)

for name, param in model.named_parameters():
    if "some_specific_layer" in name:
        param.requires_grad = False

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=2)

torch.save(model.state_dict(), 'fine_tuned_enetb3_task32.pth')

print("hogya")



enetb3.conv_stem.weight True
enetb3.bn1.weight True
enetb3.bn1.bias True
enetb3.blocks.0.0.conv_dw.weight True
enetb3.blocks.0.0.bn1.weight True
enetb3.blocks.0.0.bn1.bias True
enetb3.blocks.0.0.se.conv_reduce.weight True
enetb3.blocks.0.0.se.conv_reduce.bias True
enetb3.blocks.0.0.se.conv_expand.weight True
enetb3.blocks.0.0.se.conv_expand.bias True
enetb3.blocks.0.0.conv_pw.weight True
enetb3.blocks.0.0.bn2.weight True
enetb3.blocks.0.0.bn2.bias True
enetb3.blocks.0.1.conv_dw.weight True
enetb3.blocks.0.1.bn1.weight True
enetb3.blocks.0.1.bn1.bias True
enetb3.blocks.0.1.se.conv_reduce.weight True
enetb3.blocks.0.1.se.conv_reduce.bias True
enetb3.blocks.0.1.se.conv_expand.weight True
enetb3.blocks.0.1.se.conv_expand.bias True
enetb3.blocks.0.1.conv_pw.weight True
enetb3.blocks.0.1.bn2.weight True
enetb3.blocks.0.1.bn2.bias True
enetb3.blocks.1.0.conv_pw.weight True
enetb3.blocks.1.0.bn1.weight True
enetb3.blocks.1.0.bn1.bias True
enetb3.blocks.1.0.conv_dw.weight True
enetb3.blocks.1.0

  scaler = GradScaler()
  with autocast():


Epoch [1/2], Batch [20/27], Loss: 1.0420
Epoch [1/2] completed in 154.58 seconds. Loss: 1.2902, Accuracy: 76.41%
Epoch [2/2], Batch [20/27], Loss: 0.2286
Epoch [2/2] completed in 14.17 seconds. Loss: 0.3418, Accuracy: 98.74%
hogya


# **Evaluating the Model on the PACS Dataset**

In [18]:
import torch
# from efficientnet_b3_model import load_efficientnetb3_model
# from data_pacs import get_data_loaders_pacs
from sklearn.metrics import confusion_matrix
import numpy as np

def evaluate_model(model, dataloader, device):
    model.eval()
    correct = 0
    total = 0
    all_labels = []
    all_predicted = []

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            all_labels.extend(labels.cpu().numpy())
            all_predicted.extend(predicted.cpu().numpy())

    conf_matrix=confusion_matrix(all_labels,all_predicted)
    classwise_accuracies=np.zeros((7,1))
    for i in range(7):
        total_class_labels=0
        for j in range(7):
            total_class_labels += conf_matrix[i,j]
        classwise_accuracies[i,0]=conf_matrix[i,i]/total_class_labels
    accuracy = 100 * correct / total

    print("Confusion Matrix")
    print(conf_matrix)
    print(f"Accuracy on PACS test set: {accuracy:.2f}%")
    print("Classwise Accuracies:")
    print(classwise_accuracies)

print("sanity check")

sanity check


In [19]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load PACS dataset
train_loader, test_loader_art, test_loader_cartoon, test_loader_sketches, num_classes = get_data_loaders_pacs()

# Load model
model = load_efficientnetb3_model(num_classes, device,task='task32')

print("Evaluating on Art Dataset")
evaluate_model(model, test_loader_art, device)

print("Evaluating on Cartoon Dataset")
evaluate_model(model, test_loader_cartoon, device)

print("Evaluating on Sketches Dataset")
evaluate_model(model, test_loader_sketches, device)

Evaluating on Art Dataset


  model.load_state_dict(torch.load(f'fine_tuned_enetb3_{task}.pth'))


Confusion Matrix
[[264   2  29  11  69   1   3]
 [  6 149  51  20  27   1   1]
 [  5   1 252  17   8   2   0]
 [  0   0   5 166   7   6   0]
 [ 12   1  15   4 167   2   0]
 [  2   1  19  27   5 241   0]
 [ 74  11  60  49 131  11 113]]
Accuracy on PACS test set: 66.02%
Classwise Accuracies:
[[0.69656992]
 [0.58431373]
 [0.88421053]
 [0.90217391]
 [0.83084577]
 [0.81694915]
 [0.25167038]]
Evaluating on Cartoon Dataset




Confusion Matrix
[[ 33   0   7 335  10   1   3]
 [  6  38   4 402   7   0   0]
 [  4   0 164 177   0   1   0]
 [  0   0   0 135   0   0   0]
 [  2   0  33 162 127   0   0]
 [  4   0   1  90   2 191   0]
 [ 18   0  12 342  11   7  15]]
Accuracy on PACS test set: 29.99%
Classwise Accuracies:
[[0.0848329 ]
 [0.08315098]
 [0.47398844]
 [1.        ]
 [0.39197531]
 [0.66319444]
 [0.03703704]]
Evaluating on Sketches Dataset




Confusion Matrix
[[ 10   1  63 675  17   6   0]
 [  5   4  16 687  19   9   0]
 [  4   0 208 538   3   0   0]
 [  0   0   1 607   0   0   0]
 [  2   0 116 621  72   5   0]
 [  3   0   3  32   1  41   0]
 [ 10   0  15 135   0   0   0]]
Accuracy on PACS test set: 23.98%
Classwise Accuracies:
[[0.01295337]
 [0.00540541]
 [0.27622842]
 [0.99835526]
 [0.08823529]
 [0.5125    ]
 [0.        ]]
