# **Task 4: Inductive Biases of Models: Semantic Biases**
In this task, you will evaluate the selected models for their inductive biases due to image semantics. To systematically
evaluate these biases, we will focus on three key factors: shape bias, texture bias, and color bias. For example,
shape bias refers to the model’s (EfficientNet-B3) ability to rely on object shape over other visual cues like texture or color. This can
be quantified by comparing the shape accuracy (the model’s accuracy when presented with shape-reliant data)
to the overall accuracy (the model’s overall accuracy across different visual cues). Shape bias can be formally
evaluated using the ratio:
Shape Bias =Shape Accuracy/Total Accuracy
Similarly, texture bias and color bias
can be measured by evaluating model performance on datasets where texture and color dominate the discriminative
features.
To conduct this analysis, you will need to create or source separate datasets to evaluate each bias individually. For
instance:
A shape bias dataset may consist of line drawings or silhouettes of objects,

*   where only the object’s shape
is preserved, and other features like texture and color are removed.

*   A texture bias dataset could include texture-based alterations of the objects, where textures are preserved
but shapes are distorted.

* A color bias dataset may contain images where color is emphasized or altered in ways that challenge the
model’s reliance on this feature. This may require some extra thought.
We will use these separate datasets to compute our measures individually: e.g. the
performance on the shape bias dataset will be a proxy measure for our shape bias (since we have emphasized the
role of shape cues by taking away other pieces of information).

By evaluating the models on these specialized datasets, you can identify which visual cues each model
relies on the most. This analysis will provide deeper insights into the underlying reasons for the performance
drops observed earlier, allowing you to pinpoint what might have gone wrong or right in the OOD evaluations.
Understanding these biases is crucial for improving models to achieve more human-like generalization capabilities.
You can also take cues from the previous sections and justify why
an existing dataset is a good representative for a certain type of bias (e.g. silhouettes reflecting shape bias).

In [29]:
pip install timm



In [30]:
import tensorflow as tf
tf.test.gpu_device_name()
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **Defining the Efficient Net Model**

In [31]:
import timm
import torch
import torch.nn as nn


class EfficientNetB3Model(nn.Module):
    def __init__(self, num_classes, pretrained=True):
        super(EfficientNetB3Model, self).__init__()
        self.enetb3 = timm.create_model('efficientnet_b3', pretrained=pretrained)
        self.enetb3.classifier = nn.Linear(self.enetb3.classifier.in_features, num_classes)



    def forward(self, x):
        x = self.enetb3(x)
        return x

def load_efficientnetb3_model(num_classes, device,task):
    model = EfficientNetB3Model(num_classes)
    if task!='nopath':
        model.load_state_dict(torch.load(f'fine_tuned_enetb3_{task}.pth'))
    model = model.to(device)
    return model
print("sanity check")

sanity check


# **Verifying the Model**

In [32]:
import torch
# from efficientnet_b3_model import load_efficientnetb3_model

def verify_efficientnetb3_model():
    device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
    print(f"Using device: {device}")

    num_classes = 10

    model = load_efficientnetb3_model(num_classes, device, task='nopath')
    print(f"Model loaded successfully. Number of classes: {num_classes}")

    print("\nModel Architecture:")
    print(model)

    batch_size = 1
    dummy_input = torch.randn(batch_size, 3, 112, 112).to(device)
    print(f"\nDummy input shape: {dummy_input.shape}")

    try:
        with torch.no_grad():
            output = model(dummy_input)
        print("Forward pass successful!")
        print(f"Output shape: {output.shape}")

        expected_shape = (batch_size, num_classes)
        assert output.shape == expected_shape, f"Expected output shape {expected_shape}, but got {output.shape}"
        print("Output shape is correct.")

        if device.type == 'cuda':
          torch.cuda.empty_cache()
        elif device.type == 'mps':
          torch.mps.empty_cache()

    except Exception as e:
        print(f"Error during forward pass: {str(e)}")
        return

    print("\nModel verification completed successfully!")
print("sanity check")

sanity check


In [33]:
verify_efficientnetb3_model()

Using device: cpu
Model loaded successfully. Number of classes: 10

Model Architecture:
EfficientNetB3Model(
  (enetb3): EfficientNet(
    (conv_stem): Conv2d(3, 40, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (bn1): BatchNormAct2d(
      40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
      (drop): Identity()
      (act): SiLU(inplace=True)
    )
    (blocks): Sequential(
      (0): Sequential(
        (0): DepthwiseSeparableConv(
          (conv_dw): Conv2d(40, 40, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=40, bias=False)
          (bn1): BatchNormAct2d(
            40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True
            (drop): Identity()
            (act): SiLU(inplace=True)
          )
          (aa): Identity()
          (se): SqueezeExcite(
            (conv_reduce): Conv2d(40, 10, kernel_size=(1, 1), stride=(1, 1))
            (act1): SiLU(inplace=True)
            (conv_expand): Conv2d(10, 40, ker

# **Loading the Datasets**

In [34]:
# from google.colab import drive
# drive.mount('/content/drive')

In [35]:
import os
import torch
from torch.utils.data import Dataset
from PIL import Image
import torchvision.transforms as transforms
from torch.utils.data import DataLoader


class Task4(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.categories = sorted(os.listdir(os.path.join(root_dir)))
        self.images = []
        self.labels = []

        for category in self.categories:
            category_dir = os.path.join(root_dir, category)
            for image_file in os.listdir(category_dir):
                image_path = os.path.join(category_dir, image_file)
                self.images.append(image_path)
                self.labels.append(self.categories.index(category))

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        image_path = self.images[index]
        label = self.labels[index]
        image = Image.open(image_path)
        if self.transform:
            image = self.transform(image)
        return image, label


def get_data_loaders_task4(batch_size,task):
    # Define image transformations
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.Lambda(lambda img: img.convert("RGB")),  # Ensure 3 channels
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])

    # Load Animal10 dataset
    train_dataset = Task4(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/animal10',transform=transform)

    if task=='og':
      test_dataset = Task4(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/small_animal_dataset_updated', transform=transform)
    elif task=='shape':
      test_dataset = Task4(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/canny/val', transform=transform)
    elif task=='texture':
      test_dataset= Task4(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/Stylized_images', transform=transform)
    elif task=='color':
      test_dataset=Task4(root_dir='/content/drive/MyDrive/Advanced_ML/PA1/grayscale_images', transform=transform)



    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4)
    return train_loader,  test_loader, 10

print("sanity check")

sanity check


# **Fine Tuning the Model on the original Dataset**

In [36]:
import timm
import torch
import torch.nn as nn
import torchvision.transforms as transforms
# from data_task4 import Task4
from torch.utils.data import DataLoader
import torch.optim as optim
from torch.cuda.amp import autocast, GradScaler
import time
# from efficientnet_b3_model import load_efficientnetb3_model
# from data_task4 import get_data_loaders_task4

def train_model(model, train_loader, criterion, optimizer, device, num_epochs=5):
    model.train()
    scaler = GradScaler()

    for epoch in range(num_epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        start_time = time.time()

        for batch_idx, (inputs, labels) in enumerate(train_loader):
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            with autocast():
                outputs = model(inputs)
                loss = criterion(outputs, labels)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            running_loss += loss.item()

            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

            if (batch_idx + 1) % 20 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Batch [{batch_idx + 1}/{len(train_loader)}], Loss: {loss.item():.4f}")

        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = 100 * correct / total
        print(f"Epoch [{epoch + 1}/{num_epochs}] completed in {time.time() - start_time:.2f} seconds. "
              f"Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%")

print ("sanity check")


sanity check


In [37]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_loader, test_loader, num_classes = get_data_loaders_task4(64,task='shape')
model = load_efficientnetb3_model(num_classes, device, task='nopath')

for name, param in model.named_parameters():
    print(name, param.requires_grad)

for name, param in model.named_parameters():
    if "some_specific_layer" in name:
        param.requires_grad = False

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4)

train_model(model, train_loader, criterion, optimizer, device, num_epochs=2)

torch.save(model.state_dict(), 'fine_tuned_enetb3_task4.pth')

print("hogya")

enetb3.conv_stem.weight True
enetb3.bn1.weight True
enetb3.bn1.bias True
enetb3.blocks.0.0.conv_dw.weight True
enetb3.blocks.0.0.bn1.weight True
enetb3.blocks.0.0.bn1.bias True
enetb3.blocks.0.0.se.conv_reduce.weight True
enetb3.blocks.0.0.se.conv_reduce.bias True
enetb3.blocks.0.0.se.conv_expand.weight True
enetb3.blocks.0.0.se.conv_expand.bias True
enetb3.blocks.0.0.conv_pw.weight True
enetb3.blocks.0.0.bn2.weight True
enetb3.blocks.0.0.bn2.bias True
enetb3.blocks.0.1.conv_dw.weight True
enetb3.blocks.0.1.bn1.weight True
enetb3.blocks.0.1.bn1.bias True
enetb3.blocks.0.1.se.conv_reduce.weight True
enetb3.blocks.0.1.se.conv_reduce.bias True
enetb3.blocks.0.1.se.conv_expand.weight True
enetb3.blocks.0.1.se.conv_expand.bias True
enetb3.blocks.0.1.conv_pw.weight True
enetb3.blocks.0.1.bn2.weight True
enetb3.blocks.0.1.bn2.bias True
enetb3.blocks.1.0.conv_pw.weight True
enetb3.blocks.1.0.bn1.weight True
enetb3.blocks.1.0.bn1.bias True
enetb3.blocks.1.0.conv_dw.weight True
enetb3.blocks.1.0

  scaler = GradScaler()
  with autocast():


Epoch [1/2], Batch [20/410], Loss: 1.6456
Epoch [1/2], Batch [40/410], Loss: 0.9754
Epoch [1/2], Batch [60/410], Loss: 0.5822
Epoch [1/2], Batch [80/410], Loss: 0.2129
Epoch [1/2], Batch [100/410], Loss: 0.2714
Epoch [1/2], Batch [120/410], Loss: 0.1115
Epoch [1/2], Batch [140/410], Loss: 0.1127
Epoch [1/2], Batch [160/410], Loss: 0.1814
Epoch [1/2], Batch [180/410], Loss: 0.0520
Epoch [1/2], Batch [200/410], Loss: 0.1597
Epoch [1/2], Batch [220/410], Loss: 0.0806
Epoch [1/2], Batch [240/410], Loss: 0.0635
Epoch [1/2], Batch [260/410], Loss: 0.1328
Epoch [1/2], Batch [280/410], Loss: 0.0392
Epoch [1/2], Batch [300/410], Loss: 0.0382
Epoch [1/2], Batch [320/410], Loss: 0.0764
Epoch [1/2], Batch [340/410], Loss: 0.1402
Epoch [1/2], Batch [360/410], Loss: 0.2627
Epoch [1/2], Batch [380/410], Loss: 0.1207
Epoch [1/2], Batch [400/410], Loss: 0.0146
Epoch [1/2] completed in 2636.86 seconds. Loss: 0.3213, Accuracy: 93.03%
Epoch [2/2], Batch [20/410], Loss: 0.0611
Epoch [2/2], Batch [40/410], 

# **Evaluation**

In [38]:
import torch
# from efficientnet_b3_model import load_efficientnetb3_model
# from data_task4 import get_data_loaders_task4
from sklearn.metrics import confusion_matrix
import numpy as np

def evaluate_model(model, dataloader, device):
    model.eval()
    correct = 0
    total = 0
    all_labels = []
    all_predicted = []

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            all_labels.extend(labels.cpu().numpy())
            all_predicted.extend(predicted.cpu().numpy())

    conf_matrix=confusion_matrix(all_labels,all_predicted)
    classwise_accuracies=np.zeros((10,1))
    for i in range(10):
        total_class_labels=0
        for j in range(10):
            total_class_labels += conf_matrix[i,j]
        classwise_accuracies[i,0]=conf_matrix[i,i]/total_class_labels
    accuracy = 100 * correct / total

    print("Confusion Matrix")
    print(conf_matrix)
    print(f"Accuracy on Augmented Animal10 test set: {accuracy:.2f}%")
    print("Classwise Accuracies:")
    print(classwise_accuracies)
    return accuracy

print("sanity check")

sanity check


In [39]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model
model = load_efficientnetb3_model(num_classes, device,task='task4')
print("sanity check")

  model.load_state_dict(torch.load(f'fine_tuned_enetb3_{task}.pth'))


sanity check


# **Evaluation on Original Dataset**

In [40]:
train_loader, test_loader, num_classes = get_data_loaders_task4(64,task='og')
print("Evaluation on Original Dataset")
og_acc=evaluate_model(model, test_loader, device)

Evaluation on Original Dataset




Confusion Matrix
[[ 93   0   0   0   0   0   0   0   0   0]
 [  0 100   0   0   0   0   0   0   0   0]
 [  0   0  89   0   0   0   0   0   0   0]
 [  0   0   0 104   0   0   0   0   0   0]
 [  0   0   0   0 114   0   0   0   0   0]
 [  0   0   0   0   0 126   0   0   0   0]
 [  0   0   0   0   0   0  85   0   0   0]
 [  0   0   0   0   0   0   0  94   0   0]
 [  0   0   0   1   0   0   0   0  84   0]
 [  0   0   0   0   0   0   0   0   0  96]]
Accuracy on Augmented Animal10 test set: 99.90%
Classwise Accuracies:
[[1.        ]
 [1.        ]
 [1.        ]
 [1.        ]
 [1.        ]
 [1.        ]
 [1.        ]
 [1.        ]
 [0.98823529]
 [1.        ]]


In [41]:
print('Accuracy on Original Dataset: ', og_acc/100)

Accuracy on Original Dataset:  0.9989858012170386


# **Shape Bias**

In [42]:
train_loader, test_loader1, num_classes = get_data_loaders_task4(64,task='shape')
print("Evaluation on Canny Dataset")
shape_acc=evaluate_model(model, test_loader1, device)

Evaluation on Canny Dataset




Confusion Matrix
[[16  1  1 15  1  3  0  0 62  1]
 [ 4 18  1  2  0  0  0  0 75  0]
 [ 4  1 18  9  1  0  1  2 62  2]
 [ 0  0  0 60  0  0  0  0 40  0]
 [ 0  0  0 24 27  1  0  0 48  0]
 [ 1  0  0 22  2 29  0  1 44  1]
 [ 2  1  3 10  3  0 17  2 62  0]
 [ 3  2  1 15 11  3  4  9 52  0]
 [ 0  0  0  6  0  0  0  0 94  0]
 [ 2  0  0 15  5  3  0  2 70  3]]
Accuracy on Augmented Animal10 test set: 29.10%
Classwise Accuracies:
[[0.16]
 [0.18]
 [0.18]
 [0.6 ]
 [0.27]
 [0.29]
 [0.17]
 [0.09]
 [0.94]
 [0.03]]


In [43]:
print('Accuracy on Canny Dataset: ',shape_acc/100)
shape_bias=shape_acc/og_acc
print('Shape Bias: ',shape_bias)

Accuracy on Canny Dataset:  0.29100000000000004
Shape Bias:  0.2912954314720812


# **Texture Bias**

In [44]:
train_loader, test_loader2, num_classes = get_data_loaders_task4(64,task='texture')
print("Evaluation on Stylized Dataset")
texture_acc=evaluate_model(model, test_loader2, device)

Evaluation on Stylized Dataset




Confusion Matrix
[[30  1  0  1  2  1  0  0  1  0]
 [ 2 36  0  2  1  0  0  0  3  0]
 [ 2  1 25  1  0  0  1  0  3  0]
 [ 0  0  0 42  0  1  0  0  3  0]
 [ 0  0  0  1 27  0  0  0  0  0]
 [ 0  0  0  3  1 22  0  0  2  1]
 [ 1  1  0  1  2  1 27  0  1  0]
 [ 0  1  0  1  3  0  0 28  0  0]
 [ 0  0  0  1  1  0  0  0 33  0]
 [ 0  0  0  1  3  1  0  1  2 24]]
Accuracy on Augmented Animal10 test set: 84.00%
Classwise Accuracies:
[[0.83333333]
 [0.81818182]
 [0.75757576]
 [0.91304348]
 [0.96428571]
 [0.75862069]
 [0.79411765]
 [0.84848485]
 [0.94285714]
 [0.75      ]]


In [45]:
print("Accuracy on Stylized Dataset: ", texture_acc/100)
texture_bias=texture_acc/og_acc
print("Texture Bias: ", texture_bias)

Accuracy on Stylized Dataset:  0.84
Texture Bias:  0.8408527918781725


# **Color Bias**

In [46]:
train_loader, test_loader3, num_classes = get_data_loaders_task4(64,task='color')
print("Evaluation on Grayscaled Dataset")
color_acc=evaluate_model(model, test_loader3, device)

Evaluation on Grayscaled Dataset




Confusion Matrix
[[50  0  0  0  0  0  0  0  0  0]
 [ 0 50  0  0  0  0  0  0  0  0]
 [ 0  0 50  0  0  0  0  0  0  0]
 [ 0  0  0 50  0  0  0  0  0  0]
 [ 1  0  0  0 49  0  0  0  0  0]
 [ 0  0  0  0  0 50  0  0  0  0]
 [ 0  0  0  0  0  0 50  0  0  0]
 [ 0  0  0  0  0  0  0 50  0  0]
 [ 0  0  0  0  0  0  0  0 50  0]
 [ 0  0  0  0  0  0  0  0  0 50]]
Accuracy on Augmented Animal10 test set: 99.80%
Classwise Accuracies:
[[1.  ]
 [1.  ]
 [1.  ]
 [1.  ]
 [0.98]
 [1.  ]
 [1.  ]
 [1.  ]
 [1.  ]
 [1.  ]]


In [47]:
print("Accuracy on Grayscaled Dataset: ", color_acc/100)
color_bias=color_acc/og_acc
print("Color Bias: ", color_bias)

Accuracy on Grayscaled Dataset:  0.998
Color Bias:  0.999013197969543
