### Task 3 – Fusion Architecture Comparison (RGB + LiDAR)

In [None]:
!pip install --upgrade wandb

In [2]:
import wandb
wandb.login()

  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice:

  2


[34m[1mwandb[0m: You chose 'Use an existing W&B account'
[34m[1mwandb[0m: Logging into https://api.wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: Find your API key here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

  ········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mjain5[0m ([33mjain5-university-of-potsdam[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

import sys
sys.path.append("/kaggle/input/src-cilp-assessment")

In [6]:
from src.models import LateFusionClassifier
print("✅ src imported correctly")

✅ src imported correctly


In [7]:
import os

DATA_ROOT = "/kaggle/input/cilp-assessment-data/assessment"
print("DATA_ROOT exists:", os.path.exists(DATA_ROOT))
print("Cubes RGB:", len(os.listdir(os.path.join(DATA_ROOT, "cubes", "rgb"))))
print("Cubes LiDAR:", len(os.listdir(os.path.join(DATA_ROOT, "cubes", "lidar"))))
print("Spheres RGB:", len(os.listdir(os.path.join(DATA_ROOT, "spheres", "rgb"))))
print("Spheres LiDAR:", len(os.listdir(os.path.join(DATA_ROOT, "spheres", "lidar"))))

DATA_ROOT exists: True
Cubes RGB: 9999
Cubes LiDAR: 9999
Spheres RGB: 9999
Spheres LiDAR: 9999


## Imports and setup

In [8]:
import torch
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
import numpy as np
from pathlib import Path

## Dataset and Dataloaders

In [9]:
class SimpleCILPDataset(Dataset):
    def __init__(self, root, split="train", transform=None, seed=42):
        self.transform = transform
        self.samples = []

        rng = np.random.RandomState(seed)

        for label_name, label_id in [("cubes", 0), ("spheres", 1)]:
            rgb_dir = Path(root) / label_name / "rgb"
            lidar_dir = Path(root) / label_name / "lidar"

            rgb = {p.stem: p for p in rgb_dir.glob("*.png")}
            lidar = {p.stem: p for p in lidar_dir.glob("*.npy")}

            common = sorted(set(rgb) & set(lidar))
            rng.shuffle(common)

            split_idx = int(0.8 * len(common))
            selected = common[:split_idx] if split == "train" else common[split_idx:]

            for stem in selected:
                self.samples.append((
                    rgb[stem],
                    lidar[stem],
                    label_id
                ))

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        rgb_path, lidar_path, label = self.samples[idx]

        rgb = Image.open(rgb_path).convert("RGB")
        if self.transform:
            rgb = self.transform(rgb)

        lidar = torch.tensor(np.load(lidar_path), dtype=torch.float32)
        label = torch.tensor(label, dtype=torch.long)

        return rgb, lidar, label

In [10]:
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

train_dataset = SimpleCILPDataset(DATA_ROOT, split="train", transform=transform)
val_dataset   = SimpleCILPDataset(DATA_ROOT, split="val", transform=transform)

print("Train samples:", len(train_dataset))
print("Val samples:", len(val_dataset))

rgb, lidar, label = train_dataset[0]
print("RGB:", rgb.shape)
print("LiDAR:", lidar.shape)
print("Label:", label)


Train samples: 15998
Val samples: 4000
RGB: torch.Size([3, 128, 128])
LiDAR: torch.Size([64, 64])
Label: tensor(0)


In [11]:
lidar_input_dim = 64 * 64
print("LiDAR input dim:", lidar_input_dim)

LiDAR input dim: 4096


In [12]:
import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))

True
Tesla T4


In [13]:
from torch.utils.data import DataLoader

BATCH_SIZE = 32  # safe for T4/P100

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2,
    pin_memory=True
)

val_loader = DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=2,
    pin_memory=True
)

In [14]:
import torch
import torch.nn as nn
import torch.optim as optim

device = torch.device("cuda")

model = LateFusionClassifier(
    lidar_input_dim=lidar_input_dim,
    embedding_dim=128,
    num_classes=2
).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)


In [15]:
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

wandb.init(
    project="cilp-extended-assessment",
    name="late-fusion-baseline",
    config={
        "task": "task_3_fusion_comparison",
        "fusion_strategy": "late",
        "model_architecture": model.__class__.__name__,
        "embedding_size": 128,
        "batch_size": BATCH_SIZE,
        "learning_rate": optimizer.param_groups[0]["lr"],
        "optimizer": optimizer.__class__.__name__,
        "epochs": 10,
        "num_parameters": num_params,
        "dataset": "cilp-assessment",
    }
)


In [16]:
def run_epoch(model, loader, training=True):
    if training:
        model.train()
    else:
        model.eval()

    total_loss = 0.0
    correct = 0
    total = 0

    with torch.set_grad_enabled(training):
        for rgb, lidar, labels in loader:
            rgb = rgb.to(device)
            lidar = lidar.to(device)
            labels = labels.to(device)

            outputs = model(rgb, lidar)
            loss = criterion(outputs, labels)

            if training:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            total_loss += loss.item() * labels.size(0)
            preds = outputs.argmax(dim=1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    avg_loss = total_loss / total
    accuracy = correct / total
    return avg_loss, accuracy

In [17]:
import time
start_time = time.time()

### Late Fusion Baseline Training Loop

In [18]:
EPOCHS = 10

for epoch in range(1, EPOCHS + 1):
    train_loss, train_acc = run_epoch(model, train_loader, training=True)
    val_loss, val_acc = run_epoch(model, val_loader, training=False)

    print(
        f"Epoch {epoch}: "
        f"Train Loss={train_loss:.4f}, Train Acc={train_acc:.4f} | "
        f"Val Loss={val_loss:.4f}, Val Acc={val_acc:.4f}"
    )

    current_lr = optimizer.param_groups[0]["lr"]
    wandb.log(
        {
            "epoch": epoch,
            "train_loss": train_loss,
            "train_acc": train_acc,
            "val_loss": val_loss,
            "val_acc": val_acc,
            "learning_rate": current_lr,
        }
    )

Epoch 1: Train Loss=0.8112, Train Acc=0.7717 | Val Loss=0.2197, Val Acc=0.9280
Epoch 2: Train Loss=0.0989, Train Acc=0.9703 | Val Loss=0.0224, Val Acc=0.9960
Epoch 3: Train Loss=0.0287, Train Acc=0.9915 | Val Loss=0.0176, Val Acc=0.9945
Epoch 4: Train Loss=0.0237, Train Acc=0.9929 | Val Loss=0.0113, Val Acc=0.9970
Epoch 5: Train Loss=0.0155, Train Acc=0.9958 | Val Loss=0.0105, Val Acc=0.9972
Epoch 6: Train Loss=0.0167, Train Acc=0.9946 | Val Loss=0.0343, Val Acc=0.9918
Epoch 7: Train Loss=0.0146, Train Acc=0.9955 | Val Loss=0.0148, Val Acc=0.9948
Epoch 8: Train Loss=0.0088, Train Acc=0.9971 | Val Loss=0.0066, Val Acc=0.9982
Epoch 9: Train Loss=0.0138, Train Acc=0.9958 | Val Loss=0.0067, Val Acc=0.9975
Epoch 10: Train Loss=0.0151, Train Acc=0.9955 | Val Loss=0.0143, Val Acc=0.9948


In [19]:
total_training_time = time.time() - start_time
time_per_epoch = total_training_time / EPOCHS

wandb.log({
    "total_training_time_sec": total_training_time,
    "time_per_epoch_sec": time_per_epoch,
})

In [20]:
if torch.cuda.is_available():
    max_mem_mb = torch.cuda.max_memory_allocated() / (1024 ** 2)
    wandb.log({"max_gpu_memory_mb": max_mem_mb})


In [21]:
wandb.finish()

0,1
epoch,▁▂▃▃▄▅▆▆▇█
learning_rate,▁▁▁▁▁▁▁▁▁▁
max_gpu_memory_mb,▁
time_per_epoch_sec,▁
total_training_time_sec,▁
train_acc,▁▇████████
train_loss,█▂▁▁▁▁▁▁▁▁
val_acc,▁████▇████
val_loss,█▂▁▁▁▂▁▁▁▁

0,1
epoch,10.0
learning_rate,0.001
max_gpu_memory_mb,309.7207
time_per_epoch_sec,34.35891
total_training_time_sec,343.58907
train_acc,0.9955
train_loss,0.01509
val_acc,0.99475
val_loss,0.0143


In [23]:
from src.models import IntermediateFusionAdd

model = IntermediateFusionAdd(
    embedding_dim=128,
    num_classes=2
).to(device)

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=1e-3
)

num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

wandb.init(
    project="cilp-extended-assessment",
    name="intermediate-fusion-addition",
    config={
        "task": "task_3_fusion_comparison",
        "fusion_strategy": "intermediate_addition",
        "model_architecture": model.__class__.__name__,
        "embedding_size": 128,
        "batch_size": BATCH_SIZE,
        "learning_rate": optimizer.param_groups[0]["lr"],
        "optimizer": optimizer.__class__.__name__,
        "epochs": EPOCHS,
        "num_parameters": num_params,
        "dataset": "cilp-assessment",
    }
)

In [24]:
rgb, lidar, _ = next(iter(train_loader))
print(rgb.shape)    # expected: [B, 3, 128, 128]
print(lidar.shape)  # expected: [B, 64, 64]

torch.Size([32, 3, 128, 128])
torch.Size([32, 64, 64])


In [25]:
import time
start_time = time.time()

### Intermediate Fusion Addition Training Loop

In [26]:
EPOCHS = 10

for epoch in range(1, EPOCHS + 1):
    train_loss, train_acc = run_epoch(model, train_loader, training=True)
    val_loss, val_acc = run_epoch(model, val_loader, training=False)

    print(
        f"Epoch {epoch}: "
        f"Train Loss={train_loss:.4f}, Train Acc={train_acc:.4f} | "
        f"Val Loss={val_loss:.4f}, Val Acc={val_acc:.4f}"
    )

    current_lr = optimizer.param_groups[0]["lr"]
    wandb.log({
        "epoch": epoch,
        "train_loss": train_loss,
        "train_acc": train_acc,
        "val_loss": val_loss,
        "val_acc": val_acc,
        "learning_rate": current_lr,
    })

Epoch 1: Train Loss=0.5312, Train Acc=0.7292 | Val Loss=0.3413, Val Acc=0.8698
Epoch 2: Train Loss=0.2567, Train Acc=0.9087 | Val Loss=0.1543, Val Acc=0.9577
Epoch 3: Train Loss=0.1111, Train Acc=0.9697 | Val Loss=0.0649, Val Acc=0.9840
Epoch 4: Train Loss=0.0347, Train Acc=0.9922 | Val Loss=0.0195, Val Acc=0.9942
Epoch 5: Train Loss=0.0123, Train Acc=0.9980 | Val Loss=0.0084, Val Acc=0.9988
Epoch 6: Train Loss=0.0056, Train Acc=0.9994 | Val Loss=0.0037, Val Acc=0.9995
Epoch 7: Train Loss=0.0042, Train Acc=0.9994 | Val Loss=0.0014, Val Acc=1.0000
Epoch 8: Train Loss=0.0013, Train Acc=0.9999 | Val Loss=0.0012, Val Acc=1.0000
Epoch 9: Train Loss=0.0007, Train Acc=1.0000 | Val Loss=0.0005, Val Acc=1.0000
Epoch 10: Train Loss=0.0386, Train Acc=0.9901 | Val Loss=0.0157, Val Acc=0.9945


In [27]:
total_training_time = time.time() - start_time
time_per_epoch = total_training_time / EPOCHS

wandb.log({
    "total_training_time_sec": total_training_time,
    "time_per_epoch_sec": time_per_epoch,
})

In [28]:
if torch.cuda.is_available():
    max_mem_mb = torch.cuda.max_memory_allocated() / (1024 ** 2)
    wandb.log({"max_gpu_memory_mb": max_mem_mb})


In [29]:
wandb.finish()

0,1
epoch,▁▂▃▃▄▅▆▆▇█
learning_rate,▁▁▁▁▁▁▁▁▁▁
max_gpu_memory_mb,▁
time_per_epoch_sec,▁
total_training_time_sec,▁
train_acc,▁▆▇███████
train_loss,█▄▂▁▁▁▁▁▁▁
val_acc,▁▆▇███████
val_loss,█▄▂▁▁▁▁▁▁▁

0,1
epoch,10.0
learning_rate,0.001
max_gpu_memory_mb,331.59961
time_per_epoch_sec,27.71247
total_training_time_sec,277.12469
train_acc,0.99006
train_loss,0.03857
val_acc,0.9945
val_loss,0.01571


In [30]:
from src.models import IntermediateFusionConcat

model = IntermediateFusionConcat(
    lidar_input_dim=lidar_input_dim,
    embedding_dim=128,
    num_classes=2
).to(device)

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=1e-3
)

num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

wandb.init(
    project="cilp-extended-assessment",
    name="intermediate-fusion-concat",
    config={
        "task": "task_3_fusion_comparison",
        "fusion_strategy": "intermediate_concat",
        "model_architecture": model.__class__.__name__,
        "embedding_size": 128,
        "batch_size": BATCH_SIZE,
        "learning_rate": optimizer.param_groups[0]["lr"],
        "optimizer": optimizer.__class__.__name__,
        "epochs": EPOCHS,
        "num_parameters": num_params,
        "dataset": "cilp-assessment",
    }
)

In [31]:
import time
start_time = time.time()

### Intermediate Fusion Concat Training Loop

In [32]:
EPOCHS = 10

for epoch in range(1, EPOCHS + 1):
    train_loss, train_acc = run_epoch(model, train_loader, training=True)
    val_loss, val_acc = run_epoch(model, val_loader, training=False)

    print(
        f"Epoch {epoch}: "
        f"Train Loss={train_loss:.4f}, Train Acc={train_acc:.4f} | "
        f"Val Loss={val_loss:.4f}, Val Acc={val_acc:.4f}"
    )

    current_lr = optimizer.param_groups[0]["lr"]
    wandb.log({
        "epoch": epoch,
        "train_loss": train_loss,
        "train_acc": train_acc,
        "val_loss": val_loss,
        "val_acc": val_acc,
        "learning_rate": current_lr,
    })

Epoch 1: Train Loss=0.6643, Train Acc=0.7027 | Val Loss=0.4230, Val Acc=0.8005
Epoch 2: Train Loss=0.3039, Train Acc=0.8788 | Val Loss=0.2163, Val Acc=0.9215
Epoch 3: Train Loss=0.2068, Train Acc=0.9294 | Val Loss=0.2045, Val Acc=0.9287
Epoch 4: Train Loss=0.1760, Train Acc=0.9413 | Val Loss=0.1348, Val Acc=0.9537
Epoch 5: Train Loss=0.1627, Train Acc=0.9457 | Val Loss=0.1496, Val Acc=0.9510
Epoch 6: Train Loss=0.1411, Train Acc=0.9559 | Val Loss=0.1496, Val Acc=0.9455
Epoch 7: Train Loss=0.1276, Train Acc=0.9601 | Val Loss=0.1051, Val Acc=0.9630
Epoch 8: Train Loss=0.1087, Train Acc=0.9666 | Val Loss=0.0632, Val Acc=0.9790
Epoch 9: Train Loss=0.0598, Train Acc=0.9810 | Val Loss=0.0273, Val Acc=0.9890
Epoch 10: Train Loss=0.0261, Train Acc=0.9913 | Val Loss=0.0178, Val Acc=0.9955


In [33]:
total_training_time = time.time() - start_time
time_per_epoch = total_training_time / EPOCHS

wandb.log({
    "total_training_time_sec": total_training_time,
    "time_per_epoch_sec": time_per_epoch,
})

In [34]:
if torch.cuda.is_available():
    max_mem_mb = torch.cuda.max_memory_allocated() / (1024 ** 2)
    wandb.log({"max_gpu_memory_mb": max_mem_mb})


In [35]:
wandb.finish()

0,1
epoch,▁▂▃▃▄▅▆▆▇█
learning_rate,▁▁▁▁▁▁▁▁▁▁
max_gpu_memory_mb,▁
time_per_epoch_sec,▁
total_training_time_sec,▁
train_acc,▁▅▆▇▇▇▇▇██
train_loss,█▄▃▃▂▂▂▂▁▁
val_acc,▁▅▆▇▆▆▇▇██
val_loss,█▄▄▃▃▃▃▂▁▁

0,1
epoch,10.0
learning_rate,0.001
max_gpu_memory_mb,331.59961
time_per_epoch_sec,28.50228
total_training_time_sec,285.02278
train_acc,0.99131
train_loss,0.02607
val_acc,0.9955
val_loss,0.0178


In [36]:
from src.models import IntermediateFusionHadamard

model = IntermediateFusionHadamard(
    lidar_input_dim=lidar_input_dim,
    embedding_dim=128,
    num_classes=2
).to(device)

optimizer = torch.optim.Adam(
    model.parameters(),
    lr=1e-3
)

num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

wandb.init(
    project="cilp-extended-assessment",
    name="intermediate-fusion-hadamard",
    config={
        "task": "task_3_fusion_comparison",
        "fusion_strategy": "intermediate_hadamard",
        "model_architecture": model.__class__.__name__,
        "embedding_size": 128,
        "batch_size": BATCH_SIZE,
        "learning_rate": optimizer.param_groups[0]["lr"],
        "optimizer": optimizer.__class__.__name__,
        "epochs": EPOCHS,
        "num_parameters": num_params,
        "dataset": "cilp-assessment",
    }
)

In [37]:
import time
start_time = time.time()

### Intermediate Fusion Hadamard Training Loop

In [38]:
EPOCHS = 10

for epoch in range(1, EPOCHS + 1):
    train_loss, train_acc = run_epoch(model, train_loader, training=True)
    val_loss, val_acc = run_epoch(model, val_loader, training=False)

    print(
        f"Epoch {epoch}: "
        f"Train Loss={train_loss:.4f}, Train Acc={train_acc:.4f} | "
        f"Val Loss={val_loss:.4f}, Val Acc={val_acc:.4f}"
    )

    current_lr = optimizer.param_groups[0]["lr"]
    wandb.log({
        "epoch": epoch,
        "train_loss": train_loss,
        "train_acc": train_acc,
        "val_loss": val_loss,
        "val_acc": val_acc,
        "learning_rate": current_lr,
    })

Epoch 1: Train Loss=0.3417, Train Acc=0.8446 | Val Loss=0.0637, Val Acc=0.9742
Epoch 2: Train Loss=0.0480, Train Acc=0.9815 | Val Loss=0.1359, Val Acc=0.9493
Epoch 3: Train Loss=0.0247, Train Acc=0.9917 | Val Loss=0.0081, Val Acc=0.9970
Epoch 4: Train Loss=0.0294, Train Acc=0.9901 | Val Loss=0.0255, Val Acc=0.9902
Epoch 5: Train Loss=0.0082, Train Acc=0.9971 | Val Loss=0.0099, Val Acc=0.9970
Epoch 6: Train Loss=0.0117, Train Acc=0.9962 | Val Loss=0.0083, Val Acc=0.9962
Epoch 7: Train Loss=0.0084, Train Acc=0.9971 | Val Loss=0.0072, Val Acc=0.9980
Epoch 8: Train Loss=0.0130, Train Acc=0.9951 | Val Loss=0.0405, Val Acc=0.9905
Epoch 9: Train Loss=0.0158, Train Acc=0.9947 | Val Loss=0.0166, Val Acc=0.9950
Epoch 10: Train Loss=0.0054, Train Acc=0.9984 | Val Loss=0.0059, Val Acc=0.9988


In [39]:
total_training_time = time.time() - start_time
time_per_epoch = total_training_time / EPOCHS

wandb.log({
    "total_training_time_sec": total_training_time,
    "time_per_epoch_sec": time_per_epoch,
})

In [40]:
if torch.cuda.is_available():
    max_mem_mb = torch.cuda.max_memory_allocated() / (1024 ** 2)
    wandb.log({"max_gpu_memory_mb": max_mem_mb})


In [41]:
wandb.finish()

0,1
epoch,▁▂▃▃▄▅▆▆▇█
learning_rate,▁▁▁▁▁▁▁▁▁▁
max_gpu_memory_mb,▁
time_per_epoch_sec,▁
total_training_time_sec,▁
train_acc,▁▇████████
train_loss,█▂▁▂▁▁▁▁▁▁
val_acc,▅▁█▇███▇▇█
val_loss,▄█▁▂▁▁▁▃▂▁

0,1
epoch,10.0
learning_rate,0.001
max_gpu_memory_mb,331.59961
time_per_epoch_sec,27.65967
total_training_time_sec,276.59673
train_acc,0.99844
train_loss,0.00535
val_acc,0.99875
val_loss,0.00587


## Fusion Architecture Comparison

In this task, I implemented one late-fusion baseline and three **intermediate** fusion variants (concatenation, addition and Hadamard product) to combine RGB and LiDAR features for cube vs. sphere classification. All models were trained with the same hyperparameters: batch size 32, embedding size 128, learning rate 0.001, 10 epochs, and Adam optimizer to ensure a fair comparison. The experiment tracking and metric logging were done with Weights & Biases under the project tag `cilp-extended-assessment` and task tag `task_3_fusion_comparison`. 

### Quantitative results

All four models converged to very low training and validation losses and achieved near‑perfect accuracy on the validation set. The table below summarizes the key metrics required by the assignment.

| Metric | Late fusion (baseline) | Intermediate – concat | Intermediate – addition | Intermediate – Hadamard |
| --- | --- | --- | --- | --- |
| Final train loss | 0.0151 | 0.0261 | 0.0386 | 0.00535 |
| Final val loss | 0.0143 | 0.0178 | 0.0157 | 0.00587 |
| Final train accuracy | 0.9955 | 0.9913 | 0.9901 | 0.9984 |
| Final val accuracy | 0.9948 | 0.9955 | 0.9945 | 0.9988 |
| Parameters (count) | 1,224,642 | 1,290,434 | 1,123,222 | 1,208,258 |
| Max GPU memory (MB) | 309.7 | 285.0 | 277.1 | 276.6 |
| Time per epoch (sec) | 34.36 | 28.50 | 27.71 | 27.66 |
| Total training time (sec) | 343.6 | 285.0 | 277.1 | 276.6 |

### Which architecture performed best?

All architectures met the performance requirements, but the intermediate‑fusion Hadamard model achieved the lowest validation loss (≈0.0059) and highest validation accuracy (≈99.9%), while also having low training loss. In addition, it trained slightly faster per epoch than the late‑fusion baseline and used the least GPU memory among all runs. The intermediate addition variant also outperformed late fusion in terms of validation loss and computational efficiency, although its accuracy was marginally lower than Hadamard and concat.

### Trade‑offs and discussion

From a parameter‑efficiency perspective, intermediate addition is the lightest model (≈1.12M parameters), followed by intermediate Hadamard and the late‑fusion baseline, while the intermediate concat model is the heaviest. Despite having fewer parameters than late fusion, both intermediate addition and Hadamard achieved equal or better validation accuracy and lower validation loss, indicating that sharing later layers after fusion can use parameters more efficiently. The late‑fusion baseline required the largest total training time and GPU memory, because each modality is processed almost independently until the final layers, leading to redundant computation and larger feature representations. Intermediate concat achieves strong accuracy but at a higher parameter cost; it is therefore less attractive if memory is constrained.

### Recommendations

For this dataset, intermediate fusion with a Hadamard product offers the best balance between accuracy, loss, computational cost and memory usage, and I therefore selected it as my preferred fusion strategy for later tasks. When absolute performance is the only priority and GPU memory is sufficient, intermediate Hadamard is a strong choice; when parameter count is more critical (e.g., deployment on resource‑limited devices), intermediate addition provides a good compromise with competitive accuracy at the lowest parameter count. Late fusion remains useful when the modality‑specific encoders need to be reused independently or when we want clearer separation between unimodal branches, but in this experiment it is dominated by the intermediate fusion strategies in both performance and efficiency.
