# Setup & Test - DINO + CIFAR-100

This notebook:
1. Tests the refactored `src` modules
2. Loads **DINO ViT-S/16** using `src.model`

3. Loads **CIFAR-100** using `src.data`4. Verifies everything works

## 1. Install Required Dependencies

In [1]:
!git clone https://github.com/emanueleR3/AML-Project-2.git

fatal: destination path 'AML-Project-2' already exists and is not an empty directory.


In [3]:
# Install required packages (run if needed)
!pip install -q -r AML-Project-2/requirements.txt

In [4]:
import sys
sys.path.append('AML-Project-2')

In [5]:
import os
import torch
import torchvision

from src.utils import set_seed, get_device, count_parameters

from src.data import load_cifar100, create_dataloader, partition_iid, partition_non_iid

from src.model import DINOClassifier, load_dino_backbone

# Set seed for reproducibility

print(f"PyTorch version: {torch.__version__}")

print(f"Torchvision version: {torchvision.__version__}")

device = get_device()

print(f"Device: {device}")

PyTorch version: 2.9.0+cpu
Torchvision version: 0.24.0+cpu
Device: cpu


## 2. Download DINO ViT-S/16 Model

Loading the pretrained DINO ViT-S/16 model from the official Facebook Research GitHub repository:
https://github.com/facebookresearch/dino

In [6]:
# Load DINO ViT-S/16 using refactored model.py
print("Loading DINO ViT-S/16 model...")

model = DINOClassifier(model_name='dino_vits16', num_classes=100, device=device)
model.eval()

print("\nDINO Classifier loaded successfully!")
print(f"Backbone frozen: {model.freeze_backbone}")
print(f"Output classes: 100")

Loading DINO ViT-S/16 model...


Using cache found in /root/.cache/torch/hub/facebookresearch_dino_main



DINO Classifier loaded successfully!
Backbone frozen: True
Output classes: 100


In [7]:
# Model information using refactored utils
total_params = count_parameters(model, trainable_only=False)
trainable_params = count_parameters(model, trainable_only=True)

print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Trainable %: {100 * trainable_params / total_params:.2f}%")

Total parameters: 21,704,164
Trainable parameters: 38,500
Trainable %: 0.18%


## 3. Download CIFAR-100 Dataset

Downloading the CIFAR-100 dataset using torchvision. The dataset contains 60,000 32x32 color images in 100 classes.

In [8]:
# Define data directory
data_dir = './data'
print(f"Data directory: {os.path.abspath(data_dir)}")

Data directory: /content/data


In [9]:
# Load CIFAR-100 using refactored data.py (with DINO transforms)
print("Loading CIFAR-100 dataset...")
train_dataset, test_dataset = load_cifar100(data_dir=data_dir, image_size=224)

print(f"Training set: {len(train_dataset)} images")
print(f"Test set: {len(test_dataset)} images")
print(f"Image size after transform: 224x224 (DINO input size)")

Loading CIFAR-100 dataset...


100%|██████████| 169M/169M [00:02<00:00, 61.2MB/s] 


Training set: 50000 images
Test set: 10000 images
Image size after transform: 224x224 (DINO input size)


In [10]:
# Display dataset information
print("\n" + "="*50)
print("CIFAR-100 Dataset Information")
print("="*50)
print(f"Number of classes: {len(train_dataset.classes)}")
print(f"Original size: 32x32x3")
print(f"Transformed size: 224x224x3 (for DINO)")
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"\nSample classes: {train_dataset.classes[:10]}...")


CIFAR-100 Dataset Information
Number of classes: 100
Original size: 32x32x3
Transformed size: 224x224x3 (for DINO)
Training samples: 50000
Test samples: 10000

Sample classes: ['apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle', 'bicycle', 'bottle']...


## 4. Test Forward Pass

In [11]:
# Test model with a batch
test_loader = create_dataloader(test_dataset, batch_size=8, shuffle=False, num_workers=0)
images, labels = next(iter(test_loader))

print(f"Batch shape: {images.shape}")
print(f"Labels: {labels}")

# Forward pass
model.to(device)
images = images.to(device)
with torch.no_grad():
    outputs = model(images)

print(f"\nOutput shape: {outputs.shape}")
print(f"Output range: [{outputs.min():.2f}, {outputs.max():.2f}]")
print("\n✓ Forward pass successful!")



Batch shape: torch.Size([8, 3, 224, 224])
Labels: tensor([49, 33, 72, 51, 71, 92, 15, 14])

Output shape: torch.Size([8, 100])
Output range: [-9.45, 7.22]

✓ Forward pass successful!


## 5. Test FL Partitioning (M1)

In [12]:
# Test IID partitioning
print("Testing IID partitioning...")
num_clients = 10
client_datasets_iid = partition_iid(train_dataset, num_clients=num_clients, seed=42)

print(f"Number of clients: {num_clients}")
for i in range(min(3, num_clients)):
    print(f"Client {i}: {len(client_datasets_iid[i])} samples")

# Test non-IID partitioning
print("\nTesting non-IID partitioning...")
num_classes_per_client = 10
client_datasets_noniid = partition_non_iid(
    train_dataset, 
    num_clients=num_clients, 
    num_classes_per_client=num_classes_per_client,
    seed=42
)

for i in range(min(3, num_clients)):
    # Get labels for this client
    subset = client_datasets_noniid[i]
    client_labels = [train_dataset.targets[idx] for idx in subset.indices]
    unique_classes = len(set(client_labels))
    print(f"Client {i}: {len(subset)} samples, {unique_classes} unique classes")

print("\n✓ FL partitioning works!")

Testing IID partitioning...
Number of clients: 10
Client 0: 5000 samples
Client 1: 5000 samples
Client 2: 5000 samples

Testing non-IID partitioning...
Client 0: 5000 samples, 10 unique classes
Client 1: 5000 samples, 10 unique classes
Client 2: 5000 samples, 10 unique classes

✓ FL partitioning works!
