

# Example: Train Barlow Twins on CIFAR10
<a href="https://colab.research.google.com/github/melhaud/proj18/blob/main/examples/barlowtwins-on-cifar10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>

In this tutorial, we will train a Barlow Twins model using lightly. The model,
augmentations and training procedure is from 
`Barlow Twins: Self-Supervised Learning via Redundancy Reduction <https://arxiv.org/abs/2103.03230>`.

The paper explores a rather simple training procedure for contrastive learning.


# Imports

In [2]:
import os
import torch
import torch.nn as nn
import torchvision
import pytorch_lightning as pl
import lightly
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import pandas as pd

from tqdm.notebook import tqdm
from lightly.data import LightlyDataset
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader, Subset
import sys


sys.path.append('../src')
from utils import custom_collate_fn, get_classes
from my_resnet import resnet20
%matplotlib inline

ModuleNotFoundError: No module named 'torchvision'

# Configuration

Configuration parameters for the experiment.

In [1]:
num_workers = 2
batch_size = 128
seed = 1

max_epochs = 150
input_size = 32 # image height, assume its always square

# Let's set the seed for our experiments

pl.seed_everything(seed)

NameError: name 'pl' is not defined

# Setup data augmentations and tokens

In [None]:
cifar10_train = CIFAR10("../data/cifar10", download=True, train=True)
cifar10_test = CIFAR10("../data/cifar10", download=True, train=False)

classes_ids_train = get_classes(cifar10_train) # long!
classes_ids_test = get_classes(cifar10_test)

In [None]:
collate_fn = lightly.data.SimCLRCollateFunction(
    input_size=input_size,
    vf_prob=0.5,
    rr_prob=0.5
)

# We create a torchvision transformation for embedding the dataset after 
# training
test_transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize((input_size, input_size)),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.RandomVerticalFlip(),
    torchvision.transforms.RandomRotation(degrees=(-10, 10)),
    torchvision.transforms.ToTensor(),
])

dataset_train_simclr = LightlyDataset.from_torch_dataset(Subset(cifar10_train, classes_ids_train['dog']))

dataloader_train_simclr = torch.utils.data.DataLoader(
    dataset_train_simclr,
    batch_size=batch_size,
    shuffle=True,
    collate_fn=collate_fn,
    drop_last=True,
    num_workers=num_workers
)


dataset_test = LightlyDataset.from_torch_dataset(Subset(cifar10_train, classes_ids_train['dog']))

dataloader_test = torch.utils.data.DataLoader(
    dataset_test,
    batch_size=batch_size,
    shuffle=False,
    drop_last=False,
    collate_fn=collate_fn,
    num_workers=num_workers
)

# Create the Barlow Twins model

Now we create the SimCLR model. We implement it as a PyTorch Lightning Module
and use custom ResNet-20 backbone provided by Nikita Balabin. Lightly provides implementations
of the Barlow Twins projection head and loss function in the `BarlowTwinsProjectionHead`
and `BarlowTwinsLoss` classes. We can simply import them and combine the building
blocks in the module. We will import constructed model from our `src`.

In [3]:
from barlow_twins_model import BarlowTwins

ModuleNotFoundError: No module named 'lightly'

We first check if a GPU is available and then train the module
using the PyTorch Lightning Trainer.



In [None]:
gpus = 1 if torch.cuda.is_available() else 0

resnet_backbone = resnet20(num_classes=1)
model = BarlowTwins(resnet_backbone, img_size = input_size)
trainer = pl.Trainer(
    max_epochs=50, gpus=gpus, progress_bar_refresh_rate=10
)
trainer.fit(model, dataloader_train_simclr)

# Generate embeddings for test

In [None]:
from utils import generate_embeddings

model.eval()
embeddings_test, filenames_test = generate_embeddings(model, dataloader_test)
embeddings_train, filenames_train = generate_embeddings(model, dataloader_train_simclr)

In [None]:
print(f'Shape of TEST embeddings {embeddings_test.shape}')
print(f'Shape of TRAIN embeddings {embeddings_train.shape}')

Next we create a helper function to generate embeddings
from our test images using the model we just trained.
Note that only the backbone is needed to generate embeddings,
the projection head is only required for the training.
Make sure to put the model into eval mode for this part!



# Calculate Hausdorff distance between point clouds

In [None]:
from scipy.spatial.distance import directed_hausdorff

hausdorff_dist = directed_hausdorff(embeddings_train, embeddings_test)[0]

print(f'Hausdorff Dist: {hausdorff_dist:.3f}')