# 1 Data collection

Two datasets are used: a small version of COCO dataset with 21,837 images and one with 17,178 images of animals (12 categories)

##1.1 Animals dataset

We download this dataset from kaggle (1.4 GB)

In [1]:
!pip install -q kaggle
from google.colab import files

You have to upload a file called kaggle.json. To obtain it you need to follow the first 2 steps described in https://www.kaggle.com/general/74235

In [2]:
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"riccardodemonteita","key":"189ea3d8a51099270c4759aab3b1cecd"}'}

In [3]:
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! kaggle datasets list

ref                                                             title                                           size  lastUpdated          downloadCount  voteCount  usabilityRating  
--------------------------------------------------------------  ---------------------------------------------  -----  -------------------  -------------  ---------  ---------------  
meirnizri/covid19-dataset                                       COVID-19 Dataset                                 5MB  2022-11-13 15:47:17          14442        413  1.0              
thedevastator/analyzing-credit-card-spending-habits-in-india    Credit Card Spending Habits in India           319KB  2022-12-14 07:30:37           1279         49  1.0              
michals22/coffee-dataset                                        Coffee dataset                                  24KB  2022-12-15 20:02:12           3433         77  1.0              
thedevastator/unlock-profits-with-e-commerce-sales-data         E-Commerce Sales Data

In [4]:
!kaggle datasets download -d piyushkumar18/animal-image-classification-dataset

Downloading animal-image-classification-dataset.zip to /content
 99% 1.45G/1.47G [00:13<00:00, 98.4MB/s]
100% 1.47G/1.47G [00:13<00:00, 120MB/s] 


The data have been downloaded. To unzip them

In [5]:
!mkdir /content/animal_data
!unzip -qq /content/animal-image-classification-dataset.zip -d /content/animal_data/

Now we have to split it on two: training dataset (used to train the classifier) and validation/test dataset. Before you have to upload "val_animals.txt" that allows to split the dataset (two list with the paths will be obtained) 

In [69]:
files.upload();

Saving val_animals.txt to val_animals.txt


In [7]:
import os
#set to None to use all the images (14K)
max_img_per_class = 100

path = "/content/animal_data"
animal_path = path + "/Animal Image Dataset"

animals = ["butterfly", "cats", "cow", "dogs", "elephant", "hen", "horse", "monkey", "panda", "sheep", "spider", "squirrel"]

train_paths = []
#labels
train_labels = []
val_paths = []
val_labels = []

#collect paths validation/test images
with open("/content/val_animals.txt") as file:
    val_paths = [line.rstrip() for line in file]

#collect the corresponding labels
for path in val_paths:

  for i, animal in enumerate(animals):

    if animal in path:

      val_labels.append(i)
      break


#build training dataset:max_img_per_class images for each class (excluding the ones in the validation dataset)
#if None, all the images

for i,animal in enumerate(animals):
  counter = 0
  folder = os.listdir(animal_path+"/"+animal) 

  for image in folder:
    

    if max_img_per_class == None:
    
      if animal_path+"/"+animal+"/"+image not in val_paths:

        train_paths.append(animal_path+"/"+animal+"/"+image)
        train_labels.append(i)

    else:

      if counter == max_img_per_class:

        break

      if animal_path+"/"+animal+"/"+image not in val_paths:

        train_paths.append(animal_path+"/"+animal+"/"+image)
        train_labels.append(i)
        counter +=1


print(f"# training images: {len(train_paths)}\n# val/test images: {len(val_paths)}")

# training images: 1200
# val/test images: 2400


## 1.2 COCO dataset

To download it we use fastai

In [8]:
!pip install fastai==2.4

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fastai==2.4
  Downloading fastai-2.4-py3-none-any.whl (187 kB)
[K     |████████████████████████████████| 187 kB 4.5 MB/s 
Collecting torch<1.10,>=1.7.0
  Downloading torch-1.9.1-cp38-cp38-manylinux1_x86_64.whl (831.4 MB)
[K     |████████████████████████████████| 831.4 MB 9.5 kB/s 
Collecting fastcore<1.4,>=1.3.8
  Downloading fastcore-1.3.29-py3-none-any.whl (55 kB)
[K     |████████████████████████████████| 55 kB 5.1 MB/s 
Collecting torchvision>=0.8.2
  Downloading torchvision-0.14.1-cp38-cp38-manylinux1_x86_64.whl (24.2 MB)
[K     |████████████████████████████████| 24.2 MB 1.1 MB/s 
[?25h  Downloading torchvision-0.14.0-cp38-cp38-manylinux1_x86_64.whl (24.3 MB)
[K     |████████████████████████████████| 24.3 MB 58.7 MB/s 
[?25h  Downloading torchvision-0.13.1-cp38-cp38-manylinux1_x86_64.whl (19.1 MB)
[K     |████████████████████████████████| 19.1 MB 1.1 MB/s 
[?25h  D

In [9]:
from fastai.data.external import untar_data, URLs
import os
import glob
import numpy as np

In [10]:
coco_path = untar_data(URLs.COCO_SAMPLE)
coco_path = str(coco_path) + "/train_sample"

paths = glob.glob(coco_path+"/*.jpg")
paths =np.array(paths)
num_images_coco = len(paths)
print(f"# coco images: {num_images_coco}")

# coco images: 21837


# 2 Datasets and Dataloaders

In [11]:
from PIL import Image
from pathlib import Path
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
from skimage.color import rgb2lab, lab2rgb

import torch
from torch import nn, optim
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader

## 2.1 Training Dataset

We select a subset of COCO

In [12]:
idxs = np.random.permutation(num_images_coco)

n_train_samples_coco = 14800

coco_train_idxs = idxs[:n_train_samples_coco]

coco_train_paths = paths[coco_train_idxs]

In [13]:
animals_paths = np.array(train_paths)

union of the two datasets

In [14]:
training_paths = np.concatenate([coco_train_paths, animals_paths])

In [15]:
print(f"# training images: {training_paths.shape[0]}")

# training images: 16000


In [16]:
idxs = np.random.permutation(training_paths.shape[0])

training_paths = training_paths[idxs]

In [17]:
SIZE = 256

train_transform = transforms.Compose([
                transforms.Resize((SIZE, SIZE),  transforms.InterpolationMode.BILINEAR),
                transforms.RandomHorizontalFlip(),
            ])

In [18]:
class GrayToColorDataset(Dataset):

  def __init__(self, paths, transform = None):
    
    self.paths = paths
    self.transform = transform

  def __len__(self):

    return len(self.paths)

  def __getitem__(self, idx):

    img_rgb = Image.open(self.paths[idx]).convert("RGB")
    img_rgb = self.transform(img_rgb)
    img_rgb = np.array(img_rgb)

    #RGB -> Lab
    img_lab = rgb2lab(img_rgb).astype("float32")
    img_lab = transforms.ToTensor()(img_lab)

    #to have values in range [-1,1]
    L = img_lab[0,:]/50. - 1.
    ab = img_lab[[1,2],:] / 110.

    return (L.unsqueeze(0),ab)


In [19]:
train_dataset = GrayToColorDataset(training_paths, train_transform)

In [21]:

PIN_MEMORY = True
N_WORKERS = 2
BATCH_SIZE = 32

train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, num_workers=N_WORKERS,
                            pin_memory=PIN_MEMORY, shuffle = True)

## 2.2 Test Dataset

# 3 cGAN models

## 3.1 Generator: U-Net

In [22]:
class UNetDown(nn.Module):

  def __init__(self, in_channels, out_channels, kernel_size = 4, normalization_type = None, dropout = 0.0, activation = None):

    super(UNetDown, self).__init__()

    #if batchnorm/instancenorm used, bias not used

    use_bias = normalization_type == None
    layers = [nn.Conv2d(in_channels, out_channels, kernel_size, 2, 1, bias = use_bias)]

    if not use_bias:
      if normalization_type == "instance":

        layers.append(nn.InstanceNorm2d(out_channels))

      else:

        layers.append( nn.BatchNorm2d(out_channels))
        
    if activation == None:
      layers.append(nn.LeakyReLU(negative_slope = 0.2))

    if activation == "ReLU":

      layers.append(nn.ReLU())

    if dropout:

      layers.append(nn.Dropout(p = dropout))

    self.model = nn.Sequential(*layers)


  def forward(self, x):

    return self.model(x)


In [23]:
class UNetUp(nn.Module):

  def __init__(self, in_channels, out_channels, kernel_size = 4,  normalization_type = None, dropout = 0.0):

    super(UNetUp, self).__init__()

    use_bias = normalization_type == None

    layers = [nn.ConvTranspose2d(in_channels, out_channels, kernel_size, 2, 1, bias = use_bias)]

    if not use_bias:
      if normalization_type == "instance":

        layers.append(nn.InstanceNorm2d(out_channels))

      else:

        layers.append( nn.BatchNorm2d(out_channels))

    layers.append(nn.ReLU())

    if dropout:

      layers.append(nn.Dropout(p = dropout))

    self.model = nn.Sequential(*layers)


  def forward(self, x, skip = None):
      x = self.model(x)
      if skip is not None:

        x = torch.cat((skip, x), 1)

      return x

In [27]:
class GeneratorUNet(nn.Module):

  def __init__(self, in_channels = 1, out_channels = 2, num_down = 8, ngf = 64, normalization_type = None):

    super(GeneratorUNet, self).__init__()

    self.downs = nn.ModuleList()
    self.ups = nn.ModuleList()
    

    features =[ngf]

    for i in range(3):

      features.append(features[i]*2)

    features.append(features[-1])
    #64, 128, 256, 512, 512

    if num_down > 5:

      features += [ngf * 8 for i in range(num_down - 5)]
    #for num_down = 8: 64, 128, 256, 512, 512, 512, 512, 512 (1x1 for input size 256x256)


    #ENCODER (CONTRACTING PATH)

    #outermost down block: no normalization and no dropout, only downconv
    self.downs.append(UNetDown(in_channels, ngf, 4))

    in_channels = ngf #new in_channels for the next down block
    
    for i,n_features in enumerate(features[1:len(features)-1]):
      #no dropout
      self.downs.append(UNetDown(in_channels, n_features, 4, normalization_type, 0.0))
      in_channels = n_features

    
    #innermost down block: no normalization and no dropout, only downconv
    self.downs.append(UNetDown(in_channels, features[-1], 4, activation = "ReLU"))
    

    #DECODER (EXPANSIVE PATH)
    i_channels = in_channels
    for i, n_features in enumerate((features[-2::-1])):
      
      #print(n_features)
      #if i == 0, innermost(bottleneck), namely a block such that after down we go up. no dropout
      i_channels = in_channels if i == 0  else i_channels * 2

      #no dropout for the first up and the last 4 ups 
      dropout = 0.0 if (i == 0 or i  > 3) else 0.5

      self.ups.append(UNetUp(i_channels, n_features, 4, normalization_type, dropout))
      i_channels = n_features
    
    
    self.final = nn.Sequential(
        nn.ConvTranspose2d(ngf*2,out_channels, kernel_size=4, stride=2, padding=1),
        nn.Tanh()
    )



  def forward(self, x):

    skip_connections = list()

    #encoder
    for down in self.downs:

      x = down(x)
      skip_connections.append(x)

    #decoder with skip connections
    for i, up in enumerate(self.ups):
      
      x = up(x, skip_connections[-i-2])

    return self.final(x)

In [26]:
#da cancellare

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

G = GeneratorUNet(1,2,8,64, "batchnorm").to(device)

G.eval()

print(G(train_dataset[0][0].unsqueeze(0).to(device)).shape)

[64, 128, 256, 512, 512]
torch.Size([1, 2, 256, 256])


## 3.2 Discrimintor: PatchGAN

The descriminator is a PatchGAN for $N \times N$ patches where $N=70$: given an input $256 \times 256$ the output is $30 \times 30$

In [28]:
class PatchDiscriminator(nn.Module):

  def __init__(self, in_channels = 3, ndf = 64, n_down = 5):

    super(PatchDiscriminator, self).__init__()

    features = [ndf * 2**i for i in range(n_down-1)]

    layers = []



    for i in range(len(features)):
      use_bias = True if i < 1  else False
      stride = 2 if i < (len(features)-1) else 1
      layers.append(nn.Conv2d(in_channels, features[i], 4, stride, 1, bias = use_bias))

      if not use_bias:

        layers.append(nn.BatchNorm2d(features[i]))

      layers.append(nn.LeakyReLU(0.2))

      in_channels = features[i]
    
    layers.append(nn.Conv2d(in_channels, 1, 4, 1, 1))

    self.model = nn.Sequential(*layers)

  def forward(self, x):

    return self.model(x)

In [66]:
#da cancellare
D = PatchDiscriminator()

print(D)

prova = torch.randn(1,3, 256,256)
print(D(prova).shape)

PatchDiscriminator(
  (model): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.2)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1))
  )
)
torch.Size([1, 1, 30, 30])


# 4 GAN LOSS

The following class allows to implement the GAN loss: for the discriminator 
\begin{equation}
\mathbb{E}_{x,y}[\log D(x,y)]+\mathbb{E}_{x,z}[\log(1-D(x,G(z, x)))]
\end{equation}

For the generator instead

\begin{equation}
\mathbb{E}_{x,z}[\log D(x,G(z,x))]
\end{equation}

In [63]:
class GANLoss():

  def __init__(self, device):

    self.criteria = nn.BCEWithLogitsLoss()
    self.real = 1.
    self.fake = 0.
    self.device = device

  def __call__(self, input, label_type):
    
    label = torch.tensor(self.real if label_type else self.fake)
    
    labels = label.expand_as(input).to(self.device)
    
    return self.criteria(input, labels)

In [67]:
#da cancellare
loss_fn = GANLoss(device)
sig = nn.Sigmoid()
prova = torch.tensor([ [ [[5, 1.0], [2, 1]]  ]])

print(loss_fn(prova.to(device), True))
print(-torch.mean(torch.log(sig(prova))))

tensor(1.)
tensor([[[[1., 1.],
          [1., 1.]]]], device='cuda:0')
tensor(0.1900, device='cuda:0')
tensor(0.1900)
