# Cross-domain product recommendation with GAN

### Background

Improve a recommender system based on generative adversarial networks (GANs), which solely builds up on product images.

Imagine you are [Zalando](#https://www.zalando.ch) and you have a customer browsing through the shoes. Then the shopper likes a pair of shoes and is going to the checkout. Before the shopper buys the shoes, you would like to show them what else you have in stock that they might like. Your product recommendation algorithm recommends a dress, so you show this to the shopper. They agree, the dress fits their style perfectly, so they throw it in the shopping cart as well! Good job!

### Goal

In this project, I will be using GANs to generate cross-domain product recommendations. The two domains I will be working with are shoes and dresses.

## A primer on generative adversarial networks (GANs)

Here is a super quick introduction to GANs and CycleGANs. 

### GANs

Even though we haven't discussed GANs in the lecture, you should be able to understand them quite quickly.

GANs are designed and trained using the same tools as you learned in the lecture. A large difference is that these methods are unsupervised. That is, we don't have any natural labels. Instead, GANs use 2 neural networks that "fight" against eachother and learn by competing (hence "adversarial"). 

![Basic GAN architecture](imgs/GAN.png)

The two networks are a generator and a discriminator. The generator network is a network that generates data. In this case, you will have a generator that generates images that belong to a category; for example, shoes. The discriminator then has to distinguish between real images of shoes and the fake images of shoes that the generator created. The generator and discriminator are trained based on their successes. The generator tries to learn to generate features that fools the discriminator. The discriminator tries to learn how it can tell fakes apart from the real thing. This is why we don't need labels!

Up until now, none of this should sound too strange for you. In terms of the architecture of the neural networks, we almost only use things from the lectures. The discriminator will be a standard convolutional neural network. That is, it takes an image as input and spits out a prediction ('real' or 'fake') at the other end. The generator is a neural network that is flipped. Since the output of the generator needs to be an image, the last layer needs to produce a WxHx3 image. So rather than using downsampling convolutions that we taught in the lecture, you'll need to use upsampling convolutions. They use the same convolutional technique, but the activation maps gets wider and taller rather than smaller.

### CycleGAN

![CycleGAN](imgs/cyclegan.png)

A CycleGAN is a special type of GAN that is used for style-transfer. It has a few features that distinguish it from other GANs that we will outline quickly. These features are:

  1. 2 sets of generators and discriminators.
  2. The input to the generator are images. 
  3. The loss is cycle-consistent.

#### 1) 2 sets of generators and discriminators

Since we are interested in making cross-domain recommendations, we need a generator that generates images of shoes and one that generates images of dresses. (Note, there are architectures that can generate both dresses and shoes with one generator, but they don't work well, so we will stick with 2.) Then, for each generator you will need a discriminator that distinguishes between real and fake shoes and one that distinguishes between real and fake dresses.

#### 2) The input to the generator are images

In the early GANs, the input to the generator was usually a vector of random noise. We used a noise vector to generate a random image. This allowed us to generate many different images in the same domain because different noise would lead to a different output. In cycleGAN, the input to the generators is an image. If we want to recommend a shoe for a given dress, we use an image of a dress as an input for the generator and generate an image of a shoe. Contrarily, if we stick an image of a shoe in the generator, we want to get an image of a dress as an output. Therefore, both the input and output of the 2 generators are images.

#### 3) The loss is cycle-consistent

![cycle-consistency](imgs/cycle-consistency.png)

The final difference is that the loss is **cycle-consistent**. This means, that **we want the recommendations to be cyclical. If we recommend shoes X for a dress X, then it would make sense to get dress X recommended for shoe X**. You can think of this in terms of translation: if you translate a sentence from english to german and back to english, you would ideally like to get the input sentence back. The same is true for recommendations. 

To accomplish this we do the following: say we have a generator `G_DS` that translates dresses to shoes and a generator `G_SD` that translates shoes to dresses. Then we have an image of a dress `real_D` that we would like a shoe recommendation for. So we pass the image to the generator and get an output image of shoe: `real_D` -> `G_DS` -> `fake_S`. Then we can pass this image to the other generator to generate an image of a dress: `fake_S` -> `G_SD` -> `fake_D`. Then we compare `fake_D` to `real_D` and train the weights based on the pixel discrepancy. 

## Interpreting the output

The losses of GAN are harder to interpret than those of normal supervised classification or regression tasks. **The best way to evaluate whether your GAN is learning is through visual inspection.**

In this script, we **output figures to `output_imgs_path` generated by the GAN (see lines 213-217). The output images contain 3 images**. The first (left) is the input image to the first generator. The second image (middle) is the output of the first generator. That is, it is the translation of the first image to the second domain. The third image (right) is the the reconstuction of the first image. That is, we pass the second image to the second generator such that it translates it back to the first domain. 

![output-images](imgs/eg_output.png)

So, how can we use these images to evaluate the training procedure? The first thing to look for is whether the second (middle) and third images (right) are recognisable. That is, can you clearly identify the images as shoes and dresses? Are the edges somewhat well defined? Does the dress have at most 2 sleeves, one neck, a body, etc.? Do the shoes have a heel, a toe, a shoe hole, etc. The next thing to look for is whether the style of the first image was translated onto the second image. Since the images are 64 by 64, there won't be a great amount of detail. But, are the colors similar? are the notible attributes translated (roughly)? **Finally, the third image should look somewhat identical to the first image**. Ideally, the third image is a reconstruction of the first, however, practically, you will never get a perfect reconstruction. But, is it somewhat similar?




## Tasks
There are a few directions you can take this project:
  
  1. The **loss functions** in GANs are _really_ important. Can you improve the performance by tweaking or coming up with a new loss function? (Maybe Karras et al, 2017 (on dropbox) can be adapted?)  
    
  2. Karras et al (2017) created a great GAN called **progressive growing GAN**. The idea is that the GAN first learns to produce small images before scaling up to larger ones. So first, it learns how to generate 4x4 images. Then, once it is sufficiently good at that, it uses that as a starting point and generates 8x8 images. Then 16x16. And so on, until it can generate high-resolution images. Can you implement progressive growing in cycleGAN?    
  
  3. Another new method (Hicsonmez et al, 2020 (on dropbox)) called **GANILLA** just hit the market. Can you use GANILLA to generate cross-domain recommendations? How does it compare to cycleGAN? Make sure to google around before programming too much yourself (GANILLA is on GitHub).

## Getting started

The code presented in this notebook was taken from [here](https://github.com/aitorzip/PyTorch-CycleGAN). It offers a basic and readible implementation of the CycleGAN algorithm. You can also check [this](https://www.tensorflow.org/tutorials/generative/cyclegan) out for tensorflow code. The tensorflow code is quite good, so use whichever you feel more comfortable with. It has a few features that you won't need (e.g. crop and jitter preprocessing).

To familiarize yourself with the code, you can try to implement some of the tips and tricks learned in the lectures to see if you can improve the performance. The code is very clean should be easy to adapt.

A small tip before starting: make sure that the saving and loading of the weights works. I have made the experience that this API is somewhat unreliable. It will save you time if you can continue training from some checkpoint.


***

## CycleGAN for product recommendations

Alright, lets get started. Import the modules you need. `Models`, `utils`, and `datasets`, which are found in `/Dropbox/.../\#\ Group\ projects/Product\ recommendations/code`, contain helper functions.

In [1]:
import sys
# Append the path to the folder holding the scripts 
# if it is not already the working directory
sys.path.append('.')

import os
import itertools
from math import floor

import imageio

import numpy as np
import time

import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.autograd import Variable
from PIL import Image
import torch

from models import Generator, Discriminator
from utils import ReplayBuffer, LambdaLR, weights_init_normal, safe_mkdirs, tensor2image, as_np
from datasets import ImageDataset


Here you can define some training parameters. Adjust these as needed. Also, feel free to add your own functionalities.

In [2]:
load_iter = 0      # Starting iteration (if 0 train from scratch)
n_epochs = 200     # Number of epochs of training
batch_size = 1     # Size of the batches
lr = 0.0002        # Initial learning rate
decay_epoch = 100  # Epoch to start linearly decaying the learning rate to 0
size = 128         # Size of the data crop (squared assumed)

input_nc = 3       # Number of channels of input data
output_nc = 3      # Number of channels of output data
n_cpu = 2          # Number of cpu threads to use during batch generation
image_size = 64    # Size of images

log_interval = 200         # Interval to print output
model_save_interval = 200  # Interval to save model weights
image_save_interval = 200  # Interval at which to log visual progess

# Paths to directories
ROOT = '.'  # Change if necessary
data_path = os.path.join(ROOT, 'data')
output_path = os.path.join(ROOT, 'output_my')
output_imgs_path = os.path.join(output_path, 'imgs')
output_weights_path = os.path.join(output_path, 'weights')
safe_mkdirs(output_path)
safe_mkdirs(output_imgs_path)
safe_mkdirs(output_weights_path)

Then we get to the main chunk where we initialize and train the CycleGAN. Basic functionality is implemented.

In [3]:
# Use Cuda if available. It should be available on colab.
use_cuda = torch.cuda.is_available()

In [4]:
# From paper: We use 6 residual blocks for 128 × 128 training images, 
# and 9 residual blocks for 256 × 256 or higher-resolution training images

# in our code: defult setting in is 3
n_res_block = 3

In [5]:
# Initialize networks
# Domain A: dresses
# Domain B: shoes
netG_A2B = Generator(input_nc, output_nc,n_residual_blocks=n_res_block)
netG_B2A = Generator(output_nc, input_nc,n_residual_blocks=n_res_block)
netD_A = Discriminator(input_nc)
netD_B = Discriminator(output_nc)

In [6]:
# To cuda
if use_cuda:
    netG_A2B.cuda()
    netG_B2A.cuda()
    netD_A.cuda()
    netD_B.cuda()

# Initialize or load pretrained weights
# TODO: make sure this works. I have found the save/load_state_dict API to be buggy 
# so make sure it works on your machine before you spend a lot of time
# training a model. If you save a corrupted weights, you can't restore them.
if load_iter == 0:
    netG_A2B.apply(weights_init_normal)
    netG_B2A.apply(weights_init_normal)
    netD_A.apply(weights_init_normal)
    netD_B.apply(weights_init_normal)
else:
    netG_A2B.load_state_dict(torch.load(os.path.join(output_weights_path, 'G_A2B_{}.pth'.format(load_iter))))
    netG_B2A.load_state_dict(torch.load(os.path.join(output_weights_path, 'G_B2A_{}.pth'.format(load_iter))))
    netD_A.load_state_dict(torch.load(os.path.join(output_weights_path, 'D_A_{}.pth'.format(load_iter))))
    netD_B.load_state_dict(torch.load(os.path.join(output_weights_path, 'D_B_{}.pth'.format(load_iter))))

    netG_A2B.train()
    netG_B2A.train()
    netD_A.train()
    netD_B.train()

In [7]:
# Losses
# TODO: Defining the correct losses make and break the performance of GANs.
#       Can you think of a loss that would improve the results?

# Notice that there are three losses here
# 1) criterion_GAN: This is the standard GAN loss. Whether the discriminator
#    could correctly predict whether the image was real or fake is used to train the networks.
#    In paper: the negative log likelihood objective is replaced by a least-squares loss. 
#    This loss is more stable during training and generates higher quality results.
# 2) criterion_cycle: cycle-consistency discussed in the intro text.
# 3) criterion_identity: if you put an image of a shoe into the dress to
#    shoe generator, you should get the same image of the shoe back.

# pytorch的nn.MSELoss损失函数
# https://blog.csdn.net/hao5335156/article/details/81029791
# PyTorch 学习笔记（六）：PyTorch的十八个损失函数
# https://zhuanlan.zhihu.com/p/61379965
# TORCH.NN
# https://pytorch.org/docs/stable/nn.html

criterion_GAN = torch.nn.MSELoss()
criterion_cycle = torch.nn.L1Loss()
criterion_identity = torch.nn.L1Loss()

In [8]:
# Optimizers & LR schedulers (with decay)
optimizer_G = torch.optim.Adam(itertools.chain(netG_A2B.parameters(), netG_B2A.parameters()),
                                lr=lr, betas=(0.5, 0.999))
optimizer_D_A = torch.optim.Adam(netD_A.parameters(), lr=lr, betas=(0.5, 0.999))
optimizer_D_B = torch.optim.Adam(netD_B.parameters(), lr=lr, betas=(0.5, 0.999))

start_epoch = floor(load_iter/99554)
lr_scheduler_G = torch.optim.lr_scheduler.LambdaLR(optimizer_G, lr_lambda=LambdaLR(n_epochs, start_epoch, decay_epoch).step)
lr_scheduler_D_A = torch.optim.lr_scheduler.LambdaLR(optimizer_D_A, lr_lambda=LambdaLR(n_epochs, start_epoch, decay_epoch).step)
lr_scheduler_D_B = torch.optim.lr_scheduler.LambdaLR(optimizer_D_B, lr_lambda=LambdaLR(n_epochs, start_epoch, decay_epoch).step)

In [9]:
# Inputs & targets memory allocation
Tensor = torch.cuda.FloatTensor if use_cuda else torch.Tensor
input_A = Tensor(batch_size, input_nc, image_size, image_size)
input_B = Tensor(batch_size, output_nc, image_size, image_size)
target_real = Variable(Tensor(batch_size).fill_(1.0), requires_grad=False)
target_fake = Variable(Tensor(batch_size).fill_(0.0), requires_grad=False)

In [10]:
# Initialize replay buffer
# The replay buffer helps stabilize training by saving images of previous
# iterations and using them as training data in later iterations. This helps
# avoid over training on current data.

# In paper: the default max replay buffer size is 50
max_replay = 100

fake_A_buffer = ReplayBuffer(max_size=max_replay)
fake_B_buffer = ReplayBuffer(max_size=max_replay)

In [12]:
# orchvision中Transform的normalize参数含义
# https://blog.csdn.net/york1996/article/details/82711593

# Dataset loader
transforms_ = [ transforms.Resize(image_size, Image.BICUBIC), 
                transforms.ToTensor(),
                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]

In [13]:
# For training set
# The `ImageDataset` is in dataset.py. Check it out to see what it does.
dataloader = DataLoader(ImageDataset(data_path, transforms_=transforms_, unaligned=True), 
                        batch_size=batch_size, shuffle=True, num_workers=n_cpu)

# For test set
dataloader_test = DataLoader(ImageDataset(data_path, transforms_=transforms_, mode='test'),
                            batch_size=1, shuffle=False, num_workers=n_cpu)

In [15]:
prev_time = time.time()
iter = load_iter

# Training 
for epoch in range(start_epoch, n_epochs):
    print("Current epoch :", epoch+1)
    for i, batch in enumerate(dataloader):
        if i % 100 == 0:
            print(i)
        # Set model input
        real_A = Variable(input_A.copy_(batch['A']))
        real_B = Variable(input_B.copy_(batch['B']))

        # Forward pass through each of the generators and discriminators
        # Generators A2B and B2A ##############################################
        optimizer_G.zero_grad()

        # Identity loss
        # G_A2B(B) should equal B if real B is fed
        same_B = netG_A2B(real_B)
        loss_identity_B = criterion_identity(same_B, real_B)
        # G_B2A(A) should equal A if real A is fed
        same_A = netG_B2A(real_A)
        loss_identity_A = criterion_identity(same_A, real_A)

        # GAN loss
        fake_B = netG_A2B(real_A)
        pred_fake = netD_B(fake_B)
        loss_GAN_A2B = criterion_GAN(pred_fake, target_real)

        fake_A = netG_B2A(real_B)
        pred_fake = netD_A(fake_A)
        loss_GAN_B2A = criterion_GAN(pred_fake, target_real)

        # Cycle loss
        recovered_A = netG_B2A(fake_B)
        loss_cycle_ABA = criterion_cycle(recovered_A, real_A)

        recovered_B = netG_A2B(fake_A)
        loss_cycle_BAB = criterion_cycle(recovered_B, real_B)

        # Total loss
        loss_G = (loss_identity_A + loss_identity_B) * 5.0
        loss_G += (loss_GAN_A2B + loss_GAN_B2A) * 1.0
        loss_G += (loss_cycle_ABA + loss_cycle_BAB) * 10.0
        
        # Calculate the gradient of the generators
        loss_G.backward()
        
        # Update the weights of the generators
        optimizer_G.step()

        # Discriminator A #####################################################
        optimizer_D_A.zero_grad()

        # Real loss
        pred_real = netD_A(real_A)
        loss_D_real = criterion_GAN(pred_real, target_real)

        # Fake loss
        fake_A = fake_A_buffer.push_and_pop(fake_A)
        pred_fake = netD_A(fake_A.detach())
        loss_D_fake = criterion_GAN(pred_fake, target_fake)

        # Total loss
        loss_D_A = (loss_D_real + loss_D_fake) * 0.5

        # Calculate the gradient of discriminator A
        loss_D_A.backward()

        # Update the weights of discriminator A
        optimizer_D_A.step()

        # Discriminator B #####################################################
        optimizer_D_B.zero_grad()

        # Real loss
        pred_real = netD_B(real_B)
        loss_D_real = criterion_GAN(pred_real, target_real)
        
        # Fake loss
        fake_B = fake_B_buffer.push_and_pop(fake_B)
        pred_fake = netD_B(fake_B.detach())
        loss_D_fake = criterion_GAN(pred_fake, target_fake)

        # Total loss
        loss_D_B = (loss_D_real + loss_D_fake) * 0.5

        # Calculate the gradient of discriminator B
        loss_D_B.backward()

        # Update the weights of discriminator B
        optimizer_D_B.step()

        # Track performance
        if iter % log_interval == 0:
            print('---------------------')
            print('GAN loss:', as_np(loss_GAN_A2B), as_np(loss_GAN_B2A))
            print('Identity loss:', as_np(loss_identity_A), as_np(loss_identity_B))
            print('Cycle loss:', as_np(loss_cycle_ABA), as_np(loss_cycle_BAB))
            print('D loss:', as_np(loss_D_A), as_np(loss_D_B))
            print('time:', time.time() - prev_time)
            prev_time = time.time()

        # Print outputs
        if iter % image_save_interval == 0:
            output_path_ = os.path.join(output_imgs_path, str(iter / image_save_interval))
            safe_mkdirs(output_path_)

            for j, batch_ in enumerate(dataloader_test):

                if j < 60:
                    real_A_test = Variable(input_A.copy_(batch_['A']))
                    real_B_test = Variable(input_B.copy_(batch_['B']))

                    fake_AB_test = netG_A2B(real_A_test)
                    fake_BA_test = netG_B2A(real_B_test)

                    recovered_ABA_test = netG_B2A(fake_AB_test)
                    recovered_BAB_test = netG_A2B(fake_BA_test)

                    fn = os.path.join(output_path_, str(j))
                    A_test = np.hstack([tensor2image(real_A_test[0]), tensor2image(fake_AB_test[0]), tensor2image(recovered_ABA_test[0])])
                    B_test = np.hstack([tensor2image(real_B_test[0]), tensor2image(fake_BA_test[0]), tensor2image(recovered_BAB_test[0])])
                    imageio.imwrite(fn + '_A.jpg', A_test)
                    imageio.imwrite(fn + '_B.jpg', B_test)
                    
                    #imageio.imwrite(fn + '.A.jpg', tensor2image(real_A_test[0]))
                    #imageio.imwrite(fn + '.B.jpg', tensor2image(real_B_test[0]))
                    #imageio.imwrite(fn + '.BA.jpg', tensor2image(fake_BA_test[0]))
                    #imageio.imwrite(fn + '.AB.jpg', tensor2image(fake_AB_test[0]))
                    #imageio.imwrite(fn + '.ABA.jpg', tensor2image(recovered_ABA_test[0]))
                    #imageio.imwrite(fn + '.BAB.jpg', tensor2image(recovered_BAB_test[0]))

        # Save models checkpoints
        if iter % model_save_interval == 0:
            torch.save(netG_A2B.state_dict(), os.path.join(output_weights_path, 'netG_A2B_i{}.pth'.format(iter)))
            torch.save(netG_B2A.state_dict(), os.path.join(output_weights_path, 'netG_B2A_i{}.pth'.format(iter)))
            torch.save(netD_A.state_dict(), os.path.join(output_weights_path, 'netD_A_i{}.pth'.format(iter)))
            torch.save(netD_B.state_dict(), os.path.join(output_weights_path, 'netD_B_i{}.pth'.format(iter)))

        iter += 1

    # Update learning rates
    lr_scheduler_G.step()
    lr_scheduler_D_A.step()
    lr_scheduler_D_B.step()

Current epoch : 1
0
---------------------
GAN loss: 0.12517482 0.1724944
Identity loss: 0.4677845 0.8535755
Cycle loss: 0.43628576 0.855853
D loss: 2.3718777 1.1639683
time: 1.5386250019073486
100
200
---------------------
GAN loss: 0.7022245 0.20585868
Identity loss: 0.09330419 0.14339297
Cycle loss: 0.100808226 0.13514327
D loss: 0.28619793 0.37287968
time: 283.69341015815735


KeyboardInterrupt: 

In [17]:
k=0
for i, batch in enumerate(dataloader):
    k=i
        
print(k)

113219


Too much Images:   
        train A: 98552  
        train B: 113220  
Should use a subset,  i.e. 10000 images per collection.