<a href="https://colab.research.google.com/github/karfly/learning-deep-learning/blob/master/04_dense/seminar_photo2map.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>

# Photo -> Map
Disclaimer: this notebook is an adapted version of [this repository](https://github.com/GunhoChoi/Kind-PyTorch-Tutorial).

Previously, we used neural networks for **sparse** predictions: large input (image) -> small output (vector with 10 elements, e.g. CIFAR10 classes). Today we'll use deep learning to make **dense** predictions (large input (image) -> large output (image)) for **image-to-image translation problem**. *Image-to-image translation* is a wide class of problems, where input is image and output is image too (e.g. satellite photo -> map, image stylization, [sketch -> cat portrait](https://affinelayer.com/pixsrv/),  etc...). There many good models for dense predictions, but we'll use **UNet** as the best choice in terms of simplicity-quality ratio.

But before we start, let's look at our dataset.

<img src="https://github.com/karfly/learning-deep-learning/blob/master/04_dense/static/photo2map.jpg?raw=true" width=700 align="center"/>

## Task 1. Dataset
We'll create a dataset of pairs **satellite photo - map** (example is above). To download dataset, uncomment and execute the cell below:

In [None]:
# ! wget https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/maps.tar.gz
# ! tar -xzvf maps.tar.gz
# ! mkdir maps/train/0 && mv maps/train/*.jpg maps/train/0
# ! mkdir maps/val/0 && mv maps/val/*.jpg maps/val/0

Imports:

In [None]:
import os
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt
# from tqdm.notebook import tqdm    # run this in Colab
from tqdm import tqdm               # or this in Jupyter instead

import torch
from torch import nn
from torch.utils.data import DataLoader

import torchvision
from torchvision import transforms

Parameters:

In [None]:
experiment_title = "unet"

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

batch_size = 4
image_size = 256

data_dir = "./maps"

transform = transforms.Compose([
    transforms.Resize(size=image_size),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),    # converts input image to [-1, 1]
])

After downloading and unpacking you'll find directory `maps` with 2 subdirectories: `train` and `val`. Each image is a pair (photo - map), so we'll have to **crop image to obtain input and target**. Let's use PyTorch's [ImageFolder](https://pytorch.org/docs/stable/torchvision/datasets.html#torchvision.datasets.ImageFolder) dataloader: 

In [None]:
train_dataset = torchvision.datasets.ImageFolder(os.path.join(data_dir, "train"), transform=transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

Draw sample from dataset:

In [None]:
from einops import rearrange    # do "!pip install einops" in case einops is not installed

def imshow(img):
    img = img / 2 + 0.5  # unnormalize to [0, 1]
    img = img.cpu().numpy()
    
#     img_reshaped = np.transpose(img, (1, 2, 0))     # torch returns an image of shape (3, H, W), you need to convert it to (H, W, 3)

    # instead, below we use einops.rearrange. You can read about this neat library here: https://github.com/arogozhnikov/einops/blob/master/docs/1-einops-basics.ipynb
    img_reshaped = rearrange(img, 'c h w -> h w c')
    
    plt.imshow(img_reshaped)
    plt.show()

In [None]:
(image, _) = train_dataset[0]
imshow(image)

As we can see input and target are in the same image. Let's write wrapper of ImageFolder to return what we need:

In [None]:
class PhotoMapDataset(torchvision.datasets.ImageFolder):
    def __getitem__(self, index):
        path, _ = self.samples[index]
        sample = self.loader(path)
        
        if self.transform is not None:
            sample = self.transform(sample)
            
        # sample will be of shape (3, H, W)
        
        photo_image, map_image = # your code here 
        
        return photo_image, map_image

So now we have:

In [None]:
train_dataset = PhotoMapDataset(root=os.path.join(data_dir, "train"), transform=transform)
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

In [None]:
photo_image, map_image = train_dataset[0]
imshow(photo_image)
imshow(map_image)

Okay, we're done with data. Let's move to defining model.

## Task 2. U-Net
UNet is a very popular fully-convolutional architecture. Below you can find its structure (for more details refer to [original paper](https://arxiv.org/abs/1505.04597)):

<!-- <img src="https://github.com/karfly/learning-deep-learning/blob/master/04_dense/static/unet.png?raw=true" width=800 align="center"/> -->

<img src="https://i.imgur.com/z0ceKX1.png" width=800 align="center"/>

Let's build U-Net!

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [None]:
# UNetDownBlock: Conv + ReLU + Conv + ReLU [+ MaxPool]

class UnetDownBlock(nn.Module):
    def __init__(self, in_channels, out_channels, pooling=True):
        super().__init__()
        
        # your code here
        
    def forward(self, x):
        
        # your code here
        
        return x, x_before_pooling

In [None]:
# UNetUpBlock: upsampling + concat + Conv + ReLU

class UnetUpBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        
        # your code here
        
    def forward(self, x, x_bridge):
        
        # your code here
        
        return x

In [None]:
class Unet(nn.Module):
    def __init__(self, in_channels, out_channels, depth=3, base_n_filters=64):
        super().__init__()
        
        # your code here
        
    def forward(self, x):
        
        # your code here
        
        return x
    
    def __repr__(self):
        message = '{}(in_channels={}, out_channels={}, depth={}, base_n_filters={})'.format(
            self.__class__.__name__,
            self.in_channels, self.out_channels, self.depth, self.base_n_filters
        )
        return message

In [None]:
model = Unet(3, 3, depth=4, base_n_filters=64).to(device)
model

## Train-loop

Optimization setup:

In [None]:
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(), lr=0.0002)

Setting up SummaryWriter for TensorBoard:

In [None]:
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime

experiment_name = "{}@{}".format(experiment_title, datetime.now().strftime("%d.%m.%Y-%H:%M:%S"))
print('experiment_name:', experiment_name)
writer = SummaryWriter(log_dir=os.path.join("./tb", experiment_name))

Train loop:

In [None]:
# you might want to create valid_dataset and valid_dataloader here (see Task 3 below)

In [None]:
n_epochs = 25

for epoch in range(n_epochs):
    model.train()
    n_iters = 0
    for batch in tqdm(train_dataloader):

        # unpack batch
        photo_image_batch, map_image_batch = batch
        photo_image_batch, map_image_batch = photo_image_batch.to(device), map_image_batch.to(device)
        
        # forward
        map_image_pred_batch = model(photo_image_batch)

        loss = criterion(map_image_pred_batch, map_image_batch)
        
        # optimize
        opt.zero_grad()
        loss.backward()
        opt.step()
        
        # dump statistics
        writer.add_scalar("train/loss", loss.item(), global_step=epoch * len(train_dataloader) + n_iters)
        
        if n_iters % 50 == 0:
            writer.add_image('train/photo_image', torchvision.utils.make_grid(photo_image_batch) * 0.5 + 0.5, 
                             epoch * len(train_dataloader) + n_iters)
            writer.add_image('train/map_image_pred', torchvision.utils.make_grid(map_image_pred_batch), 
                             epoch * len(train_dataloader) + n_iters)
            writer.add_image('train/map_image_gt', torchvision.utils.make_grid(map_image_batch), 
                             epoch * len(train_dataloader) + n_iters)
        
        n_iters += 1
    
    # YOUR CODE HERE
    
    print("Epoch {} done.".format(epoch))

## Run tensorboard

To look at your logs in tensorboard go to terminal and run command:
```bash
$ tensorboard --logdir PATH_TO_YOUR_LOG_DIR
```

Then go to browser to `localhost:6006` and you'll see beautiful graphs! Always use tensorbord to watch your experiment, because it's very important to check how training is going on.

When running in Colab, uncomment the following cells instead to initialize TensorBoard in Colab:

In [None]:
# %load_ext tensorboard

In [None]:
# logs_dir = os.path.join("./tb", experiment_name)
# %tensorboard --logdir {logs_dir}

## Task 3. Validation

As you remember we have `val` images in our dataset. So, to make sure, that we didn't overfit to `train`, we should do evaluation on validation set. You're free to choose, how to insert validation in existing notebook:
1. Insert validation to train-loop (validate every epoch)
2. Validate 1 time after training

I highly recomend to implement first option with beautiful tensorboard logs. Have fun! :)