# Advanced Topics in Normalizing Flows - 1x1 convolution

**Filled notebook:** 

<center width="100%"> 

[![View on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/DL2/template/TemplateNotebook.ipynb)

   
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/phlippe/uvadlc_notebooks/blob/master/docs/tutorial_notebooks/DL2/template/TemplateNotebook.ipynb)

</center>
    
**Pre-trained models:** 

<center width="100%"> 

[![View files on Github](https://img.shields.io/static/v1.svg?logo=github&label=Repo&message=View%20On%20Github&color=lightgrey)](https://github.com/phlippe/saved_models/tree/main/DL2/template/)      

</center>

**Authors:**
Cyril Hsu

### Introduction
The [Glow](https://arxiv.org/abs/1807.03039), a flow-based generative model extends the previous invertible generative models, [NICE](https://arxiv.org/abs/1410.8516) and [RealNVP](https://arxiv.org/abs/1605.08803), and simplifies the architecture by replacing the reverse permutation operation on the channel ordering with **Invertible 1x1 Convolutions**. Glow is famous for being the one of the first flow-based models that works on high resolution images and enables manipulation in latent space. Let's have a look at the interactive [demonstration](https://openai.com/blog/glow/) from OpenAI.

<center width="100%"><img src="face_demo.gif" width="350px" style="padding: 20px"></center>

Glow consists of a series of steps of flow. Each step of flow comprises **Actnorm** followed by an **Invertible 1×1 Convolution**, and finally a **Coupling Layer**.
<center width="100%"><img src="glow_bb.png" width="350px" style="padding: 20px"></center>

**Actnorm** performs an affine transformation with a scale and bias parameter per channel, similar to that of batch normalization, but works on mini-batch size 1.

**Invertible 1×1 Convolution** with equal number of input and output channels is a generalization of any permutation of the channel ordering. Recall the operation between layers of the RealNVP flow, the ordering of channels is switched so that all the data dimensions have a chance to be mixed. 1x1 convolution is proposed to replace this fixed permutation with a learned invertible operation.

**Coupling Layer** is a powerful reversible transformation where the forward function, the reverse function and the logdeterminant are computationally efficient. The design is the same as in RealNVP.

<center width="100%"><img src="assets/glow_comp.png" width="800px" style="padding: 20px"></center>

In this tutorial, we will be focusing on the implemetation of invertible 1x1 convolution layer.

### Invertible 1x1 convolution

Given an input of shape $H\times W\times C$ applied with a 1x1 convolution with $C$ filters, meaning the output tensor shape is also going to be $H\times W\times C$. Thus, each layer has a set of weights $W$ with $C\times C$ values.
The forward operation acts just like a typical convolution, while the inverse operation can be computed by simply applying a convolution with $W^{-1}$ weights.
<center width="100%"><img src="1x1.png" width="500px" style="padding: 20px"></center>

Enough descriptions! Now let's take a look at the code.

In [14]:
import torch
from torch import nn
from torch.nn import functional as F

class InvConv2d(nn.Module):
    def __init__(self, in_channel):
        super().__init__()

        weight = torch.randn(in_channel, in_channel)
        # use the Q matrix from QR decompisition as the initial weight to make sure it's invertible
        q, _ = torch.qr(weight)
        weight = q.unsqueeze(2).unsqueeze(3)
        self.weight = nn.Parameter(weight)

    def forward(self, input, logdet, reverse=False):
        _, _, height, width = input.shape

        dlogdet = (
            height * width * torch.log(torch.abs(torch.det(self.weight.squeeze())))
        )

        if not reverse:
            out = F.conv2d(input, self.weight)
            logdet = logdet + dlogdet

        else:
            out = F.conv2d(input, self.weight.squeeze().inverse().unsqueeze(2).unsqueeze(3))
            logdet = logdet - dlogdet

        return out, logdet

Note that to calcute the determinant of $W$ could be computationally expensive, thus there's also an [implemetation](https://github.com/rosinality/glow-pytorch/blob/master/model.py#L88) which utilizes LU decomposition to speed up, as suggested in the Glow paper.

The idea is to parameterizing $W$ directly in its LU decomposition:

$$
W = PL(U + \text{diag}(s)),
$$

where $P$ is a permutation matrix, $L$ is a lower triangular matrix with ones on the diagonal, $U$ is an upper triangular matrix with zeros on the diagonal, and $s$ is a vector.

The log-determinant is then simply:

$$
\log | \det(W)| = \sum \left(\log |s|\right)
$$

Please check out the link above for the implementation.

#### A small pitfall
As you might notice, there's an inverse operation for the weight $W$ involved when the **1x1 convolution** is forwarding reversely. As a result, an error can occur when the weight $W$ is not invertible, even though it seldom happends.

To our best knowledge, there's no elegant solution to address this, but an easy way to workaround: If this happends unfortunately during the training, one can try to restart from the recent checkpoint.

### A complete flow block

Now we have the **Invertible 1x1 Convolution**. Together with the aforementioned **Actnorm** and **Coupling Layer**, we are ready to try out the power of the Glow by pluging the block into the model we had in the NFs tutorial!

In [21]:
class ActNorm(nn.Module):
    def __init__(self, in_channel):
        super().__init__()

        self.loc = nn.Parameter(torch.zeros(1, in_channel, 1, 1))
        self.scale = nn.Parameter(torch.ones(1, in_channel, 1, 1))
        self.register_buffer("initialized", torch.tensor(0, dtype=torch.uint8))

    def initialize(self, input):

        with torch.no_grad():
            flatten = input.permute(1, 0, 2, 3).contiguous().view(input.shape[1], -1)
            mean = (
                flatten.mean(1)
                .unsqueeze(1)
                .unsqueeze(2)
                .unsqueeze(3)
                .permute(1, 0, 2, 3)
            )
            std = (
                flatten.std(1)
                .unsqueeze(1)
                .unsqueeze(2)
                .unsqueeze(3)
                .permute(1, 0, 2, 3)
            )

            self.loc.data.copy_(-mean)
            self.scale.data.copy_(1 / (std + 1e-6))

    def forward(self, input, logdet, reverse=False):
        _, _, height, width = input.shape

        if self.initialized.item() == 0:
            self.initialize(input)
            self.initialized.fill_(1)

        dlogdet = height * width * torch.sum(torch.log(torch.abs(self.scale)))

        if not reverse:
            logdet += dlogdet
            return self.scale * (input + self.loc), logdet

        else:
            dlogdet *= -1
            logdet += dlogdet
            return input / self.scale - self.loc, logdet

A sample code for **Actnorm** is provided above.

As for **Coupling Layer**, please refer to the one in the NFs tutorial.

## Conclusion

We've learned an advanced flow-based layer from the Glow model, an **Invertable 1x1 convolution**, which is adapted from the typical 1x1 convolution layers.

### References

* [Glow: Generative Flow with Invertible 1x1 Convolutions](https://arxiv.org/abs/1807.03039)
* [Glow: Better Reversible Generative Models](https://openai.com/blog/glow/)
* https://github.com/rosinality/glow-pytorch
* [Materials from NTU Speech Lab](https://reurl.cc/9O8bka)