# Deep Learning
## Formative assessment
### Week 8: Normalising flows

#### Instructions

In this notebook, you will write code to implement a complete RealNVP normalising flow model, including checkerboard and channel-wise masking, and combining all the components into a multiscale architecture. You will train the normalising flow on the CIFAR-10 dataset.

Some code cells are provided you in the notebook. You should avoid editing provided code, and make sure to execute the cells in order to avoid unexpected errors. Some cells begin with the line: 

`#### GRADED CELL ####`

These cells require you to write your own code to complete them.

#### Let's get started!

We'll start by running some imports, and loading the dataset. 

In [None]:
#### PACKAGE IMPORTS ####

# Run this cell first to import all required packages. Do not make any imports elsewhere in the notebook

import keras
from keras import ops
from keras import Model, Sequential
from keras.layers import Layer, Input, Conv2D, BatchNormalization
from keras.regularizers import l2
import torch
import numpy as np
import matplotlib.pyplot as plt

<img src="figures/cifar10.png" title="CIFAR-10" style="width: 700px;"/> 

#### The CIFAR-10 dataset
In this assignment, you will use the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). This image dataset has 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. 

* A. Krizhevsky (2009), "Learning Multiple Layers of Features from Tiny Images", technical report.

Your goal is to develop a RealNVP normalising flow generative model, trained on this dataset. This assignment will roughly follow the architecture described in the original RealNVP paper:

* Dinh, L., Sohl-Dickstein, J. & Bengio, S. (2017), "Density estimation using Real NVP",  in *5th International Conference on Learning Representations, (ICLR)*, Toulon, France, April 24-26, 2017.

An important conceptual point to bear in mind during the course of this assignment, is that we also follow the original paper by thinking of the forward transformation as acting on the input image. Note that this is in contrast to the convention for bijectors of using the forward transformation for sampling, and the inverse transformation for computing log probs.

#### Load and preprocess the dataset

In [None]:
# Load the dataset from keras.datasets

(images_train, labels_train), (images_val, labels_val) = keras.datasets.cifar10.load_data()

In [None]:
# Define a list for the labels

word_labels = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

In [None]:
# Display a few images and labels

plt.figure(figsize=(15,8))
inx = np.random.choice(images_train.shape[0], 32, replace=False)
for n, i in enumerate(inx):
    ax = plt.subplot(4, 8, n+1)
    plt.imshow(images_train[i])
    plt.title(word_labels[int(np.squeeze(labels_train[i]))])
    plt.axis('off')

You should now write a `get_datasets` function to load the data into PyTorch DataLoaders, and preprocess the data ready for training.

* The function takes arguments `images_train`, `images_val`, and `batch_size`
* Your function should convert the image dtype to `float32`, rescale so the pixel values lie in the range `[0, 1]`
* Create two `torch.utils.data.Dataset` objects, one for training and one for validation
  * Only the images should be loaded into the Datasets; the labels will not be used
  * The Datasets should return a tuple `(img, img)` containing two copies of the image
* The corresponding DataLoaders should shuffle the training Dataset and batch both Datasets using the `batch_size` argument
* Your function should return both DataLoaders in a tuple `(train_dl, valid_dl)`

In [None]:
#### GRADED CELL ####

# Complete the following function.
# Make sure not to change the function name or arguments.

def get_datasets(images_train, images_val, batch_size):
    """
    This function takes the training and validation images, as well as a
    batch_size argument. It should load and preprocess the data as specified above.
    Your function should then return the Datasets in the tuple (train_dl, valid_dl)
    """
    
    

In [None]:
# Run your function to create the Datasets

batch_size = 64
train_dl, valid_dl = get_datasets(images_train, images_val, batch_size)

#### Custom model for log-scale and shift
Recall the equations for the affine coupling layer:

$$
\begin{align}
\left. 
\begin{array}{rcl}
\mathbf{z}_{1:d} &= &\mathbf{x}_{1:d},\\
\mathbf{z}_{d+1:D} &=& \mathbf{x}_{d+1:D}\odot \exp(s(\mathbf{x}_{1:d})) + t(\mathbf{x}_{1:d}).
\end{array}
\right\}\qquad\text{forward pass}
\\[1ex]
\left. 
\begin{array}{rcl}
\mathbf{x}_{1:d} &=& \mathbf{z}_{1:d},\label{realnvp_inv_acl1}\\
\mathbf{x}_{d+1:D} &=& (\mathbf{z}_{d+1:D} - t(\mathbf{z}_{1:d})) \odot \exp(-s(\mathbf{z}_{1:d})).
\end{array}
\right\}\qquad\text{inverse pass}
\end{align}
$$

We will use a custom CNN residual network model for the shift and log-scale parameters that are used in this layer bijector. 

You should now complete the following class to create a custom layer as the residual block for use in this custom model. 

* The class initializer takes `num_filters`, `kernel_size` and `l2reg_coeff` arguments, and optional keyword arguments
  * Any keyword arguments should be passed up to the base class initializer
  * The required arguments should be set as class attributes, to be available to other methods
* The class should implement a `build` method, that creates the model layers
  * There should be two `Conv2D` layers. The first has `num_filters` filters, and the second has the same number of filters as the layer inputs
  * Both `Conv2D` layers should use the `kernel_size` argument to set the kernel size, and should use a ReLU activation and `"SAME"` padding
  * Both `Conv2D` layers should also use $l^2$ kernel regularisation, using the `l2reg_coeff` argument
  * This method should also create two `BatchNormalization` layers
* In the `call` method, the layer inputs should be processed as follows:
  * First, the inputs are passed through the first `Conv2D` layer (with `num_filters` filters)
  * Then they are passed through a `BatchNormalization` layer
  * Then they are processed by the other `Conv2D` layer, and then the other `BatchNormalization` layer
  * Finally, this output should then be added to the original layer input and returned

In [None]:
#### GRADED CELL ####

# Complete the following class. 
# Make sure to not change the class name or provided methods and signatures.

class Conv2DResidualBlock(Layer):
    
    def __init__(self, num_filters, kernel_size, l2reg_coeff, **kwargs):
        """
        Class initializer takes kernel_size num_filters and l2reg_coeff as arguments, and 
        optional keyword arguments that should be passed to the base Layer class initializer.
        """
        
        

In [None]:
# Create residual block layers using your class

resnet_block1 = Conv2DResidualBlock(64, (3, 3), 5e-5, name='resnet1')
resnet_block2 = Conv2DResidualBlock(64, (3, 3), 5e-5, name='resnet2')

In [None]:
# Build and call the first residual block

resnet_block1(keras.random.normal((1, 32, 32, 3))).shape

In [None]:
# Build and call the second residual block

resnet_block2(keras.random.normal((1, 32, 32, 3))).shape

You should now complete the following `get_shift_and_log_scale_resnet` function that builds the full shift and log-scale network, using the `Conv2DResidualBlock` class above.

* This function takes `input_shape`, `kernel_size` and `l2reg_coeff` as arguments, as well as `residual_blocks`, which is a list of `Conv2DResidualBlock` objects
* The function should use the functional API to build the multi-output model
* The model should use the `input_shape` in the function argument to set the shape in the `Input` layer
* The inputs should be processed sequentially by the layers contained in the `residual_blocks` list
* There should then be a final `Conv2D` layer that processes this output, using the `kernel_size` argument, and $l^2$ kernel regularization using `l2reg_coeff`
  * This `Conv2D` layer should have twice as many filters as the input
  * It should use `"SAME"` padding and have no activation function
* The output of this layer should then be split into two equal-sized Tensors along the final channel axis. These two Tensors are the shift and log-scale Tensors, and should each have the same shape as the model input
* Finally, you should then apply the `tanh` nonlinearity to the log_scale Tensor
* The outputs to the model should be the list of Tensors `[shift, log_scale]`

_Hint: use_ [`keras.ops.split`](https://keras.io/api/ops/numpy/#split-function) _to create the output Tensors_.

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def get_shift_and_log_scale_resnet(input_shape, kernel_size, l2reg_coeff, residual_blocks):
    """
    This function should build the CNN shift and log-scale ResNet model according to the 
    above specification, using the functional API. The model should be multi-output, where
    the output Tensors are [shift, log_scale].
    Your function should return the model.
    """
    
    

In [None]:
# Create a shift and log-scale model using your function

shift_and_log_scale = get_shift_and_log_scale_resnet((6, 6, 3), (3, 3), 5e-5, 
                                                     [resnet_block1, resnet_block2])

In [None]:
# Print the model summary

shift_and_log_scale.summary()

In [None]:
# Check the output shapes are as expected

print(shift_and_log_scale(keras.random.normal((1, 6, 6, 3)))[0].shape)
print(shift_and_log_scale(keras.random.normal((1, 6, 6, 3)))[1].shape)

#### Binary masks

Now that the shift and log-scale model code is complete, you can use it in the implementation of the affine coupling layer. Recall that the affine coupling layer transformations can be rewritten in the following form, using a binary mask $b$:

$$
\begin{align}
\mathbf{z} &= b\odot \mathbf{x} + (1-b)\odot(\mathbf{x}\odot \exp(s(b\odot \mathbf{x})) + t(b\odot\mathbf{x})) & \text{(forward pass)}\\
\mathbf{x} &= b\odot \mathbf{z} + (1-b)\odot((\mathbf{z}-t(b\odot\mathbf{z}))\odot \exp(-s(b\odot \mathbf{z}))) & \text{(inverse pass)}
\end{align}
$$

The following two functions will be used to create the binary masks in this layer.

First, you should complete the following function that builds the channel-wise binary mask. 

* This function takes a single integer `num_channels` as an input
* You can assume that `num_channels` is even
* The function should return a rank-3 Tensor with singleton entries for height and width dimensions
* In the channel axis, the first `num_channels // 2` entries should be zero, and the final `num_channels // 2` entries should be one
* The `dtype` of the returned Tensor should be `float32`
* The function should return the binary mask Tensor

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def channel_binary_mask(num_channels):
    """
    This function should build and return the channel-wise binary mask as described above,
    with zeros for the first half of the channels, and ones for the remainder.
    Your function should return the binary mask Tensor.
    """
    
    

In [None]:
# Run your function to see an example channel-wise binary mask

channel_binary_mask(6)

The following function creates the checkerboard binary mask.

* The function takes `shape` as an input, which is a integer tuple of length 2, corresponding to the height and width dimensions
* You can assume both height and width dimensions are even integers
* The function should return a rank-3 Tensor with a singleton entry in the channel dimension
* In the spatial dimensions, the entry at index `[0, 0]` should be zero. The remaining entries should be filled with ones and zeros in a checkerboard pattern
* The `dtype` of the returned Tensor should be `tf.float32`
* The function should return the binary mask Tensor

In [None]:
#### GRADED CELL ####

# Complete the following function. 
# Make sure to not change the function name or arguments.

def checkerboard_binary_mask(shape):
    """
    This function should build and return the spatial checkerboard binary mask as 
    described above, with a zero in the [0, 0] entry in the spatial dimensions.
    Your function should return the binary mask Tensor.
    """
    
    

In [None]:
# Run your function to see an example checkerboard binary mask

ops.squeeze(checkerboard_binary_mask((6, 6)))

#### Affine coupling layer

The following is the same class that we implemented in this week's coding tutorial (with the minor modification of adding the `training` keyword argument for use with the `BatchNormalization` layers). It will work with either of the binary masks above.

In [None]:
class AffineCouplingLayer(Layer):

    def __init__(self, shift_and_log_scale_fn, mask, **kwargs):
        super().__init__(**kwargs)
        self.shift_and_log_scale_fn = shift_and_log_scale_fn
        self.b = ops.cast(mask, 'float32')

    def build(self, input_shape):
        self.event_dims = list(range(1, len(input_shape)))

    def call(self, x, training=None):
        t, log_s = self.shift_and_log_scale_fn(self.b * x, training=training)
        z = self.b * x + (1 - self.b) * (x * ops.exp(log_s) + t) 
        
        self.add_loss(-ops.mean(ops.sum(log_s * (1 - self.b), axis=self.event_dims)))
        return z

    def inverse(self, z, training=None):
        t, log_s = self.shift_and_log_scale_fn(self.b * z, training=training)
        x = self.b * z + (1 - self.b) * ((z - t) * ops.exp(-log_s))
        return x

In [None]:
# Create an affine coupling layer with a checkerboard mask

mask = checkerboard_binary_mask((6, 6))
affine_coupling_layer = AffineCouplingLayer(shift_and_log_scale, mask)

In [None]:
# View an example layer output - look at only one channel dimension for easier viewing

output = affine_coupling_layer(ops.ones((1, 6, 6, 3)))
print(ops.squeeze(output)[...,0])

In the above example, you should find that the unmasked elements (entries where the mask is equal to one) are unchanged.

#### Combining the affine coupling layers

Recall that the affine coupling layers are combined into groups of 3 or 4 in the RealNVP architecture. Within each group, successive affine coupling layers are applied with alternating masks.

You should now complete the following `AffineCouplingLayerBlock` custom layer to build one of these blocks. 

* The class initialiser takes `shift_and_log_scale_fns` and `mask` as arguments
* The `shift_and_log_scale_fns` is a list or tuple of shift and log scale objects (as used in the `AffineCouplingLayer` above)
* The `mask` argument is either a channel-wise or checkerboard mask Tensor
* The layer should consist of successive `AffineCouplingLayer` objects chained together
  * The first affine coupling layer should use the mask passed in the `mask` argument
  * Following affine coupling layers should use alternating masks (of the same type)
* As well as implementing the `call` method, your layer should also implement an `inverse` method for the backward pass through the affine coupling layers

In [None]:
#### GRADED CELL ####

# Complete the following class. 
# Make sure to not change the function name or arguments.

class AffineCouplingLayerBlock(Layer):

    def __init__(self, shift_and_log_scale_fns, mask, **kwargs):
        """
        Class initializer takes shift_and_log_scale_fns and mask as arguments, and 
        optional keyword arguments that should be passed to the base Layer class initializer.
        """
        
        

In [None]:
# Create an affine coupling layer block

resnet_block1 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block2 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_1 = get_shift_and_log_scale_resnet((32, 32, 3), (3, 3), 5e-5, 
                                                     [resnet_block1, resnet_block2])

resnet_block3 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block4 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_2 = get_shift_and_log_scale_resnet((32, 32, 3), (3, 3), 5e-5, 
                                                     [resnet_block3, resnet_block4])

resnet_block5 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block6 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_3 = get_shift_and_log_scale_resnet((32, 32, 3), (3, 3), 5e-5, 
                                                     [resnet_block5, resnet_block6])

mask = checkerboard_binary_mask((32, 32))

acl_block_1 = AffineCouplingLayerBlock([shift_and_log_scale_1, shift_and_log_scale_2, shift_and_log_scale_3], mask)

#### The squeeze operation

In the RealNVP architecture, after an affine coupling layer block with checkerboard masking (as above), there is a squeeze operation, where the spatial dimensions of the layer are divided into $2\times 2\times c$ subsquares, and reshaped into $1\times 1\times 4c$.

The squeezing operation is also a bijective operation. The `call` method has been completed for you in the custom `Squeeze` layer below. Note that the log Jacobian determinant of the squeeze operation is zero, so there is no contribution from this layer to the negative log-likelihood loss.

You should now complete the `inverse` method in the custom layer below. As with the `AffineCouplingLayer` class, this method should compute the inverse of the `call` computation.

In [None]:
#### GRADED CELL ####

# Complete the following class. 
# Make sure to not change the class name or provided methods and signatures.

class Squeeze(Layer):
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def call(self, x):
        input_shape = ops.shape(x)
        batch_size, height, width, channels = input_shape
        h = ops.reshape(x, (batch_size, height // 2, 2, width // 2, 2, channels))
        h = ops.transpose(h, axes=[0, 1, 3, 2, 4, 5])
        z = ops.reshape(h, (batch_size, height // 2, width // 2, 4 * channels))
        return z

    def inverse(self, z):
        """
        This method should compute the inverse of the call method,
        as described above.
        """
        
        

In [None]:
# Test the Squeeze bijector

squeeze = Squeeze()
squeeze(ops.ones((10, 32, 32, 3))).shape

In [None]:
# Test the inverse operation

squeeze.inverse(ops.ones((10, 4, 4, 96))).shape

#### Multiscale architecture

You are now ready to bring all of the components together in the complete multiscale architecture. The RealNVP model that we will build will factor out latent variables to downscale the input only once. This model is visualised in the following diagram:

<img src="figures/realnvp_model.png" alt="RealNVP model" style="width: 800px;"/>

We have already instantiated the first of these affine couple layer blocks above. The following two cells instantiate the remaining two blocks.

In [None]:
# Create the second affine coupling layer block

resnet_block7 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block8 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_4 = get_shift_and_log_scale_resnet((16, 16, 12), (3, 3), 5e-5, 
                                                     [resnet_block7, resnet_block8])

resnet_block9 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block10 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_5 = get_shift_and_log_scale_resnet((16, 16, 12), (3, 3), 5e-5, 
                                                     [resnet_block9, resnet_block10])

resnet_block11 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block12 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_6 = get_shift_and_log_scale_resnet((16, 16, 12), (3, 3), 5e-5, 
                                                     [resnet_block11, resnet_block12])

mask = channel_binary_mask(12)

acl_block_2 = AffineCouplingLayerBlock([shift_and_log_scale_4, shift_and_log_scale_5, shift_and_log_scale_6], mask)

In [None]:
# Create the third affine coupling layer block

resnet_block13 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block14 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_7 = get_shift_and_log_scale_resnet((16, 16, 6), (3, 3), 5e-5, 
                                                     [resnet_block13, resnet_block14])

resnet_block15 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block16 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_8 = get_shift_and_log_scale_resnet((16, 16, 6), (3, 3), 5e-5, 
                                                     [resnet_block15, resnet_block16])

resnet_block17 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block18 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_9 = get_shift_and_log_scale_resnet((16, 16, 6), (3, 3), 5e-5, 
                                                     [resnet_block17, resnet_block18])

resnet_block19 = Conv2DResidualBlock(64, (3, 3), 5e-5)
resnet_block20 = Conv2DResidualBlock(64, (3, 3), 5e-5)
shift_and_log_scale_10 = get_shift_and_log_scale_resnet((16, 16, 6), (3, 3), 5e-5, 
                                                        [resnet_block19, resnet_block20])

mask = checkerboard_binary_mask((16, 16))

acl_block_3 = AffineCouplingLayerBlock([shift_and_log_scale_7, shift_and_log_scale_8, 
                                        shift_and_log_scale_9, shift_and_log_scale_10],
                                       mask)

You should now implement the multiscale architecture in the following custom layer. This layer should be constructed to be able to operate on a batch of Tensors of shape `(B, H, W, C)`, with the only assumption being that `H` and `W` are both even.

* The initializer takes the three affine coupling layer blocks that are part of the above architecture:
  * `acl_block_1` is a block with 3 affine coupling layers with checkerboard masking
  * `acl_block_2` is a block with 3 affine coupling layers with channel masking
  * `acl_block_3` is n block with 4 affine coupling layers with checkerboard masking
* You should implement the `call` and `inverse` methods
* The forward transformation should operate on the inputs as depicted above:
  * It should pass the inputs through the first ACL block, a `Squeeze` operation, and then the second ACL block
  * It should then split the Tensor in half along the channel axis
  * The first half of the channel dimensions should be used as latent variables. Call this $\mathbf{z}^{(1)}$
  * The second half of the channel dimensions (call this $\mathbf{h}^{(1)}$) should be further processed through the third ACL block to produce $\mathbf{z}^{(2)}$
  * The final latent variable $\mathbf{z} = (\mathbf{z}^{(1)}, \mathbf{z}^{(2)})$ should be concatenated along the channel dimension. This should be returned by the `call` method
* The `inverse` method should perform precisely the inverse of the `call` method
  
_Hint: use_ `keras.ops.split` _and_ `keras.ops.concatenate` _to factor out (and recombine) latent variables in the forward and inverse passes._

In [None]:
#### GRADED CELL ####

# Complete the following class. 
# Make sure to not change the class name or provided methods and signatures.

class RealNVPMultiScale(Layer):
    
    def __init__(self, acl_block_1, acl_block_2, acl_block_3, **kwargs):
        """
        The initializer takes three instances of the AffineCouplingLayerBlock class.
        """
        
        

    def call(self, x, training=None):
        """
        This function computes the forward transformation as described above.
        It takes an input image batch x, and returns a latent variable batch z.
        """
        
        
        
    def inverse(self, z, training=None):
        """
        This function computes the inverse transformation as described above.
        It takes a latent variable batch z, and returns an input image batch x.
        """
        
        

In [None]:
# Build the RealNVP model

realnvp = RealNVPMultiScale(acl_block_1, acl_block_2, acl_block_3)

#### Data preprocessing layer

We will also preprocess the image data before sending it through the RealNVP model. To do this, for a Tensor $\mathbf{x}$ of pixel values in $[0, 1]^D$, we transform $\mathbf{x}$ according to the following (all operations performed elementwise):

$$
T(\mathbf{x}) = \text{logit}\left(\alpha + (1 - 2\alpha)\mathbf{x}\right),
$$

where $\alpha$ is a parameter, and the logit function is the inverse of the sigmoid function, which is given by 

$$
\text{logit}(p) = \log (p) - \log (1 - p).
$$

You should now complete the following class to construct this invertible layer.

* The initializer takes the parameter `alpha` as an input, which you can assume to take a small positive value ($\ll0.5$)
* The layer's `call` method should compute $T(\mathbf{x})$ in the forward pass
* You should also implement an `inverse` method, which computes the inverse of the `call` method
* This layer is a preprocessing step and not part of the normalising flow itself, so the layer should not add a log Jacobian determinant loss contribution

In [None]:
#### GRADED CELL ####

# Complete the following class. 
# Make sure to not change the function name or arguments.

class PreProcessor(Layer):
    """
    Extra data preprocessing step implemented as a custom Layer. The layer
    should implement the __init__, call and inverse methods as described above.
    """

    

In [None]:
# Create an instance of the preprocess bijector

preprocess = PreProcessor(alpha=0.05)

#### Define and train the RealNVP model

The following is the same custom loss function that we implemented in this week's coding tutorial. We will make use of it again to implement the latent space distribution loss term.

In [None]:
# Define the custom loss function

def normal_pdf_loss(y_true, y_pred):
    event_dims = list(range(1, len(y_pred.shape)))
    const = 0.5 * ops.log(2. * np.pi)
    return ops.sum(const + ops.square(y_pred)/2., axis=event_dims)

In [None]:
# Define the model for training

inputs = Input(shape=(32, 32, 3))
h = preprocess(inputs)
outputs = realnvp(h)
realnvp_model = Model(inputs=inputs, outputs=outputs)

In [None]:
# Compile and train the model

optimizer = keras.optimizers.Adam()
realnvp_model.compile(loss=normal_pdf_loss, optimizer=optimizer)
realnvp_model.fit(train_dl, validation_data=valid_dl, epochs=20)

#### Generate some samples

In [None]:
# Sample from the model

n_images = 12

h = keras.random.normal((n_images, 16, 16, 12))
for layer in reversed(realnvp_model.layers[1:]):
    h = layer.inverse(h)
    
samples = ops.convert_to_numpy(h)

In [None]:
# Display the samples

f, axs = plt.subplots(2, n_images // 2, figsize=(17, 6))
for k, image in enumerate(samples):
    i = k % 2
    j = k // 2
    axs[i, j].imshow(np.clip(image, 0., 1.))
    axs[i, j].axis('off')
f.subplots_adjust(wspace=0.03, hspace=0.03)

Congratulations on completing this week's assignment! In this assignment you have developed a full implementation of the RealNVP architecture, including the affine coupling layers with channel-wise and checkerboard masking, CNN ResNet networks for the shift and log scale functions, the squeeze operation and multiscale architecture. For optimal performance, the model should be larger and trained for longer. The architecture in the original paper also contains some additional features.