## Conditional GANs

Generative Adversarial Networks (GANs) allow us to generate image, video, or audio data from a random input. Typically, the random input is sampled from a normal distribution before undergoing a series of transformations that convert it into something plausible (image, video, audio, etc.).

However, a simple DCGAN does not allow us to control the appearance (e.g., the class) of the samples we are generating. For instance, with a GAN that generates handwritten digits, a simple DCGAN would not let us choose the class of the digits being generated. To control what we generate, we need to condition the GAN's output on a semantic input, such as the class of an image.

Here, we will build a Conditional GAN that can generate handwritten digits conditioned on a specific class.

The applications of GANs are highly varied, given the wide range of data types they can handle. Essentially, in a GAN, all we need is a "real" dataset and the ability to approximate a function that allows us to generate new instances (via the generator network) capable of "fooling" a network that tries to distinguish between real and fake instances (the discriminator network). On this basis, for a wide variety of data types, GANs can help us generate sufficiently similar instances of those data types.

Here are some applications of GANs:

1. Image Generation: Imagine we have a dataset of images we want to use for training a model, but the number of available images is limited. We can augment the dataset using GANs to create new instances for each class in the original dataset.

2. Image Upscaling: GANs can be trained to enhance the resolution of an image. For this, we would need a dataset of paired low-resolution and high-resolution images to train the GAN.
"Vector Operations": Similar to how word representations in transformers involve vectors with semantically meaningful dimensions (e.g., the concept of "queen" derived from the vectors for "king," "man," and "woman"), GANs can be used to generate images that result from "operations" on other images. For example, aging a face.

3. Text-to-Image: As another variant of image-related applications, GANs (like GigaGAN) can be used to generate images from text descriptions. However, models based on diffusion, such as those powering MidJourney or DALL-E, are often preferred for this task.

4. Other Data Types: These examples are just a glimpse of the data types that can be generated using GANs. In principle, GANs can also generate other types of data, such as videos, audio, or text—and even beyond these. Many machine learning models require large volumes of data to generalize well (a concept we've seen repeatedly in this course). GANs can help generate new data instances to feed these models. Here are a few examples:

5. Fraud Detection: The goal of fraud detection in, for example, online transactions, is to have as few fraudulent transactions as possible. This scarcity makes training models to detect fraud challenging. If we have 10 million transactions for training and only 0.01% are fraudulent, the class imbalance is too significant, making it difficult to avoid overfitting. GANs can be used to increase the amount of fraudulent data, thereby addressing class imbalance in the dataset used to train the fraud detection model.

6. Anomaly Detection in Industry: Detecting defective parts can be crucial in industries (e.g., for safety reasons). Just as it can be difficult to obtain fraudulent instances, acquiring anomalous instances in industrial settings can also be challenging. To train better anomaly detection models, GANs can be used to augment the data with anomalous instances.

As we can see, most applications of GANs revolve around data augmentation.

In [1]:
!pip install -q git+https://github.com/tensorflow/docs

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for tensorflow-docs (setup.py) ... [?25l[?25hdone


## 0. Libraries

In [2]:
import keras

from keras import layers
from keras import ops
from tensorflow_docs.vis import embed
import tensorflow as tf
import numpy as np
import imageio
from sklearn.preprocessing import OneHotEncoder
import matplotlib.pyplot as plt

## 0.1. Definition of constants and hyperparameters

In [3]:
batch_size = 64
num_channels = 1
num_classes = 10
image_size = 28
latent_dim = 128

**About latent_dim**: We can understand images in terms of multidimensional vectors. However, we can also represent them in a space with fewer dimensions, known as the latent space. Each point in the latent space maps to a point in the image's dimensional space. Thus, by providing the generator with a point (i.e., a vector) from the latent space, the generator can produce an image. The value of latent_dim is simply the number of dimensions in the latent space—in our case, 128. This 128-dimensional space is much simpler than the high-dimensional space required to represent each "complete" image.








## 1. Dataset loading and preprocessing

We will use the dataset from: https://www.kaggle.com/datasets/jordidelatorreuoc/handwritten-digits-with-writer-characteristics/data

We will fetch the 28x28 version, analyze the contents of the dataset, identify the variables of interest (images in 28x28 format and labels), and load them into a TensorFlow dataset. The variables must be loaded and normalized to the range [0,1]. The classes should be one-hot encoded, e.g., class 9 as [0,0,0,0,0,0,0,0,1], and the images must have the correct dimensions to be processed later with TensorFlow.

The expected output dimensions are:

Shape of images: (13580, 28, 28, 1)
Shape of labels: (13580, 10)
The output of the process should be a dataset variable of type tf.data.Dataset.

The dataset contains six files, three of which provide information about the dataset. The file `Images(28x28).npy` contains the images in 28x28 pixel format, while the file `Images(500x500).npy` contains the images in 500x500 pixel format. Finally, the file `WriterInfo.npy` contains the labels (in the first column) as well as information about the person who wrote each number (which is actually the unique aspect of this dataset).

In [5]:
# Load the images, normalize the values to be between 0 and 1, and adjust the shape to the desired output.
images = np.load('/kaggle/input/handwritten-digits-with-writer-characteristics/HDW+/Images(28x28).npy')
images = images/255
images = images.reshape(13580, 28, 28, 1)

In [6]:
# Load labels
labels = np.load('/kaggle/input/handwritten-digits-with-writer-characteristics/HDW+/WriterInfo.npy')[:,0]

In [7]:
# Show the number of instances per digit
unique, counts = np.unique(labels, return_counts=True)
print(np.asarray((unique, counts)).T)

[[   0 1358]
 [   1 1358]
 [   2 1358]
 [   3 1358]
 [   4 1358]
 [   5 1358]
 [   6 1358]
 [   7 1358]
 [   8 1358]
 [   9 1358]]


In [8]:
# OneHot encoding
ohe = OneHotEncoder(sparse_output=False)
labels_onehot = ohe.fit_transform(labels.reshape(-1, 1))

In [9]:
# Show an example
print(labels[1000])
print(labels_onehot[1000])

2
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]


In [11]:
temp = images.reshape(13580, 28, 28, 1)
print(f"Shape of images: {images.shape}")
print(f"Shape of labels: {labels_onehot.shape}")

Shape of images: (13580, 28, 28, 1)
Shape of labels: (13580, 10)


In [12]:
dataset = tf.data.Dataset.from_tensor_slices((images, labels_onehot))
dataset = dataset.batch(batch_size)

## 2. Number of input channels for the generator and the discriminator

In a regular (non-conditional) GAN, we start by sampling noise (of a fixed dimension) from a normal distribution. In our case, we also need to account for class labels. We will need to add the class labels to the input channels of the generator (noise input) and also to the discriminator (generated image input).

In [13]:
generator_in_channels = latent_dim + num_classes
discriminator_in_channels = num_channels + num_classes
print(generator_in_channels, discriminator_in_channels)

138 11


`generator_in_channels`: Specifies the number of dimensions the generator's input will have. This should correspond to the dimensions of the latent space (from which random vectors are drawn to feed the generator) plus the dimensions of the vector indicating the class to which each image belongs.

`discriminator_in_channels`: Specifies the number of dimensions the discriminator's input will have. This should correspond to the number of channels in each image (in our case, only one, since these are grayscale images), plus, again, the dimensions of the vector indicating the class to which each image belongs. This is because it’s not enough for the generator to simply learn to create coherent numbers; since this is a conditional GAN, we want it to generate the specific number it is supposed to generate, and this must also be trained with the help of the discriminator.