# License

Copyright 2020 Hamaad Musharaf Shah

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# Automatic feature engineering using Generative Adversarial Networks
## Author: Hamaad Shah

---

The purpose of deep learning is to learn a representation of high dimensional and noisy data using a sequence of differentiable functions, i.e., geometric transformations, that can perhaps be used for supervised learning tasks among others. It has had great success in discriminative models while generative models have fared worse due to the limitations of explicit maximum likelihood estimation (MLE). Adversarial learning as presented in the Generative Adversarial Network (GAN) aims to overcome these problems by using implicit MLE. 

We will use the MNIST computer vision dataset for these experiments. GAN is a remarkably different method of learning compared to explicit MLE. Our purpose will be to show that the representation learnt by a GAN in an unsupervised manner can be used for supervised learning tasks. Unlabelled data is inexpensive to obtain in large quantities therefore training a feature extractor in an unsupervised manner is a powerful first step towards later training supervised learning models which perhaps may not have access to a large amount of labelled data.

In [None]:
import inspect

import numpy as np

np.set_printoptions(suppress=True)

import pandas as pd

import tensorflow as tf

gpu_devices = tf.config.list_physical_devices(device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpu_devices[0], enable=True)

from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

from scipy.stats import norm

import matplotlib.pyplot as plt

%matplotlib inline
from PIL import Image
import plotnine

print("TensorFlow version:", tf.__version__)
print("GPU:", gpu_devices)

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = (
    x_train.astype(dtype="float32") / 255.0,
    x_test.astype(dtype="float32") / 255.0,
)

y_train = y_train.ravel()
y_test = y_test.ravel()

class_per_label_size = 100
sampled_class_ids = np.concatenate(
    [
        [
            np.random.choice(
                a=np.arange(start=0, stop=y_train.shape[0], step=1)[
                    y_train == class_label
                ],
                size=class_per_label_size,
                replace=False,
            )
        ]
        for class_label in np.unique(ar=y_train)
    ],
    axis=0,
)
sampled_x_train = np.concatenate(
    [x_train[these_class_ids, :, :] for these_class_ids in sampled_class_ids], axis=0
)
sampled_y_train = np.concatenate(
    [y_train[these_class_ids] for these_class_ids in sampled_class_ids], axis=0
)

## Generative Adversarial Network

---

There are 2 main components to a GAN, the generator and the discriminator, that play an adversarial game against each other. In doing so the generator learns how to create realistic synthetic samples from noise, i.e., the latent space $z$, while the discriminator learns how to distinguish between a real sample and a synthetic sample. 

The representation learnt by the discriminator can later on be used for other supervised learning tasks, i.e., automatic feature engineering or representation learning. This can also be viewed through the lens of transfer learning. A GAN can also be used for semi-supervised learning which we will get to in another paper where we will look into using variational autoencoders, ladder networks and adversarial autoencoders for this purpose.

### Computer Vision

---

We will use the MNIST dataset for this purpose where the raw data is a 2 dimensional tensor of pixel intensities per image. The image is our unit of analysis: We will predict the probability of each class for each image. This is a multiclass classification task and we will use the accuracy score to assess model performance on the test fold.

![](pixel_lattice.png)

Some examples of handcrafted feature engineering for the computer vision task perhaps might be using Gabor filters.

### Generator

---

Assume that we have a prior belief on where the latent space $z$ lies: $p(z)$. Given a draw from this latent space the generator $G$, a deep learner parameterized by $\theta_{G}$, outputs a synthetic sample.

$$
G(z|\theta_{G}): z \rightarrow x_{synthetic}
$$ 

### Discriminator

---

The discriminator $D$ is another deep learner parameterized by $\theta_{D}$ and it aims to classify if a sample is real or synthetic, i.e., if a sample is from the real data distribution,

$$
P_{\text{data}}
$$ 

or the synthetic data distribution.

$$
P_{G}
$$

Let us denote the discriminator $D$ as follows.

$$
D(x|\theta_{D}): x \rightarrow [0, 1]
$$ 

Here we assume that the positive examples are from the real data distribution while the negative examples are from the synthetic data distribution.

### Game: Optimality

---

A GAN simultaneously trains the discriminator to correctly classify real and synthetic examples while training the generator to create synthetic examples such that the discriminator incorrectly classifies real and synthetic examples. This 2 player minimax game has the following objective function.

$$
\min_{G(z|\theta_{G})} \max_{D(x|\theta_{D})} V(D(x|\theta_{D}), G(z|\theta_{G})) = \mathbb{E}_{x \sim p_{\text{data}}(x)} \log{D(x|\theta_{D})} + \mathbb{E}_{z \sim p(z)} \log{(1 - D(G(z|\theta_{G})|\theta_{D}))}
$$

Please note that the above expression is basically the objective function of the discriminator.

$$
\mathbb{E}_{x \sim p_{\text{data}}(x)} \log{D(x|\theta_{D})} + \mathbb{E}_{x \sim p_{G}(x)} \log{(1 - D(x|\theta_{D}))}
$$

It is clear from how the game has been set up that we are trying to obtain a solution $\theta_{D}$ for $D$ such that it maximizes $V(D, G)$ while simultaneously we are trying to obtain a solution $\theta_{G}$ for $G$ such that it minimizes $V(D, G)$.

We do not simultaneously train $D$ and $G$. We train them alternately: Train $D$ and then train $G$ while freezing $D$. We repeat this for a fixed number of steps.

If the synthetic samples taken from the generator $G$ are realistic then implicitly we have learnt the distribution $P_{G}$. In other words, $P_{G}$ can be seen as a good estimation of $P_{\text{data}}$. The optimal solution will be as follows.

$$
P_{G}=P_{\text{data}}
$$

To show this let us find the optimal discriminator $D^\ast$ given a generator $G$ and sample $x$. 

\begin{align*}
V(D, G) &= \mathbb{E}_{x \sim p_{\text{data}}(x)} \log{D(x|\theta_{D})} + \mathbb{E}_{x \sim p_{G}(x)} \log{(1 - D(x|\theta_{D}))} \\
&= \int_{x} p_{\text{data}}(x) \log{D(x|\theta_{D})} dx + \int_{x} p_{G}(x) \log{(1 - D(x|\theta_{D}))} dx \\
&= \int_{x} \underbrace{p_{\text{data}}(x) \log{D(x|\theta_{D})} + p_{G}(x) \log{(1 - D(x|\theta_{D}))}}_{J(D(x|\theta_{D}))} dx
\end{align*}

Let us take a closer look at the discriminator's objective function for a sample $x$.

\begin{align*}
J(D(x|\theta_{D})) &= p_{\text{data}}(x) \log{D(x|\theta_{D})} + p_{G}(x) \log{(1 - D(x|\theta_{D}))} \\
\frac{\partial J(D(x|\theta_{D}))}{\partial D(x|\theta_{D})} &= \frac{p_{\text{data}}(x)}{D(x|\theta_{D})} - \frac{p_{G}(x)}{(1 - D(x|\theta_{D}))} \\
0 &= \frac{p_{\text{data}}(x)}{D^\ast(x|\theta_{D^\ast})} - \frac{p_{G}(x)}{(1 - D^\ast(x|\theta_{D^\ast}))} \\
p_{\text{data}}(x)(1 - D^\ast(x|\theta_{D^\ast})) &= p_{G}(x)D^\ast(x|\theta_{D^\ast}) \\
p_{\text{data}}(x) - p_{\text{data}}(x)D^\ast(x|\theta_{D^\ast})) &= p_{G}(x)D^\ast(x|\theta_{D^\ast}) \\
p_{G}(x)D^\ast(x|\theta_{D^\ast}) + p_{\text{data}}(x)D^\ast(x|\theta_{D^\ast})) &= p_{\text{data}}(x) \\
D^\ast(x|\theta_{D^\ast}) &= \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)} 
\end{align*}

We have found the optimal discriminator given a generator. Let us focus now on the generator's objective function which is essentially to minimize the discriminator's objective function.

\begin{align*}
J(G(x|\theta_{G})) &= \mathbb{E}_{x \sim p_{\text{data}}(x)} \log{D^\ast(x|\theta_{D^\ast})} + \mathbb{E}_{x \sim p_{G}(x)} \log{(1 - D^\ast(x|\theta_{D^\ast}))} \\
&= \mathbb{E}_{x \sim p_{\text{data}}(x)} \log{\bigg( \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}} \bigg) + \mathbb{E}_{x \sim p_{G}(x)} \log{\bigg(1 - \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}\bigg)} \\
&= \mathbb{E}_{x \sim p_{\text{data}}(x)} \log{\bigg( \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}} \bigg) + \mathbb{E}_{x \sim p_{G}(x)} \log{\bigg(\frac{p_{G}(x)}{p_{\text{data}}(x) + p_{G}(x)}\bigg)} \\
&= \int_{x} p_{\text{data}}(x) \log{\bigg( \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}} \bigg) dx + \int_{x} p_{G}(x) \log{\bigg(\frac{p_{G}(x)}{p_{\text{data}}(x) + p_{G}(x)}\bigg)} dx
\end{align*}

We will note the Kullback–Leibler (KL) divergences in the above objective function for the generator.

$$
D_{KL}(P||Q) = \int_{x} p(x) \log\bigg(\frac{p(x)}{q(x)}\bigg) dx
$$

Recall the definition of a $\lambda$ divergence.

$$
D_{\lambda}(P||Q) = \lambda D_{KL}(P||\lambda P + (1 - \lambda) Q) + (1 - \lambda) D_{KL}(Q||\lambda P + (1 - \lambda) Q)
$$

If $\lambda$ takes the value of 0.5 this is then called the Jensen-Shannon (JS) divergence. This divergence is symmetric and non-negative.

$$
D_{JS}(P||Q) = 0.5 D_{KL}\bigg(P\bigg|\bigg|\frac{P + Q}{2}\bigg) + 0.5 D_{KL}\bigg(Q\bigg|\bigg|\frac{P + Q}{2}\bigg)
$$

Keeping this in mind let us take a look again at the objective function of the generator.

\begin{align*}
J(G(x|\theta_{G})) &= \int_{x} p_{\text{data}}(x) \log{\bigg( \frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}} \bigg) dx + \int_{x} p_{G}(x) \log{\bigg(\frac{p_{G}(x)}{p_{\text{data}}(x) + p_{G}(x)}\bigg)} dx \\
&= \int_{x} p_{\text{data}}(x) \log{\bigg(\frac{2}{2}\frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}} \bigg) dx + \int_{x} p_{G}(x) \log{\bigg(\frac{2}{2}\frac{p_{G}(x)}{p_{\text{data}}(x) + p_{G}(x)}\bigg)} dx \\
&= \int_{x} p_{\text{data}}(x) \log{\bigg(\frac{1}{2}\frac{1}{0.5}\frac{p_{\text{data}}(x)}{p_{\text{data}}(x) + p_{G}(x)}} \bigg) dx + \int_{x} p_{G}(x) \log{\bigg(\frac{1}{2}\frac{1}{0.5}\frac{p_{G}(x)}{p_{\text{data}}(x) + p_{G}(x)}\bigg)} dx \\
&= \int_{x} p_{\text{data}}(x) \bigg[ \log(0.5) + \log{\bigg(\frac{p_{\text{data}}(x)}{0.5 (p_{\text{data}}(x) + p_{G}(x))}} \bigg) \bigg] dx \\ &+ \int_{x} p_{G}(x) \bigg[\log(0.5) + \log{\bigg(\frac{p_{G}(x)}{0.5 (p_{\text{data}}(x) + p_{G}(x))}\bigg) \bigg] } dx \\
&= \log\bigg(\frac{1}{4}\bigg) + \int_{x} p_{\text{data}}(x) \bigg[\log{\bigg(\frac{p_{\text{data}}(x)}{0.5 (p_{\text{data}}(x) + p_{G}(x))}} \bigg) \bigg] dx \\ 
&+ \int_{x} p_{G}(x) \bigg[\log{\bigg(\frac{p_{G}(x)}{0.5 (p_{\text{data}}(x) + p_{G}(x))}\bigg) \bigg] } dx \\
&= -\log(4) + D_{KL}\bigg(P_{\text{data}}\bigg|\bigg|\frac{P_{\text{data}} + P_{G}}{2}\bigg) + D_{KL}\bigg(P_{G}\bigg|\bigg|\frac{P_{\text{data}} + P_{G}}{2}\bigg) \\
&= -\log(4) + 2 \bigg(0.5 D_{KL}\bigg(P_{\text{data}}\bigg|\bigg|\frac{P_{\text{data}} + P_{G}}{2}\bigg) + 0.5 D_{KL}\bigg(P_{G}\bigg|\bigg|\frac{P_{\text{data}} + P_{G}}{2}\bigg)\bigg) \\
&= -\log(4) + 2D_{JS}(P_{\text{data}}||P_{G}) 
\end{align*}

It is clear from the objective function of the generator above that the global minimum value attained is $-\log(4)$ which occurs when the following holds.

$$
P_{G}=P_{\text{data}}
$$

When the above holds the Jensen-Shannon divergence, i.e., $D_{JS}(P_{\text{data}}||P_{G})$, will be zero. Hence we have shown that the optimal solution is as follows.

$$
P_{G}=P_{\text{data}}
$$

### Game: Convergence

---

Assuming that the discriminator is allowed to reach its optimum given a generator, then $P_{G}$ can be shown to converge to $P_{\text{data}}$. 

Consider the following objective function which has been previously shown to be convex with respect to $P_{G}$ as we found the global minimum at $-\log(4)$.

$$
U(D^\ast, P_{G}) = \mathbb{E}_{x \sim p_{\text{data}}(x)} \log{D^\ast(x|\theta_{D^\ast})} + \mathbb{E}_{x \sim p_{G}(x)} \log{(1 - D^\ast(x|\theta_{D^\ast}))}
$$

Gradient descent is used by the generator to move towards the global minimum given an optimal discriminator. We will show that the gradient of the generator exists given an optimal discriminator, i.e., $\nabla_{P_{G}} U(D^\ast, P_{G})$, such that convergence of $P_{G}$ to $P_{\text{data}}$ is guaranteed.

Note that the following is a supremum of a set of convex functions where the set is indexed by the discriminator $D$: $U(D^\ast, P_{G})=\sup_{D} U(D, P_{G})$. Remember that the supremum is the least upper bound.

Let us recall a few definitions regarding gradients and subgradients. A vector $g \in \mathbb{R}^K$ is a subgradient of a function $f: \mathbb{R}^K \rightarrow \mathbb{R}$ at a point $x \in \mathbb{dom}(f)$ if $\forall z \in \mathbb{dom}(f)$, the following relationship holds:

$$
f(z) \geq f(x) + g^{T}(z - x)
$$

If $f$ is convex and differentiable then its gradient at a point $x$ is also the subgradient. Most importantly, a subgradient can exist even if $f$ is not differentiable.

The subgradients of the supremum of a set of convex functions include the subgradient of the function at the point where the supremum is attained. As mentioned earlier, we have already shown that $U(D^\ast, P_{G})$ is convex.

\begin{align*}
&U(D^\ast, P_{G})=\sup_{D} U(D, P_{G}) \\
&\nabla_{P_{G}} \sup_{D} U(D, P_{G}) \in \nabla_{P_{G}} U(D, P_{G}) \\
&\nabla_{P_{G}} U(D^\ast, P_{G}) \in \nabla_{P_{G}} U(D, P_{G})
\end{align*}

The gradient of the generator, $\nabla_{P_{G}} U(D^\ast, P_{G})$, is used to make incremental improvements to the objective function of the generator, $U(D^\ast, P_{G})$, given an optimal discriminator, $D^\ast$. Therefore convergence of $P_{G}$ to $P_{\text{data}}$ is guaranteed.

In [None]:
class GenerativeAdversarialNetworkDiscriminator(tf.keras.Model):
    def __init__(self):
        super().__init__()

        args, _, _, values = inspect.getargvalues(frame=inspect.currentframe())
        values.pop("self")

        for arg, val in values.items():
            setattr(self, arg, val)

        self.feature_extractor = tf.keras.Sequential(
            layers=[
                tf.keras.layers.Conv2D(
                    filters=64,
                    kernel_size=(5, 5),
                    padding="same",
                    strides=(2, 2),
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                    activation="tanh",
                ),
                tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
                tf.keras.layers.Conv2D(
                    filters=64,
                    kernel_size=(5, 5),
                    padding="same",
                    strides=(2, 2),
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                    activation="tanh",
                ),
                tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
                tf.keras.layers.Flatten(data_format="channels_last"),
                tf.keras.layers.Dense(
                    units=1024,
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                    activation="tanh",
                ),
            ]
        )

        self.discriminator = tf.keras.Sequential(
            layers=[
                tf.keras.layers.Dense(
                    units=1,
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                    activation="linear",
                ),
            ]
        )

    def call(self, x):
        encoding = self.feature_extractor(x)
        return self.discriminator(encoding)

    def encoder(self, x):
        return self.feature_extractor(x)


class GenerativeAdversarialNetworkGenerator(tf.keras.Model):
    def __init__(self):
        super().__init__()

        args, _, _, values = inspect.getargvalues(frame=inspect.currentframe())
        values.pop("self")

        for arg, val in values.items():
            setattr(self, arg, val)

        self.generator = tf.keras.Sequential(
            layers=[
                tf.keras.layers.Dense(
                    units=1024,
                    activation="tanh",
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                ),
                tf.keras.layers.Dense(
                    units=128 * 7 * 7,
                    activation="tanh",
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                ),
                tf.keras.layers.BatchNormalization(),
                tf.keras.layers.Reshape(target_shape=(7, 7, 128)),
                tf.keras.layers.Conv2DTranspose(
                    filters=64,
                    kernel_size=(5, 5),
                    strides=(2, 2),
                    padding="same",
                    activation="tanh",
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                ),
                tf.keras.layers.Conv2D(
                    filters=64,
                    kernel_size=(5, 5),
                    strides=(1, 1),
                    padding="same",
                    activation="tanh",
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                ),
                tf.keras.layers.Conv2DTranspose(
                    filters=64,
                    kernel_size=(5, 5),
                    strides=(2, 2),
                    padding="same",
                    activation="tanh",
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                ),
                tf.keras.layers.Conv2D(
                    filters=1,
                    kernel_size=(5, 5),
                    strides=(1, 1),
                    padding="same",
                    activation="sigmoid",
                    kernel_initializer="glorot_normal",
                    kernel_regularizer=tf.keras.regularizers.l2(1e-8),
                ),
            ]
        )

    def call(self, z):
        return self.generator(z)

In [None]:
num_epochs = 50
batch_size = 100
z_size = 2
generator = GenerativeAdversarialNetworkGenerator()
discriminator = GenerativeAdversarialNetworkDiscriminator()

train_ds = (
    tf.data.Dataset.from_tensor_slices(
        tensors=(
            x_train.reshape(
                x_train.shape[0],
                x_train.shape[1],
                x_train.shape[2],
                1,
            )
        )
    )
    .shuffle(buffer_size=10000, reshuffle_each_iteration=True)
    .batch(batch_size=batch_size)
)
test_ds = tf.data.Dataset.from_tensor_slices(
    tensors=(
        x_test.reshape(
            x_test.shape[0],
            x_test.shape[1],
            x_test.shape[2],
            1,
        )
    )
).batch(batch_size=batch_size)

reconstruction_loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer_disc = tf.keras.optimizers.Adam(
    learning_rate=1e-4, amsgrad=True, clipvalue=1.0
)
optimizer_gen = tf.keras.optimizers.Adam(
    learning_rate=1e-4, amsgrad=True, clipvalue=1.0
)
train_disc_loss = tf.keras.metrics.Mean(name="train_disc_loss")
train_gen_loss = tf.keras.metrics.Mean(name="train_gen_loss")
test_disc_loss = tf.keras.metrics.Mean(name="test_disc_loss")
test_gen_loss = tf.keras.metrics.Mean(name="test_gen_loss")


@tf.function
def train_step(x, truth_disc, z, truth_gen):
    with tf.GradientTape() as tape:
        disc_preds = discriminator(x=x, training=True)
        disc_loss = reconstruction_loss(y_true=truth_disc, y_pred=disc_preds)
    gradients_disc = tape.gradient(disc_loss, discriminator.trainable_variables)

    optimizer_disc.apply_gradients(
        grads_and_vars=zip(
            gradients_disc,
            discriminator.trainable_variables,
        )
    )
    train_disc_loss(disc_loss)

    with tf.GradientTape() as tape:
        gen_preds = generator(z=z, training=True)
        gen_preds = discriminator(x=gen_preds, training=True)
        gen_loss = reconstruction_loss(y_true=truth_gen, y_pred=gen_preds)
    gradients_gen = tape.gradient(gen_loss, generator.trainable_variables)
    optimizer_gen.apply_gradients(
        grads_and_vars=zip(
            gradients_gen,
            generator.trainable_variables,
        )
    )
    train_gen_loss(gen_loss)


@tf.function
def test_step(x, truth_disc, z, truth_gen):
    disc_preds = discriminator(x=x, training=False)
    disc_loss = reconstruction_loss(y_true=truth_disc, y_pred=disc_preds)
    test_disc_loss(disc_loss)

    gen_preds = generator(z=z, training=False)
    gen_preds = discriminator(x=gen_preds, training=False)
    gen_loss = reconstruction_loss(y_true=truth_gen, y_pred=gen_preds)
    test_gen_loss(gen_loss)


for epoch in range(num_epochs):
    train_disc_loss.reset_states()
    train_gen_loss.reset_states()
    test_disc_loss.reset_states()
    test_gen_loss.reset_states()

    for real_data in train_ds:
        train_step(
            x=np.concatenate(
                (
                    real_data,
                    generator(
                        z=np.random.uniform(
                            low=-1.0, high=1.0, size=(batch_size, z_size)
                        ),
                        training=True,
                    ),
                ),
                axis=0,
            ),
            truth_disc=np.concatenate(
                (np.ones(shape=(batch_size, 1)), np.zeros(shape=(batch_size, 1))),
                axis=0,
            )
            + (0.05 * np.random.random(size=(batch_size * 2, 1))),
            z=np.random.uniform(low=-1.0, high=1.0, size=(batch_size, z_size)),
            truth_gen=np.ones(shape=(batch_size, 1)),
        )

    for real_data in test_ds:
        test_step(
            x=np.concatenate(
                (
                    real_data,
                    generator(
                        z=np.random.uniform(
                            low=-1.0, high=1.0, size=(batch_size, z_size)
                        ),
                        training=False,
                    ),
                ),
                axis=0,
            ),
            truth_disc=np.concatenate(
                (np.ones(shape=(batch_size, 1)), np.zeros(shape=(batch_size, 1))),
                axis=0,
            )
            + (0.05 * np.random.random(size=(batch_size * 2, 1))),
            z=np.random.uniform(low=-1.0, high=1.0, size=(batch_size, z_size)),
            truth_gen=np.ones(shape=(batch_size, 1)),
        )

    img = tf.keras.preprocessing.image.array_to_img(
        x=generator(
            z=np.random.uniform(low=-1.0, high=1.0, size=(1, z_size)),
            training=False,
        )[0, :, :, :]
        * 255.0,
        scale=False,
    )
    img.save(fp="/home/hamaad/Projects/gan_tensorflow/generated_image.png")
    pil_im = Image.open("/home/hamaad/Projects/gan_tensorflow/generated_image.png", "r")
    plt.imshow(X=np.asarray(a=pil_im), cmap="Greys_r")
    plt.show()
    print("Epoch:", epoch + 1)
    print("Train discriminator loss:", train_disc_loss.result())
    print("Train generator loss:", train_gen_loss.result())
    print("Test discriminator loss:", test_disc_loss.result())
    print("Test generator loss:", test_gen_loss.result())

In [None]:
pipe_dcgan = Pipeline(
    steps=[
        ("scaler", MinMaxScaler(feature_range=(0.0, 1.0))),
        (
            "classifier",
            linear_model.LogisticRegression(max_iter=10000, random_state=666),
        ),
    ]
)

with tf.device(device_name="/CPU:0"):
    pipe_dcgan.fit(
        X=discriminator.encoder(
            sampled_x_train.reshape(
                sampled_x_train.shape[0],
                sampled_x_train.shape[1],
                sampled_x_train.shape[2],
                1,
            )
        ),
        y=sampled_y_train,
    )
    acc_dcgan = pipe_dcgan.score(
        X=discriminator.encoder(
            x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
        ),
        y=y_test,
    )

print(
    "The accuracy score for the MNIST classification task with DCGAN: %.6f%%."
    % (acc_dcgan * 100)
)

In [None]:
n = 50
data_size = 28
figure = np.zeros(shape=(data_size * n, data_size * n))
grid_x = np.linspace(start=-1.0, stop=1.0, num=n)
grid_y = np.linspace(start=-1.0, stop=1.0, num=n)

with tf.device(device_name="/CPU:0"):
    for i, xi in enumerate(grid_x):
        for j, yi in enumerate(grid_y):
            figure[
                i * data_size : (i + 1) * data_size,
                j * data_size : (j + 1) * data_size,
            ] = (
                generator(np.array(object=[[xi, yi]]))
                .numpy()
                .reshape(data_size, data_size)
            )

plt.figure(figsize=(20, 20))
plt.imshow(X=figure, cmap="Greys_r")
plt.title(
    label="Deep Convolutional Generative Adversarial Network (DCGAN) with a 2-dimensional latent manifold\nGenerating new images on the 2-dimensional latent manifold",
    fontsize=20,
)
plt.xlabel(xlabel="Latent dimension 1", fontsize=24)
plt.ylabel(ylabel="Latent dimension 2", fontsize=24)
plt.savefig(fname="/home/hamaad/Projects/gan_tensorflow/DCGAN_Generated_Images.png")
plt.show()

### Results

---

In these experiments we show the ability of the generator to create realistic synthetic examples for the MNIST dataset. 

Finally we show that using the representation learnt by the discriminator we can attain competitive results to using other representation learning methods for the MNIST dataset such as a wide variety of autoencoders.

### Results: Generating new data

---

![](DCGAN_Generated_Images.png)


### Results: GAN for representation learning

---

* The accuracy score for the MNIST classification task with DCGAN: 94.01%.


## Conclusion

---

We have shown how to use GANs to learn a good representation of raw data, i.e., 1 or 2 dimensional tensors per unit of analysis, that can then perhaps be used for supervised learning tasks in the domain of computer vision. This moves us away from manual handcrafted feature engineering towards automatic feature engineering, i.e., representation learning.

## References

---

1. Goodfellow, I., Bengio, Y. and Courville A. (2016). Deep Learning (MIT Press).
2. Geron, A. (2017). Hands-On Machine Learning with Scikit-Learn & Tensorflow (O'Reilly).
3. Radford, A., Luke, M. and Chintala, S. (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (https://arxiv.org/abs/1511.06434).
4. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative Adversarial Networks (https://arxiv.org/abs/1406.2661).
5. http://scikit-learn.org/stable/#
6. https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1
7. https://stackoverflow.com/questions/42177658/how-to-switch-backend-with-keras-from-tensorflow-to-theano
8. https://blog.keras.io/building-autoencoders-in-keras.html
9. https://keras.io
10. https://github.com/fchollet/keras/blob/master/examples/mnist_acgan.py#L24
11. https://en.wikipedia.org/wiki/Kullback–Leibler_divergence
12. https://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf