# Problem 1

Recall that for the original GAN, the discriminator $f_\phi(x)$ is trained with the following optimization:

$$\max_\phi p(x) \log [f_\phi(x)] + q_\theta(x) \log [1 - f_\phi(x)]$$

where $p(x)$ is the true data distribution, $q_\theta(x)$ is the generator's learned distribution, and $f_\phi(x) \in [0, 1]$.

1. Assuming $q_\theta(x)$ is fixed, what is the solution to this optimization problem? "Solution" here means the function $f_\phi(x)$ that maximizes the above objective. Your expression should be in terms of $p(x)$ and $q_\theta(x)$.
1. Assume that $q_\theta(x) = p(x)$, i.e. the generator has learned to fit the data distribution perfectly. What is the optimal discriminator in this case?

sdfgsdfgsdfg

# Problem 2

Consider the following min-max optimization problem: $\min_x \max_y xy$.

1. Is there a stationary point to the function $xy$ (i.e. a point where $\frac{\partial xy}{\partial x} = 0$ and $\frac{\partial xy}{\partial y} = 0$)? If so, what is it?

1. Consider tackling this optimization problem with alternating gradient descent (i.e. alternatingly minimizing with respect to $x$ and maximizing with respect to $y$ using gradient descent). Assume that the learning rate is the same for both steps and that optimization begins from the point $x = 1, y = 1$. Will optimization reach a stationary point? Feel free to support your answer theoretically and/or empirically (i.e. by implementing it in code).

# Problem 3

The code below is taken from [the example in the textbook](http://d2l.ai/chapter_generative-adversarial-networks/gan.html). I've modified the example so that the true data distribution is a mixture of two Gaussians rather than a single Gaussian. Feel free to use the Pytorch code in this problem instead, but make sure you keep my modification to the data distribution.

The code below uses the original GAN loss. A somewhat popular alternative loss function is the ["Least-Squares GAN (LS-GAN)"](https://arxiv.org/pdf/1611.04076.pdf), which is defined as follows:


Modify the code so that it uses the LS-GAN objective instead of the original GAN objective. Does it converge to a different solution?

## <font color=red> Need to change the below code to pytorch ...

In [1]:
import numpy as np
import matplotlib
%matplotlib inline
from d2l import torch as d2l

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# Do not change this part! It is different from the example in the textbook.
X = np.random.normal(0.0, 1, (1000, 2))
A = np.array([[1, 2], [-0.1, 0.5]])
b = np.array([1, 2])
data = np.concatenate([np.dot(X, A) + b, np.random.normal(0.0, 0.5, (100, 2)) + np.array([0, 7])], 0)
data = torch.tensor(data)
np.random.shuffle(data)

ModuleNotFoundError: No module named 'd2l'

In [11]:
batch_size = 8
data_iter = d2l.load_array((data,), batch_size)

## generator
net_G = nn.Sequential(nn.Linear(2, 2))

## discriminator
net_D = nn.Sequential(nn.Linear(2, 5), nn.Tanh(), nn.Linear(5, 3), nn.Tanh(), nn.Linear(3, 1))

In [12]:
def update_D(X, Z, net_D, net_G, loss, trainer_D):
    """Update discriminator."""
    batch_size = X.shape[0]
    ones = torch.ones((batch_size,), device=X.device)
    zeros = torch.zeros((batch_size,), device=X.device)
    trainer_D.zero_grad()
    real_Y = net_D(X)
    fake_X = net_G(Z)
    # Do not need to compute gradient for `net_G`, detach it from
    # computing gradients.
    fake_Y = net_D(fake_X.detach())
    loss_D = (loss(real_Y, ones.reshape(real_Y.shape)) +
              loss(fake_Y, zeros.reshape(fake_Y.shape))) / 2
    loss_D.backward()
    trainer_D.step()
    return loss_D

In [4]:
# batch_size = 8
# data_iter = d2l.load_array((data,), batch_size)

# net_G = nn.Sequential()
# net_G.add(nn.Dense(2))

# net_D = nn.Sequential()
# net_D.add(nn.Dense(5, activation='tanh'), nn.Dense(3, activation='tanh'),
#           nn.Dense(1))

# def update_D(X, Z, net_D, net_G, loss, trainer_D):
#     """Update discriminator."""
#     batch_size = X.shape[0]
#     ones = np.ones((batch_size,), ctx=X.ctx)
#     zeros = np.zeros((batch_size,), ctx=X.ctx)
#     with autograd.record():
#         real_Y = net_D(X)
#         fake_X = net_G(Z)
#         # Do not need to compute gradient for `net_G`, detach it from
#         # computing gradients.
#         fake_Y = net_D(fake_X.detach())
#         loss_D = (loss(real_Y, ones) + loss(fake_Y, zeros)) / 2
#     loss_D.backward()
#     trainer_D.step(batch_size)
#     return float(loss_D.sum())

def update_G(Z, net_D, net_G, loss, trainer_G):
    """Update generator."""
    batch_size = Z.shape[0]
    ones = np.ones((batch_size,), ctx=Z.ctx)
    with autograd.record():
        # We could reuse `fake_X` from `update_D` to save computation
        fake_X = net_G(Z)
        # Recomputing `fake_Y` is needed since `net_D` is changed
        fake_Y = net_D(fake_X)
        loss_G = loss(fake_Y, ones)
    loss_G.backward()
    trainer_G.step(batch_size)
    return float(loss_G.sum())

def train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G, latent_dim, data):
    loss = gluon.loss.SigmoidBCELoss()
    net_D.initialize(init=init.Normal(0.02), force_reinit=True)
    net_G.initialize(init=init.Normal(0.02), force_reinit=True)
    trainer_D = gluon.Trainer(net_D.collect_params(), 'adam',
                              {'learning_rate': lr_D})
    trainer_G = gluon.Trainer(net_G.collect_params(), 'adam',
                              {'learning_rate': lr_G})
    animator = d2l.Animator(xlabel='epoch', ylabel='loss',
                            xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
                            legend=['discriminator', 'generator'])
    animator.fig.subplots_adjust(hspace=0.3)
    for epoch in range(num_epochs):
        # Train one epoch
        timer = d2l.Timer()
        metric = d2l.Accumulator(3)  # loss_D, loss_G, num_examples
        for X in data_iter:
            batch_size = X.shape[0]
            Z = np.random.normal(0, 1, size=(batch_size, latent_dim))
            metric.add(update_D(X, Z, net_D, net_G, loss, trainer_D),
                       update_G(Z, net_D, net_G, loss, trainer_G), batch_size)
        # Visualize generated examples
        Z = np.random.normal(0, 1, size=(1000, latent_dim))
        fake_X = net_G(Z).asnumpy()
        animator.axes[1].cla()
        animator.axes[1].scatter(data[:, 0], data[:, 1], alpha=0.2)
        animator.axes[1].scatter(fake_X[:, 0], fake_X[:, 1], alpha=0.2)
        animator.axes[1].legend(['real', 'generated'])
        # Show the losses
        loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
        animator.add(epoch + 1, (loss_D, loss_G))
    print(f'loss_D {loss_D:.3f}, loss_G {loss_G:.3f}, '
          f'{metric[2] / timer.stop():.1f} examples/sec')
    
lr_D, lr_G, latent_dim, num_epochs = 0.05, 0.005, 2, 20
train(net_D, net_G, data_iter, num_epochs, lr_D, lr_G, latent_dim, data)

TypeError: 'int' object is not callable