# Simplest BEGAN
Simple example using vanilla Neural networks. On Began the main difference is that the discriminator is made with an AutoEncoder which aims to perform well on real samples and poorly on generated samples, while the generator aims to produce adversarial samples which the discriminator cannot differentiation from real images.

On this example we will only see normal fully connected layers, refer to the examples for more serious stuff.

### Advantages
* SOTA for face generation (2017), generating faces up to 128x128
* Don't need batchnorm or dropout to stabiize training
* Automatically balance image diversity and quality (More quality means more collapse)
* Gives a convergence measure (Loss on DCGANS are normally meanigless)

### References
* [Blog](https://blog.heuritech.com/2017/04/11/began-state-of-the-art-generation-of-faces-with-generative-adversarial-networks/)
* [Paper](https://arxiv.org/pdf/1703.10717.pdf)
* [Code I basically did copy/paste](https://github.com/wiseodd/generative-models/tree/master/GAN/boundary_equilibrium_gan)
* [Implementation with faces on Tensorflow](https://github.com/artcg/BEGAN)
* [Implementation with faces on Pytorch](https://github.com/sunshineatnoon/Paper-Implementations/tree/master/BEGAN)
* [Wasserstein metric](https://en.wikipedia.org/wiki/Wasserstein_metric)
* [Pix2Pix with Began](https://github.com/taey16/pix2pixBEGAN.pytorch)

### Datasets to play
* [CelebA dataset](https://drive.google.com/open?id=0B7EVK8r0v71pZjFTYXZWM3FlRnM)
* [Pedestrians dataset](http://mmlab.ie.cuhk.edu.hk/projects/luoWTiccv2013DDN/index.html)

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

# Include modules from other directories
import sys
sys.path.append('../tensorflow/')
import model_util as util
import models
import anim_util as anim

import os
os.environ["CUDA_VISIBLE_DEVICES"] = str(0)


mb_size = 32
X_dim = 784
z_dim = 64
h_dim = 128
lr = 1e-3
m = 5
lam = 1e-3
diversity_ratio = 0.5
k_curr = 0

mnist = input_data.read_data_sets('../../MNIST_data', one_hot=True)

Extracting ../../MNIST_data/train-images-idx3-ubyte.gz
Extracting ../../MNIST_data/train-labels-idx1-ubyte.gz
Extracting ../../MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ../../MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
def plot(samples):
    fig = plt.figure(figsize=(4, 4))
    gs = gridspec.GridSpec(4, 4)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')

    return fig


def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=xavier_stddev)

def sample_z(m, n):
    return np.random.uniform(-1., 1., size=[m, n])

### Define Model Inputs

In [3]:
# Create model inputs
X = tf.placeholder(tf.float32, shape=[None, X_dim])
z = tf.placeholder(tf.float32, shape=[None, z_dim])
k = tf.placeholder(tf.float32)

### Define BEGAN Model
As mentioned before with the BEGAN models the discriminator is composed by an AutoEncoder which has it's own loss "Dist - Reconstruction loss", this is not the discriminator loss.
![alt text](began_arch.png "Began Architecture")

### Main Idea
Matching the distributions of the reconstruction losses (For generated and real images) can be a suitable proxy for matching the data distributions.

The real loss is then derived from the Wasserstein distance between the reconstruction losses of real and generated data.

### Gamma Parameter
BEGAN has a hyperparameter that is used to balance generation quality vs diversity. More quality means mode mode collpapse.

In [4]:
def G(z):
    with tf.variable_scope("G"):
        G_h1 = tf.nn.relu(util.linear_std(z, h_dim, 'g1', stddev=0.002))
        #G_h1 = tf.nn.relu(tf.matmul(z, G_W1) + G_b1)
        #G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
        G_log_prob = util.linear_std(G_h1, X_dim, 'g2', stddev=0.002)
        G_prob = tf.nn.sigmoid(G_log_prob)    
        return G_prob


def D(X, reuse=None):
    with tf.variable_scope("D") as scope:
        if reuse == True:
            scope.reuse_variables()
        # Autoencoder
        #D_h1 = tf.nn.relu(tf.matmul(X, D_W1) + D_b1)
        D_h1 = tf.nn.relu(util.linear_std(X, h_dim, 'g1', stddev=0.002))
        #X_recon = tf.matmul(D_h1, D_W2) + D_b2
        X_recon = util.linear_std(D_h1, X_dim, 'g2', stddev=0.002)

        # Reconstruction Loss
        return tf.reduce_mean(tf.reduce_sum((X - X_recon)**2, 1))


# Generator
G_sample = G(z)

# Reconstruction Loss (Autoencoder loss)
D_real = D(X)
D_fake = D(G_sample, True)

### Define loss

In [5]:
# Get adversarial loss
D_loss = D_real - k*D_fake
G_loss = D_fake

### Get trainable parameters for Generator and Discriminator

In [6]:
vars = tf.trainable_variables()
theta_D = [v for v in vars if v.name.startswith('D/')]
theta_G = [v for v in vars if v.name.startswith('G/')]

### Define optimizers
Notice that we need only 2 optimizers to do the job.

In [7]:
D_solver = (tf.train.AdamOptimizer(learning_rate=lr)
            .minimize(D_loss, var_list=theta_D))
G_solver = (tf.train.AdamOptimizer(learning_rate=lr)
            .minimize(G_loss, var_list=theta_G))

### Create session

In [8]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

if not os.path.exists('out/'):
    os.makedirs('out/')


### Train

In [9]:
for it in range(100000):
    # Get batch of images
    X_mb, _ = mnist.train.next_batch(mb_size)

    # Optimize Discriminator
    _, D_real_curr = sess.run(
        [D_solver, D_real],
        feed_dict={X: X_mb, z: sample_z(mb_size, z_dim), k: k_curr}
    )

    # Optimize Generator
    _, D_fake_curr = sess.run(
        [G_solver, D_fake],
        feed_dict={X: X_mb, z: sample_z(mb_size, z_dim)}
    )

    # Adaptive term to balance 
    k_curr = k_curr + lam * (diversity_ratio*D_real_curr - D_fake_curr)

    if it % 1000 == 0:
        measure = D_real_curr + np.abs(diversity_ratio*D_real_curr - D_fake_curr)

        print('Iter-{}; Convergence measure: {:.4}'.format(it//1000, measure))

        samples = sess.run(G_sample, feed_dict={z: sample_z(16, z_dim)})

        fig = plot(samples)
        plt.savefig('out/{}.png'.format(str(it//1000).zfill(3)), bbox_inches='tight')        
        plt.close(fig)

Iter-0; Convergence measure: 240.0
Iter-1; Convergence measure: 18.42
Iter-2; Convergence measure: 17.01
Iter-3; Convergence measure: 13.67
Iter-4; Convergence measure: 15.91
Iter-5; Convergence measure: 17.01
Iter-6; Convergence measure: 18.05
Iter-7; Convergence measure: 18.96
Iter-8; Convergence measure: 15.04
Iter-9; Convergence measure: 15.02
Iter-10; Convergence measure: 16.53
Iter-11; Convergence measure: 15.25
Iter-12; Convergence measure: 15.61
Iter-13; Convergence measure: 17.62
Iter-14; Convergence measure: 14.63
Iter-15; Convergence measure: 15.47
Iter-16; Convergence measure: 14.8
Iter-17; Convergence measure: 14.77
Iter-18; Convergence measure: 18.14
Iter-19; Convergence measure: 14.61
Iter-20; Convergence measure: 14.81
Iter-21; Convergence measure: 15.4
Iter-22; Convergence measure: 17.67
Iter-23; Convergence measure: 14.87
Iter-24; Convergence measure: 14.61
Iter-25; Convergence measure: 15.07
Iter-26; Convergence measure: 14.66
Iter-27; Convergence measure: 15.47
Iter