# Generate synthetic faces with specific attributes

The aim of this blog post is to...no one has done this ...not tutorial on gans...the complete code can be found here, here is just a summary
For this purpose, we are going to use Generative Adversarial Networks (GANs). They are generative neural network models which aim to produce images that look like real data.

## Data

We will use the CelebA data set (link here) which is composed of roughly 200k images. We will use 180k of these for training our generative model and 20k for evaluate the model accordind to the FID score (lin here).
The training images has been pre-processed such that have a fixed size of $64 \times 64$ and cropped in such a way to have the faces centered.

## Special layers

GANs are intrinsically difficult to train due their tendency to diverge. So, before diving into the code, I am going to super briefly introduce three types of layers I added in the GAN to obtain better results:
- **Pixel normalization**: is a normalization technique whose aim is to control the magnitude of the activations of the generator. It normalize the feature vector in each pixel to unit vector after each convolutional layer in the generator. This layer helps to obtain better images.
- **Mini-Batch Standard Deviation**: this layer is used to increase the diversity of the generated images.
- **Spectral Normalization**: is a weight normalization that stabilizes the training of the discriminator in order to limitate the exploding gradient problem and the mode collapse problem. This layer helps the GAN to converge and have a smoother loss.

## ACGAN

Generative Adversarial Networks (GANs) are a deep learning architecture for training powerful generator models. A generator model is capable of generating new artificial samples that plausibly could have come from an existing distribution of samples. GANs are comprised of both generator and discriminator models. The generator is responsible for generating new samples from the domain (real faces in our case), and the discriminator is responsible for classifying whether samples are real or fake. Importantly, the performance of the discriminator model is used to update both the model weights of the discriminator itself and the generator model. This means that the generator never actually sees examples from the domain and is adapted based on how well the discriminator performs.

However we need a way to condition the generation of the generated images. For this purpose, we are gointo to make use of the Auxiliary Classifier GAN (ACGAN) which is an extension of the GAN architecture that are able to achieve our task.
The ACGAN's discriminator is provided with the image as input and it must predict whether the given image is real or fake and must also predict the attributes vector of the image.

Below I am going to show the code for the definition of the network of the `discriminator`, `generator` and the `acgan` combined model.

### Discriminator

In [1]:
def discriminator(n_attr, filters = 128, kernel_size = 3, in_shape=(64,64,3)):

    in_img = Input(shape=(in_shape))
    
    # 64 x 64 x FILTERS
    fe = SNConv2D(filters=filters, kernel_size=kernel_size, strides=(2,2), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(in_img)
    fe = LeakyReLU(alpha=0.1)(fe)
    fe = MinibatchStdev()(fe)
    
    # 32 x 32 x FILTERS
    fe = SNConv2D(filters=filters * 2, kernel_size=kernel_size, strides=(2,2), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(fe)
    fe = LeakyReLU(alpha=0.1)(fe)
   
    # 16 x 16 x FILTERS
    fe = SNConv2D(filters=filters * 4, kernel_size=kernel_size, strides=(2,2), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(fe)
    fe = LeakyReLU(alpha=0.1)(fe)
   
    # 8 x 8 x FILTERS
    fe = SNConv2D(filters=filters * 8, kernel_size=kernel_size, strides=(2,2), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(fe)
    fe = LeakyReLU(alpha=0.1)(fe)
    
    # current: 4 x 4 x FILTERS
    fe = GlobalAveragePooling2D()(fe)
    
    # output about fake/real image:
    out1 = Dense(1, activation='sigmoid')(fe)
    
    # output about attributes
    out2 = Dense(n_attr, activation='sigmoid')(fe)
    
    # define model
    model = Model(in_img, [out1, out2])
    
    # compile model
    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss=["binary_crossentropy", "binary_crossentropy"], optimizer=opt, metrics=['accuracy'])
    
    return model

### Discriminator

In [2]:
def generator(latent_dim, n_attr, filters=128, kernel_size=4):

    in_gen = Input(shape=(latent_dim + n_attr,))
    gen = Dense(4 * 4 * filters * 8)(in_gen)
    gen = LeakyReLU(alpha=0.2)(gen)
    gen = Reshape((4, 4, filters * 8))(gen)
    gen = PixelNormalization()(gen)
    
    # 4x4 -> 8x8
    gen = UpSampling2D()(gen)
    gen = SNConv2D(filters=filters * 4, kernel_size=kernel_size, strides=(1,1), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(gen)
    gen = PixelNormalization()(gen)
    gen = LeakyReLU(alpha=0.2)(gen)
    
    # 8x8 -> 16x16
    gen = UpSampling2D()(gen)
    gen = SNConv2D(filters=filters * 2, kernel_size=kernel_size, strides=(1,1), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(gen)
    gen = PixelNormalization()(gen)
    gen = LeakyReLU(alpha=0.2)(gen)
    
    # 16x16 -> 32x32
    gen = UpSampling2D()(gen)
    gen = SNConv2D(filters=filters, kernel_size=kernel_size, strides=(1,1), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(gen)
    gen = PixelNormalization()(gen)
    gen = LeakyReLU(alpha=0.2)(gen)
    
    # 32x32 -> 64x64
    gen = UpSampling2D()(gen)
    gen = SNConv2D(filters=filters // 2, kernel_size=kernel_size, strides=(1,1), padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(gen)
    gen = PixelNormalization()(gen)
    gen = LeakyReLU(alpha=0.2)(gen)
    
    # 64 x 64 x FILTERS -> 64 x 64 x 3
    image = SNConv2D(filters=3, kernel_size=4, strides=(1, 1), activation='tanh', padding='same', kernel_initializer="orthogonal", spectral_normalization=True)(gen)
    
    # define model
    model = Model(in_gen, image)
    return model

### ACGAN combined model

In [None]:
# define the combined generator and discriminator model, for updating the generator
def acgan(g_model, d_model):
    img = g_model.output
    d_model.trainable = False
    valid, target_label = d_model(img)
    
    noise_plus_label = g_model.input
    model = Model(noise_plus_label, [valid, target_label])

    opt = Adam(lr=0.0002, beta_1=0.5)
    model.compile(loss=["binary_crossentropy", "binary_crossentropy"], optimizer=opt)
    return model

The function to train the model is the onw below.

## Training

In [7]:
def train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs, batch_size):
    bat_per_epo = int(dataset[0].shape[0] / batch_size)
    half_batch = int(batch_size / 2)
    for i in range(n_epochs):
        for j in range(bat_per_epo):
            [X_real, labels_real], y_real = generate_real_samples(dataset, half_batch)   
            d_metrics1 = d_model.train_on_batch(X_real, [y_real, labels_real])
            d_loss1 = d_metrics1[0]
            [X_fake, labels], y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
            d_metrics2 = d_model.train_on_batch(X_fake, [y_fake, labels])
            d_loss2 = d_metrics2[0]
            [z_input, z_labels] = generate_latent_points(latent_dim, batch_size, n_attr, n_classes=2)
            y_gan = np.ones((batch_size, 1))
            concat = tf.concat([z_input, z_labels], axis=-1)
            g_metrics = gan_model.train_on_batch(concat, [y_gan, z_labels])
            g_loss = g_metrics[0]
            
        # summarize loss on this batch
        print('>%d, %d/%d, d1=%.3f, d2=%.3f g=%.3f' % (i+1, j+1, bat_per_epo, d_loss1, d_loss2, g_loss))
        
        # save the generator model
        g_model.save('cgan_generator' +str(i+j)+'.h5')

We are going to use 180k training images as we said before and we are going to use all the CelebA attributes which are 40. I used 16 as batch size becuase this was the largest value I could fit on my GPU, 128 as latent space and I trained for 100 epochs.

In [13]:
# number of images to use for training
how_many = 180000

# size of the latent space
latent_dim = 128

# number of images per batch
batch_size = 16

# number of epochs
n_epochs = 100

# number of attributes per image
n_attr = 40

# just 2 possible value for each attribute
n_classes=2

In [14]:
# create the discriminator
d_model = discriminator(n_attr)

# create the generator
g_model = generator(latent_dim, n_attr)

# create the gan combined model
gan_model = acgan(g_model, d_model)

# load dataset
dataset = load_dataset(how_many)

# train model
train(g_model, d_model, gan_model, dataset, latent_dim, n_epochs, batch_size);

## Results

Let's see some results aftr roughly 48 hours of training.

<img src="1.png">