Skip to content

kmkar31/StarGAN-v2

Repository files navigation

StarGAN-v2

Objective:

To train a Single Generator G which generates diverse images of each domain y corresponding to a given image x.

Technical Terms used in the paper:

  1. Diverse Images: Produces different images every time given a source image and a domain. (Disadvantage of StarGAN)
  2. Domain: A set of images forming a visually distinctive category. Ex: Gender of a Person
  3. Style: Unique Appearance of each image. Ex: Hairstyle, Makeup etc.
  4. Latent Space: A space over which the compressed data is represented in order to remove extraneous information and find fundamental similarities between two datapoints. Understanding Latent Space.
Read More

The Network attemps to generate domain-specific style vectors in the learned style space of each domain and train G to reflect these vectors in the output image

Components of the Network

  1. Generator G: Takes an Input image x and a style code s to generate an output image. The style code removes the need of providing the domain of the image to G allowing it to generate images of all domains. s is designed to represent the style of a specific domain y, which removes the necessity of providing y to G and allowing tit to synthesize images of all domains.
  2. Mapping Network F: Takes the Domain (y) and Latent Code z (Gaussian Noise) to generate the style code s (which are domain specific). Diverse style codes can be generated by randomly sampling the latent vector z and the domain y rndomly
  3. Style Encoder E: Takes in an Image x and a domain y to generate the style code s of x.The Style Encoder can produce diffrent style codes using different reference imaages.
  4. Discriminator D: Consists of multiple output branches with each branch Dy classifying whether or not the image is a real image belonging to Domain y.

Network Architecture


Training Objectives

Notation:

  • Original Image - x
  • Original Domain - y
  • Target Domain - ỹ
  • Style Code of the Target Domain predicted by the Mapping Network - š
  • Style Code of the Original Image predicted by the Style Encoder - ŝ
  • Loss - 𝓛
  1. Adversarial Objective 𝓛adv:
    • Sample a latent code z and a domain randomly. Generate a style code š = F(z)
    • Generate an Output image G(x,s) using the generated style code.
    • Learn using Adverserial Loss. While training the Generator, there is no control over the log[Dy(x)]. So the Generator tries to Minimise the expected value of the log(1-D(G(x,š))) term. We want the discriminator to classify the generated image as real with as high a probability as possible. Since log is a monotonically increasing function, minimising the loss would try and maximise this probability. When training the Discriminator, however, we want to Maximise the loss to maximise Dy(x) since x truly belongs to the domain y
  2. Style Reconstruction 𝓛sty:
    • To Minimise the style Reconstruction loss i.e., to train the Style Encoder to correctly predict the style of the image and to push the Generator towards greater use of the provided style code. The output of the Style Encoder should ideally be ŝ
  3. Style Diversification 𝓛ds:
    • We try to Maximise the difference between images generated using two different style codes š1 and š2 produces using two different latent codes z1 and z2
  4. Source Characteristics 𝓛cyc:
    • We try to Minimise the difference between the original image and the generated output given an image which is generated using x and ŝ and the style code predicted by the Style Encoder i.e., ensure that the generator preserves characteristics of the original image.

Full Objective:

minG,F,E maxD 𝓛adv + λsty 𝓛sty - λds 𝓛ds + λcyc 𝓛cyc

  • The λ's are hyperparameters

Evaluating the Model:

  • Frechet Inception Distance - The Fréchet inception distance (FID) is a metric used to assess the quality of images created by the generator of a generative adversarial network (GAN).The FID compares the distribution of generated images with the distribution of real images that were used to train the generator. (Lower FID is better) FID
  • Learned Perceptual Image Patch Similarity - A Mesure Of Diversity in generated Images (Higher is Better)

Network Architecture:

Layers:

AdaptiveWing Loss for Robust Face Alignment via Heatmap Regression

Output

Trial_3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages