# Arbitrary Facial Attribute Editing: Only Change What You Want (AttGAN)

## Summary
* The idea that the attempt of separating all the attribute information from the latent representation being excessive is suggested.
    * May result in loss of information.
* With the goal of attribute editing, what we need is just the correct change of attributes and no contraints on the latent representation.
* To do this the constraint is placed on the generated image in the form of an attribute classifier instead to make sure the attribute manipulation looks good.
* Reconstruction learning signal for preserving non attribute details.
* Adversarial learning signal for realism in generated images. 


## Attribute GAN (AttGAN)
* Model parts
    * Encoder $G_{enc}$
    * Decoder $G_{dec}$
    * Discriminator $D$ 
    * Attribute classifier $C$
    * $D$ and $C$ share layers.
* Data streams
    * Input image is $x^a$, $a$ are binary attributes.
    * Encoder $G_{enc}$ encodes input $x^a$ to latent $z$
    * Decoder $G_{dec}$ decodes $z$ conditioned on target attributes $b$ to $\hat{x^b}$
    * Decoder $G_{dec}$ decodes $z$ conditioned on actual attributes $a$ to $\hat{x^a}$
    * Attribute classifier $C$ predicts $b'$ from $\hat{x^b}$
* Learning
    * Attribute classification loss
        * The weights of $G_{enc}$ and $G_{dec}$ are updated to output $\hat{x^b}$ that minimize the cross entropy between predicted attributes $b'$ and target attributes $b$.
        * The weights of $C$ are updated to minimize the cross entropy between predicted attributes $a'$ and actual attributes $a$ from training data.
    * Reconstruction loss
        * The weights of $G_{enc}$ and $G_{dec}$ are updated to minimize the $l_1$ loss between input sample $x^a$ and the decoded image conditioned on actual attributes, $\hat{x^a}$.
    * Adversarial loss
        * $D$ is updated to output high values for (real) input samples $x^a$ and to output low values for (fake) $\hat{x^b}$
        * $G_{enc}$ and $G_{dec}$ are updated to output high values for (fake) $\hat{x^b}$.
        * WGAN-GP is used meaning $D$ is constrained to be Lipchitz-1 via the gradient penalty.
    * The different types of losses are scaled by hyper parameters and added to form the total loss.


### Details For Training
TODO
* how are attributes fed in to the generator, tiled into feature maps?
* How are attributes sampled during training


### Attribute Intensity Manipulation
* Attributes can/should actually be represented continuously instead of the binary representation assumed above.
* $z$ is divided into $z_{int}$ which models intensity of attributes and $z'$ which has same role as $z$.
* $z_{int}$ is modeled as samples from $U(0, 1)$. Note that it only applies to $z_{int}$ and not rest of latent code.
    * This is enforced via adversarial training (like AAE). Note: I guess separate discriminator for this?
* Intensity adjusted target attributes via $b_{int} = z_{int} * (2b - 1)$
    * Same treatment for reconstructions.
    * This is what's fed into the generator along with $z'$.
    
## Experiments
TODO

## TODO/discussion: 
* Similarities to stargan?
* similarities to cyclegan? and or UNIT
    * we have the reconstruction
    * classifier in the form of the discriminator
    * Maybe more explicit in attgan