# Evaluation & Loss Functions

***
***

## 1. [Disentangled Representation Learning for Non-Parallel Text Style Transfer](https://arxiv.org/pdf/1808.04339.pdf)

- Code source: [_torch reimplementation_](https://github.com/h3lio5/linguistic-style-transfer-pytorch)

### 1.1 Loss Functions

- Source: [_model.py_](https://github.com/h3lio5/linguistic-style-transfer-pytorch/blob/master/linguistic_style_transfer_pytorch/model.py)

- _Math_ : $J_{TOT} = J_{VAE} + \lambda_{mul(s)} J_{mul(s)} + \lambda_{adv(s)} J_{adv(s)} + \lambda_{mul(c)} J_{mul(c)} + \lambda_{adv(c)} J_{adv(c)}$
    
    and $J_{VAE} = J_{AE} + KL$ for standard autoencoding expectation-based loss

#### 1.1.1 Multitask loss
- _Style classifier_ : trained model that predicts the style label -- $J_{mul(s)}$
- _Content classifier_ : predicts bag of words (BoW) representation of sentence -- $J_{mul(c)}$

In [6]:
# Content multitask loss
def get_content_mul_loss(self, content_emb, content_bow):
        """
        This loss quantifies the amount of content information preserved
        in the content space
        Returns:
        cross entropy loss of the content classifier
        """
        # predictions
        preds = nn.Softmax(dim=1)(
            self.content_classifier(self.dropout(content_emb)))
        # label smoothing
        smoothed_content_bow = content_bow * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.content_bow_dim
        # calculate cross entropy loss
        content_mul_loss = nn.BCELoss()(preds, smoothed_content_bow)

        return content_mul_loss

# Style multitask loss
def get_style_mul_loss(self, style_emb, style_labels):
        """
        This loss quantifies the amount of style information preserved
        in the style space
        Returns:
        cross entropy loss of the style classifier
        """
        # predictions
        preds = nn.Softmax(dim=1)(
            self.style_classifier(self.dropout(style_emb)))
        # label smoothing
        smoothed_style_labels = style_labels * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.num_style
        # calculate cross entropy loss
        style_mul_loss = nn.BCELoss()(preds, smoothed_style_labels)
        
        return style_mul_loss

#### 1.1.2 Discriminator Loss

- _Style discriminator_ : trained to predict style label -- $J_{adv(s)}$
- _Style generator_ : trained to increase entropy/likelihood of predictions
- _Content discriminator, generator_ work in the same way -- $J_{adv(c)}$

In [4]:
# Adversarial content generator/'predictor'
def get_content_disc_preds(self, style_emb):
        """
        Returns predictions about the content using style embedding
        as input
        output shape : [batch_size,content_bow_dim]
        """
        # predictions
        # Note: detach the style embedding since when don't want the gradient to flow
        #       all the way to the encoder. content_disc_loss is used only to change the
        #       parameters of the discriminator network
        preds = nn.Softmax(dim=1)(self.content_disc(
            self.dropout(style_emb.detach())))

        return preds

# Adversarial content discriminator loss
def get_content_disc_loss(self, content_disc_preds, content_bow):
        """
        It essentially quantifies the amount of information about content
        contained in the style space
        Returns:
        cross entropy loss of content discriminator
        """
        # label smoothing
        smoothed_content_bow = content_bow * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.content_bow_dim
        # calculate cross entropy loss
        content_disc_loss = nn.BCELoss()(content_disc_preds, smoothed_content_bow)

        return content_disc_loss
    
# Adversarial style generator/"predictor"
def get_style_disc_preds(self, content_emb):
        """
        Returns predictions about style using content embeddings
        as input
        output shape: [batch_size,num_style]
        """
        # predictions
        # Note: detach the content embedding since when don't want the gradient to flow
        #       all the way to the encoder. style_disc_loss is used only to change the
        #       parameters of the discriminator network
        preds = nn.Softmax(dim=1)(self.style_disc(
            self.dropout(content_emb.detach())))

        return preds
    
# Adversarial style discriminator loss
def get_style_disc_loss(self, style_disc_preds, style_labels):
        """
        It essentially quantifies the amount of information about style
        contained in the content space
        Returns:
        cross entropy loss of style discriminator
        """
        # label smoothing
        smoothed_style_labels = style_labels * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.num_style
        # calculate cross entropy loss

        style_disc_loss = nn.BCELoss()(style_disc_preds, smoothed_style_labels)

        return style_disc_loss

#### 1.1.3 KL regularization

- Regularization to ensure that reconstruction from encoder/decoder is good -- $KL$
- Measures how much information is lost during encoding/decoding
- This model, like most VAEs, assumes the latent model is Gaussian

In [5]:
# KL divergence loss

def get_kl_loss(self, mu, log_var):
        """
        Args:
            mu: batch of means of the gaussian distribution followed by the latent variables
            log_var: batch of log variances(log_var) of the gaussian distribution followed by the latent variables
        Returns:
            total loss(float)
        """
        kl_loss = torch.mean((-0.5*torch.sum(1+log_var -
                                             log_var.exp()-mu.pow(2), dim=1)))
        return kl_loss

#### 1.1.4 VAE autoencoding loss

- Loss of reconstructing the input from VAE model -- $J_{AE}$
- Employs KL regularization above

In [8]:
# Autoencoding/'reconstruction' loss

def get_recon_loss(self, output_logits, input_sentences):
        """
        Args:
            output_logits: logits of output sentences at each time step, shape = (max_seq_length,batch_size,vocab_size)
            input_sentences: batch of token indices of input sentences, shape = (batch_size,max_seq_length)
        Returns:
            reconstruction loss calculated using cross entropy loss function
        """

        loss = nn.CrossEntropyLoss(ignore_index=0)
        recon_loss = loss(
            output_logits.view(-1, mconfig.vocab_size), input_sentences.view(-1))

        return recon_loss

***

### 1.2 Evaluation Metrics