# Evaluation & Loss Functions

__Contents__ :

1. [Disentangled Representation Learning](#paper_1), John et al. 2018
    - Loss functions: [multitask](#paper_1_mult), [discriminator](#paper_1_disc), [KL regularization](#paper_1_KL), [reconstruction](#paper_1_recon)
    - [Evaluation metrics](#paper_1_eval):
        - _Style loss_ : pretrained classifier
        - _Content loss_ : cosine similarity for pure content, KN language smoother for fluency
    
2. [Adversarially Regularized Autoencoders](#paper_2), Zhao et al. 2018

3. [Style Transformer](#paper_3), Dai et al. 2019
    - _Evaluation_ :
        - [Reqiurements](#paper_3_eval)
        - Metrics: [style](#paper_3_eval_style) (classification), [content](#paper_3_eval_content) (BLEU), [fluency](#paper_3_eval_fluency) (PPL)

***
***

<a name="paper_1"></a>
## 1. [Disentangled Representation Learning for Non-Parallel Text Style Transfer](https://arxiv.org/pdf/1808.04339.pdf)

- _Authors_ : John et al. 2018
- _Source code_ : [torch reimplementation](https://github.com/h3lio5/linguistic-style-transfer-pytorch)

### 1.1 Loss

- _Source_ : [model.py](https://github.com/h3lio5/linguistic-style-transfer-pytorch/blob/master/linguistic_style_transfer_pytorch/model.py)

- _Math_ : $J_{TOT} = J_{VAE} + \lambda_{mul(s)} J_{mul(s)} + \lambda_{adv(s)} J_{adv(s)} + \lambda_{mul(c)} J_{mul(c)} + \lambda_{adv(c)} J_{adv(c)}$
    
    and $J_{VAE} = J_{AE} + KL$ for standard autoencoding expectation-based loss

In [None]:
# Requirements

import torch
import torch.nn as nn
from linguistic_style_transfer_pytorch.config import ModelConfig, GeneralConfig
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
import math

mconfig = ModelConfig()
gconfig = GeneralConfig()

<a name="paper_1_mult"></a>
#### 1.1.1 Multitask loss
- _Style classifier_ : trained model that predicts the style label -- $J_{mul(s)}$
- _Content classifier_ : predicts bag of words (BoW) representation of sentence -- $J_{mul(c)}$

In [6]:
# Content multitask loss
def get_content_mul_loss(self, content_emb, content_bow):
        """
        This loss quantifies the amount of content information preserved
        in the content space
        Returns:
        cross entropy loss of the content classifier
        """
        # predictions
        preds = nn.Softmax(dim=1)(
            self.content_classifier(self.dropout(content_emb)))
        # label smoothing
        smoothed_content_bow = content_bow * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.content_bow_dim
        # calculate cross entropy loss
        content_mul_loss = nn.BCELoss()(preds, smoothed_content_bow)

        return content_mul_loss

# Style multitask loss
def get_style_mul_loss(self, style_emb, style_labels):
        """
        This loss quantifies the amount of style information preserved
        in the style space
        Returns:
        cross entropy loss of the style classifier
        """
        # predictions
        preds = nn.Softmax(dim=1)(
            self.style_classifier(self.dropout(style_emb)))
        # label smoothing
        smoothed_style_labels = style_labels * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.num_style
        # calculate cross entropy loss
        style_mul_loss = nn.BCELoss()(preds, smoothed_style_labels)
        
        return style_mul_loss

<a name="paper_1_disc"></a>
#### 1.1.2 Discriminator Loss

- _Style discriminator_ : trained to predict style label -- $J_{adv(s)}$
- _Style generator_ : trained to increase entropy/likelihood of predictions
- _Content discriminator, generator_ work in the same way -- $J_{adv(c)}$

In [4]:
# Adversarial content generator/'predictor'
def get_content_disc_preds(self, style_emb):
        """
        Returns predictions about the content using style embedding
        as input
        output shape : [batch_size,content_bow_dim]
        """
        # predictions
        # Note: detach the style embedding since when don't want the gradient to flow
        #       all the way to the encoder. content_disc_loss is used only to change the
        #       parameters of the discriminator network
        preds = nn.Softmax(dim=1)(self.content_disc(
            self.dropout(style_emb.detach())))

        return preds

# Adversarial content discriminator loss
def get_content_disc_loss(self, content_disc_preds, content_bow):
        """
        It essentially quantifies the amount of information about content
        contained in the style space
        Returns:
        cross entropy loss of content discriminator
        """
        # label smoothing
        smoothed_content_bow = content_bow * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.content_bow_dim
        # calculate cross entropy loss
        content_disc_loss = nn.BCELoss()(content_disc_preds, smoothed_content_bow)

        return content_disc_loss
    
# Adversarial style generator/"predictor"
def get_style_disc_preds(self, content_emb):
        """
        Returns predictions about style using content embeddings
        as input
        output shape: [batch_size,num_style]
        """
        # predictions
        # Note: detach the content embedding since when don't want the gradient to flow
        #       all the way to the encoder. style_disc_loss is used only to change the
        #       parameters of the discriminator network
        preds = nn.Softmax(dim=1)(self.style_disc(
            self.dropout(content_emb.detach())))

        return preds
    
# Adversarial style discriminator loss
def get_style_disc_loss(self, style_disc_preds, style_labels):
        """
        It essentially quantifies the amount of information about style
        contained in the content space
        Returns:
        cross entropy loss of style discriminator
        """
        # label smoothing
        smoothed_style_labels = style_labels * \
            (1-mconfig.label_smoothing) + \
            mconfig.label_smoothing/mconfig.num_style
        # calculate cross entropy loss

        style_disc_loss = nn.BCELoss()(style_disc_preds, smoothed_style_labels)

        return style_disc_loss

<a name="paper_1_KL"></a>
#### 1.1.3 KL regularization

- Regularization to ensure that reconstruction from encoder/decoder is good -- $KL$
- Measures how much information is lost during encoding/decoding
- This model, like most VAEs, assumes the latent model is Gaussian

In [5]:
# KL divergence loss

def get_kl_loss(self, mu, log_var):
        """
        Args:
            mu: batch of means of the gaussian distribution followed by the latent variables
            log_var: batch of log variances(log_var) of the gaussian distribution followed by the latent variables
        Returns:
            total loss(float)
        """
        kl_loss = torch.mean((-0.5*torch.sum(1+log_var -
                                             log_var.exp()-mu.pow(2), dim=1)))
        return kl_loss

<a name="paper_1_recon"></a>
#### 1.1.4 VAE autoencoding loss

- Loss of reconstructing the input from VAE model -- $J_{AE}$
- Employs KL regularization above

In [8]:
# Autoencoding/'reconstruction' loss

def get_recon_loss(self, output_logits, input_sentences):
        """
        Args:
            output_logits: logits of output sentences at each time step, shape = (max_seq_length,batch_size,vocab_size)
            input_sentences: batch of token indices of input sentences, shape = (batch_size,max_seq_length)
        Returns:
            reconstruction loss calculated using cross entropy loss function
        """

        loss = nn.CrossEntropyLoss(ignore_index=0)
        recon_loss = loss(
            output_logits.view(-1, mconfig.vocab_size), input_sentences.view(-1))

        return recon_loss

***

<a name="paper_1_eval"></a>
### 1.2 Evaluation

#### 1.2.1 Style Transfer
- Evaluate based on CNN to predict sentiment, based on implementation of word2vec sentiment classifier [(Kim 2014)](https://arxiv.org/pdf/1408.5882.pdf)

#### 1.2.2 Content Preservation
- _Cosine similarity_ between source, generated sentence embeddings
- _Word overlap_ : unigram word overlap rate of original and style transferred sentence
- _Language fluency_ : trigram Kneser-Ney smoothed language model; number $\approx 0 \implies$ more fluent

***
***

<a name="paper_2"></a>
## 2. [Adversarially Regularized Autoencoders](https://arxiv.org/pdf/1706.04223.pdf)

- _Authors_ : Zhao et al. 2018
- _Source code_ : [ARAE](https://github.com/jakezhaojb/ARAE)

### 2.1 Evaluation

- _Source_ : [utils.py](https://github.com/jakezhaojb/ARAE/blob/master/lang/utils.py)

<a name="paper_3"></a>
## 3. [Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation](https://arxiv.org/pdf/1905.05621.pdf)

- _Authors_ : Dai et al. 2019
- _Source code_ : [fastnlp](https://github.com/fastnlp/style-transformer)

<a name="paper_3_eval"></a>

### 3.1 Evaluation

- _Source_ : [evaluator.py](https://github.com/fastnlp/style-transformer/blob/master/evaluator/evaluator.py) -- called from `Evaluator` class

In [2]:
# Requirements

from nltk.tokenize import word_tokenize
from nltk.translate.bleu_score import sentence_bleu

import fasttext
import pkg_resources
import kenlm
import math

def __init__(self):
        resource_package = __name__

        yelp_acc_path = 'acc_yelp.bin'
        yelp_ppl_path = 'ppl_yelp.binary'
        yelp_ref0_path = 'yelp.refs.0'
        yelp_ref1_path = 'yelp.refs.1'

        
        yelp_acc_file = pkg_resources.resource_stream(resource_package, yelp_acc_path)
        yelp_ppl_file = pkg_resources.resource_stream(resource_package, yelp_ppl_path)
        yelp_ref0_file = pkg_resources.resource_stream(resource_package, yelp_ref0_path)
        yelp_ref1_file = pkg_resources.resource_stream(resource_package, yelp_ref1_path)

        
        self.yelp_ref = []
        with open(yelp_ref0_file.name, 'r') as fin:
            self.yelp_ref.append(fin.readlines())
        with open(yelp_ref1_file.name, 'r') as fin:
            self.yelp_ref.append(fin.readlines())
        self.classifier_yelp = fasttext.load_model(yelp_acc_file.name)
        self.yelp_ppl_model = kenlm.Model(yelp_ppl_file.name)

ModuleNotFoundError: No module named 'kenlm'

<a name="paper_3_eval_style"></a>
#### 3.1.1 Style Evaluation
- Evaluate based on two sentiment classifiers trained from [_fastText_](https://github.com/facebookresearch/fastText) (FaceBook classifier)

In [11]:
# Style check from fastText model

def yelp_style_check(self, text_transfered, style_origin):
        text_transfered = ' '.join(word_tokenize(text_transfered.lower().strip()))
        if text_transfered == '':
            return False
        label = self.classifier_yelp.predict([text_transfered])
        style_transfered = label[0][0] == '__label__positive'
        return (style_transfered != style_origin)
    
# Checking the accuracy for different styles
    
def yelp_acc_b(self, texts, styles_origin):
        assert len(texts) == len(styles_origin), 'Size of inputs does not match!'
        count = 0
        for text, style in zip(texts, styles_origin):
            if self.yelp_style_check(text, style):
                count += 1
        return count / len(texts)

def yelp_acc_0(self, texts):
        styles_origin = [0] * len(texts)
        return self.yelp_acc_b(texts, styles_origin)

def yelp_acc_1(self, texts):
        styles_origin = [1] * len(texts)
        return self.yelp_acc_b(texts, styles_origin)

<a name="paper_3_eval_content"></a>

#### 3.1.2 Content Evaluation

- Calculate BLEU score between transferred sentence and input using NLTK
- Higher BLEU $\implies$ kept more words from source -- maybe not the best eval for our model since content $\neq$ literal words?

In [12]:
# Initialize the NLTK model
def nltk_bleu(self, texts_origin, text_transfered):
        texts_origin = [word_tokenize(text_origin.lower().strip()) for text_origin in texts_origin]
        text_transfered = word_tokenize(text_transfered.lower().strip())
        return sentence_bleu(texts_origin, text_transfered) * 100

# Check the BLEU diff between original & transferred text
def self_bleu_b(self, texts_origin, texts_transfered):
        assert len(texts_origin) == len(texts_transfered), 'Size of inputs does not match!'
        sum = 0
        n = len(texts_origin)
        for x, y in zip(texts_origin, texts_transfered):
            sum += self.nltk_bleu([x], y)
        return sum / n

<a name="paper_3_eval_fluency"></a>

#### 3.1.3 Fluency Evaluation

- Measures perplexity of transferred sentence by training 5-gram language model on two datasets using KenLM

In [None]:
# Measures perplexity of language model

def yelp_ppl(self, texts_transfered):
        texts_transfered = [' '.join(word_tokenize(itm.lower().strip())) for itm in texts_transfered]
        sum = 0
        words = []
        length = 0
        for i, line in enumerate(texts_transfered):
            words += [word for word in line.split()]
            length += len(line.split())
            score = self.yelp_ppl_model.score(line)
            sum += score
        return math.pow(10, -sum / length)