## Introduction 

Here we are trying to adjust parameters of a paraphrase model to generate adversarial examples. 
### Policy gradients 
The key parameter update equation is $\theta_{t+1} = \theta_t + \alpha \nabla_\theta J(\theta)$, where $\alpha$ is a step size parameter, the parameter vector $\theta$ is for a model (here a paraphrase model), and $J$ is a loss function. The time step $t$ depends on the problem specification and we will get to it later. 

Now in my review I have defined the loss function $J(\theta) = E_\pi[r(\tau)]$. Here: 
* $\pi$ is the policy, a probability distribution for the next action in a given state; essentially $p(a_t|s_t)$
* $\tau$ is a trajectory, a specific sequence $s_0, a_0, r_1, s_1, a_1, \ldots$ of the agent in the game. This starts at time $t=0$ and finishes at time $t=T$. 
* $r(\tau)$ is the sum of rewards for a trajectory $\tau$, or in other words, the total reward for the trajectory. 

For this loss function higher values are better (which might make it a reward function) and so we might have to invert it at some point. 

To update parameters we must find the gradient $\nabla_\theta J(\theta)$, which measures how $J(\theta)$ changes when we adjust the parameters of the paraphrase model. The gradient is simplified through some maths to get the policy gradient theorem $$ \nabla_\theta J(\theta) =  \nabla_\theta E_\pi [r(\tau)]  = E_\pi \left[r(\tau) \sum_{t=1}^T \nabla_\theta \log \pi (a_t|s_t)  \right] $$ 

To calculate this you need to calculate the expectation term, which in turn means evaluating every possible trajectory $\tau$ and its expected return. Generally this is not possible and instead we turn to estimators.  

One of these is REINFORCE. It gives us  $$ \nabla_\theta J(\theta) \approx \sum_{s=1}^S \sum_{t=1}^T G_t \nabla \log \pi(a_t|s_t)$$ where 
* $G_t$ is the discounted return and is given by $G_t = r_t + \beta r_{t-1} + \beta^2 r_{t-2} + \dots$. It's a rough estimate of $r(\tau)$. Rewards obtained later in the episode are weighted much higher than rewards obtained earlier. I guess it assumes that the parameters update every timestep. 
* $S$ is some number of samples.

The implementation of REINFORCE and similar estimators depends on how we formulate the problem. Below we present some possible formulations

### Interpretation One: Document-level  
This is the first implementation we will try. 

Here we generate a list of paraphrases at each time point. The idea is that there is one paraphrase amongst them that is a good adversarial example. We try to tune the model to produce the best one. 

This interpretation sees forming the complete paraphrase as one time step. So it isn't token-level but document-level. 

* Starting state: $s0 = x$, the original example  
* Actions: each action is "choosing" a paraphrase (or of choosing $n$ paraphrases). The set of all possible paraphrases and their probabilities is the policy. So $\pi(a|s) = p(x'| x;\theta)$ where $x'$ is the paraphrase (or list of paraphrases). 
    * To approximate this probability, what we can do is generate a large list of paraphrases, and for each, the probabilities of generating each token in turn for that paraphrase. This gives a rough "probability" of how likely that sequence was. This number is kind of like a weight for how good that paraphrase is, according to the model.  We can then turn the weights into probabilities to get a "probability" of the paraphrase. This is dependent on the number of paraphrases generated, so generating a large list is likely to be better for this task. 
* Reward: The paraphrase moves through the reward function $R(x, x')$) to get the reward $r$. 
* Time steps: We only have one time step in the game ($T=1$ and $G_t=r$)  


There are a few variations to this scenario that we can do. For each of these we will formulate the policy and the reward function $R$. Below, $x'$ means paraphrase, $f(x)_y$ means the model confidence of x for the class of the true label $y$, $SS(a,b)$ is the result of a semantic similarity model run over $a$ and $b$, and $\lambda$ is a hyperparameter.  


#### One-paraphrase 
Here we only generate one paraphrase. This scenario also has a few options. First we generate a list of paraphrases with the probabilities of selecting one. Then we either sample probabilistically from the list or pick the most probable option. 

In this case the policy $p(x'|x,\theta)$ is the chance of obtaining a specific paraphrase. For the sampling option this is equal to its sample probability. For the top option this is just the probability of selecting that option. 

The reward function might look like $R(x,x') = f(x)_y - f(x')_y + \lambda SS(x, x')$. We could also make the $SS$ factor a step-function above some threshold. 

The REINFORCE equation $$ \nabla_\theta J(\theta) \approx \sum_{s=1}^S \sum_{t=1}^T G_t \nabla \log \pi(a_t|s_t)$$ becomes $$ \nabla_\theta J(\theta) \approx \sum_{s=1}^S  R(x,x'_s) \nabla \log p(x'_s|x,\theta)$$ We repeat the process $S$ times where $S$ is ideally as large as possible. We can start with something simple (e.g. $S=10$ or $S=100$) and go from there.  

The gradient term $\nabla \log p(x'_s|x,\theta)$ can hopefully be found with autodiff. 

#### Set of paraphrases
In this scenario the paraphrase model is evaluated on performance over a set of paraphrases, which we call $X'$ here. The policy becomes $p(X'|x, \theta)$, the probability of obtaining that list. We can get this probability by multipling together the "probability" of each individual paraphrase, multiplying also by nCr (for r paraphrases out of n total) to account for the lack of order in the list. 

We can make a number of sub-scenarios here. 

For the **top-paraphrase in set** condition the paraphrase generator is only measured on the best reward for a paraphrase in its set. The idea is the generator will learn to produce a diverse set of examples, any of which could plausibly be a good adversarial example. Here we only look at best performing paraphrase $x'_m$, which we can find by $x'_m = \max_i [f(x)_y - f(x'_i)_y]$, then return $R(x,x'_m) = [f(x)_y - f(x'_m)_y] + \lambda SS(x,x'_m)$ 

For the **average-paraphrase in set** condition the paraphrase generator is measured on the average reward of the paraphrases in its set. This encourages the generator to consider performance of all examples more-or-less equally. The reward function could be something like $\frac{1}{k} \sum_{i=1}^k \left[ f(x)_y - f(x'_i)_y + \lambda SS(x, x'_i) \right]$ 

A combination of these scenarios is the **top-k/top-p\% paraphrases in set**. Here we only use the top-$k$ paraphrases, or more generally, the top $p$ percentage of paraphrases. 


### Interpretation 2: Token-level
This interpretation is at token-level; it sees choosing the next word as the next time step. 

* Starting state: $s0 = x$, the initial state. But you also have a "blank slate" for the paraphrase. So maybe it's a tuple (x, pp) where pp is a paraphrase with no words. Here x is used as the reference for the paraphrase generator.  
* Actions: Choose the next word of p. I guess this starts with the \<START\> token (or something similar). Then you have the policy $\pi(a|s)$ which is the same as $p(w_{next}|pp, x; \theta)$ where $\theta$ is the paraphrase model parameters, $pp$ is the so-far constructed sentence, and $w_{next}$ is the next token (I say token because I don't know if this model is on the subword or word basis). 
* Time steps: every token is generated one-by-one and each of these is allocated a time step. This means probably that you also update the parameters after each token generated too. 
* Reward. The reward is allocated every token. There are many reward functions (see papers on token-level loss functions). Some also incorporate document-level rewards too. 
* Next state. $s_1$ is again the tuple $(x, pp)$ but now $pp$ has the first word in it. 

On *teacher forcing*. This is when you have a ground-truth paraphrase and you can use it when generating tokens. This is useful because if the model makes a mistake it doesn't continue down that track but is adjusted back. This stops big divergences (but also might limit the diversity of generated paraphrases). This is used when training a paraphrase model. You have a set of reference paraphrases that are human provided. Here though we only have the original sentence and no references. We could generate adversarial examples and use that to do teacher forcing. Generating them using textattack recipes might work. This is only really used on the token-level rewards. 

### Updating the paraphrase model parameters. 

There is a choice here. We can either directly update the parameters of the paraphrase model. Or we can fix the parameters and add a new dense layer to the end of the model. We could then train this dense layer to convert paraphrases to adversarial paraphrases. 

Before trying this out, I am worried that we will destroy the capabilities of the paraphrase generator a bit. We might get semantically invalid or ungrammatical or gibberish text. If so we could try and mitigate it a bit by shaping our reward function to maintain grammatical components. 

### Experiment order

Plan is to try the following order: 

1. One-paraphrase (most probable option). I'll start with this one because it is probably the most simple case. Within this category: 
    1a. tune existing parameters only (see if the text is recognisable) 
    1b. add dense layer onto end and try again 
2. One-paraphrase (sampled). This seems like a logical extension on the first one. 
3. Paraphrase-set options. (Decide after finishing 1, 2) 
4. Token-level tuning. (Decide after 1,2,3)


### Layer Freezing

I am uncertain on if to do this or not. 

* This [paper](https://arxiv.org/abs/1911.03090) indicates that you can get pretty good results by freezing all layers except the last few 
* Conversely I saw in the transformers documentation that transformers train better if you don't do layer freezing 


## Setup, load models + datasets 

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
# Core imports 
import torch, numpy as np, pandas as pd, os, gc, logging
from torch.utils.data import DataLoader
from datasets import load_dataset, load_metric, load_from_disk
from transformers import (AutoModelForSeq2SeqLM, AutoModelForSequenceClassification, 
                          AutoTokenizer, AdamW, SchedulerType, get_scheduler)
from collections import defaultdict
from types import MethodType
import utils; from utils import *   # local script 
from tqdm.auto import tqdm

# Dev imports (not needed for final script)
import seaborn as sns
from IPython.display import Markdown
from pprint import pprint
from IPython.core.debugger import set_trace
from GPUtil import showUtilization
import torchsnooper

# Paths
path_cache = './cache/'
path_results = "./results/"

# Seeds
seed = 420
torch.manual_seed(seed)
np.random.seed(seed)
torch.cuda.manual_seed(seed)

# Devices and GPU settings
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 
devicenum = torch.cuda.current_device() if device.type == 'cuda' else -1
n_wkrs = 4 * torch.cuda.device_count()
batch_size_dl = 64

# Config
pd.set_option("display.max_colwidth", 400)

# Logging 
logging.basicConfig(format='%(message)s')
logger = logging.getLogger("main_logger")
logger.setLevel(logging.INFO)


### Parameters and training settings
# Paraphrase parameters  
num_beams = 1
num_return_sequences=1
num_beam_groups = 1
diversity_penalty = 0.  # must be a float
temperature = 1.5
length_penalty = 1
min_length = 5

# REINFORCE parameters 
S = 1

# Model training parameters
batch_size = 1
lr = 1e-3 # Initial learning rate (after the potential warmup period) to use
weight_decay = 0
n_train_epochs = 20
#lr_scheduler_type = 'none'
n_warmup_steps = 30 
plot_grads = False

### Load models

In [3]:
## Paraphrase (pp) model 
pp_name = "tuner007/pegasus_paraphrase"
pp_tokenizer = AutoTokenizer.from_pretrained(pp_name)
# takes about 3GB memory space up on the GPU
pp_model = AutoModelForSeq2SeqLM.from_pretrained(pp_name).to(device)
# If need a no_grad version of generate:
pp_model.generate_with_grad = MethodType(utils.generate_with_grad, pp_model)

## Victim Model (VM)
vm_name = "textattack/distilbert-base-uncased-rotten-tomatoes"
vm_tokenizer = AutoTokenizer.from_pretrained(vm_name)
vm_model = AutoModelForSequenceClassification.from_pretrained(vm_name).to(device)
vm_idx2lbl = vm_model.config.id2label
vm_lbl2idx = vm_model.config.label2id
vm_num_labels = vm_model.num_labels

### Load raw datasets and create dataloaders

In [4]:
dataset = load_dataset("rotten_tomatoes")
train,valid,test = dataset['train'],dataset['validation'],dataset['test']
label_cname = 'label'
## For snli
# remove_minus1_labels = lambda x: x[label_cname] != -1
# train = train.filter(remove_minus1_labels)
# valid = valid.filter(remove_minus1_labels)
# test = test.filter(remove_minus1_labels)

# make sure that all datasets have the same number of labels as what the victim model predicts
assert train.features[label_cname].num_classes == vm_num_labels
assert valid.features[label_cname].num_classes == vm_num_labels
assert test.features[ label_cname].num_classes == vm_num_labels

train_dl = DataLoader(train, batch_size=batch_size_dl, shuffle=True, num_workers=n_wkrs)
valid_dl = DataLoader(valid, batch_size=batch_size_dl, shuffle=True, num_workers=n_wkrs)
test_dl = DataLoader( test,  batch_size=batch_size_dl, shuffle=True, num_workers=n_wkrs)

Using custom data configuration default
Reusing dataset rotten_tomatoes_movie_review (/data/tproth/.cache/huggingface/datasets/rotten_tomatoes_movie_review/default/1.0.0/9c411f7ecd9f3045389de0d9ce984061a1056507703d2e3183b1ac1a90816e4d)


In [5]:
# For testing, we'll just use a simple dataset
simple_dataset = load_dataset('csv',data_files="simple_dataset.csv")['train']
simple_dataset_test = load_dataset('csv',data_files="simple_dataset_test.csv")['train']
simple_dl = DataLoader(simple_dataset, batch_size=1, 
                            shuffle=False, num_workers=n_wkrs)
simple_dl_test = DataLoader(simple_dataset_test, batch_size=1, 
                            shuffle=False, num_workers=n_wkrs)
dl = simple_dl

Using custom data configuration default-a2b91d51da8a7742
Reusing dataset csv (/data/tproth/.cache/huggingface/datasets/csv/default-a2b91d51da8a7742/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0)
Using custom data configuration default-73f997ff87f1fac5
Reusing dataset csv (/data/tproth/.cache/huggingface/datasets/csv/default-73f997ff87f1fac5/0.0.0/2dc6629a9ff6b5697d82c25b73731dd440507a69cbce8b425db50b751e8fcfd0)


## Training

### Description 

Training loop pseudocode

The REINFORCE estimator is $$ \nabla_\theta J(\theta) \approx \sum_{s=1}^S  R(x,x'_s) \nabla \log p(x'_s|x,\theta)$$

**Non-batched version (one example), stochastic gradient descent**  
Inputs: train, n_pp=1, vm, ppm, $\alpha = 5e^{-5}$ (saw this rate for $\alpha$ somewhere  
Set eval_mode=true for vm, eval_mode = false for ppm  
Freeze all layers of ppm except last 6  
Shuffle traning dataset  

Loop: take one row $x$ from train
* tokenize
* do greedy search to get paraphrase pp
* get reward using `reward_fn_onepp(x, pp)`. $r=R(x,x'_s) = f(x)_y - f(x'_s)_y + \lambda SS(x, x'_s)$ 
* update model parameters 


* generate large UNIVERSE list of paraphrases `pp_l` (e.g. 128) from 'text' column using ppm
* extract sequence scores from this list to get a vector of probabilities `pp_probs`
* take `log` of `pp_probs` and store in `pp_logprobs`
* pick S paraphrases from `pp_l` to get `pp_s`. 
* Take the corresponding entries from `pp_logprobs`. Get gradient of each entry by looking at .grad attribute. Sum them up and store in a variable `gradsum` 
* for each `pp` (i.e. $x'_s$) in `pp_s`:
    * 
* Sum up these rewards to get `rewardsum` and add to `gradsum` to get `nablaJ`
* Update parameters of paraphrase model with $\theta_{t+1} = \theta_t + \alpha \nabla_\theta J(\theta)$

$$ J(\theta) \approx \sum_{s=1}^S  R(x,x'_s) \log p(x'_s|x,\theta)$$

### Preprocessing and setup 

#### Define functions 

In [6]:
def get_paraphrases(text, num_return_sequences, num_beams, 
                     num_beam_groups=1, diversity_penalty=0, 
                    temperature=1.5, min_length=5, length_penalty=1):
    """Wrapper for generating paraphrases (pp's). Most keywords are passed on to pp_model.generate function, 
    so see docs for that function. """
    batch = pp_tokenizer(text, truncation=True, padding='longest', return_tensors="pt").to(device)
    # Only greedy search supported at the moment
    generated = pp_model.generate_with_grad(**batch, 
                                         num_beams=num_beams,
                                         num_return_sequences=num_return_sequences, 
                                         do_sample=False, 
                                         temperature=temperature, 
                                         num_beam_groups=num_beam_groups,
                                         diversity_penalty=diversity_penalty,
                                         length_penalty=length_penalty,
                                         min_length=min_length,
                                         return_dict_in_generate=True,
                                         output_scores=True)
    tgt_text = pp_tokenizer.batch_decode(generated.sequences, skip_special_tokens=True)
    return generated, tgt_text

In [7]:
def print_info_on_generated_text():
    """
        Prints a bunch of statistics around the generated text. Useful for debugging purposes.
        So far only works for greedy search.
    """
    logger.info("\n######################################################################\n")
    logger.info(f"Original text: {text}")
    tgt_text = pp_tokenizer.batch_decode(translated.sequences, skip_special_tokens=True)
    tgt_text_with_tokens = pp_tokenizer.batch_decode(translated.sequences, skip_special_tokens=False)
    logger.info(f"Generated text: {tgt_text}")
    logger.info(f"Generated text with special tokens: {tgt_text_with_tokens}")
    logger.info(f"Shape of translated.sequences:{translated.sequences.shape}")
    logger.info(f"translated.sequences:{translated.sequences}")
    logger.info(f"Scores is a tuple of length {len(translated.scores)} \
    and each score is a tensor of shape {translated.scores[0].shape}")
    scores_stacked = torch.stack(translated.scores, 1)
    logger.info(f"Stacking the scores into a tensor of shape {scores_stacked.shape}")
    scores_softmax = torch.softmax(scores_stacked, 2)
    logger.info(f"Now taking softmax. This shouldn't change the shape, but just to check,\
    its shape is {scores_softmax.shape}")
    probsums = scores_softmax.sum(axis=2)
    logger.info(f"These are probabilities now and so they should all sum to 1 (or close to it) in the axis \
    corresponding to each time step. We can check the sums here: {probsums}, but it's a long tensor \
    of shape {probsums.shape} and hard to see, so summing over all these values and removing 1 \
    from each gives {torch.sum(probsums - 1)} \
    which should be close to 0.")
    seq_without_first_tkn = translated.sequences[:, 1:]
    logger.info("Now calculating sequence probabilities")
    seq_token_probs = torch.gather(scores_softmax,2,seq_without_first_tkn[:,:,None]).squeeze(-1)
    seq_prob = seq_token_probs.prod(-1).item()
    logger.info(f"Sequence probability: {seq_prob}")

    # Get the 2nd and 3rd most likely tokens at each st
    topk_ids = torch.topk(scores_softmax,3,dim=2).indices[:,:,1:]
    topk_tokens_probs = torch.gather(scores_softmax,2,topk_ids).squeeze(-1)
    toks2 = pp_tokenizer.convert_ids_to_tokens(topk_ids[:,:,0].squeeze())
    toks3 = pp_tokenizer.convert_ids_to_tokens(topk_ids[:,:,1].squeeze())
    tok_probs2 = topk_tokens_probs[:,:,0].squeeze()
    tok_probs3 = topk_tokens_probs[:,:,1].squeeze()

    logger.info(f"Probabilities of getting the top 3 tokens at each step:")
    tokens = pp_tokenizer.convert_ids_to_tokens(seq_without_first_tkn.squeeze())
    for (p, t, p2,t2,p3,t3)  in zip(seq_token_probs.squeeze(), tokens, tok_probs2, toks2, tok_probs3, toks3): 
        logger.info(f"{t}: {round(p.item(),3)}  {t2}: {round(p2.item(),3)}  {t3}: {round(p3.item(),3)}") 

In [8]:
def get_pp_logp(translated): 
    """log(p(pp|orig)) basically.
    works for greedy search, will need tweaking for other types probably"""
    scores_stacked = torch.stack(translated.scores, 1)
    scores_log_softmax = torch.log_softmax(scores_stacked, 2)
    seq_without_first_tkn = translated.sequences[:, 1:]
    seq_token_log_probs = torch.gather(scores_log_softmax,2,seq_without_first_tkn[:,:,None]).squeeze(-1)
    seq_log_prob = seq_token_log_probs.sum(-1)
    return seq_log_prob

In [9]:
def get_vm_probs(text): 
    if vm_model.training: vm_model.eval()
    tkns = vm_tokenizer(text, truncation=True, padding='longest', return_tensors="pt").to(device)
    logits = vm_model(**tkns).logits
    probs = torch.softmax(logits,1)
    return probs

In [10]:
def reward_fn_onepp(orig, pp, truelabel, return_probs=False): 
    """Only works for batch size of 1 so far. """
    # Victim model probabilities 
    orig_probs,pp_probs = get_vm_probs(orig),get_vm_probs(pp)
    orig_truelabel_prob = orig_probs[0][truelabel].item()
    pp_truelabel_prob   = pp_probs[0][  truelabel].item()
    truelabel_prob_diff = orig_truelabel_prob - pp_truelabel_prob
    
    # ROUGE score 
    rouge_score = rouge_metric.compute(rouge_types=["rougeL"],
        predictions=pp, references=orig)['rougeL'].mid.fmeasure    
    
    # Reward calculation 
    if    rouge_score < 0.15: reward = -9999
    else:                     reward = truelabel_prob_diff * rouge_score
    
    print(orig)
    print(pp)
    print("VM score: ", truelabel_prob_diff)
    print("ROUGE score:", rouge_score)
    print("Reward:", reward)
    
    if return_probs: return orig_probs,pp_probs,reward
    else:            return reward

In [11]:
def training_step(data): 
    optimizer.zero_grad()
    label,text = data['label'].to(device),data["text"]
    generated, pp_text = get_paraphrases(text,
            num_return_sequences=num_return_sequences, num_beams=num_beams, 
            num_beam_groups=num_beam_groups, diversity_penalty=diversity_penalty,
            temperature=temperature, length_penalty=length_penalty, min_length=min_length)
    pp_logp = get_pp_logp(generated)
    reward = reward_fn_onepp(orig=text, pp=pp_text, truelabel=label)
    loss = -reward * pp_logp
    loss.backward()
    optimizer.step()
    #  lr_scheduler.step()
    return loss, reward, pp_logp

In [12]:
def get_vm_preds_for_dl(dl): 
    l = list()
    if pp_model.training: pp_model.eval()
    if vm_model.training: vm_model.eval()
    for i, data in enumerate(dl):
        label,text = data['label'].to(device),data["text"]
        generated, pp_text = get_paraphrases(text,
                num_return_sequences=num_return_sequences, num_beams=num_beams, 
                num_beam_groups=num_beam_groups, diversity_penalty=diversity_penalty,
                temperature=temperature, length_penalty=length_penalty, min_length=min_length)
        pp_logp = get_pp_logp(generated).item()

        orig_probs,pp_probs,truelabel_prob_diff = reward_fn_onepp(orig=text, 
            pp=pp_text, truelabel=label, return_probs = True)
        orig_probs_truelabel = orig_probs[0].detach().cpu().numpy()[label]
        pp_probs_truelabel   = pp_probs[0].detach().cpu().numpy()[label]
        orig_preds,pp_preds = orig_probs.argmax(1).item(),pp_probs.argmax(1).item()
    
        d = {
            "orig": text[0],
            "pp": pp_text[0],   
            "pp_logp": pp_logp, 
            "pp_p": np.exp(pp_logp),
            "truelabel": label.item(),
            "orig_pred": orig_preds, 
            "pp_pred": pp_preds,
            "orig_probs_truelabel":orig_probs_truelabel,
            "pp_probs_truelabel": pp_probs_truelabel,
            "truelabel_prob_diff": truelabel_prob_diff
        }
        l.append(d)
        
        # writer.add_text()
        # writer.add_text()

        # writer.add_scalars("test_predictions",d)
    return l

In [13]:
def get_avg_prob_diff(test_set_preds):
    prob_diffs = [o['truelabel_prob_diff'] for o in test_set_preds]
    return np.mean(prob_diffs)

In [14]:
def plot_grad_flow(named_parameters):
    '''Plots the gradients flowing through different layers in the net during training.
    Can be used for checking for possible gradient vanishing / exploding problems.
    
    Usage: Plug this function in Trainer class after loss.backwards() as 
    "plot_grad_flow(self.model.named_parameters())" to visualize the gradient flow'''
    from matplotlib.lines import Line2D
    import matplotlib.pyplot as plt 
    ave_grads = []
    max_grads= []
    layers = []
    for n, p in named_parameters:
        if(p.requires_grad) and ("bias" not in n):
            layers.append(n)
            ave_grads.append(p.grad.abs().mean())
            max_grads.append(p.grad.abs().max())
    plt.bar(np.arange(len(max_grads)), max_grads, alpha=0.1, lw=1, color="c")
    plt.bar(np.arange(len(max_grads)), ave_grads, alpha=0.1, lw=1, color="b")
    plt.hlines(0, 0, len(ave_grads)+1, lw=2, color="k" )
    plt.xticks(range(0,len(ave_grads), 1), layers, rotation="vertical")
    plt.xlim(left=0, right=len(ave_grads))
    plt.ylim(bottom = -0.001, top=0.02) # zoom in on the lower gradient regions
    plt.xlabel("Layers")
    plt.ylabel("Average Gradient")
    plt.title("Gradient Flow")
    plt.grid(True)
    plt.legend([Line2D([0], [0], color="c", lw=4),
                Line2D([0], [0], color="b", lw=4),
                Line2D([0], [0], color="k", lw=4)], ['max-gradient', 'mean-gradient', 'zero-gradient'])

#### Set up models and do layer freezing

In [15]:
### Setup
vm_model.eval()
pp_model.train()

## Layer freezing 
# Unfreeze last 2 layers of the base model decoder
# Not sure if decoder layer norm should be unfrozen or not, but it appears after the
#   other parameters in the module ordering, so let's include it for now
# Also unfreeze the linear head.  This isn't stored in the base model but rather tacked on top
#   and will be fine-tuned for summarisation. 
layer_list = ['decoder.layers.14', 'decoder.layers.15', 'decoder.layer_norm'] 
for i, (name,param) in enumerate(pp_model.base_model.named_parameters()): 
    if np.any([o in name for o in layer_list]):   param.requires_grad = True
    else:                                         param.requires_grad = False
for param in pp_model.lm_head.parameters():       param.requires_grad = True
# For some reason this seems to be excluded
for param in pp_model.base_model.shared.parameters(): param.requires_grad=False 
### For checking the grad status of the layers
# for i, (name, param) in enumerate(pp_model.base_model.named_parameters()): print(i, name, param.requires_grad)
# for i, (name, param) in enumerate(pp_model.lm_head.named_parameters()):    print(i, name, param.requires_grad)

#### Create small dataset (dev step for quicker development, delete later)

In [16]:
# train_small = train.shard(10000, 4, contiguous=False)  # small training set for testing purposes
# train_small_dl = DataLoader(train_small, batch_size=batch_size, 
#                             shuffle=True, num_workers=n_wkrs)
# dl = train_small_dl

#### Set up optimiser and learning rate scheduler

In [17]:
# Code below taken from https://github.com/huggingface/transformers/blob/master/examples/pytorch/text-classification/run_glue_no_trainer.py#L363
# Split weights in two groups, one with weight decay and the other not.
# no_decay = ["bias", "LayerNorm.weight"]
# optimizer_grouped_parameters = [
#     {
#         "params": [p for n, p in pp_model.named_parameters() if not any(nd in n for nd in no_decay)],
#         "weight_decay": weight_decay,
#     },
#     {
#         "params": [p for n, p in pp_model.named_parameters() if any(nd in n for nd in no_decay)],
#         "weight_decay": 0.0,
#     },
# ]
# optimizer = AdamW(optimizer_grouped_parameters, lr=lr)

# For now we just keep this simple
optimizer = AdamW(pp_model.parameters(), lr=lr)
# lr_scheduler = get_scheduler(
#     name=lr_scheduler_type,
#     optimizer=optimizer,
#     num_warmup_steps=n_warmup_steps,
#     num_training_steps=n_train_steps,
# )

#### Set up other miscellaneous things

In [18]:
rouge_metric = load_metric("rouge")

### Training loop 

In [19]:
n_train_epochs = 200
n_train_steps = n_train_epochs * len(dl)
plot_grads = False

In [20]:
progress_bar = tqdm(range(n_train_steps))
for epoch in range(n_train_epochs): 
    logger.info(f"Now on epoch {epoch} of {n_train_epochs}")
    if not pp_model.training: pp_model.train()
    for i, data in enumerate(dl): 
        if i % 10 == 0 :   logging.info(f"Now processing batch {i} out of {len(dl)}")
        loss, reward, pp_logp = training_step(data) 
        if plot_grads: plot_grad_flow(pp_model.named_parameters())
 
    
#       if i == 0: 
#             print(label)
#             print(text)
#             print(pp_text)
#             print("reward: ", reward)
#             print("logp: ", pp_logp)
#             print("p", p)
#             print("loss: ", loss)

        # For debugging
        # print_info_on_generated_text()
        
        # Useful link: 
        # https://discuss.huggingface.co/t/generation-probabilities-how-to-compute-probabilities-of-output-scores-for-gpt2/3175
        # might be helpful?
        # https://discuss.huggingface.co/t/showing-individual-token-and-corresponding-score-during-beam-search/3735/5 
        
        progress_bar.update(1)            
    
    
    # Evaluation loop 
    train_set_preds = get_vm_preds_for_dl(dl = simple_dl)
    test_set_preds  = get_vm_preds_for_dl(dl = simple_dl_test)
    avg_prob_diff_train = get_avg_prob_diff(train_set_preds)
    avg_prob_diff_test  = get_avg_prob_diff(test_set_preds)
    print("Train paraphrases:", [o['pp'] for o in train_set_preds])
    print("Train avg prob diff:", avg_prob_diff_train)
    print("Test paraphrases:",  [o['pp'] for o in test_set_preds])
    print("Test avg prob diff:",  avg_prob_diff_test)

HBox(children=(FloatProgress(value=0.0, max=800.0), HTML(value='')))

Now on epoch 0 of 200


['I like this movie']
['I like this movie.']
VM score:  -0.1223558783531189
ROUGE score: 1.0
Reward: -0.1223558783531189
['I do not like this movie']
["I don't like this movie"]
VM score:  -0.007324695587158203
ROUGE score: 0.6666666666666666
Reward: -0.004883130391438802
['I love this apple']
['I am a fan of apples.']
VM score:  0.10344105958938599
ROUGE score: 0.2
Reward: 0.0206882119178772
['I hate this apple']
["I don't like apples."]
VM score:  -0.07796370983123779
ROUGE score: 0.22222222222222224
Reward: -0.017325268851386178
['I like this movie']
['I am a fan of this movie.']
VM score:  -0.10696166753768921
ROUGE score: 0.5454545454545454
Reward: -0.058342727747830475
['I do not like this movie']
["I don't think this is a good movie."]
VM score:  -0.02734243869781494
ROUGE score: 0.4
Reward: -0.010936975479125977
['I love this apple']
['I am a fan of this apple.']
VM score:  0.055120646953582764
ROUGE score: 0.5454545454545454
Reward: 0.03006580742922696
['I hate this apple']
["

Now on epoch 1 of 200


['I hate this film']
["I don't like the film."]
VM score:  -0.06831562519073486
ROUGE score: 0.4
Reward: -0.027326250076293947
Train paraphrases: ['I am a fan of this movie.', "I don't think this is a good movie.", 'I am a fan of this apple.', "I don't like apples."]
Train avg prob diff: -0.014134791162278917
Test paraphrases: ['I am a fan of this banana.', "I don't like the banana.", 'I am a fan of the film.', "I don't like the film."]
Test avg prob diff: -0.029961455139246855
['I like this movie']
['I am a fan of this movie.']
VM score:  -0.10696166753768921
ROUGE score: 0.5454545454545454
Reward: -0.058342727747830475
['I do not like this movie']
["I don's not a 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932. 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932 888-276-5932s."]
VM score:  0.16212129592895508
ROUGE score: 0.07142857142857142
Reward: -9999
['I love this apple']
['This is an apple I am fo

Now on epoch 2 of 200


['I hate this film']
["I don't like the film."]
VM score:  -0.06831562519073486
ROUGE score: 0.4
Reward: -0.027326250076293947
Train paraphrases: ['This is a good movie.', 'This is a movie that I do not like.', 'This is an apple that I am fond of.', "I don't like the apple."]
Train avg prob diff: -0.03837504448034825
Test paraphrases: ['This is a good banana.', 'This banana is not something I would enjoy.', 'This is a great film.', "I don't like the film."]
Test avg prob diff: -0.047200328584701294
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This is a movie that I do not enjoy.']
VM score:  -0.01855146884918213
ROUGE score: 0.4
Reward: -0.007420587539672852
['I love this apple']
['I love this apple']
VM score:  0.0
ROUGE score: 1.0
Reward: 0.0
['I hate this apple']
["This is the apple I don't like."]
VM score:  -0.011995315551757812
ROUGE score: 0.333333333333

Now on epoch 3 of 200


Train paraphrases: ['This is a good movie.', 'This is a movie that I do not like.', 'This is an apple that I am fond of.', 'This is an apple that I am not fond of.']
Train avg prob diff: -0.03150416929206569
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I enjoy.', 'This is a film that I really enjoy.', 'This film is terrible.']
Test avg prob diff: -0.04018362182559389
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This is a movie is not a movie is not a movie that I do not a thing.']
VM score:  0.058525681495666504
ROUGE score: 0.25
Reward: 0.014631420373916626
['I love this apple']
['I am a fan of the apple']
VM score:  0.1364840269088745
ROUGE score: 0.36363636363636365
Reward: 0.049630555239590736
['I hate this apple']
['This is an apple that I am against.']
VM score:  0.06447285413742065
ROUGE score: 0.3333333333333

Now on epoch 4 of 200


Train paraphrases: ['This is a good movie.', 'This is a movie that I do not like.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03400466503241124
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I enjoy.', 'This is a film that I really enjoy.', 'This film is terrible.']
Test avg prob diff: -0.04018362182559389
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This movie is not a movie that I am a fan of.']
VM score:  0.003063023090362549
ROUGE score: 0.2222222222222222
Reward: 0.0006806717978583442
['I love this apple']
['This is a good apple.']
VM score:  -0.06512105464935303
ROUGE score: 0.4444444444444445
Reward: -0.028942690955268014
['I hate this apple']
['I am not a fan of the apple.']
VM score:  -0.02688354253768921
ROUGE score: 0.3333333333333333
Reward: -0.008961180845896

Now on epoch 5 of 200


['I hate this film']
['This film is terrible.']
VM score:  -0.024498701095581055
ROUGE score: 0.5
Reward: -0.012249350547790527
Train paraphrases: ['This is a good movie.', 'This is a movie that I do not like.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03400466503241124
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I like.', 'This is a film that I really enjoy.', 'This film is terrible.']
Test avg prob diff: -0.04046352065248645
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This is a movie I do not like.']
VM score:  -0.02467930316925049
ROUGE score: 0.5714285714285715
Reward: -0.014102458953857424
['I love this apple']
['This is a very popular apple.']
VM score:  0.03097224235534668
ROUGE score: 0.4
Reward: 0.012388896942138673
['I hate this apple']
['This is the worst 

Now on epoch 6 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I like.', 'This is a film that I really enjoy.', 'This film is terrible.']
Test avg prob diff: -0.04046352065248645
['I like this movie']
['This is a movie I enjoy.']
VM score:  -0.22076785564422607
ROUGE score: 0.4
Reward: -0.08830714225769043
['I do not like this movie']
['This movie is not good.']
VM score:  -0.032944321632385254
ROUGE score: 0.3636363636363636
Reward: -0.011979753320867363
['I love this apple']
['This apple is very much in my affection.']
VM score:  -0.07053142786026001
ROUGE score: 0.3333333333333333
Reward: -0.023510475953420002
['I hate this apple']
['I am against this apple.']
VM score:  0.011028468608856201
ROUGE score: 0.6666666666666665
Reward: 0.007352312405904132
['I like this movie']


Now on epoch 7 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I like.', 'This is a film that I really enjoy.', 'This film is terrible.']
Test avg prob diff: -0.04046352065248645
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This movie is not for me.']
VM score:  0.014312028884887695
ROUGE score: 0.3333333333333333
Reward: 0.0047706762949625645
['I love this apple']
['This apple is very special to me.']
VM score:  -0.0828101634979248
ROUGE score: 0.36363636363636365
Reward: -0.030112786726518112
['I hate this apple']
['This apple is terrible.']
VM score:  -0.023667335510253906
ROUGE score: 0.5
Reward: -0.011833667755126953
['I like this movie']
['This i

Now on epoch 8 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I like.', 'This film is very good.', 'This film is terrible.']
Test avg prob diff: -0.03862207819994976
['I like this movie']
['This is a movie I enjoy.']
VM score:  -0.22076785564422607
ROUGE score: 0.4
Reward: -0.08830714225769043
['I do not like this movie']
['I do not think that the movie is good.']
VM score:  -0.007422924041748047
ROUGE score: 0.5333333333333333
Reward: -0.003958892822265625
['I love this apple']
['This is an apple that I am very fond of.']
VM score:  -0.044003427028656006
ROUGE score: 0.28571428571428575
Reward: -0.012572407722473146
['I hate this apple']
['This apple is not my favorite.']
VM score:  -0.06080150604248047
ROUGE score: 0.4
Reward: -0.024320602416992188
['I like this movie']
['T

Now on epoch 9 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I like.', 'This film is very good.', 'This film is terrible.']
Test avg prob diff: -0.03862207819994976
['I like this movie']
['This is a movie that I like.']
VM score:  -0.16161924600601196
ROUGE score: 0.36363636363636365
Reward: -0.05877063491127708
['I do not like this movie']
['I do not like the movie.']
VM score:  -0.022276580333709717
ROUGE score: 0.8333333333333334
Reward: -0.0185638169447581
['I love this apple']
['This is an apple that I am fond of.']
VM score:  -0.029483497142791748
ROUGE score: 0.30769230769230765
Reward: -0.009071845274705153
['I hate this apple']
['The apple is a bad thing.']
VM score:  -0.07750153541564941
ROUGE score: 0.2
Reward: -0.015500307083129883
['I like this movie']
['This is

Now on epoch 10 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is appealing to me.', 'This banana is not something I like.', 'This film is very good.', 'This film is terrible.']
Test avg prob diff: -0.0436030573824532
['I like this movie']
['This is a movie I really enjoy.']
VM score:  -0.21944540739059448
ROUGE score: 0.36363636363636365
Reward: -0.07979832996021617
['I do not like this movie']
['This movie is not a thing that I like.']
VM score:  -0.014815151691436768
ROUGE score: 0.26666666666666666
Reward: -0.0039507071177164715
['I love this apple']
['This is an apple that I am very fond of.']
VM score:  -0.044003427028656006
ROUGE score: 0.28571428571428575
Reward: -0.012572407722473146
['I hate this apple']
['I am not a fan of this apple.']
VM score:  -0.02823275327682495
ROUGE score: 0.5
Reward: -0.014116376638412476
['I like

Now on epoch 11 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is appealing to me.', 'This banana is not something I like.', 'This film is very good.', 'This film is terrible.']
Test avg prob diff: -0.0436030573824532
['I like this movie']
['I think this is a good movie.']
VM score:  -0.16912537813186646
ROUGE score: 0.5454545454545454
Reward: -0.09225020625374533
['I do not like this movie']
['This movie is not for me.']
VM score:  0.014312028884887695
ROUGE score: 0.3333333333333333
Reward: 0.0047706762949625645
['I love this apple']
['The apple is so beautiful.']
VM score:  -0.08370804786682129
ROUGE score: 0.22222222222222224
Reward: -0.018601788414849177
['I hate this apple']
['...........................)....)........)...................']
VM score:  0.3562314808368683
ROUGE score: 0.0
Reward: -9999
['I like this movie']
['This

Now on epoch 12 of 200


Train paraphrases: ['This is a good movie.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This apple is terrible.']
Train avg prob diff: -0.03390708489295764
Test paraphrases: ['This banana is appealing to me.', 'This banana is not something I like.', 'This film is very good.', 'This film is terrible.']
Test avg prob diff: -0.0436030573824532
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This movie is not a good one.']
VM score:  -0.041653454303741455
ROUGE score: 0.30769230769230765
Reward: -0.012816447478074292
['I love this apple']
['This apple is very attractive.']
VM score:  -0.07470154762268066
ROUGE score: 0.4444444444444445
Reward: -0.03320068783230252
['I hate this apple']
['This is an apple.']
VM score:  0.17380160093307495
ROUGE score: 0.5
Reward: 0.08690080046653748
['I like this movie']
['This is a movie that I enjoy.']
V

Now on epoch 13 of 200


Train paraphrases: ['This is a movie that I enjoy.', 'This movie is not a good one.', 'This is an apple that I am fond of.', 'This is an apple that I hate.']
Train avg prob diff: -0.020871550886781068
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something I like.', 'This film is very good.', 'This film is terrible.']
Test avg prob diff: -0.03862207819994976
['I like this movie']
['This is a movie I really like.']
VM score:  -0.10713887214660645
ROUGE score: 0.36363636363636365
Reward: -0.038959589871493255
['I do not like this movie']
['I do not like the movie at all.']
VM score:  -0.018328487873077393
ROUGE score: 0.7142857142857143
Reward: -0.013091777052198137
['I love this apple']
['This is a very good apple.']
VM score:  -0.06340444087982178
ROUGE score: 0.4
Reward: -0.02536177635192871
['I hate this apple']
['This apple is terrible']
VM score:  0.05136460065841675
ROUGE score: 0.5
Reward: 0.025682300329208374
['I like this movie']
['This is a mo

Now on epoch 14 of 200


Train paraphrases: ['This is a movie that I enjoy.', 'This movie does not suit me.', 'This is an apple that I love.', 'This is an apple that I hate.']
Train avg prob diff: -0.024619737357804268
Test paraphrases: ['This banana is very attractive to me.', 'This banana is not something that I like.', 'This film is a great one.', 'This film is terrible.']
Test avg prob diff: -0.041648988677309706
['I like this movie']
['This is a good movie.']
VM score:  -0.22928935289382935
ROUGE score: 0.4444444444444445
Reward: -0.10190637906392416
['I do not like this movie']
['This movie does not suit me.']
VM score:  -0.031046271324157715
ROUGE score: 0.3333333333333333
Reward: -0.01034875710805257
['I love this apple']
['The apple is the most beautiful one of them']
VM score:  -0.05860048532485962
ROUGE score: 0.15384615384615383
Reward: -0.009015459280747633
['I hate this apple']
['The apple is terrible']
VM score:  0.0333707332611084
ROUGE score: 0.25
Reward: 0.0083426833152771
['I like this movie

Now on epoch 15 of 200


Train paraphrases: ['This is a movie that I enjoy', 'This movie does not suit me.', 'This is an apple that I love.', 'This is an apple that I hate.']
Train avg prob diff: -0.022936886910236244
Test paraphrases: ['This banana is very attractive to me.', 'This banana does not appeal to me.', 'This film is a great one.', 'This film is terrible']
Test avg prob diff: -0.03656857539932211
['I like this movie']
['I am a fan of the movie.']
VM score:  -0.08677077293395996
ROUGE score: 0.36363636363636365
Reward: -0.0315530083396218
['I do not like this movie']
['The movie is a flop.']
VM score:  -0.03399837017059326
ROUGE score: 0.1818181818181818
Reward: -0.006181521849198774
['I love this apple']
['The apple I love the most.']
VM score:  -0.05879467725753784
ROUGE score: 0.4
Reward: -0.023517870903015138
['I hate this apple']
['This is an apple that I hate.']
VM score:  0.06186741590499878
ROUGE score: 0.36363636363636365
Reward: 0.022497242147272285
['I like this movie']
['This is a good mo

Now on epoch 16 of 200


Train paraphrases: ['This is a good movie.', 'This movie does not suit me.', 'This is an apple that I love.', 'This is an apple that I hate.']
Train avg prob diff: -0.029072543888381035
Test paraphrases: ['This banana is very attractive to me.', 'This banana does not appeal to me.', 'This film is a great one.', 'This film is terrible']
Test avg prob diff: -0.03656857539932211
['I like this movie']
['This is a movie I really like.']
VM score:  -0.10713887214660645
ROUGE score: 0.36363636363636365
Reward: -0.038959589871493255
['I do not like this movie']
['the the the the the the the the the the']
VM score:  0.1765972375869751
ROUGE score: 0.0
Reward: -9999
['I love this apple']
['This is a very good apple.']
VM score:  -0.06340444087982178
ROUGE score: 0.4
Reward: -0.02536177635192871
['I hate this apple']
['This is an unpopular fruit.']
VM score:  -0.07865965366363525
ROUGE score: 0.22222222222222224
Reward: -0.01747992303636339
['I like this movie']
['This is a movie that I enjoy']
V

Now on epoch 17 of 200


Train paraphrases: ['This is a movie that I enjoy', 'This movie does not suit me.', 'This is an apple that I love.', 'This is an apple that I hate.']
Train avg prob diff: -0.022936886910236244
Test paraphrases: ['This banana is delicious.', 'This banana does not appeal to me.', 'This film is a great one.', 'This film is terrible']
Test avg prob diff: -0.04998359095591765
['I like this movie']
['Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger Krueger']
VM score:  0.28871920704841614
ROUGE score: 0.0
Reward: -9999
['I do not like this movie']
['I do not like the movie']
VM score:  0.0029

Now on epoch 18 of 200


Train paraphrases: ['This is a movie that I enjoy', 'This movie does not suit me.', 'This is an apple that I love.', 'This is an unpopular fruit for me.']
Train avg prob diff: -0.03142636020978292
Test paraphrases: ['This is a banana that I like.', 'This banana does not appeal to me.', 'This film is so good', 'This film is terrible']
Test avg prob diff: -0.022198194048853002
['I like this movie']
['This is a movie that I like.']
VM score:  -0.16161924600601196
ROUGE score: 0.36363636363636365
Reward: -0.05877063491127708
['I do not like this movie']
['This movie does not suit me.']
VM score:  -0.031046271324157715
ROUGE score: 0.3333333333333333
Reward: -0.01034875710805257
['I love this apple']
['This is anPruning']
VM score:  0.6007619053125381
ROUGE score: 0.28571428571428575
Reward: 0.17164625866072522
['I hate this apple']
['This isunsubscribeunsubscribeunsubscribes']
VM score:  0.2335471510887146
ROUGE score: 0.3333333333333333
Reward: 0.07784905036290486
['I like this movie']
['

Now on epoch 19 of 200


['I hate this film']
['This film is repulsive to me.']
VM score:  -0.04636508226394653
ROUGE score: 0.4
Reward: -0.018546032905578613
Train paraphrases: ['This is a movie I like', 'This movie does not suit me.', 'This is an amazing-looking-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-', 'This is an unpopularunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscribeunsubscri

Now on epoch 20 of 200


['I hate this film']
['This film offends me.']
VM score:  -0.058479487895965576
ROUGE score: 0.5
Reward: -0.029239743947982788
Train paraphrases: ['This is aUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquitiUbiquiti', 'This movie does not suit me.', 'This is anIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifiableIdentifia

Now on epoch 21 of 200


['I hate this film']
['This film is repulsive to me.']
VM score:  -0.04636508226394653
ROUGE score: 0.4
Reward: -0.018546032905578613
Train paraphrases: ['This movie isWARN', 'This movie does not suit me.', 'This is an strait I am fond of', 'This is the worst-performing-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-unsubscribed-']
Train avg prob diff: -2499.714618661219
Test paraphrases: ['This is a good-looking banana.', 'This is a repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsively repulsive', 'This film isAbruzzoAbruzzoAbruzzo', 'This film is repulsive to m

Now on epoch 22 of 200


Train paraphrases: ['The movie is like a favourite of mine.', 'The movie does not make me like it.', 'The design of the Anst-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je-je', 'The culprit of my dislike of the This is an Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible']
Train avg prob diff: -4999.509079879239
Test paraphrases: ["Theologis It's a Like This Banana", 'The only thing I do not like about the This is a Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible

Now on epoch 23 of 200


['I hate this film']
['The film is not film I am fond of.']
VM score:  -0.03658264875411987
ROUGE score: 0.15384615384615383
Reward: -0.005628099808326134
Train paraphrases: ['The movie is like me.', 'The movie does not make me like it.', 'The design of the An-W-D-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-W-', 'The Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible Terrible']
Train avg prob diff: -4999.499717903516
Test paraphrases: ['The A-listers like the A-listers.', "The only thing I do not like about the This Banana is it's colour."

Now on epoch 24 of 200


['I hate this film']
['The film is unpopular with me.']
VM score:  -0.07438349723815918
ROUGE score: 0.2
Reward: -0.014876699447631836
Train paraphrases: ['The movie is like "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome" and "Scatterdome"', 'The movie does not make me like it.', 'The Rockefellers are The Rockefellers I am fond of This Rockefellers', 'The Rockefellers I Hate The Worst']
Train avg prob diff: -2499.7518238184
Test paraphrases: ['The A-Rod has a Likes This Banana', 'The A-Rod does not like the A-Rod.', 'The film isAbruzzoAbruzzoAbruzzo film', 'The film is unpopular with me.']
Test avg prob diff: 0.03807154049475988
['I like this movie']
['The movie is Cheerleaders']
VM score:  0.4526684582233429
ROUGE score: 0.25
Reward: 0.11316711455583572
['I do not like this movie']
["The movie do't make me like it."]
VM score:  -0.03899586200714111
ROUGE scor

Now on epoch 25 of 200


['I hate this film']
['The film is unpopular with me.']
VM score:  -0.07438349723815918
ROUGE score: 0.2
Reward: -0.014876699447631836
Train paraphrases: ['The movie is Regency and I like Regency Regency.', '"Do Not Like" movie does not make me like it', 'The Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller Rockefeller', 'I am anti-poli sTENT']
Train avg prob diff: -2499.737

Now on epoch 26 of 200


['I hate this film']
['A film I am against.']
VM score:  -0.047292232513427734
ROUGE score: 0.22222222222222224
Reward: -0.010509385002983943
Train paraphrases: ['lique Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency Regency', '"ReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflectReflect

Now on epoch 27 of 200


['I hate this film']
['AReflect s tites to me andggan s tites to tites to tites to tites to tites to tites to tites to tites to tites to tites to tites to ']
VM score:  0.17531061172485352
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego Montego', 'Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inter Inte

Now on epoch 28 of 200


['I hate this film']
['A Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock Hawks Hemlock']
VM score:  0.33231788873672485
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol A menthol', 'AFarmer chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chariot chari

['I love this film']
['turquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoise']
VM score:  0.5432286858558655
ROUGE score: 0.0
Reward: -9999


Now on epoch 29 of 200


['I hate this film']
['turquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoiseturquoise']
VM score:  0.21070879697799683
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['chrysDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzleDrizzle

['I do not like this banana']
['rebates Hawks rfg Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer Barometer']
VM score:  0.42962485551834106
ROUGE score: 0.0
Reward: -9999
['I love this film']
['Hawks rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates reb

Now on epoch 30 of 200


['I hate this film']
['Hawks rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates']
VM score:  0.3611094057559967
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks', 'Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Hawks Ha

['I do not like this banana']
['abscond customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising customising']
VM score:  0.4427923858165741
ROUGE score: 0.0
Reward: -9999
['I love this film']
['rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebate

Now on epoch 31 of 200


['I hate this film']
['rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates rebates']
VM score:  0.36325305700302124
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['rebatesumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumab', 'abscondcherryumabumabumabumab abscondcherrycherryumabumab abscondcherrycherryumab abscondcherrycherryumab abscondcherrycherryumab abscondcherrycherryumab abscondcherrycherryumab abscondcher

['I love this film']
['umabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumab']
VM score:  0.5432286858558655
ROUGE score: 0.0
Reward: -9999


Now on epoch 32 of 200


['I hate this film']
['umabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumab']
VM score:  0.21070879697799683
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['koskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoskikoski', 'Creative backpacksumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumabumab', 'CreativeNev08BrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBrisbaneBr

['I do not like this banana']
['CreativeNev bachelorette metabolites baha DOT bachelorette metabolites baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha baha']
VM score:  0.3898860216140747
ROUGE score: 0.0
Reward: -9999
['I love this film']
['Craft President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President President Presi

Now on epoch 33 of 200


['I hate this film']
['Craft backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks backpacks']
VM score:  0.2325792908668518
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['bachelorette metabolites metabolites bachelorette metabolites bachelorette metabolites bachelorette metabolites bachelorette bachelorette metabolites bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorette bachelorett

['I love this apple']
['DOT Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi']
VM score:  0.33684664964675903
ROUGE score: 0.0
Reward: -9999
['I hate this apple']
['DOT Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi Bondi']
VM score:  0.3477071225643158
ROUGE score: 0.0
Reward: -9999
['I like this banana']
['Bondi Mann Bondi Mann Bondi Mann Bondi Mann Bondi Mann Bondi Mann Bondi Mann Bondi Mann Bondi Mann Bondi Man

Now on epoch 34 of 200


['I hate this film']
['feng Toomey bachelorette euphemism euphemism bachelorette euphemism bachelorette euphemism bachelorette DOT DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT bachelorette DOT']
VM score:  0.16889441013336182
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann Mann', 'feng Toomey bachelorette metabolites bachelorette bachelorette bachelorette bachelorette bachelo

['I do not like this banana']
['feng irradiat bachelorette metabolites patios feng indemnification bachelorette metabolites patios feng indemnification bachelorette metabolites patios feng indemnification bachelorette metabolites patios feng indemnification bachelorette metabolites patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette patios bachelorette']
VM score:  0.3158366084098816
ROUGE score: 0.0
Reward: -9999
['I love this film']
['fenghinahinahinahinahinaarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarangarang']
VM s

Now on epoch 35 of 200


['I hate this film']
['feng choker choker choker choker choker choker choker choker choker patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios']
VM score:  0.285810649394989
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['fenghina psychosis psychosis psychosis psychosis psychosis psychosis psychosis psychosis psychosis psychosis psychosis psychosis psychosis maladies psychosis psychosis psychosis psychosis psychosis psychosis psychosis maladies psychosis bachelorette patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios patios', 'feng viaduct 

['I love this film']
['PwChinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahina']
VM score:  0.5432286858558655
ROUGE score: 0.0
Reward: -9999


Now on epoch 36 of 200


['I hate this film']
['PwC metaboliteshinahinahina Renter PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC']
VM score:  0.29792845249176025
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['PwC metaboliteshinahinahinahinahinahinahina PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC', 'feng PwC PwC feng PwC feng PwC feng PwC feng PwC feng PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC feng PwC PwC PwC PwC PwC PwC PwC PwC PwC', 'PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC 

Now on epoch 37 of 200


['I hate this film']
['viaduct PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC']
VM score:  0.3399659991264343
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['hinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahinahina', 'viaduct PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC', 'Sava DJ PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC PwC Pw

Now on epoch 38 of 200


['I hate this film']
['unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt unspoilt']
VM score:  0.2590203285217285
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['dome crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass outcome outcome outcome outcome outcome outcome outcome outcome outcome outcome outcome', 

['I do not like this banana']
['Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam']
VM score:  0.2765302062034607
ROUGE score: 0.0
Reward: -9999
['I love this film']
['avu crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass']
VM score:  0.5412418842315674
ROUGE score: 0.0
Reward: -9999


Now on epoch 39 of 200


['I hate this film']
['Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam']
VM score:  0.24840664863586426
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['dome crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass crass Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam', 'Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam F

Now on epoch 40 of 200


['I hate this film']
['Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam']
VM score:  0.24840664863586426
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['Foam crass crass crass crass crass crass crass crass crass crass crass crass crass crass Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam', 'CET Foam DTS Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam Foam', 'Foam Foam

Now on epoch 41 of 200


['I hate this film']
['hura redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment']
VM score:  0.2332592010498047
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['woolBDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABDABD

['I love this apple']
['LIDAR (200Breaking superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded superseded']
VM score:  0.42569640278816223
ROUGE score: 0.0
Reward: -9999
['I hate this apple']
['SSIS climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic climactic

Now on epoch 42 of 200


['I hate this film']
['avi SHA effortlessly redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment redevelopment']
VM score:  0.24501538276672363
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['wool adapters chub adapters chub adapters chub adapters chub adapters chub adapters chub adapters reb

['I do not like this movie']
['Chariot SFX Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.42920249700546265
ROUGE score: 0.0
Reward: -9999
['I love this apple']
['SSIS Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Star

Now on epoch 43 of 200


['I hate this film']
['Chariot effortlessly Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.40202596783638
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['wool vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran vashikaran Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight S

['I hate this apple']
['SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX Shaolin Entertainment buffet buffet buffet buffet buffet buffet buffet buffet buffet buffet buffet buffet buffet buffet']
VM score:  0.29989922046661377
ROUGE score: 0.0
Reward: -9999
['I like this banana']
['sniper Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.0231437

Now on epoch 44 of 200


['I hate this film']
['sniper Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.39305272698402405
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX', 'SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX 

['I love this film']
['SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX']
VM score:  0.4719335734844208
ROUGE score: 0.0
Reward: -9999


Now on epoch 45 of 200


['I hate this film']
['sniper BMC Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.38134104013442993
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX', 'SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SF

['I love this film']
['Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.35841768980026245
ROUGE score: 0.0
Reward: -9999


Now on epoch 46 of 200


['I hate this film']
['bayou Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.398833304643631
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX', 'bayou effortlessly BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC 

['I do not like this banana']
['bayou ISPs Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.4073123633861542
ROUGE score: 0.0
Reward: -9999
['I love this film']
['Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight S

Now on epoch 47 of 200


['I hate this film']
['effortlessly Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight Starlight']
VM score:  0.4173228144645691
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX SFX Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad', 'feng Starlight BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC 

['I do not like this banana']
['Chopra ISPs Hung CMO Twins Hung CMO Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad Dad']
VM score:  0.263283908367157
ROUGE score: 0.0
Reward: -9999
['I love this film']
['AFS BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC']
VM score:  0.4506858289241791
ROUGE score: 0.0
Reward: -9999


Now on epoch 48 of 200


['I hate this film']
['effortlessly BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC']
VM score:  0.3237658143043518
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['AFS BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC', 'AFS BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC', 'effortlessly Directorate Starlight Directorate Starlight Directorate MCX effortlessly Dad effortlessly Dad effortlessly Dad effortlessly Dad effortlessly Dad effortlessly Dad effortl

Now on epoch 49 of 200


['I hate this film']
['AFS BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC']
VM score:  0.3032516837120056
ROUGE score: 0.0
Reward: -9999
Train paraphrases: ['AFS BMC Harrington UHD BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington BMC Harrington Harrington Harrington Harrington Harrington Harrington Harrington Harrington Harrington Harrington Tall EX Harrington Tall Tall Tall Tall Tall Tall Tall Tall Tall Tall Tall Tall Tall', 'AFS BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC BMC', 'MCX Dir

Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/home/tproth/Programs/miniconda/envs/nlp_env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-db80026f50d6>", line 32, in <module>
    train_set_preds = get_vm_preds_for_dl(dl = simple_dl)
  File "<ipython-input-12-24bcb25ecc2c>", line 7, in get_vm_preds_for_dl
    generated, pp_text = get_paraphrases(text,
  File "<ipython-input-6-963a2d27d681>", line 8, in get_paraphrases
    generated = pp_model.generate_with_grad(**batch,
  File "/data/tproth/travis_attack/utils.py", line 307, in generate_with_grad
    return self.greedy_search(
  File "/home/tproth/Programs/miniconda/envs/nlp_env/lib/python3.8/site-packages/transformers/generation_utils.py", line 1273, in greedy_search
    outputs = self(
  File "/home/tproth/Programs/miniconda/envs/nlp_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
   

TypeError: object of type 'NoneType' has no len()

## Testing and debugging 

### Verifying that the weights update each training step 

In [None]:
def check_parameters_update(dl): 
    """
    This checks which parameters are being updated. 
    We run one forward pass+backward pass (updating the parameters once) 
    and look at which ones change. 
    """
    # Check which parameters should be updated
    params_with_grad = [o for o in pp_model.named_parameters() if o[1].requires_grad]
    print("---- Parameters with 'requires_grad' and their sizes ------")
    for (name, p) in params_with_grad:  print(name, p.size())
        
    ## Take a step and see which weights update
    params_all = [o for o in pp_model.named_parameters()]  # this is updated by a training step    
    params_all_initial = [(name, p.clone()) for (name, p) in params_all]  # Initial values
        
    # take a step    
    loss, reward, pp_logp = training_step(data)
    
    print("\n---- Matrix norm of parameter update for one step ------\n")
    for (_,old_p), (name, new_p) in zip(params_all_initial, params_all): 
        print (name, torch.norm(new_p - old_p).item()) 
check_parameters_update(dl)

## Code scraps 

### Experiments around plotting average parameter updates 

In [None]:
def get_parameter_group_dict(): 
    """Function to create "groups" of parameters. This is useful to check how much a group of 
    parameters updates at an epoch. 
    Parameter groups are hardcoded into this code for now. 
    """
    # Identify which parameters should be grouped together
    isolates = ['model.shared.weight',"model.encoder.embed_positions.weight", "model.encoder.layer_norm",
                "model.decoder.embed_positions.weight", "model.decoder.layer_norm"]
    layers_base = ["model.encoder.layers", "model.decoder.layers"]
    def flatten_list(l): return list(np.concatenate(l).flat)
    layers = flatten_list([[lyr + "." + str(o) +"." for o in list(range(16))] for lyr in layers_base])
    parameter_groups = layers + isolates
    # Sort the parameter groups by the order they appear in the model 
    all_params = [name for name,_ in pp_model.named_parameters()]
    ordering = [np.min(np.where([pg in o for o in all_params])) for pg in parameter_groups]
    parameter_groups = [o for _,o in sorted(zip(ordering, parameter_groups))]
    # Assign each model parameter a parameter group 
    group_d = dict()
    for pg in parameter_groups: 
        name = pg[:-1] if pg in layers else pg  # remove the "." from the end of the name for the numeric layers
        group_d[name] = [o for o in all_params if pg in o]
    return group_d

In [None]:
def get_parameter_update_amount(): 
    group_d = get_parameter_group_dict()
    params_all_initial_d = dict(params_all_initial)
    params_all_d = dict(params_all)
    group_d = get_parameter_group_dict()
    df_d = dict()
    for k,param_l in group_d.items(): 
        l = list()
        for p in param_l: 
            l.append((params_all_initial_d[p] - params_all_d[p]).abs().flatten())
        l = torch.cat(l).cpu().detach().numpy()  # list of 1-d tensors to tensor and then to numpy
        df_d[k] = pd.DataFrame(l).describe().values.flatten()
    df = pd.DataFrame(df_d)
    df.index = pd.DataFrame([1,2,3]).describe().index
    return df 

In [None]:
## Random code snippets

# initial_params = [(name, p.detach().clone()) for (name, p) in pp_model.named_parameters()]
# loss, reward, pp_logp = training_step(data) 
# update_d =  dict()
# for (_,old_p), (name, new_p) in zip(initial_params, pp_model.named_parameters()): 
#     update_d[name] = torch.abs(old_p - new_p).detach().flatten()     
    
#             update_d =  dict()
#             for (_,old_p), (name, new_p) in zip(initial_params, pp_model.named_parameters()): 
#                 update_d[name] = torch.abs(old_p - new_p).flatten() 
#                 print (name, torch.norm(new_p - old_p).item())  
            
#             group_d = get_parameter_group_dict()
#             initial_params_d,current_params_d = dict(initial_params),dict()
#             params_all_d = dict(params_all)
#             group_d = get_parameter_group_dict()
#             df_d = dict()
#             for k,param_l in group_d.items(): 
#                 l = list()
#                 for p in param_l: 
#                     l.append((params_all_initial_d[p] - params_all_d[p]).abs().flatten())
#                 l = torch.cat(l).cpu().detach().numpy()  # list of 1-d tensors to tensor and then to numpy
#                 df_d[k] = pd.DataFrame(l).describe().values.flatten()
#             df = pd.DataFrame(df_d)
#             df.index = pd.DataFrame([1,2,3]).describe().index

### Generating a paraphrase dataset and getting VM predictions for it

In [None]:
def create_paraphrase_dataset(batch, cname_input, cname_output, num_beams=32,
                              num_return_sequences=32): 
    """Create paraphrases for each example in the batch. Then repeat the other fields 
        so that the resulting datase has the same length as the number of paraphrases. 
        Key assumption is 
        that the same number of paraphrases is created for each example.
        batch: a dict of examples used by the `map` function from the dataset
        cname_input: What column to create paraphrases of 
        cname_output: What to call the column of paraphrases
        other parameters - passed to get_paraphrases. """
    
    # Generate paraphrases. 
    # This can be later extended to add diversity or so on. 
    #set_trace()
    pp_l,probs = get_paraphrases(batch[cname_input], num_beams=num_beams,
        num_return_sequences=num_return_sequences)
    
    # To return paraphrases as a list of lists for batch input (not done here but might need later)
    #     split_into_sublists = lambda l,n: [l[i:i + n] for i in range(0, len(l), n)]
    #     pp_l = split_into_sublists(pp_l, n_seed_seqs)
    batch[cname_output] = pp_l 
    batch["probs"] = probs.to('cpu').numpy()
    
    # Repeat each entry in all other columns `num_return_sequences` times so they are the same length
    # as the paraphrase column
    # Only works if the same number of paraphrases is generated for each phrase. 
    # Else try something like 
        # for o in zip(*batch.values()):
        #     d = dict(zip(batch.keys(), o))
        #     get_paraphrases(batch[cname_input],num_return_sequences=n_seed_seqs,num_beams=n_seed_seqs)
        #     for k,v in d.items(): 
        #       return_d[k] += v if k == 'text' else [v for o in range(n_paraphrases)]
        # return return_d
    return_d = defaultdict(list) 
    repeat_each_item_n_times = lambda l,n: [o for o in l for i in range(n)]
    for k in batch.keys(): 
        if   k == cname_output: return_d[k] = batch[cname_output]
        elif k == "probs"     : return_d[k] = batch["probs"]
        else:                   return_d[k] = repeat_each_item_n_times(batch[k], num_return_sequences)
    return return_d 

In [None]:
def get_vm_scores(ds_pp, cname_orig, cname_pp, cname_label='label', 
                  use_metric=False, monitor=False): 
    """Get victim model preds+probs for the paraphrase dataset.
    """
    assert vm_model.training == False  # checks that model is in eval mode 
    if use_metric: 
        metric_d = {}
        metric_d['orig'],metric_d['pp'] = load_metric('accuracy'),load_metric('accuracy')
    orig_probs_l,pp_probs_l = [],[]
    if monitor: monitor = Monitor(2)  # track GPU usage and memory
    
    def get_vm_preds(x): 
        """Get predictions for a vector x (here a vector of documents/text). 
        Works for a sentiment-analysis dataset (needs to be adjusted for NLI tasks)"""
        inputs = vm_tokenizer(x, padding=True, truncation=True, return_tensors="pt")
        inputs.to(device)
        outputs = vm_model(**inputs, labels=labels)
        probs = outputs.logits.softmax(1).cpu()
        preds = probs.argmax(1)
        return probs, preds
       
    print("Getting victim model predictions for both original and paraphrased text.")
    dl = DataLoader(ds_pp, batch_size=batch_size, shuffle=False, 
                    num_workers=n_wkrs, pin_memory=True)
    with torch.no_grad():
        for i, data in enumerate(dl): 
            if i % 50 == 0 : print("Now processing batch", i, "out of", len(dl))
            labels,orig,pp = data['label'].to(device),data[cname_orig],data[cname_pp]
            orig_probs, orig_preds = get_vm_preds(orig)            
            pp_probs,   pp_preds   = get_vm_preds(pp)    
            orig_probs_l.append(orig_probs); pp_probs_l.append(pp_probs)
            if use_metric: 
                metric_d['orig'].add_batch(predictions=orig_preds, references=labels)
                metric_d['pp'].add_batch(  predictions=pp_preds,   references=labels)
    if monitor: monitor.stop()
    def list2tensor(l): return torch.cat(l)
    orig_probs_t,pp_probs_t = list2tensor(orig_probs_l),list2tensor(pp_probs_l)
    if use_metric: return orig_probs_t, pp_probs_t, metric_d
    else:          return orig_probs_t, pp_probs_t, None

In [None]:
### Generate paraphrase dataset
num_beams = 10
num_return_sequences = 3
cname_input = 'text' # which text column to paraphrase
cname_output= cname_input + '_pp'
date = '20210825'
fname = path_cache + '_rt_train'+ date + '_' + str(num_return_sequences)
if os.path.exists(fname):  
    ds_pp = datasets.load_from_disk(fname)
else:
    ds_pp = train.shard(200, 0, contiguous=True)
    # Have to call with batched=True
    # Need to set a batch size otherwise will run out of memory on the GPU card. 
    # 64 seems to work well 
    ds_pp = ds_pp.map(
        lambda x: create_paraphrase_dataset(x, 
            num_beams=num_beams, num_return_sequences=num_return_sequences,
            cname_input=cname_input, cname_output=cname_output),
        batched=True, batch_size=4) 
    ds_pp.save_to_disk(fname)
    gc.collect(); torch.cuda.empty_cache() # free up most of the GPU memory

In [None]:
### Get predictions
cname_orig = cname_input
cname_pp = cname_output
cname_label = 'label'
print_metric = True
fname = path_cache + 'results_df_'+ date + "_" + str(num_return_sequences) + ".csv"
if os.path.exists(fname):    results_df = pd.read_csv(fname)
else: 
    #sim_score_t = generate_sim_scores()
    orig_probs_t,pp_probs_t,metric_d = get_vm_scores(ds_pp, cname_orig, 
                                                     cname_pp, cname_label,
                                                     monitor=True, use_metric=print_metric)
    if print_metric: 
        print("orig vm accuracy:",       metric_d['orig'].compute())
        print("paraphrase vm accuracy:", metric_d['pp'].compute())
    vm_orig_scores  = torch.tensor([r[idx] for idx,r in zip(ds_pp[cname_label], orig_probs_t)])
    vm_pp_scores    = torch.tensor([r[idx] for idx,r in zip(ds_pp[cname_label], pp_probs_t)])
    results_df = pd.DataFrame({
                  cname_orig: ds_pp[cname_orig],
                  cname_pp: ds_pp[cname_pp],
   #               'sim_score': sim_score_t,
                  'label_true': ds_pp[cname_label], 
                  'label_vm_orig': orig_probs_t.argmax(1),
                  'label_vm_pp': pp_probs_t.argmax(1),
                  'vm_orig_truelabel': vm_orig_scores,             
                  'vm_pp_truelabel': vm_pp_scores,
                  'vm_truelabel_change': vm_orig_scores - vm_pp_scores,
                  'vm_orig_class0': orig_probs_t[:,0], 
                  'vm_orig_class1': orig_probs_t[:,1], 
                  'vm_pp_class0': pp_probs_t[:,0], 
                  'vm_pp_class1': pp_probs_t[:,1], 
                  })
#    results_df['vm_truelabel_change_X_sim_score'] = results_df['vm_truelabel_change'] * results_df['sim_score']
    results_df.to_csv(fname, index_label = 'idx')

### Testing how to keep gradients with `generate` functions

In [None]:
### Testing the `generate_with_grad` function

input_text="hello my name is Tom"
num_return_sequences=1
num_beams=2
return_probs=True
batch = pp_tokenizer(input_text, truncation=True, padding='longest', return_tensors="pt").to(device)
generated = pp_model.generate_with_grad(**batch, return_dict_in_generate=True, output_scores=True,
                              num_return_sequences=num_return_sequences,
                                num_beams=num_beams,
                                num_beam_groups=1,
                                diversity_penalty=0,
                                temperature=1.5, 
                              length_penalty=1)
print(generated)

tgt_text = pp_tokenizer.batch_decode(generated.sequences, skip_special_tokens=True)
print(pp_tokenizer.tokenize(tgt_text[0]))
print(pp_tokenizer.encode(tgt_text[0]))

# Score: score = sum_logprobs / (hyp.shape[-1] ** self.length_penalty)
# gradient gets removed (i think) by the line 
# beam_hyp.add(
#   input_ids[batch_beam_idx].clone(),
#   next_score.item())


x=generated['scores'][5]
print(x.max(1))
x.max(1).values / (len(generated['scores']) ** 0.8)

In [None]:
## An example of how to use greedy_search

# from transformers import (
# AutoTokenizer,
# AutoModelForCausalLM,
# LogitsProcessorList,
# MinLengthLogitsProcessor,
# )

# tokenizer = AutoTokenizer.from_pretrained("gpt2")
# model = AutoModelForCausalLM.from_pretrained("gpt2")

# # set pad_token_id to eos_token_id because GPT2 does not have a EOS token
# model.config.pad_token_id = model.config.eos_token_id

# input_prompt = "Today is a beautiful day, and"
# input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids

# # instantiate logits processors
# logits_processor = LogitsProcessorList([
#     MinLengthLogitsProcessor(15, eos_token_id=model.config.eos_token_id),
# ])

# outputs = model.greedy_search(input_ids, logits_processor=logits_processor)

# print("Generated:", tokenizer.batch_decode(outputs, skip_special_tokens=True))

### Tensorboard setup 

In [None]:

# from torch.utils.tensorboard import SummaryWriter
# import datetime 
# # Create writer and track to run directory 
# path_runs = './runs/'
# log_dir = path_runs + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + "/"
# writer = SummaryWriter(log_dir = log_dir)
# # stuff here logging to tensorboard
# #writer.close() # important otherwise Tensorboard eventually shuts down
