## GPT Paragraph Similarity using LSTM- head

GPT gives good features for sentence embeddings. These embeddings seem to be separated well between in-domain and out-of-domain topics when measured using cosine similarity.

Paragraph embeddings can be constructed using a linear combination of sentence embeddings. When a naive summing of embeddings was performed, the model failed to construct a reliable paragraph embedding. On tweaking the algorithm to perform summed aggregation of embeddings on groups of sentences such that their combined length was less than the max permissible length of the model, better results were observed. It was noticed however that the last set of sentences seemed to influence the paragraph the most and would skew the results of the paragraph embedding comparison (using cosine similarity metric).

There are a few possible solutions to this problem:
1. Use a different metric.
 - Not explored much.
2. Divide the paragraph equally into chunks and then feed them into the model before aggregating
 - Improves scores but last sentence bias is not completely negated.
3. Use an additional neural network as an aggregator of these sentence embeddings in order to learn paragraph embeddings in a non-linear space. These networks (possibly LSTM based) could be trained on the objective to learn paragraph features from sentence features based on cosine similarity loss.
 - Unidirectional LSTM was prone to bias of last sentence. The bias reduced after shifting to a bidirectional LSTM. The Bi-LSTM was trained by performing cosine similarity between outputs and next/previous inputs for forward/backward cells. Bi-LSTM bi-sequential loss calculation gave the best results.
4. Train GPT as a language model in order to remove influence of last sentence on the score.
 - The GPT LM model with an LSTM head is averse to addition of non-domain topics at the end of the paragraph but does not capture context as well as the GPT with Multi Choice Head model, hence was eliminated for consideration of final approach.


In [1]:
%matplotlib inline

In [2]:
import argparse
import os
import csv
import random
import logging
from tqdm import tqdm, trange, tqdm_notebook
from math import ceil
import numpy as np
import torch
import torch.nn as nn
from itertools import combinations, product
from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
                              TensorDataset)
from pytorch_pretrained_bert import (OpenAIGPTDoubleHeadsModel, OpenAIGPTTokenizer,
                                     OpenAIAdam, cached_path, WEIGHTS_NAME, CONFIG_NAME)
from pytorch_pretrained_bert.modeling_openai import OpenAIGPTPreTrainedModel,OpenAIGPTDoubleHeadsModel,OpenAIGPTConfig,OpenAIGPTModel,OpenAIGPTLMHead

from scipy.spatial.distance import cosine, cityblock

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.INFO)
logger = logging.getLogger(__name__)

In [3]:
class OpenAIGPTLMHead_custom(nn.Module):
    """ Language Model Head for the transformer """

    def __init__(self, model_embeddings_weights, config):
        super(OpenAIGPTLMHead_custom, self).__init__()
        self.n_embd = config.n_embd
        self.vocab_size = config.vocab_size
        self.predict_special_tokens = config.predict_special_tokens
        embed_shape = model_embeddings_weights.shape
        #print("shape check",(model_embeddings_weights[1]))
        self.decoder = nn.Linear(embed_shape[1], embed_shape[0], bias=False)
        self.set_embeddings_weights(model_embeddings_weights)

    def set_embeddings_weights(self, model_embeddings_weights, predict_special_tokens=True):
        self.predict_special_tokens = predict_special_tokens
        embed_shape = model_embeddings_weights.shape
        self.decoder.weight = model_embeddings_weights  # Tied weights

    def forward(self, hidden_state):
#         print('decoder weight')
#         print((hidden_state.shape))
        lm_logits = self.decoder(hidden_state)
#         print(lm_logits.shape)
        if not self.predict_special_tokens:
            lm_logits = lm_logits[..., :self.vocab_size]
#             print("lm_logits.shape: ",lm_logits.shape)
        return lm_logits

class OpenAIGPTMultipleChoiceHead_custom(nn.Module):
    """ Classifier Head for the transformer """

    def __init__(self, config):
        super(OpenAIGPTMultipleChoiceHead_custom, self).__init__()
        self.n_embd = config.n_embd
        self.dropout = nn.Dropout2d(config.resid_pdrop)  # To reproduce the noise_shape parameter of TF implementation
        self.linear = nn.Linear(config.n_embd, 1)

        nn.init.normal_(self.linear.weight, std=0.02)
        nn.init.normal_(self.linear.bias, 0)

    def forward(self, hidden_states, mc_token_ids):
        # Classification logits
        # hidden_state (bsz, num_choices, seq_length, hidden_size)
        # mc_token_ids (bsz, num_choices)
        mc_token_ids = mc_token_ids.unsqueeze(-1).unsqueeze(-1).expand(-1, -1, -1, hidden_states.size(-1))
        multiple_choice_h = hidden_states.gather(2, mc_token_ids).squeeze(2)
        return multiple_choice_h

class OpenAIGPTDoubleHeadsModel_custom(OpenAIGPTPreTrainedModel):
    """
    OpenAI GPT model with a Language Modeling and a Multiple Choice head ("Improving Language Understanding by Generative Pre-Training").
    OpenAI GPT use a single embedding matrix to store the word and special embeddings.
    Special tokens embeddings are additional tokens that are not pre-trained: [SEP], [CLS]...
    Special tokens need to be trained during the fine-tuning if you use them.
    The number of special embeddings can be controled using the `set_num_special_tokens(num_special_tokens)` function.
    The embeddings are ordered as follow in the token embeddings matrice:
        [0,                                                         ----------------------
         ...                                                        -> word embeddings
         config.vocab_size - 1,                                     ______________________
         config.vocab_size,
         ...                                                        -> special embeddings
         config.vocab_size + config.n_special - 1]                  ______________________
    where total_tokens_embeddings can be obtained as config.total_tokens_embeddings and is:
        total_tokens_embeddings = config.vocab_size + config.n_special
    You should use the associate indices to index the embeddings.
    Params:
        `config`: a OpenAIGPTConfig class instance with the configuration to build a new model
        `output_attentions`: If True, also output attentions weights computed by the model at each layer. Default: False
        `keep_multihead_output`: If True, saves output of the multi-head attention module with its gradient.
            This can be used to compute head importance metrics. Default: False
    Inputs:
        `input_ids`: a torch.LongTensor of shape [batch_size, num_choices, sequence_length] with the BPE token
            indices selected in the range [0, total_tokens_embeddings[
        `mc_token_ids`: a torch.LongTensor of shape [batch_size, num_choices] with the index of the token from
            which we should take the hidden state to feed the multiple choice classifier (usually last token of the sequence)
        `position_ids`: an optional torch.LongTensor with the same shape as input_ids
            with the position indices (selected in the range [0, config.n_positions - 1[.
        `token_type_ids`: an optional torch.LongTensor with the same shape as input_ids
            You can use it to add a third type of embedding to each input token in the sequence
            (the previous two being the word and position embeddings).
            The input, position and token_type embeddings are summed inside the Transformer before the first
            self-attention block.
        `lm_labels`: optional language modeling labels: torch.LongTensor of shape [batch_size, num_choices, sequence_length]
            with indices selected in [-1, 0, ..., total_tokens_embeddings]. All labels set to -1 are ignored (masked), the loss
            is only computed for the labels set in [0, ..., total_tokens_embeddings]
        `multiple_choice_labels`: optional multiple choice labels: torch.LongTensor of shape [batch_size]
            with indices selected in [0, ..., num_choices].
        `head_mask`: an optional torch.Tensor of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
            It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
    Outputs:
        if `lm_labels` and `multiple_choice_labels` are not `None`:
            Outputs a tuple of losses with the language modeling loss and the multiple choice loss.
        else: a tuple with
            `lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, num_choices, sequence_length, total_tokens_embeddings]
            `multiple_choice_logits`: the multiple choice logits as a torch.FloatTensor of size [batch_size, num_choices]
    Example usage:
    ```python
    # Already been converted into BPE token ids
    input_ids = torch.LongTensor([[[31, 51, 99], [15, 5, 0]]])  # (bsz, number of choice, seq length)
    mc_token_ids = torch.LongTensor([[2], [1]]) # (bsz, number of choice)
    config = modeling_openai.OpenAIGPTOpenAIGPTMultipleChoiceHead_customOpenAIGPTMultipleChoiceHead_customConfig()
    model = modeling_openai.OpenAIGPTDoubleHeadsModel(config)
    lm_logits, multiple_choice_logits = model(input_ids, mc_token_ids)
    ```
    """

    def __init__(self, config, output_attentions=False, keep_multihead_output=False):
        super(OpenAIGPTDoubleHeadsModel_custom, self).__init__(config)
        self.transformer = OpenAIGPTModel(config, output_attentions=False,
                                             keep_multihead_output=keep_multihead_output)
        self.lm_head = OpenAIGPTLMHead_custom(self.transformer.tokens_embed.weight, config)
        self.multiple_choice_head = OpenAIGPTMultipleChoiceHead_custom(config)
        self.apply(self.init_weights)

    def set_num_special_tokens(self, num_special_tokens, predict_special_tokens=True):
        """ Update input and output embeddings with new embedding matrice
            Make sure we are sharing the embeddings
        """
        #self.config.predict_special_tokens = self.transformer.config.predict_special_tokens = predict_special_tokens
        self.transformer.set_num_special_tokens(num_special_tokens)
        self.lm_head.set_embeddings_weights(self.transformer.tokens_embed.weight, predict_special_tokens=predict_special_tokens)

    def forward(self, input_ids, mc_token_ids, lm_labels=None, mc_labels=None, token_type_ids=None,
                position_ids=None, head_mask=None):
        hidden_states = self.transformer(input_ids, position_ids, token_type_ids, head_mask)
        if self.transformer.output_attentions:
            all_attentions, hidden_states = hidden_states
        
        hidden_states = hidden_states[-1] #layer #
        lm_logits = self.lm_head(hidden_states)
        # No input to Multi-Choice head as it gives same output as hidden_states[pos_of_clf_token] during inference 
        
#         losses = []
#         if lm_labels is not None:
#             shift_logits = lm_logits[..., :-1, :].contiguous()
#             shift_labels = lm_labels[..., 1:].contiguous()
#             loss_fct = nn.CrossEntropyLoss(ignore_index=-1)
#             losses.append(loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)))
        return lm_logits, hidden_states #

In [4]:
def accuracy(out, labels):
    outputs = np.argmax(out, axis=1)
    return np.sum(outputs == labels)

def listRightIndex(alist, value):
    return len(alist) - alist[-1::-1].index(value) -1

def pre_process_datasets(encoded_datasets, input_len, cap_length, start_token, delimiter_token, clf_token):
    """ Pre-process datasets containing lists of story

        To Transformer inputs of shape (n_batch, n_sentence, length) comprising for each batch:
        input_ids[batch,n_sentence, :] = [start_token] + story[:cap_length] + [clf_token]
    """

    tensor_datasets = []
    for dataset in encoded_datasets:
        n_batch = ceil(len(dataset[0][0])/cap_length)
        input_ids = np.zeros((n_batch, 1, input_len), dtype=np.int64)
        
        mc_token_ids = np.zeros((n_batch, 1), dtype=np.int64)
        lm_labels = np.full((n_batch, 1, input_len), fill_value=-1, dtype=np.int64)
        mc_labels = np.zeros((n_batch,), dtype=np.int64)
        i = 0
        init_pos = 0
        end_pos = cap_length
        for story, cont1, cont2, mc_label in dataset:
            if n_batch!=0:
                if n_batch==1:
                    with_cont1 = [start_token] + story[:cap_length] + [clf_token]
                    input_ids[i, 0, :len(with_cont1)] = with_cont1
                    mc_token_ids[i, 0] = len(with_cont1) - 1
                    lm_labels[i, 0, :len(with_cont1)] = with_cont1
                    mc_labels[i] = mc_label
                    i+=1
                else:
                    while i!=n_batch and end_pos<len(story):
                        try:

                            end_pos = init_pos + listRightIndex(story[init_pos:end_pos],story[-1])
                        except ValueError:

                            end_pos = init_pos+story[init_pos:].index(story[-1])

                        with_cont1 = [start_token] + story[init_pos:end_pos+1] + [clf_token]

                        input_ids[i, 0, :len(with_cont1)] = with_cont1
                        mc_token_ids[i, 0] = len(with_cont1) - 1
                        lm_labels[i, 0, :len(with_cont1)] = with_cont1
                        mc_labels[i] = mc_label
                        i+=1
                        init_pos = end_pos+1
                        end_pos = min(init_pos+cap_length-1,len(story))
        all_inputs = (input_ids, mc_token_ids, lm_labels, mc_labels)
        tensor_datasets.append(tuple(torch.tensor(t) for t in all_inputs))
    return tensor_datasets

def load_rocstories_dataset(dataset_path):
    """ Output a list of tuples(story, 1st continuation, 2nd continuation, label) """
    with open(dataset_path, encoding='utf_8') as f:
        f = csv.reader(f)
        output = []
        next(f) # skip the first line
        for line in tqdm(f):
            output.append(('.'.join(line[0 :4]), line[4], line[5], int(line[-1])))
    return output

def tokenize_and_encode(obj):
    """ Tokenize and encode a nested object """
    if isinstance(obj, str):
        return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj))
    elif isinstance(obj, int):
        return obj
    return list(tokenize_and_encode(o) for o in obj)

In [5]:
def pre_process_datasets_cos(encoded_datasets, input_len, cap_length,start_token, delimiter_token, clf_token):
    """ Pre-process datasets containing lists of stories(paragraphs)

        To Transformer inputs of shape (n_batch, n_sentences, length) comprising for each batch, continuation:
        input_ids[batch, alternative, :] = [start_token] + story[:cap_length] + [full_stop_id] + [clf_token]
    """
#     print("clf_token",clf_token)
    tensor_datasets = []
    for dataset in encoded_datasets:
        #print(dataset)
        n_batch = len(dataset)
        input_ids = np.zeros((n_batch, 5, input_len), dtype=np.int64)
        mc_token_ids = np.zeros((n_batch, 5), dtype=np.int64)
        for i, stories in enumerate(dataset):
            sents=[]
            story = stories[0]
            size = len(story) 
            idx_list = [idx + 1 for idx, val in enumerate(story) if val == 239] 
            res = [story[i: j] for i, j in zip([0] + idx_list, idx_list + \
                                                   ([size] if idx_list[-1] != size else []))] 
            
            
            for sent in res:
#                 print("sent",sent,cap_length)
                sents.append([start_token] + sent[:cap_length]+[239] + [clf_token])


            for j in range(len(sents)):
                input_ids[i, j,:len(sents[j])] = sents[j]
                mc_token_ids[i,j] = len(sents[j]) - 1
        all_inputs = (input_ids, mc_token_ids)
        
        tensor_datasets.append(tuple(torch.tensor(t) for t in all_inputs))
    return tensor_datasets

In [49]:
## Defining constants over here
seed = 42 
model_name = 'openai-gpt'
do_train = False
output_dir = '/home/shubham/projects/domain_minds/gpt-experiment/model/'
train_batch_size = 1
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
logger.info("device: {}, n_gpu {}".format(device, n_gpu))

special_tokens = ['_start_', '_delimiter_', '_classify_']
tokenizer = OpenAIGPTTokenizer.from_pretrained(model_name, special_tokens=special_tokens)
special_tokens_ids = list(tokenizer.convert_tokens_to_ids(token) for token in special_tokens)

model1 = OpenAIGPTDoubleHeadsModel_custom.from_pretrained(output_dir)
tokenizer = OpenAIGPTTokenizer.from_pretrained(output_dir)
model1.to(device)
model1.eval()
tokenizer = OpenAIGPTTokenizer.from_pretrained(output_dir)
logger.info("Ready to encode dataset...")

def feature_extractor(model1,text):
    trn_dt = ([text,'','',0],)   
    datasets = (trn_dt,)
    encoded_datasets = tokenize_and_encode(datasets)
    # Compute the max input length for the Transformer
#     max_length = min(510,ceil(len(encoded_datasets[0][0][0])/ 2)) # For multisentence inputs

    max_length = model1.config.n_positions//2 - 2

    input_length = len(encoded_datasets[0][0][0])+2 # +2 for start and clf token


    input_length = min(input_length, model1.config.n_positions)  # Max size of input for the pre-trained model

    # Prepare inputs tensors and dataloaders
    n_batches = ceil(len(encoded_datasets[0][0][0])/max_length)

    tensor_datasets = pre_process_datasets(encoded_datasets, input_length, max_length, *special_tokens_ids)

    train_tensor_dataset = tensor_datasets[0]
    
    train_data = TensorDataset(*train_tensor_dataset)
    train_dataloader = DataLoader(train_data, batch_size=1)
    '''
    config = OpenAIGPTConfig.from_json_file('/home/shubham/Project/domain_mind/gpt2_experiment/model/config.json')
    model1 = OpenAIGPTMultipleChoiceHead_custom(config)
    '''
    #eval_loss, eval_accuracy = 0, 0
    #nb_eval_steps, nb_eval_examples = 0, 0
    final_clf=[]
    final_lm=[]
    for batch in train_dataloader:
        batch = tuple(t.to(device) for t in batch)
        input_ids, mc_token_ids, lm_labels, mc_labels = batch
        with torch.no_grad():
            a, clf_text_feature = model1(input_ids, mc_token_ids, lm_labels, mc_labels)
            final_clf.append(clf_text_feature[:,:,-1])
    if n_batches>1:
        clf_torch = torch.sum(torch.stack(final_clf),0)
        return clf_torch
    else:
        return clf_text_feature[:,:,-1,:]#, lm_text_feature


08/06/2019 13:09:38 - INFO - __main__ -   device: cuda, n_gpu 2
08/06/2019 13:09:39 - INFO - pytorch_pretrained_bert.tokenization_openai -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-vocab.json from cache at /home/ether/.pytorch_pretrained_bert/4ab93d0cd78ae80e746c27c9cd34e90b470abdabe0590c9ec742df61625ba310.b9628f6fe5519626534b82ce7ec72b22ce0ae79550325f45c604a25c0ad87fd6
08/06/2019 13:09:39 - INFO - pytorch_pretrained_bert.tokenization_openai -   loading merges file https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-merges.txt from cache at /home/ether/.pytorch_pretrained_bert/0f8de0dbd6a2bb6bde7d758f4c120dd6dd20b46f2bf0a47bc899c89f46532fde.20808570f9a3169212a577f819c845330da870aeb14c40f7319819fce10c3b76
08/06/2019 13:09:39 - INFO - pytorch_pretrained_bert.tokenization_openai -   Special tokens {'_start_': 40478, '_delimiter_': 40479, '_classify_': 40480}
08/06/2019 13:09:39 - INFO - pytorch_pretrained_bert.modeling_openai -   l

In [50]:
def load_rocstories_dataset(dataset_path):
    """ Output a list of tuples(story, 1st continuation, 2nd continuation, label) """
    with open(dataset_path, encoding='utf_8') as f:
        f = csv.reader(f)
        output = []
        next(f) # skip the first line
        for line in tqdm(f):
            output.append(('.'.join(line[0 :4]), line[4], line[5], int(line[-1])))
    return output

In [51]:
train_dataset = '/home/ether/Desktop/gpt_experiments/data/data_para_se_5sent.csv'
import pandas as pd
train_dataset = pd.read_csv(train_dataset,index_col=0)
encoded_datasets = tokenize_and_encode((train_dataset.drop("Num_sentences",axis=1).values,))

max_length = model1.config.n_positions // 2 - 2

input_length = max_length+5
# # Prepare inputs tensors and dataloaders
tensor_datasets = pre_process_datasets_cos(encoded_datasets, input_length, max_length,*special_tokens_ids)
train_tensor_dataset = tensor_datasets[0]

train_data = TensorDataset(*train_tensor_dataset)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=train_batch_size)


In [101]:
# # Uni/ Bi-LSTM unisequential code
# # for uni, make bidirectional False for self.lstm and set hidden to sizes(1,1,768)
# # for bidirectional unisequential, there is only one loss backprop. 
# #    make bidirectional True for self.lstm and set hidden to sizes(2,1,768). Uncomment code For Bidirectional.
# class LSTM_Head(nn.Module):
#     def __init__(self):
#         super(LSTM_Head, self).__init__()
#         self.lstm = nn.LSTM(768,768,batch_first=True,bidirectional=False)
#         self.linear = nn.Linear(768*2,768)
#     def forward(self,input_embeds,mc_token_ids=None,infer=False):
#         hidden  = (torch.zeros((1,1,768),device=device), \
#                    torch.zeros((1,1,768),device=device))
        
#         cosloss = nn.CosineSimilarity(dim=-1)
#         m = nn.Softmax()
#         loss = 0
#         hidden_states=[]
#         for i in range(len(input_embeds)):
#             if not infer:
                
# #                 prev_hid,prev_cst = hidden # For Bidirectional
#                 out, hidden = self.lstm(input_embeds[i][mc_token_ids[i].item()].view(1,1,-1),hidden)
# #                 hid = torch.sum(torch.stack([hidden[0],prev_hid]),0) # For Bidirectional
# #                 cst = torch.sum(torch.stack([hidden[1],prev_cst]),0) # For Bidirectional
# #                 hidden=(hid,cst) # For Bidirectional
# #                 out = self.linear(out) # For Bidirectional
#                 if i!=len(input_embeds)-1:
#                     loss += 1 - cosloss(out,input_embeds[i+1][mc_token_ids[i+1]])
 
#             else:
#                 # During inference the last output of last lstm cell is considered as paragraph embedding
#                 out, hidden = self.lstm(input_embeds[i].view(1,1,-1),hidden)
# #                 out = self.linear(out) # For Bidirectional inference


#         if infer:
#             return out
        
#         loss = loss/(len(input_embeds)-1)
#         return loss

In [10]:
#Bi-LSTM bi-sequential code for truebi files
class LSTM_Head(nn.Module):
    def __init__(self):
        super(LSTM_Head, self).__init__()
        self.lstm = nn.LSTM(768,768,bidirectional=True)
        self.linear = nn.Linear(768*2,768)
    def forward(self,input_embeds,mc_token_ids=None,infer=False):
        hidden  = (torch.zeros((2,1,768),device=device), \
                   torch.zeros((2,1,768),device=device))
        # For Cosine Distance
        cosloss = nn.CosineSimilarity(dim=-1)
        loss = 0
        if not infer:
            inputs=torch.cat([input_embeds[i][mc_token_ids[i].item()] for i in range(len(input_embeds))]).view(len(input_embeds),1,-1)
            out, hidden = self.lstm(inputs,hidden)
            
            lossf=0
            lossb=0
            outs = out.view(5,2,-1)
            for i in range(len(inputs)):
                if i!=len(inputs)-1:
                    # Forward loss claculated as 1-cosloss(current_cell_output,next_cell_input)
                    lossf += 1-cosloss(outs[i,0],inputs[i+1])
#                     lossf += cosloss(outs[i,0],inputs[i+1]).acos()/np.pi # Making cosine between (0,1)
                if i!=0:
                    # Backward loss claculated as 1-cosloss(current_cell_output,previous_cell_input)
                    lossb += 1-cosloss(outs[i,1],inputs[i-1])
#                     lossb += cosloss(outs[i,1],inputs[i-1]).acos()/np.pi # Making cosine between (0,1)
            lossf = lossf/(len(inputs)-1)
            lossb = lossb/(len(inputs)-1)
            loss = (lossf+lossb)/2

            return loss,lossf,lossb
        else:
            # During inference, output of first lstm_cell(reverse direction) and last lstm_cell(forward direction)
            # are concatenated to give the paragraph embedding
            out, hidden = self.lstm(input_embeds.view(len(input_embeds),1,-1),hidden)
            return hidden[0].view(1,1,-1)
        

In [14]:
model1.eval()
model = LSTM_Head()
# state_dict = torch.load("../models/lstmheadSGD_bi_mcpos_real_ep2.pt")
# model.load_state_dict(state_dict)
model.to(device)
model.train()
print()

LSTM_Head(
  (lstm): LSTM(768, 768, bidirectional=True)
  (linear): Linear(in_features=1536, out_features=768, bias=True)
)

In [15]:
# TRAINING
num_train_epochs = 10
optimizer = torch.optim.SGD(model.parameters(),lr = 1e-5)
for i in tqdm_notebook(range(num_train_epochs)):
    tr_loss = 0
    nb_tr_steps = 0
    tqdm_bar = tqdm_notebook(train_dataloader, desc="Training")
    for step, batch in enumerate(tqdm_bar):
        batch = tuple(t.to(device) for t in batch)
        input_ids, mc_token_ids= batch
        with torch.no_grad():
            _, sent_feats = model1(input_ids,mc_token_ids) 
        loss, lossf,lossb= model.forward(sent_feats[0], mc_token_ids[0])
        loss.backward()
        optimizer.step()

        optimizer.zero_grad()
        nb_tr_steps += 1
        tqdm_bar.desc = "Training losses: {:.2e} {:.2e} {:.2e}".format(loss.item(),lossf.item(),lossb.item()) 
    torch.save(model.state_dict(), "/home/ether/Desktop/gpt_experiments/models/lstmheadSGD_truebi_mcpos_torchcos_ep"+str(i)+".pt")

HBox(children=(IntProgress(value=0, max=10), HTML(value='')))

HBox(children=(IntProgress(value=0, description='Training', max=13266, style=ProgressStyle(description_width='…

HBox(children=(IntProgress(value=0, description='Training', max=13266, style=ProgressStyle(description_width='…

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



HBox(children=(IntProgress(value=0, description='Training', max=13266, style=ProgressStyle(description_width='…

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

IOPub message rate exceed

HBox(children=(IntProgress(value=0, description='Training', max=13266, style=ProgressStyle(description_width='…

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



HBox(children=(IntProgress(value=0, description='Training', max=13266, style=ProgressStyle(description_width='…

IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



### Testing Ground for LSTM head-based paragraph embeddings

In [16]:
# Collection of paragraphs separated by "\n"
para_docker = '''
Docker is a containerization platform that packages your app and all its dependencies together in the form called a docker container to ensure that your application works seamlessly in any environment. This environment might be a production or staging server. Docker pulls the dependencies needed for your application to run from the cloud and configures them automatically. You don’t need to do any extra work. Cool Right.
Docker communicates natively with the system kernel by passing the middleman on Linux machines and even Windows 10 and Windows Server 2016 and above this means you can run any version of Linux in a container and it will run natively. Not only that Docker uses less disk space to as it is able to reuse files efficiently by using a layered file system. If you have multiple Docker images using the same base image for instance.
Imagine we already have an application running PHP 5.3 on a server and want to deploy a new application which requires PHP 7.2 on that same server. This will cause some version conflict on that server and also might cause some features in the existing application to fail.
In situations like this, we might have to use Docker to sandbox or containerise the new application to run without affecting the old application. This brings us to Docker containers.
Think of a Docker container as above image. There are multiple applications running on the same machine. These applications are put into docker containers and any changes made on these containers does not affect the other container. Each container has different Os running on the same physical machine. Docker helps you to create, deploy and run applications using containers.
A container packages up the code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.
'''

para_infra = '''
Infrastructure software is a type of enterprise software or program specifically designed to help business organizations perform basic tasks such as workforce support, business transactions and internal services and processes. The most common examples of infrastructure software are database programs, email and other communication software and security applications.
Infrastructure software is used to ensure that people and systems within an organization can connect and do their jobs properly and ensure the efficient execution of business processes, share information, as well as manage touch points with suppliers and customers. This type of software is not necessarily marketing related or used for business transactions such as selling products and services, but is more operations related, ensuring that business applications and processes can keep running effectively.
Infrastructure software can be configured to automatically alert users about best practices and relevant discoveries based on their current activities and job position. Expert systems and knowledge systems fall under this category.
Management of converged infrastructure resources is typically handled by a discrete hardware component that serves a singular purpose. While hyper-converged infrastructure systems are similar in nature to converged infrastructure systems, management of the resources is largely software-defined rather than being handled by one or more hardware components.
Human computation studies need not have extensive or complex software infrastructure. Studies can easily be run through homegrown or customized web applications, together with logging software capable of tracking the details and time of any given interaction. One productive approach for such tools might be to build a database-driven web application capable of storing appropriate demographic background information associated with each participant, along with details of each action and task completed. 
You might even add an administrative component capable of managing and enrolling prospective participants. These homegrown applications are generally not terribly difficult to construct, particularly if you have a web-based implementation of the key tasks under consideration, or were planning on building one anyway. For some tasks—particularly those involving collection of fine-grained detail or requiring complex interactions—the freedom associated with constructing your own application may be necessary to get the job done.
The infrastructure provided by Mechanical Turk and similar crowdsourcing platforms provides many advantages over “roll-your-own” designs. As any experienced HCI researcher knows well, the challenges of recruiting, enrolling, and consenting participants can consume substantial amounts of time. Even if you are able to build your own web application to do the trick, you might find that leveraging these platforms—particularly with one of the add-on libraries—might simplify your life considerably. These advantages aside, commercial crowdsourcing tools have potential downsides. 
'''

para_mark = '''
I used to find a lot of my illustrators and designers by trawling websites such as Folksy and Etsy.Funnily enough, I always prefered Folksy as an option because we were a UK-based shop, and it made a lot more sense for me to buy from UK designers.I am also a little scared sometimes by the monster that is Etsy! It’s always a minefield having to find out whether someone will ship internationally, let alone navigating tens of thousands of pages of products.
I also find a lot of people on Pinterest, Twitter, Facebook, graduate fairs, local and national exhibitions and design shows so make sure you are linked in with as many of those as you can. Things to be mindful of: The most obvious is email size, but other things to look out for are not putting any images at all in an email. If you are pitching your work, always make sure that you have pictures of this work included — quite a lot of people forget this.
This PDF therefore. It's their signature. Like when did that decision happen? Someone said that was a thing. Yeah. That's a great question. I'm not sure what exactly happened. I would say. My guess is that it happened in the age of the internet, but it's interesting that you bring that up as that is it just because somebody clicked on this PDF and that means it's a signature. I remember back in law school when I studied estate planning and development of wills and trusts and things like that for individuals there.
There is a some sort of Statute the state of Texas where if you just made if that individual just create some sort of marketing. It doesn't necessarily have to be their actual signature or even their name. It could be some sort of marking and that constitutes as a signature or you know authorization if they're granting authorization or whatever. It may be agreement contract if it is a written agreement and. 
So I think it's I think the market demanded it because of you know, faxing becoming a thing of the past somewhat. I don't know to fax machine, but I know most Law Offices do have pads machines and courts still use fax machines. And when I hear when I told the court and they don't have use email and like what like what how but it is right and I think it's it's also a matter of investing in other resources in bringing things up to date which hey and in their minds it might be a matter of if it's not broken. We're not going to fix it. You know, it works just fine. Everybody knows how we operate. We'll keep doing it until we can't be morons. Yeah because nobody's objected at this point. So yeah, I don't I don't know when that came about but that's a great question. Yeah. I find it comforting. You don't have a fax machine somehow. I just imagine that.
Ever since their executive shakeup that resulted in Instagram’s original founders being softly pushed out of their positions only to be replaced with Facebook loyalists, Instagram has been toying around with features they’re claiming will make their platform a safer place for their users — from hiding Likes to their recent anti-bullying updates.
While the intention behind these features might be well and good, the changes make the product deviate from the core things that made Instagram spread like wildfire in the first place — specifically by the platform deciding which content you see versus the end user being in control of their in-app experience.Plus, one can’t help to wonder if Facebook is using Instagram company as a buffer, or band-aid, for their own recent mishaps and privacy scandals, which have caused many users to lose faith in the platform.
'''

para_db = '''
SQL is a query language for talking to structured databases. Pretty much all databases that have tables and rows will accept SQL based queries. SQL has many flavors but the fundamentals stay the same. Professionals and amateurs alike use SQL to find, create, update and delete information from their sources of record. It can be used with a ton of different databases like MySQL, Postgres, SQL Server and Oracle. It powers the logic behind popular server-side frameworks like Ruby On Rails and Laravel. If you want to find information associated with a particular account or query what buttons users click in your app there is a good chance SQL can help you out.
Before we hop on the SQL train to Database Town I’d like to acknowledge some alternatives. You can use ORMs to query databases. ORM stands for Object Relational Mapper, which is a fancy way of saying that you can write code in a programming language like PHP or Node.js that translates to SQL queries. Popular ORMs are Active Record for Ruby On Rails, Eloquent for Laravel and Sequelize for Node.js. All of these services allow you to write code that translates to SQL under the hood. SQL is important for building applications with these technologies.
There are many databases that do not use SQL, such as MongoDB and GraphQL. These are newer technologies and not as widely adopted as relational databases. Relational databases have been around a very long time and power the majority of data storage on the internet. To fully appreciate NoSQL technologies and the reasons they came about it’s helpful to know how relational databases and SQL work.
Oracle Corporation provides a range of database cloud services on its Oracle Cloud platform that are designed for different database use cases; from test/dev deployments to small and medium sized workloads to large mission-critical workloads. Oracle Database Cloud Services are available on a choice of general purpose hardware and Exadata engineered systems, in either virtual machines environments or 'bare metal' infrastructure (now known as Oracle Cloud Infrastructure).
Moving away from your database vendor would be like cutting off a foot; self-destructive and painful. More to the point, building a me-too product and entering a full-on competition with the established leaders, is a significantly retrograde step with little tradition of success. Beyond the relational database, there are many new wrinkles that offer attractive niches such as virtual machines, bare metal servers, serverless technologies and micro apps. But I am not seeing a great deal of competition heating up in that space.
Databases are a structured system to put your data in that imposes rules upon that data, and the rules are yours, because the importance of these problems changes based on your needs. Maybe your problem is the size, while someone else has a smaller amount of data where the sensitivity is a high concern.It’s the things you can’t see that are going on in the background; the security, the enforced integrity of the data, the ability to get to it fast and get to it reliably, the robustness; serving lots of people at the same time and even correctly survive crashes and hardware issues without corrupting the data.
In practice it’s very common to have multiple databases. The database that deals with your order and customer information might be completely independent from you database that deals with human resource information. And in many organizations, you don’t just have multiple databases but multiple DBMS. Sometimes it’s because one DBMS is better at something than the other.
'''

para_news = '''
Vaibhav Pichad, while leaving the NCP, which his father Madhukar Pichad has had a long association with, claimed that his shift to the BJP is “keeping the general public’s interest in mind.” This, a senior NCP leader said, was hogwash. “Just because a certain party is winning doesn’t mean they have the public interest in mind. And to claim that someone is leaving a party keeping the public’s interest in mind is a plain lie. These politicians have all proved to be opportunists and don’t care for any party ideology or its legacy,” a senior leader said, requesting anonymity. Several decisions, particularly the one to give 16% reservation to the Maratha caste in the state, has worked in the ruling party’s favour. With the Bombay high court approving the state’s decision, sources in the BJP have indicated that several senior Maratha leaders have been warming up to the party.
When asked if the party accepts its failure in handling the situation, Nirupam said, “I would rather blame the saffron force here. Does the BJP not trust its own cadres to ensure its victory in the state? Why does it need the Congress leadership then?” Bal terms this trend as a “destructive” one. “They are in a destructive mode right now. They want to ensure there is no opposition in the state. They are sure to win the state assembly elections. But before that, the BJP wants to clear off the Congress-NCP from the state. What happened in Karnataka is also playing out in Maharashtra. In fact, it is a trend across the country. Most of these new entrants might not even have any substantial roles to play in the party. But there they are,” Bal points out.
Hindutva bigots also targeted Hindustan Unilever’s Surf Excel ad campaign #RangLayeSang, which featured a young (Hindu) girl helping a young (Muslim) boy in March of this year. Earlier that month they had also targeted a tea brand (Brook Bond) for ‘projecting the Kumbh in the wrong light’ by showing a (presumably Hindu) man deliberately attempting to abandon his father there. The troll brigade aimed to boycott all HUL products, trending the hashtag #BoycottHUL on twitter with pictures of an assortment of products they had bought in the trash. This is also not an India specific trend – across America as well, conservative groups have protested brands taking up a stance against brands that support liberal causes – even when they are as vague as the Gilette ad in January of this year.
The Planetary Society, a non-profit organisation, has been working on the LightSail programme for a decade. The project kicked off in the 1990s, but its first planned prototype, Cosmos 1, was destroyed during a faulty launch on a Russian rocket taking off from a submarine in 2005. The Planetary Society got its the next prototype, LightSail 1, into space in 2015, but technical problems kept it from climbing high enough to be steered by sunlight. The LightSail 2 spacecraft was launched on June 25 and has since been in a low-Earth orbit, according to The Verge. Last week, it deployed four triangular sails – a thin, square swath of mylar about the size of a boxing ring. After launch, engineers on the ground have been remotely adjusting the orientation of the sails to optimise the LightSail 2’s ability to harness solar photons.
Solar sailing isn’t new but the Planetary Society wanted to show that the technique could be used for smaller satellites, which are harder to manoeuvre through space. A majority of the satellites, as senior science reporter Loren Grush explained on The Verge, have to rely on thrusters to be mobile. These are “tiny engines that combust chemical propellants to push a vehicle through space.” However, this increases the cost of satellites as well as their launch mass. Smaller satellites like CubeSats cannot accommodate thrusters most of the time, nor can they be closely manoeuvred once they are in space. But with this mission, the Planetary Society has demonstrated that solar sails can guide CubeSats through space. It is set to share the data it receives from this mission to allow other groups to build on this technology. The solar sail technology could reduce the need for expensive, cumbersome rocket propellants and slash the cost of navigating small satellites in space.
Last week’s launch of the Chandrayaan-2 water-finding Moon mission is a significant demonstration of India’s scientific and engineering capacity. It puts India firmly within a select group of countries prowling the solar system for commercial, strategic, and scientific reasons. Pakistanis naturally want to know where they stand in science – of which space exploration is just a small part – and why. What gave India this enormous lead over Pakistan? It is natural that India’s Hindutva government should boast Chandrayaan-2 as its own achievement and claim continuation with imagined glories from Vedic times. But rightfully the credit goes elsewhere. Just imagine if history could be wound back by 70-80 years and Prime Minister Jawaharlal Nehru was replaced by Narendra Modi.
The atheistic Nehru brought to India an acceptance of European modernity. For this Hindutva hates him even more than it hates India’s Muslims and Christians. Still, his insistence on ‘scientific temper’ – a singularly odd phrase invented while he was still in prison – made India nurture science. Earlier, vigorous reformers like Raja Ram Mohan Roy (1772-1833) had shown the path. As long as Nehru stood tall no rishi, yogi, or army general could head a science institution Will Pakistan also get a slice of the moon? That depends upon the quality of our scientists and if a culture of science develops. Of course, Pakistan never had a Nehru. A further setback happened in the Zia ul Haq days when Sir Syed Ahmad Khan’s modernism had its remaining flesh eaten off by Allama Iqbal’s shaheen. As if to compensate the loss of appetite for science, buildings for half-a-dozen science institutions were erected along Islamabad’s Constitution Avenue. They could be closed down today and no one would notice. Today’s situation for science – every kind except agriculture and biotechnology – is dire.
'''

para_kuber = '''
Real production apps span multiple containers. Those containers must be deployed across multiple server hosts. Security for containers is multilayered and can be complicated. That's where Kubernetes can help. Kubernetes gives you the orchestration and management capabilities required to deploy containers, at scale, for these workloads. Kubernetes orchestration allows you to build application services that span multiple containers, schedule those containers across a cluster, scale those containers, and manage the health of those containers over time. With Kubernetes you can take real steps towards better IT security.
Of course, this depends on how you’re using containers in your environment. A rudimentary application of Linux containers treats them as efficient, fast virtual machines. Once you scale this to a production environment and multiple applications, it's clear that you need multiple, colocated containers working together to deliver the individual services. This significantly multiplies the number of containers in your environment and as those containers accumulate, the complexity also grows.
Kubernetes fixes a lot of common problems with container proliferation—sorting containers together into a ”pod.” Pods add a layer of abstraction to grouped containers, which helps you schedule workloads and provide necessary services—like networking and storage—to those containers. Other parts of Kubernetes help you load balance across these pods and ensure you have the right number of containers running to support your workloads.
The primary advantage of using Kubernetes in your environment, especially if you are optimizing app dev for the cloud, is that it gives you the platform to schedule and run containers on clusters of physical or virtual machines. More broadly, it helps you fully implement and rely on a container-based infrastructure in production environments. And because Kubernetes is all about automation of operational tasks, you can do many of the same things that other application platforms or management systems let you do, but for your containers.
That’s where Red Hat OpenShift comes in. OpenShift is Kubernetes for the enterprise—and a lot more. OpenShift includes all of the extra pieces of technology that makes Kubernetes powerful and viable for the enterprise, including: registry, networking, telemetry, security, automation, and services. With OpenShift, your developers can make new containerized apps, host them, and deploy them in the cloud with the scalability, control, and orchestration that can turn a good idea into new business quickly and easily.
Kubernetes runs on top of an operating system (Red Hat Enterprise Linux Atomic Host, for example) and interacts with pods of containers running on the nodes. The Kubernetes master takes the commands from an administrator (or DevOps team) and relays those instructions to the subservient nodes. This handoff works with a multitude of services to automatically decide which node is best suited for the task. It then allocates resources and assigns the pods in that node to fulfill the requested work.
The docker technology still does what it's meant to do. When kubernetes schedules a pod to a node, the kubelet on that node will instruct docker to launch the specified containers. The kubelet then continuously collects the status of those containers from docker and aggregates that information in the master. Docker pulls containers onto that node and starts and stops those containers as normal. The difference is that an automated system asks docker to do those things instead of the admin doing so by hand on all nodes for all containers.
'''

In [17]:
text_docker1 = "Docker communicates natively with the system kernel by passing the middleman on Linux machines and even Windows 10 and Windows Server 2016 and above this means you can run any version of Linux in a container and it will run natively. Not only that Docker uses less disk space to as it is able to reuse files efficiently by using a layered file system. If you have multiple Docker images using the same base image for instance Docker only keep a single copy of the files needed and share them with each container. All right. So, how do we use Docker install Docker on your machine and will provide links in the description begin with a Docker file, which can be built into a Docker image which can be run as a Docker container. Okay, let's break that down. The dockerfile is a surprisingly Simple Text document that instructs how the docker image will be built like a blueprint you first select a base image to start with using the from keyword, which you can find a container to use from the docker Hub. Like we mentioned before a bun to an Alpine Linux are popular choices.From there, you can run commands such as downloading installing and running your software of course will link the docks below once our Docker file is complete. We can build it using Docker build followed by the T flag so we can name our image and pass our commands the location of the dockerfile once complete. You can verify your images existence with Docker images. Now, you're built image can run a container of that image or you can push it to the cloud to share with others speaking of sharing with others. If you don't create your own Docker image and you just want to use a premade one in Poland from the docker hub using Docker full and the image names, you may also include a tag if one is available which may specify a version or variant of the software. If you don't specify a tag the latest version will be what statute to run a container pulled down from the docker Hub or build the image and then enter Docker run followed by the image name. There are of course many options available when running your containers such as running it in detached mode, but XD or assigning ports for web services, you can view your running containers with Docker container LS. And as you add more Bill appear here running a single container is fun, but it's annoying to enter all of these.Commands to get a container running and we may want to control several containers as part of a single application such as running an app and a database together something you might want to."
text_docker2 = "Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package. By doing so, thanks to the container, the developer can rest assured that the application will run on any other Linux machine regardless of any customized settings that machine might have that could differ from the machine used for writing and testing the code."
text_marketing = "I used to find a lot of my illustrators and designers by trawling websites such as Folksy and Etsy.\
                Funnily enough, I always prefered Folksy as an option because we were a UK-based shop, and it made a lot more sense for me to buy from UK designers.\
                I am also a little scared sometimes by the monster that is Etsy! It’s always a minefield having to find out whether someone will ship internationally, let alone navigating tens of thousands of pages of products.\
                I also find a lot of people on Pinterest, Twitter, Facebook, graduate fairs, local and national exhibitions and design shows so make sure you are linked in with as many of those as you can.\
                Things to be mindful of: The most obvious is email size, but other things to look out for are not putting any images at all in an email.\
                If you are pitching your work, always make sure that you have pictures of this work included — quite a lot of people forget this."
text_sql = "The uses of SQL include modifying database table and index structures; adding, updating and deleting rows of data; and retrieving subsets of information from within a database for transaction processing and analytics applications. Queries and other SQL operations take the form of commands written as statements -- commonly used SQL statements include select, add, insert, update, delete, create, alter and truncate.SQL became the de facto standard programming language for relational databases after they emerged in the late 1970s and early 1980s. Also known as SQL databases, relational systems comprise a set of tables containing data in rows and columns. Each column in a table corresponds to a category of data -- for example, customer name or address -- while each row contains a data value for the intersecting column."
text_se1 = "If we were to build a solution to collect and querying data related to our customers’ clinical history, probably the Software Architecture will be strongly shaped by lots of politics about how to access data, obfuscation, certificates, tracking, protocols, etc…On the other hand, if we were re-building a system because it’s unmaintainable and technologically obsolete, surely some principles about modularity, testability, new technology stack, etc… will appear.Finally, lightweight Architecture will be needed when working on a new experimental product focused on a new market niche due to the uncertainty of the product itself.Many enterprises have their own framework which implements some of the Architecture’s principles."
text_se2 = "The starting point is a bit of Software Architecture (upfront design) which is retro-feeding with the emergent design of the autonomous teams.Doing so we reach two benefits:Having a reference Architecture which helps us to build our solutionsLet the teams have a degree of innovation that, at the same time, will feed the Architecture and will allow other teams to take advantage of that.When we mean agile and autonomous teams we also refer to multi-skilled teams. Such teams are composed by dev-ops, scrum master, product owner, frontend developer, backend developer, QA, technical leader and so on."
text_nemo = "Parents need to know that even though there are no traditional bad guys in Finding Nemo, there are still some very scary moments, including large creatures with zillions of sharp teeth, the apparent death of a major character, and many tense scenes with characters in peril. And at the very beginning of the movie, Marlin's wife and all but one of their eggs are eaten by a predator -- a scene that could very well upset little kids. Expect a little potty humor amid the movie's messages of teamwork, determination, loyalty, and a father's never-ending love for his son. The issue of Nemo's stunted fin is handled exceptionally well -- matter-of-factly but frankly.Marlin's encounter with the barracuda that decimated his young family drove a permanent stake of fear through his heart. And he transfers his misgivings to his son. Instead of encouraging him to spread his wings—er, flip his fins—he shelters him to a smothering degree. This breeds anger and rebellion in Nemo and creates further unhappiness for Marlin. The film stresses the need to maintain balance in your family life and in the way you introduce your kids to the world. And an extended family of sea turtles provides insight into how steady, loving relationships can flow more smoothly."

In [47]:
# # For testing, use files:
# # 1) /home/ether/Desktop/gpt_experiments/models/lstmheadSGD_truebi_mcpos_torchcos_ep4.pt with Bi-LSTM non-seq
# # 2) /home/ether/Desktop/gpt_experiments/models/lstmheadSGD_uni_mcpos_torchcos_ep4.pt with Unidirectional
# # 3) /home/ether/Desktop/gpt_experiments/models/lstmheadSGD_bi_mcpos_real_ep2.pt with Bi-LSTM seq or non-seq
model = LSTM_Head()
state_dict = torch.load("/home/ether/Desktop/gpt_experiments/models/lstmheadSGD_truebi_mcpos_torchcos_ep4.pt")
model.load_state_dict(state_dict)
model.eval()
model.to(device)
m = nn.Sigmoid()
# for texta, textb in combinations(para_db.strip().split("\n"),2):
for texta, textb in product(para_docker.strip().split("\n"),para_kuber.strip().split("\n")):
    with torch.no_grad():
        feat1 = [feature_extractor(model1,text.strip()) for text in texta.split(".")[:-1]]
        feat2 = [feature_extractor(model1,text.strip()) for text in textb.split(".")[:-1]]
        in1 = torch.stack(feat1)
        in2 = torch.stack(feat2)
        op1 = model(in1.to(device),infer=True)
        op2 = model(in2.to(device),infer=True)
        print("#"*40,end="\n\n")
        # Cosine score of 1 means high similarity
        print("With LSTM Cosine score: ", torch.cosine_similarity(op1,op2,dim=-1).detach().cpu().item())
        # Cityblock score of 0 means high similarity
        print("With LSTM Cityblock score: ", cityblock(m(op1).detach().cpu(),m(op2).detach().cpu()))
        print("Without LSTM sum(sent_feat_vecs) Cosine score",torch.cosine_similarity(torch.sum(in1,0),torch.sum(in2,0),dim=-1).detach().cpu().item())
        print("Without LSTM sum(sent_feat_vecs) Cityblock score",cityblock(m(torch.sum(in1,0)).detach().cpu(),m(torch.sum(in2,0)).detach().cpu()))
        print("*"*40,"<Para1> ",texta,"*"*40,"<Para2> ",textb,sep="\n",end="\n\n")
#         lena = len(texta.split("."))-1
#         lenb = len(textb.split("."))-1
#         print("Lengths",lena,lenb,end="\n\n")

########################################

With LSTM Cosine score:  0.8316231369972229
With LSTM Cityblock score:  25.091522
Without LSTM sum(sent_feat_vecs) Cosine score 0.8730065226554871
Without LSTM sum(sent_feat_vecs) Cityblock score 114.95056
****************************************
<Para1> 
Docker is a containerization platform that packages your app and all its dependencies together in the form called a docker container to ensure that your application works seamlessly in any environment. This environment might be a production or staging server. Docker pulls the dependencies needed for your application to run from the cloud and configures them automatically. You don’t need to do any extra work. Cool Right.
****************************************
<Para2> 
Real production apps span multiple containers. Those containers must be deployed across multiple server hosts. Security for containers is multilayered and can be complicated. That's where Kubernetes can help. Kubernetes gives yo

########################################

With LSTM Cosine score:  0.7895971536636353
With LSTM Cityblock score:  25.401327
Without LSTM sum(sent_feat_vecs) Cosine score 0.7900435924530029
Without LSTM sum(sent_feat_vecs) Cityblock score 108.53167
****************************************
<Para1> 
Docker communicates natively with the system kernel by passing the middleman on Linux machines and even Windows 10 and Windows Server 2016 and above this means you can run any version of Linux in a container and it will run natively. Not only that Docker uses less disk space to as it is able to reuse files efficiently by using a layered file system. If you have multiple Docker images using the same base image for instance.
****************************************
<Para2> 
Kubernetes fixes a lot of common problems with container proliferation—sorting containers together into a ”pod.” Pods add a layer of abstraction to grouped containers, which helps you schedule workloads and provide necessary 

########################################

With LSTM Cosine score:  0.7630738019943237
With LSTM Cityblock score:  27.877953
Without LSTM sum(sent_feat_vecs) Cosine score 0.7721814513206482
Without LSTM sum(sent_feat_vecs) Cityblock score 124.30997
****************************************
<Para1> 
Imagine we already have an application running PHP 5.3 on a server and want to deploy a new application which requires PHP 7.2 on that same server. This will cause some version conflict on that server and also might cause some features in the existing application to fail.
****************************************
<Para2> 
The primary advantage of using Kubernetes in your environment, especially if you are optimizing app dev for the cloud, is that it gives you the platform to schedule and run containers on clusters of physical or virtual machines. More broadly, it helps you fully implement and rely on a container-based infrastructure in production environments. And because Kubernetes is all abou

########################################

With LSTM Cosine score:  0.899846613407135
With LSTM Cityblock score:  20.066689
Without LSTM sum(sent_feat_vecs) Cosine score 0.9083788394927979
Without LSTM sum(sent_feat_vecs) Cityblock score 128.54495
****************************************
<Para1> 
In situations like this, we might have to use Docker to sandbox or containerise the new application to run without affecting the old application. This brings us to Docker containers.
****************************************
<Para2> 
The docker technology still does what it's meant to do. When kubernetes schedules a pod to a node, the kubelet on that node will instruct docker to launch the specified containers. The kubelet then continuously collects the status of those containers from docker and aggregates that information in the master. Docker pulls containers onto that node and starts and stops those containers as normal. The difference is that an automated system asks docker to do those thing

Without LSTM sum(sent_feat_vecs) Cityblock score 68.809044
****************************************
<Para1> 
Think of a Docker container as above image. There are multiple applications running on the same machine. These applications are put into docker containers and any changes made on these containers does not affect the other container. Each container has different Os running on the same physical machine. Docker helps you to create, deploy and run applications using containers.
****************************************
<Para2> 
The docker technology still does what it's meant to do. When kubernetes schedules a pod to a node, the kubelet on that node will instruct docker to launch the specified containers. The kubelet then continuously collects the status of those containers from docker and aggregates that information in the master. Docker pulls containers onto that node and starts and stops those containers as normal. The difference is that an automated system asks docker to do those

########################################

With LSTM Cosine score:  0.7581173181533813
With LSTM Cityblock score:  31.532446
Without LSTM sum(sent_feat_vecs) Cosine score 0.787143349647522
Without LSTM sum(sent_feat_vecs) Cityblock score 207.9006
****************************************
<Para1> 
A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.
****************************************
<Para2> 
Real production apps span multiple containers. Those containers must be deployed across multiple server hosts. Security for containers is multilayered and can be complicated. That's where Kubernetes can help. Kubernetes gives you the orchestration and management capabilities required to deploy containers, at scale, for these workloads. Kubernetes orchestration allows you to build application services that span multiple containers, schedule those con