# This notebook is for the evaluation results of CS224N project

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%cd /content/drive/MyDrive/Clean CS224N folder

/content/drive/MyDrive/Clean CS224N folder


# Beicheng's fine-tuned T5 abstractive model
Fine-tuned on FB data set

In [3]:
!pip install sentencepiece
!pip install bert-score

Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[?25l[K     |▎                               | 10 kB 33.7 MB/s eta 0:00:01[K     |▌                               | 20 kB 23.2 MB/s eta 0:00:01[K     |▉                               | 30 kB 11.5 MB/s eta 0:00:01[K     |█                               | 40 kB 7.0 MB/s eta 0:00:01[K     |█▍                              | 51 kB 6.3 MB/s eta 0:00:01[K     |█▋                              | 61 kB 7.5 MB/s eta 0:00:01[K     |██                              | 71 kB 7.7 MB/s eta 0:00:01[K     |██▏                             | 81 kB 6.6 MB/s eta 0:00:01[K     |██▍                             | 92 kB 7.3 MB/s eta 0:00:01[K     |██▊                             | 102 kB 7.9 MB/s eta 0:00:01[K     |███                             | 112 kB 7.9 MB/s eta 0:00:01[K     |███▎                            | 122 kB 7.9 MB/s eta 0:00:01[K     |███▌       

In [4]:
!pip install transformers -q
!pip install wandb -q

[K     |████████████████████████████████| 1.7 MB 7.9 MB/s 
[K     |████████████████████████████████| 181 kB 88.3 MB/s 
[K     |████████████████████████████████| 144 kB 92.0 MB/s 
[K     |████████████████████████████████| 63 kB 1.9 MB/s 
[?25h  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [5]:
# Importing stock libraries
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
import seaborn as sns
import matplotlib.pyplot as plt

# Importing the T5 modules from huggingface/transformers
# from transformers import T5Tokenizer, T5ForConditionalGeneration
# from transformers import T5ForConditionalGeneration
# from transformers import AutoTokenizer as T5Tokenizer
# from transformers import AutoModelWithLMHead as T5ForConditionalGeneration
from transformers import AutoModelWithLMHead, AutoTokenizer
# AutoModelWithLMHead
from bert_score import score

# tokenizer = AutoTokenizer.from_pretrained("t5-base")

# model = AutoModelWithLMHead.from_pretrained("t5-base")

# WandB – Import the wandb library
import wandb

In [6]:
# Checking out the GPU we have access to. This is output is from the google colab version. 
!nvidia-smi

Mon Mar 14 04:45:26 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [7]:
# # Setting up the device for GPU usage
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

### Preparing the Dataset for data processing: Class

We will start with creation of Dataset class - This defines how the text is pre-processed before sending it to the neural network. This dataset will be used the the Dataloader method that will feed  the data in batches to the neural network for suitable training and processing. 
The Dataloader and Dataset will be used inside the `main()`.
Dataset and Dataloader are constructs of the PyTorch library for defining and controlling the data pre-processing and its passage to neural network. For further reading into Dataset and Dataloader read the [docs at PyTorch](https://pytorch.org/docs/stable/data.html)

#### *CustomDataset* Dataset Class
- This class is defined to accept the Dataframe as input and generate tokenized output that is used by the **T5** model for training. 
- We are using the **T5** tokenizer to tokenize the data in the `text` and `ctext` column of the dataframe. 
- The tokenizer uses the ` batch_encode_plus` method to perform tokenization and generate the necessary outputs, namely: `source_id`, `source_mask` from the actual text and `target_id` and `target_mask` from the summary text.
- To read further into the tokenizer, [refer to this document](https://huggingface.co/transformers/model_doc/t5.html#t5tokenizer)
- The *CustomDataset* class is used to create 2 datasets, for training and for validation.
- *Training Dataset* is used to fine tune the model: **80% of the original data**
- *Validation Dataset* is used to evaluate the performance of the model. The model has not seen this data during training. 

#### Dataloader: Called inside the `main()`
- Dataloader is used to for creating training and validation dataloader that load data to the neural network in a defined manner. This is needed because all the data from the dataset cannot be loaded to the memory at once, hence the amount of data loaded to the memory and then passed to the neural network needs to be controlled.
- This control is achieved using the parameters such as `batch_size` and `max_len`.
- Training and Validation dataloaders are used in the training and validation part of the flow respectively

In [8]:
# Creating a custom dataset for reading the dataframe and loading it into the dataloader to pass it to the neural network at a later stage for finetuning the model and to prepare it for predictions

class CustomDataset(Dataset): ### seq2seq from ctext to text

    def __init__(self, dataframe, tokenizer, source_len, summ_len):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.source_len = source_len
        self.summ_len = summ_len
        self.text = self.data.text
        self.ctext = self.data.ctext

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        ctext = str(self.ctext[index])
        ctext = ' '.join(ctext.split())

        text = str(self.text[index])
        text = ' '.join(text.split())

        source = self.tokenizer.batch_encode_plus([ctext], max_length= self.source_len, pad_to_max_length=True,return_tensors='pt')
        target = self.tokenizer.batch_encode_plus([text], max_length= self.summ_len, pad_to_max_length=True,return_tensors='pt')

        source_ids = source['input_ids'].squeeze()
        source_mask = source['attention_mask'].squeeze()
        target_ids = target['input_ids'].squeeze()
        target_mask = target['attention_mask'].squeeze()

        return {
            'source_ids': source_ids.to(dtype=torch.long), 
            'source_mask': source_mask.to(dtype=torch.long), 
            'target_ids': target_ids.to(dtype=torch.long),
            'target_ids_y': target_ids.to(dtype=torch.long)
        }

<a id='section03'></a>
### Fine Tuning the Model: Function

Here we define a training function that trains the model on the training dataset created above, specified number of times (EPOCH), An epoch defines how many times the complete data will be passed through the network. 

This function is called in the `main()`

Following events happen in this function to fine tune the neural network:
- The epoch, tokenizer, model, device details, testing_ dataloader and optimizer are passed to the `train ()` when its called from the `main()`
- The dataloader passes data to the model based on the batch size.
- `language_model_labels` are calculated from the `target_ids` also, `source_id` and `attention_mask` are extracted.
- The model outputs first element gives the loss for the forward pass. 
- Loss value is used to optimize the weights of the neurons in the network.
- After every 10 steps the loss value is logged in the wandb service. This log is then used to generate graphs for analysis. Such as [these](https://app.wandb.ai/abhimishra-91/transformers_tutorials_summarization?workspace=user-abhimishra-91)
- After every 500 steps the loss value is printed in the console.

In [9]:
# Creating the training function. This will be called in the main function. It is run depending on the epoch value.
# The model is put into train mode and then we wnumerate over the training loader and passed to the defined network 
def train(epoch, tokenizer, model, device, loader, optimizer):
    model.train()
    for _,data in enumerate(loader, 0):
        y = data['target_ids'].to(device, dtype = torch.long)
        y_ids = y[:, :-1].contiguous()
        # lm_labels = y[:, 1:].clone().detach()
        # lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
        lm_labels = y[:, :].clone().detach()
        lm_labels[lm_labels == tokenizer.pad_token_id] = -100
        ids = data['source_ids'].to(device, dtype = torch.long)
        mask = data['source_mask'].to(device, dtype = torch.long)

        # outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, lm_labels=lm_labels)
        # outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels)
        outputs = model(input_ids = ids, attention_mask = mask, labels=lm_labels)
        loss = outputs[0]
        
        if _%10 == 0:
            wandb.log({"Training Loss": loss.item()})

        if _%500==0:
            print(f'Epoch: {epoch}, Loss:  {loss.item()}')
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # xm.optimizer_step(optimizer)
        # xm.mark_step()

<a id='section04'></a>
### Validating the Model Performance: Function

During the validation stage we pass the unseen data(Testing Dataset), trained model, tokenizer and device details to the function to perform the validation run. This step generates new summary for dataset that it has not seen during the training session. 

This function is called in the `main()`

This unseen data is the 20% of `news_summary.csv` which was seperated during the Dataset creation stage. 
During the validation stage the weights of the model are not updated. We use the generate method for generating new text for the summary. 

It depends on the `Beam-Search coding` method developed for sequence generation for models with LM head. 

The generated text and originally summary are decoded from tokens to text and returned to the `main()`

In [10]:
def validate(epoch, tokenizer, model, device, loader):
    model.eval()
    predictions = []
    actuals = []
    with torch.no_grad():
        for _, data in enumerate(loader, 0):
            y = data['target_ids'].to(device, dtype = torch.long)
            ids = data['source_ids'].to(device, dtype = torch.long)
            mask = data['source_mask'].to(device, dtype = torch.long)

            generated_ids = model.generate(
                input_ids = ids,
                attention_mask = mask, 
                max_length=150, 
                num_beams=2,
                repetition_penalty=2.5, 
                length_penalty=1.0, 
                early_stopping=True
                )
            preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
            target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True)for t in y]
            if _%100==0:
                print(f'Completed {_}')

            predictions.extend(preds)
            actuals.extend(target)
    return predictions, actuals

### Load our data set

In [11]:
import pandas as pd
df = pd.read_csv('SCB_all_v2.csv')
#Check what preprocessing (Beicheng uses pruned data set)
df = df.dropna() #remove nones
df['summary'] = df['summary'].str.replace('#StopClickbait', '')
df

Unnamed: 0,summary,link,article,title
0,"Gadot is Israeli, and pronounces her last name...",http://www.slate.com/blogs/browbeat/2017/05/30...,"With the release of Wonder Woman, star Gal Gad...","How to pronounce Gal Gadot, the star of Wonder..."
1,Ghost.,http://www.mirror.co.uk/tv/tv-news/games-thron...,Season 7 of Game of Thrones is teased in new t...,Game Of Thrones star tragically dies just week...
2,"""...I always thought I was the most outsider, ...",http://www.cinemablend.com/news/1653270/why-di...,While Diane Keaton might not be the first name...,Why Diane Keaton Hadn’t Watched The Godfather ...
3,"She's allowing people to visit her guesthouse,...",http://ctrylv.co/66pB7xu,Ree Drummond superfans have no shortage of pla...,The Pioneer Woman Is Letting People Tour Her R...
4,He was traded to Nashville to be closer to his...,http://faithtap.com/7882/hockey-heartthrob-mik...,"“I think just time spending with them, being w...",FaithTap Archives
...,...,...,...,...
1307,It's Rachel. Meghan is her middle name.,https://www.yahoo.com/lifestyle/meghan-markle-...,"“Meghan Markle” is the kind of charming, allit...",Meghan Markle's real name isn't actually Megha...
1308,The McDonald's Big Mac.,https://www.businessinsider.com/mcdonalds-wend...,"Get the Insider App A personalized feed, summa...","We compared McDonald's, Wendy's, and Burger Ki..."
1309,"French, due to the large growth of French-spea...",https://www.usatoday.com/story/news/world/2014...,Is French the language of the future?\n\nCorre...,Is French the language of the future?
1310,He got a 5 on his AP Art portfolio and got sel...,https://www.gaystarnews.com/article/teacher-sn...,Jasper Behrends is a trans teenager who lives ...,Teacher snubs trans teen’s art project but wha...


In [12]:
df['ctext'] = "context: "+df.article +" <question for context: "+ df.title +" </s>"
df['text'] = df.summary

In [13]:
len(df)


1257

In [14]:
#df.to_excel("temp.xlsx")

In [15]:
# WandB – Initialize a new run
wandb.init(project="transformers_tutorials_summarization")

# WandB – Config is a variable that holds and saves hyperparameters and inputs
# Defining some key variables that will be used later on in the training  
config = wandb.config          # Initialize config
config.TRAIN_BATCH_SIZE = 2    # input batch size for training (default: 64)
config.VALID_BATCH_SIZE = 2    # input batch size for testing (default: 1000)
config.TRAIN_EPOCHS = 20        # number of epochs to train (default: 10)
config.VAL_EPOCHS = 1 
config.LEARNING_RATE = 1e-5    # learning rate (default: 0.01)
config.SEED = 42               # random seed (default: 42)
config.MAX_LEN = 1024 
config.SUMMARY_LEN = 150 

# Set random seeds and deterministic pytorch for reproducibility
torch.manual_seed(config.SEED) # pytorch random seed
np.random.seed(config.SEED) # numpy random seed
torch.backends.cudnn.deterministic = True

# tokenzier for encoding the text
# tokenizer = T5Tokenizer.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("tuner007/t5_abs_qa")



# Importing and Pre-Processing the domain data
# Selecting the needed columns only. 
# Adding the summarzie text in front of the text. This is to format the dataset similar to how T5 model was trained for summarization task. 
# df = pd.read_csv('./data/news_summary.csv',encoding='latin-1')
# df = df[['text','ctext']]
# df.ctext = 'summarize: ' + df.ctext
# print(df.head())


# Creation of Dataset and Dataloader
# Defining the train size. So 90% of the rest of data (10% already taken for test, so 0.81) will be used for training and the rest will be used for validation. 
train_size = 0.8
val_size = 0.1 ### the rest goes to test
train_dataset=df.sample(frac=train_size,random_state = config.SEED)
# val_dataset=df.drop(train_dataset.index).reset_index(drop=True)
val_test_dataset=df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)
val_dataset=val_test_dataset.sample(frac=val_size / (1-train_size),random_state = config.SEED)
test_dataset=val_test_dataset.drop(val_dataset.index).reset_index(drop=True)
val_dataset = val_dataset.reset_index(drop=True)

print("FULL Dataset: {}".format(df.shape))
print("TRAIN Dataset: {}".format(train_dataset.shape))
print("TEST Dataset: {}".format(val_dataset.shape))


# Creating the Training and Validation dataset for further creation of Dataloader
training_set = CustomDataset(train_dataset, tokenizer, config.MAX_LEN, config.SUMMARY_LEN)
val_set = CustomDataset(val_dataset, tokenizer, config.MAX_LEN, config.SUMMARY_LEN)
test_set = CustomDataset(test_dataset, tokenizer, config.MAX_LEN, config.SUMMARY_LEN)

# Defining the parameters for creation of dataloaders
train_params = {
    'batch_size': config.TRAIN_BATCH_SIZE,
    'shuffle': True,
    'num_workers': 0
    }

val_params = {
    'batch_size': config.VALID_BATCH_SIZE,
    'shuffle': False,
    'num_workers': 0
    } ### also used for test set

# Creation of Dataloaders for testing and validation. This will be used down for training and validation stage for the model.
training_loader = DataLoader(training_set, **train_params)
val_loader = DataLoader(val_set, **val_params)
test_loader = DataLoader(test_set, **val_params)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


FULL Dataset: (1257, 6)
TRAIN Dataset: (1006, 6)
TEST Dataset: (126, 6)


In [16]:
len(train_dataset), len(val_dataset), len(test_dataset)

(1006, 126, 125)

In [17]:
train_dataset.to_csv('./train_dataset.csv')
val_dataset.to_csv('./val_dataset.csv')
test_dataset.to_csv('./test_dataset.csv')

### Load model parameters
go straight to this using presaved model params

In [18]:
#model = Model()
model = AutoModelWithLMHead.from_pretrained("tuner007/t5_abs_qa")
model = model.to(device)
path = 'model_20epochs_state_dict'
model.load_state_dict(torch.load(path), strict=False)

The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.


Downloading:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

<All keys matched successfully>

In [19]:
# # Setting up the device for GPU usage
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [20]:
i=1

In [21]:
model.to('cpu')

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseReluDense(
              (wi): Linear(in_features=768, out_features=3072, bias=False)
              (wo): Linear(in_features=3072, out_features=768, bias=False)
              (dropout): Dr

In [23]:
def get_answer(question, context): ### check
  input_text = "context: %s <question for context: %s </s>" % (context,question)
  features = tokenizer([input_text], return_tensors='pt')
  out = model.generate(input_ids=features['input_ids'].to(device), attention_mask=features['attention_mask'].to(device))
  # print(out[0])
  # for i in out[0]:
  #   print(i,tokenizer.decode(i))
  return tokenizer.decode(out[0].detach()) ### test if we dont store the output

In [25]:
train_dataset = pd.read_csv('train_dataset.csv')
val_dataset = pd.read_csv('val_dataset.csv')
test_dataset = pd.read_csv('test_dataset.csv')

print(len(val_dataset))
print(len(test_dataset))

whole_dataset = pd.concat([train_dataset,test_dataset,val_dataset],ignore_index=True)
whole_dataset = whole_dataset[['summary','link','article','title']]
whole_dataset

126
125


Unnamed: 0,summary,link,article,title
0,September 16th. #stopclickbait,http://www.unilad.co.uk/viral/this-is-the-most...,NBC\n\nIf like myself you’re forever forgettin...,This Is The Most Common Birthday In The World
1,"June Moon, aka Enchantress...maybe.",http://www.cinemablend.com/new/Who-Main-Villai...,The next DCEU movie to hit theaters will mark ...,Who The Main Villain Of Suicide Squad May Be
2,Nathan Keyes.,http://www.instyle.com/news/britney-spears-bio...,Naked dresses are perhaps the most impressive ...,"Lifetime's ""Britney"" Casts Role of Justin Timb..."
3,His dad and grandpa--Michael and Kirk Douglas.,https://www.queerty.com/youll-never-guess-mich...,You’ll Never Guess What (Or Who) Michael Dougl...,You’ll Never Guess What (Or Who) Michael Dougl...
4,In the pool.,https://www.clickorlando.com/news/2018/08/02/v...,"MAPLE VALLEY, WASH. – A Washington dad pulled ...",Viral video: You'll never guess where this guy...
...,...,...,...,...
1252,The CDC doesn't recommend it.,https://canoe.com/news/weird/will-wearing-wate...,"As if wearing face masks, washing your hands a...",Will wearing water jugs on your head combat co...
1253,So the crew can assess the surroundings in an ...,https://theculturetrip.com/north-america/usa/a...,Book your bucket list adventure here with TRIP...,Here's Why Flight Attendants Ask You to Raise ...
1254,Industry standards are eliminating/reducing si...,https://www.usatoday.com/story/news/nation/201...,You won't be able to buy some corded blinds st...,You won't be able to buy some corded blinds st...
1255,Australians aren't actually going anywhere...t...,http://www.iflscience.com/environment/australi...,"Hold on tight, Australians – on New Year’s Day...",Australia Will Suddenly Move 1.8 Meters North ...


In [26]:
print(whole_dataset['title'][i])
print(whole_dataset['article'][i])
print(get_answer(whole_dataset['title'][i],whole_dataset['article'][i])[6:-4])

Token indices sequence length is longer than the specified maximum sequence length for this model (1026 > 512). Running this sequence through the model will result in indexing errors


Who The Main Villain Of Suicide Squad May Be
The next DCEU movie to hit theaters will mark one of the weirdest and boldest comic book adaptations ever committed to the silver screen. Suicide Squad hasn’t even premiered yet, and it’s already obvious that the film has taken every possible left turn to set itself apart from what we expect from this genre. David Ayer has a vision in mind, and from what we’ve seen, it seems delightfully fresh.

However, despite the fact that we’re beyond excited for the upcoming release of Suicide Squad, we seriously know nothing about the film’s actual villain. Sure, it will be undeniably awesome to see these characters coalesce on the silver screen for the first time, but the trailers and marketing materials have done very little to inform us of the overarching conflict. What looming threat arises that causes Amanda Waller to bring this group of psychos and killers together? We’ve gone through the details, and come up with five potential big bads that cou

RuntimeError: ignored

In [27]:
for data, target in load_data.train_loader:
    data = data.cuda()
    target = target.cuda()

NameError: ignored

In [22]:
    text = val_dataset['article'][i]
    question = val_dataset['title'][i]
    encoding = tokenizer.encode_plus(question, text, return_tensors="pt")
    input_ids = encoding["input_ids"]

    # default is local attention everywhere
    # the forward method will automatically set global attention on question tokens
    attention_mask = encoding["attention_mask"]

    start_scores, end_scores = model(input_ids, attention_mask=attention_mask)
    all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
    answer_tokens = all_tokens[torch.argmax(start_scores) :torch.argmax(end_scores)+1]
    answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
    val_dataset["Finetuned e20 Text"][i] = answer

ValueError: ignored

In [None]:
val_dataset["Finetuned e20 Text"] = ""

for i in range(len(val_dataset)):
  try:
    print(i)
    text = val_dataset['article'][i]
    question = val_dataset['title'][i]
    encoding = tokenizer.encode_plus(question, text, return_tensors="pt")
    input_ids = encoding["input_ids"]

    # default is local attention everywhere
    # the forward method will automatically set global attention on question tokens
    attention_mask = encoding["attention_mask"]

    start_scores, end_scores = model(input_ids, attention_mask=attention_mask)
    all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())
    answer_tokens = all_tokens[torch.argmax(start_scores) :torch.argmax(end_scores)+1]
    answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
    val_dataset["Finetuned e20 Text"][i] = answer
  except:
    pass

In [None]:
val_dataset

# Calculate BERT scores

In [None]:
epoch

In [None]:
tmp

In [None]:
tmp = df_val
Pb, Rb, Fb = score([str(i) for i in tmp['Actual Text'].tolist()], [str(i) for i in tmp['Generated Text'].tolist()], lang='en')
fbscores = Fb.tolist()
epochs = [0]*len(Fb)
avgs_val = [np.average(fbscores)]
for epoch in range(1,21):
  Pb, Rb, Fb = score([str(i) for i in tmp['Actual Text'].tolist()], [str(i) for i in tmp[f'Generated Text {epoch}'].tolist()], lang='en')
  fbscores += Fb.tolist()
  epochs += [epoch]*len(Fb)
  avgs_val.append(np.average(Fb.tolist()))
sns.lineplot(x=epochs, y=fbscores, ci='sd')
plt.xlabel('Epochs')
plt.ylabel('Fbert score')
plt.savefig('validation_score.png')

In [None]:
tmp = df_test
Pb, Rb, Fb = score([str(i) for i in tmp['Actual Text'].tolist()], [str(i) for i in tmp['Generated Text'].tolist()], lang='en')
fbscores = Fb.tolist()
epochs = [0]*len(Fb)
avgs_val = [np.average(fbscores)]
for epoch in range(1,21):
  Pb, Rb, Fb = score([str(i) for i in tmp['Actual Text'].tolist()], [str(i) for i in tmp[f'Generated Text {epoch}'].tolist()], lang='en')
  fbscores += Fb.tolist()
  epochs += [epoch]*len(Fb)
  avgs_val.append(np.average(Fb.tolist()))
sns.lineplot(x=epochs, y=fbscores, ci='sd')
plt.xlabel('Epochs')
plt.ylabel('Fbert score')
plt.savefig('validation_score.png')

In [None]:
#Validation loop and saving the resulting file with predictions and acutals in a dataframe.
#Saving the dataframe as predictions.csv
print('Now generating summaries on our fine tuned model for the validation dataset and saving it in a dataframe')
for epoch in range(config.VAL_EPOCHS):
    predictions, actuals = validate(epoch, tokenizer, model, device, val_loader)
    final_df = pd.DataFrame({'Generated Text':predictions,'Actual Text':actuals})
    final_df.to_csv('./predictions.csv')
    print('Output Files generated for review')

In [None]:
final_df

# Print BERTscore

In [None]:
tmp = df_test
Pb, Rb, Fb = score([str(i) for i in tmp['Actual Text'].tolist()], [str(i) for i in tmp['Generated Text'].tolist()], lang='en')
epochs = [0]*len(Fb)
print("Precision: "+str(Pb))
print("Recall: "+str(Rb))
print("Fbert: "+str(Fb))
print("Mean Precision: "+str(torch.mean(Pb)))
print("Mean Recall: "+str(torch.mean(Rb[~torch.isnan(Rb)])))
print("Mean Fbert: "+str(torch.mean(Fb)))

for epoch in range(1,21):
  Pb, Rb, Fb = score([str(i) for i in tmp['Actual Text'].tolist()], [str(i) for i in tmp[f'Generated Text {epoch}'].tolist()], lang='en')
  print("Epoch" + str(epoch))
  print("Precision: "+str(Pb))
  print("Recall: "+str(Rb))
  print("Fbert: "+str(Fb))
  print("Mean Precision: "+str(torch.mean(Pb)))
  print("Mean Recall: "+str(torch.mean(Rb[~torch.isnan(Rb)])))
  print("Mean Fbert: "+str(torch.mean(Fb)))

# Calculate Rogue scores

In [None]:
!pip install rouge


In [None]:
#https://www.programcreek.com/python/example/125541/rouge.Rouge

from rouge import Rouge 


def compute_rouge(predictions, targets):
    predictions = [" ".join(prediction).lower() for prediction in predictions]
    predictions = [prediction if prediction else "EMPTY" for prediction in predictions]
    targets = [" ".join(target).lower() for target in targets]
    targets = [target if target else "EMPTY" for target in targets]
    rouge = Rouge()
    scores = rouge.get_scores(hyps=predictions, refs=targets, avg=True)
    return  scores

for epoch in range(1,21):
    targets = [str(i) for i in tmp['Actual Text'].tolist()]
    prediction = [str(i) for i in tmp[f'Generated Text {epoch}'].tolist()]
    print(compute_rouge(predictions, targets))


In [None]:
!pip install rouge/requirements.txt
!pip install rouge-score

In [None]:
fmeasure = score['rouge'+ind]
        results['precision'].append(precision)
        results['recall'].append(recall)
        results['fmeasure'].append(fmeasure)
    print("results['precision']"+ str(np.mean(results['precision'])))
    print("results['recall']"+ str(np.mean(results['recall'])))
    print("results['fmeasure']"+ str(np.mean(results['fmeasure'])))

We could try it on the reddit data set