## Generate Summaries for One Model for a Single Config Settings

The *Longformer Encoder-Decoder (LED)* was recently added as an extension to [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.

In this notebook we will generate summaries based on our previous trained model.

First, let's check we have a GPU with at least 15GB RAM.

In [None]:
import torch
torch.cuda.empty_cache()

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Sat Apr 13 05:31:02 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8              11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

We need to upgrade accelerate. Install it on the first session. No need to re-install on the restarted session. Use the pip show to verify the version previously installed.

In [None]:
!pip install accelerate -U --quiet

[?25l     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/297.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[91m‚ï∏[0m [32m297.0/297.4 kB[0m [31m10.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m297.4/297.4 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25h

Next, we install ü§óTransformers, ü§óDatasets, and `rouge_score`.



Install on the first session. Do no reinstall again after we restart our session as it will upgrade the dill package to 0.3.8., which we do not want.

In [None]:
%%capture
!pip install datasets==1.2.1 #do not install again, if installed in previous session
!pip install transformers==4.2.0 #do not install again, if installed in previous session
!pip install rouge_score #do not install again, if installed in previous session

Need dill 0.3.4, need to uninstall default dill 0.3.8, otherwise it will have both installations, and the model will fail. After installation wait to restart session AFTER - ALSO reinstall numpy

In [None]:
!pip uninstall -y dill #need 0.3.4, need to reinstall default dill 0.3.8, otherwise it will have both installations, and model will fail
!pip install dill==0.3.4 #after installation restart session AFTER - ALSO reinstall numpy

Found existing installation: dill 0.3.8
Uninstalling dill-0.3.8:
  Successfully uninstalled dill-0.3.8
Collecting dill==0.3.4
  Downloading dill-0.3.4-py2.py3-none-any.whl (86 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m86.9/86.9 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dill
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
multiprocess 0.70.16 requires dill>=0.3.8, but you have dill 0.3.4 which is incompatible.[0m[31m
[0mSuccessfully installed dill-0.3.4


Downgrading numpy version to a previous version that supports np.object. Default numpy installation 1.25.2 causes failure, need to uninstall and install 1.23.5.

In [None]:
!pip uninstall -y numpy #default numpy installation 1.25.2 causes failure, need to uninstall and install 1.23.5
!pip install numpy==1.23.5 #need to restart session


Found existing installation: numpy 1.25.2
Uninstalling numpy-1.25.2:
  Successfully uninstalled numpy-1.25.2
Collecting numpy==1.23.5
  Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m17.1/17.1 MB[0m [31m82.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chex 0.1.86 requires numpy>=1.24.1, but you have numpy 1.23.5 which is incompatible.
pandas-stubs 2.0.3.230814 requires numpy>=1.25.0; python_version >= "3.9", but you have numpy 1.23.5 which is incompatible.[0m[31m
[0mSuccessfully installed numpy-1.23.5


In [None]:
!pip install bert-score -q

[?25l     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/61.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[90m‚ï∫[0m[90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m51.2/61.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m61.1/61.1 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h

###Package Version Checks

The next cells are just to check the expected versions installed.

In [None]:
!pip show accelerate #should show 0.28.0 or 0.29.1


Name: accelerate
Version: 0.29.2
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: zach.mueller@huggingface.co
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: 


In [None]:
!pip show datasets #should show 1.2.1

Name: datasets
Version: 1.2.1
Summary: HuggingFace/Datasets is an open library of NLP datasets.
Home-page: https://github.com/huggingface/datasets
Author: HuggingFace Inc.
Author-email: thomas@huggingface.co
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: dill, multiprocess, numpy, pandas, pyarrow, requests, tqdm, xxhash
Required-by: 


In [None]:
!pip show transformers #should show 4.38.2

Name: transformers
Version: 4.38.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: 


In [None]:
!pip show rouge_score #should show 0.1.2

Name: rouge-score
Version: 0.1.2
Summary: Pure python implementation of ROUGE-1.5.5.
Home-page: https://github.com/google-research/google-research/tree/master/rouge
Author: Google LLC
Author-email: rouge-opensource@google.com
License: 
Location: /usr/local/lib/python3.10/dist-packages
Requires: absl-py, nltk, numpy, six
Required-by: 


In [None]:
!pip show dill #need 0.3.4 to work, otherwise we get _stack error.


Name: dill
Version: 0.3.4
Summary: serialize all of python
Home-page: https://github.com/uqfoundation/dill
Author: Mike McKerns
Author-email: 
License: 3-clause BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: 
Required-by: datasets, multiprocess


In [None]:
!pip show numpy #should show numpy 1.23.5, otherwise it will not work

Name: numpy
Version: 1.23.5
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: 
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: 
Required-by: accelerate, albumentations, altair, arviz, astropy, autograd, blis, bokeh, bqplot, chex, cmdstanpy, contourpy, cufflinks, cupy-cuda12x, cvxpy, datascience, datasets, db-dtypes, dopamine-rl, ecos, flax, folium, geemap, gensim, gym, h5py, holoviews, hyperopt, ibis-framework, imageio, imbalanced-learn, imgaug, jax, jaxlib, librosa, lightgbm, matplotlib, matplotlib-venn, missingno, mizani, ml-dtypes, mlxtend, moviepy, music21, nibabel, numba, numexpr, opencv-contrib-python, opencv-python, opencv-python-headless, opt-einsum, optax, orbax-checkpoint, osqp, pandas, pandas-gbq, pandas-stubs, patsy, plotnine, prophet, pyarrow, pycocotools, pyerfa, pymc, pytensor, python-louvain, PyWavelets, qdldl, qudida, rouge-score, scikit-im

###Loading Data and Defining Functions

In [None]:
import os
import re
import pandas as pd
from bert_score import score

In [None]:
# This cell will authenticate and mount Drive in the Colab.
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Load the data

In [None]:
from tqdm import tqdm


def load_data(file_path):
    chunksize = 5e5 # adjust this value depending on file's size
    df = pd.DataFrame()
    for chunk in tqdm(pd.read_csv(file_path, chunksize=chunksize)):
        df = pd.concat([df, chunk])
    return df

#  path to saved the train data
tr_df = load_data('drive/MyDrive/W266_NLP/Project2/summarization/ms2-train-data.csv') #train data

vl_df = load_data('drive/MyDrive/W266_NLP/Project2/summarization/ms2-val-data.csv')  #validation data


1it [00:14, 14.31s/it]
1it [00:05,  5.24s/it]


Sanity check the data

In [None]:
tr_df.head(2)

Unnamed: 0.1,Unnamed: 0,review_id,pmid,title,abstract,target,background
0,0,30760312,"['22776744', '25271670', '3493740', '1863023',...",['Improved Cell Survival and Paracrine Capacit...,['Although transplantation of adult bone marro...,Conclusions SC therapy is effective for PAH in...,Background Despite significant progress in dru...
1,1,19588356,"['8532025', '10790348', '17504794', '16793845'...",['A comparison of continuous intravenous epopr...,['BACKGROUND Primary pulmonary hypertension is...,There was a trend for endothelin receptor anta...,BACKGROUND Pulmonary arterial hypertension is ...


Remove columns we will not use. abstract is our input data, and target is our target data.

In [None]:
# List of columns to drop
columns_to_drop = ['Unnamed: 0', 'review_id', 'pmid', 'title', 'background']

# Drop the unnecessary columns
tr_df = tr_df.drop(columns=columns_to_drop)

vl_df = vl_df.drop(columns=columns_to_drop)

tr_df.head(2)


Unnamed: 0,abstract,target
0,['Although transplantation of adult bone marro...,Conclusions SC therapy is effective for PAH in...
1,['BACKGROUND Primary pulmonary hypertension is...,There was a trend for endothelin receptor anta...


In [None]:
tr_df.shape , vl_df.shape

((14188, 2), (2021, 2))

To check that we are having enough RAM we can run the following command.
If the randomely allocated GPU is too small, the above cells can be run
to crash the notebook hoping to get a better GPU.

In [None]:
!nvidia-smi

Sat Apr 13 01:57:13 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

Let's start by loading and preprocessing the dataset.



In [None]:
from datasets import Dataset, load_dataset, load_metric

Next, we download the data from the dataframes.

In [None]:
# Convert the DataFrame to a Dataset
train_dataset = Dataset.from_pandas(tr_df)
val_dataset = Dataset.from_pandas(vl_df)


In [None]:
import torch

from datasets import load_dataset, load_metric
from transformers import LEDTokenizer, LEDForConditionalGeneration

If there are x in "val_dataset = val_dataset.select(range(x))" it evaluates x entries, resulting in "result.shape (x,2)". If we want to evaluate the entire dataset then we can skip this line, or do "val_dataset.select(range(2021))" which is the entire validation dataset.

In [None]:
#train_dataset = train_dataset.select(range(10000))
val_dataset = val_dataset.select(range(250))

Because there are 10 in "val_dataset = val_dataset.select(range(10))" it evaluates 10 entries, resulting in "result.shape (10,2)". If we want to evaluate the entier dataset then we can skip that line, or do "val_dataset.select(range(2021))" since there 2021 row entries for validation.

In [None]:
val_dataset.shape

(250, 2)

#Evaluation

###Supporting Functions

####Generate answer

In [None]:
def generate_answer(batch):
    inputs_dict = tokenizer(batch["abstract"], padding="max_length", max_length=max_input_token_length, return_tensors="pt", truncation=True)
    input_ids = inputs_dict.input_ids.to("cuda")
    attention_mask = inputs_dict.attention_mask.to("cuda")
    global_attention_mask = torch.zeros_like(attention_mask)
    # put global attention on <s> token
    global_attention_mask[:, 0] = 1

    predicted_abstract_ids = model.generate(
        input_ids,
        attention_mask=attention_mask,
        global_attention_mask=global_attention_mask,
        num_beams=model.config.num_beams,
        length_penalty=model.config.length_penalty,
        early_stopping=model.config.early_stopping,
        no_repeat_ngram_size=model.config.no_repeat_ngram_size
    )

    batch["predicted_abstract"] = tokenizer.batch_decode(predicted_abstract_ids, skip_special_tokens=True, max_length=max_output_token_length)

    return batch

Different Strategies, here we document the different approaches we took.

In [None]:
strategies = ['Abstraction','Extraction TF-IDF\Abstraction'] #value 0 is when we use just abstraction
base_models =['allenai/led-base-16384', 'allenai/led-large-16384']
token_strategies = ['None', 'Remove Stop Words', 'Add New Tokens', 'Remove Stop Words AND Add New Tokens'] # for now is None
frozen_layers =['no', 'yes'] #default is no


####Saving Results

In [None]:


def save_validation_results(csv_name, name, trained_model_name,base_model, validation_data_size, strategy, token_strategies, frozen_layers, max_input_token_length, \
                            max_output_token_length, epochs, num_beams, max_length, length_penalty, early_stopping, \
                            no_repeat_ngram_size, rouge1, rouge2, bertscore_precision, bertscore_recall, bertscore_f1, note):
    # Define the CSV file path
    csv_file_path = f"/content/drive/MyDrive/W266_NLP/Project2/summarization/generation_trials/{csv_name}.csv"

    # Define the data to be saved
    data = {
        "name": [name],
        "trained_model_name": [trained_model_name],
        "base_model":[base_model],
        "validation_data_size": [validation_data_size],
        "strategy": [strategy],
        "token_strategy": [token_strategies],
        "freeze_layers": [frozen_layers],
        "max_input_token_length": [max_input_token_length],
        "max_output_token_length": [max_output_token_length],
        "epochs": [epochs],
        "num_beams": [num_beams],
        "max_length": [max_length],
        "length_penalty": [length_penalty],
        "early_stopping": [early_stopping],
        "no_repeat_ngram_size": [no_repeat_ngram_size],
        "rouge1": [rouge1],
        "rouge2": [rouge2],
        "bertscore_precision": [bertscore_precision],
        "bertscore_recall": [bertscore_recall],
        "bertscore_f1": [bertscore_f1],
        "note":[note]
    }

    # Convert the data to a DataFrame
    df = pd.DataFrame(data)

    # If the CSV file exists, load it and append the new data
    if os.path.exists(csv_file_path):
        df_existing = pd.read_csv(csv_file_path)
        df = pd.concat([df_existing, df], ignore_index=True)

    # Save the DataFrame to the CSV file
    df.to_csv(csv_file_path, index=False)
    return csv_file_path


max_output_token_length remains the same for all the models

In [None]:
max_output_token_length = 196

###Generating Summaries

####Load Model

In [None]:
import torch
#clean memory
del model  #delete model
!nvidia-smi
torch.cuda.empty_cache() #clear memory cache
!nvidia-smi

"allenai/led-large-16384" Longformer LED model with abstractive approach using native data  with 8K tokens:"e_l_longformer_8k_plain1/checkpoint-750/", but trained on less than one epoch at this point.

data entry meta data, all this parameters go into a csv file for later use

In [None]:
csv_name = 'e_lm_longformer_8k_abstr-2200_bert_score_final' #name of file
investigation_name = 'large led 8K configuration with 2200 steps 1.77 epochs' #name this investigation
base_model = base_models[1] #name of base model used for fine tuning: 1 is large
token_strategy = token_strategies[0] #specify if any additional token processing was done, right now is None
freeze_layers =frozen_layers[0] #default is 0/ no, where layers unfrozen during trianing
validation_data_size = val_dataset.shape[0] #size of data
strategy = strategies[0] #0 is 'Abstraction' only
load_dir = "/content/drive/MyDrive/W266_NLP/Project2/summarization/e_l_longformer_8k_plain3/checkpoint-2200/"
name = 'e_lm_longformer_8k_abstr' #name of trained model

In [None]:
#add a note here, for this particular run
note ='using Bert Score for final model evaluation with num_beams = 5, and length_penalty = 1.25'

####Load the Model and Tokenizer

In [None]:
max_input_token_length = 8000 # this is used in generate_answer function

# Load the fine-tuned model
model = LEDForConditionalGeneration.from_pretrained(load_dir).to("cuda").half()
# Load the tokenizer
tokenizer = LEDTokenizer.from_pretrained(load_dir)



####Define Range for Hyperparameter Sweep

Define range of sweep of parameters, the larger the ranges, the more iteration numbers, the longer it will take to complete

In [None]:
val_dataset.shape

(250, 2)

In [None]:
num_beams = 5 #wsa 2
max_length = 196 #was 512, our targets are 190 max tokens at most
min_length = 26  #was 100, our targets are min 26 tokens
length_penalty = 1.25 #was 2.0
early_stopping = False
no_repeat_ngram_size = 2

####Start the Evaluation

In [None]:
i=1 #sets which entry we want to sample for review


#start file save
summaries =[]
# Load rouge
rouge = load_metric("rouge")

 # Set the hyperparameters
model.config.num_beams = num_beams
model.config.max_length = max_length #was 512, our targets are 190 max tokens at most
model.config.min_length = min_length  #was 100, our targets are min 26 tokens
model.config.length_penalty = length_penalty #was 2.0
model.config.early_stopping = early_stopping
model.config.no_repeat_ngram_size = no_repeat_ngram_size

# Run the model and compute the rouge metrics
result = val_dataset.map(generate_answer, batched=True, batch_size=2)
rouge1 = rouge.compute(predictions=result["predicted_abstract"], references=result["target"], rouge_types=["rouge1"])["rouge1"].mid
rouge2 = rouge.compute(predictions=result["predicted_abstract"], references=result["target"], rouge_types=["rouge2"])["rouge2"].mid

# Calculate BERTScore
P, R, F1 = score(result["predicted_abstract"], result["target"], lang="en", verbose=False)
bertscore_precision = P.mean().item()
bertscore_recall = R.mean().item()
bertscore_f1 = F1.mean().item()

# Save the results
file_path = save_validation_results(csv_name, name, investigation_name, base_model, validation_data_size, strategy, token_strategy, freeze_layers, \
                                    max_input_token_length, max_output_token_length, '', \
                                    model.config.num_beams, model.config.max_length, model.config.length_penalty, model.config.early_stopping, \
                                    model.config.no_repeat_ngram_size, rouge1, rouge2,\
                                    bertscore_precision, bertscore_recall, bertscore_f1, note)

print("------------------Results ------------------")
# Print the results for this combination of hyperparameters
print(f"\n num_beams: {num_beams}, length_penalty: {length_penalty}, early_stopping: {early_stopping}, no_repeat_ngram_size: {no_repeat_ngram_size}")
print(f"rouge1: {rouge1}")
print(f"rouge2: {rouge2}")
print(f"BERTScore Precision: {bertscore_precision:.4f}, Recall: {bertscore_recall:.4f}, F1 Score: {bertscore_f1:.4f}")
print(f"Results saved to: {file_path}")

print("------------------End ------------------")

#save results
df = pd.DataFrame(result)

# Now you can use the to_csv method to save the dataframe to a CSV file

save_to = f"/content/drive/MyDrive/W266_NLP/Project2/summarization/generation_trials/{csv_name}.csv"
df.to_csv(save_to, index=False)

HBox(children=(FloatProgress(value=0.0, max=125.0), HTML(value='')))




Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


------------------Results ------------------

 num_beams: 5, length_penalty: 1.25, early_stopping: False, no_repeat_ngram_size: 2
rouge1: Score(precision=0.28118242084627043, recall=0.24805944036178496, fmeasure=0.23982735870303276)
rouge2: Score(precision=0.057698844304712345, recall=0.05040613194398993, fmeasure=0.04807903161921104)
BERTScore Precision: 0.8585, Recall: 0.8427, F1 Score: 0.8503
Results saved to: /content/drive/MyDrive/W266_NLP/Project2/summarization/generation_trials/e_lm_longformer_8k_abstr-2200_bert_score_final.csv
------------------End ------------------


In [None]:
rouge1

Score(precision=0.22904290468143468, recall=0.3232251029186865, fmeasure=0.2368903041297903)

In [None]:
rouge2

Score(precision=0.04842935222313299, recall=0.06997713370904665, fmeasure=0.049703169207369204)

In [None]:
bertscore_recall

NameError: name 'bertscore_recall' is not defined

In [None]:
df = pd.DataFrame(result)
df.head()

Unnamed: 0,abstract,predicted_abstract,target
0,"[""ABSTRACT A healthy intestinal microbiota is ...",There was no significant difference between pr...,Current evidence from systematic review and me...
1,['The effects of the soluble fiber konjac gluc...,There was no significant difference between th...,The use of glucomannan did not appear to signi...
2,['The aims of this study were 1 ) to evaluate ...,There is currently no evidence to support the ...,Ensuring that the characteristics of the histo...
3,"[""Abstract . This study documented postoperati...",There was no significant difference between th...,The QT autograft detected comparable rate of L...
4,['OBJECTIVES To investigate the effects of dar...,There was no significant association between t...,medicines with anti-cholinergic properties hav...


In [None]:
df.predicted_abstract.iloc[2]

'There is currently no evidence to support the use of dexamethasone or acetazolamide as prophylaxis for acute mountain sickness.'

In [None]:
df.target.iloc[2]

'Ensuring that the characteristics of the history and future ascents are similar may improve the clinical utility of AMS history'

In [None]:
df.predicted_abstract.iloc[2]

In [None]:
save_to = f"/content/drive/MyDrive/W266_NLP/Project2/summarization/generation_trials/{csv_name}.csv"
df.to_csv(save_to, index=False)

In [None]:
i = 1 #whaever sample summary you want to see

In [None]:
result["predicted_abstract"][i]

'There was no significant difference between the two groups in fasting blood glucose, HDL-C, and LDL cholesterol.\nThere were no statistically significant differences in body weight and body fatness between glucomannan and placebo, but there was a significant increase in the risk of nausea, vomiting, diarrhea, dizziness, abdominal pain, nausea and vomiting with the use of the drug.'

In [None]:
result["target"][i]

'The use of glucomannan did not appear to significantly alter any other study endpoints .\nPediatric patients , patients receiving dietary modification , and patients with impaired glucose metabolism did not benefit from glucomannan to the same degree .\nGlucomannan appears to beneficially affect total cholesterol , LDL cholesterol , triglycerides , body weight , and FBG , but not HDL cholesterol or BP'

Is one rouge1 and rouge2 score pair better than any other?

###Results

####Take a Peek

Lets take a peek at how the different hyperparameter settings affect the generated summmary

Please note which of the above are good results

####Write Sample Summaries to Drive

In [None]:
# save the summaries

csv_file_path = file_path  # Ensure this variable is defined earlier


text_file_path = f"/content/drive/MyDrive/W266_NLP/Project2/summarization/generation_trials/{csv_name}-summaries.txt"

# Write the content to the text file
with open(text_file_path, 'w') as file:
    file.write("-----Below are sample summaries generated with the following settings:----\n")
    file.write(f"name: {csv_name}\n")
    file.write(f"investigation_name: {investigation_name}\n")
    file.write(f"base_model: {base_model}\n")
    file.write(f"token_strategy: {token_strategy}\n")
    file.write(f"freeze_layers: {freeze_layers}\n")
    file.write(f"validation_data_size: {validation_data_size}\n")
    file.write(f"strategy: {strategy}\n")
    file.write(f"load_dir: {load_dir}\n")
    file.write(f"csv_file_path: {csv_file_path}\n")
    file.write("----Range of values for each hyperparameter------------\n")
    file.write(f"num_beams_values: {list(num_beams_values)}\n")
    file.write(f"length_penalty_values: {length_penalty_values}\n")
    file.write(f"early_stopping_values: {early_stopping_values}\n")
    file.write(f"no_repeat_ngram_size_values: {list(no_repeat_ngram_size_values)}\n")
    file.write("# Calculate the total number of iterations\n\n")

    # Assuming 'sample_summaries' is a dictionary with all the necessary keys and values
    for i in range(len(sample_summaries['num_beams'])):
        file.write(f"------------------Sample {i}-----------------------\n")
        file.write(f"bertscore_f1: {sample_summaries['bertscore_f1'][i]}---rouge1 fmeasure: {sample_summaries['rouge1_fmeasure'][i]}---\
        rouge2 fmeasure: {sample_summaries['rouge2_fmeasure'][i]}---num_beams: {sample_summaries['num_beams'][i]}\
        --length_penalty: {sample_summaries['length_penalty'][i]}--early_stopping: {sample_summaries['early_stopping'][i]}\
        --no_repeat_ngram_size: {sample_summaries['no_repeat_ngram_size'][i]}\n")
        file.write("------------------Generated Summary------------------------\n")
        file.write(f"{sample_summaries['predicted_abstract'][i]}\n")
        file.write("------------------Original Summary------------------------\n")
        file.write(f"{sample_summaries['target'][i]}\n\n")


#End of File