## Data Collection 🛠

The subjQA dataset is constructed based on publicly available review datasets. Specifically, the movies, books, electronics, and grocery categories are constructed using reviews from the Amazon Review dataset. The TripAdvisor category, as the name suggests, is constructed using reviews from TripAdvisor which can be found [here](link). Finally, the restaurants category is constructed using the Yelp Dataset which is also publicly available.

The process of constructing SubjQA is discussed in detail in our paper. In a nutshell, the dataset construction consists of the following steps:

1. First, all opinions expressed in reviews are extracted. In the pipeline, each opinion is modeled as a (modifier, aspect) pair which is a pair of spans where the former describes the latter. *(e.g., "good, hotel", and "terrible, acting" are a few examples of extracted opinions)*.
2. Using Matrix Factorization techniques, implication relationships between different expressed opinions are mined. For instance, the system mines that "responsive keys" implies "good keyboard". In our pipeline, we refer to the conclusion of an implication (i.e., "good keyboard" in this example) as the query opinion, and we refer to the premise (i.e., "responsive keys") as its neighboring opinion.
3. Annotators are then asked to write a question based on query opinions. For instance, given "good keyboard" as the query opinion, they might write "Is this keyboard any good?"
4. Each question written based on a query opinion is then paired with a review that mentions its neighboring opinion. In our example, that would be a review that mentions "responsive keys".
5. The question and review pairs are presented to annotators to select the correct answer span, and rate the subjectivity level of the question as well as the subjectivity level of the highlighted answer span.

## Data Format 📊

All files are in standard CSV format, and they consist of the following columns:

- **domain**: The category/domain of the review (e.g., hotels, books, ...).
- **question**: The question (written based on a query opinion).
- **review**: The review (that mentions the neighboring opinion).
- **human_ans_spans**: The span labeled by annotators as the answer.
- **human_ans_indices**: The (character-level) start and end indices of the answer span highlighted by annotators.
- **question_subj_level**: The subjectivity level of the question (on a 1 to 5 scale with 1 being the most subjective).
- **ques_subj_score**: The subjectivity score of the question computed using the TextBlob package.
- **is_ques_subjective**: A boolean subjectivity label derived from question_subj_level (i.e., scores below 4 are considered as subjective).
- **answer_subj_level**: The subjectivity level of the answer span (on a 1 to 5 scale with 5 being the most subjective).
- **ans_subj_score**: The subjectivity score of the answer span computed using the TextBlob package.
- **is_ans_subjective**: A boolean subjectivity label derived from answer_subj_level (i.e., scores below 4 are considered as subjective).
- **nn_mod**: The modifier of the neighboring opinion (which appears in the review).
- **nn_asp**: The aspect of the neighboring opinion (which appears in the review).
- **query_mod**: The modifier of the query opinion (around which a question is manually written).
- **query_asp**: The aspect of the query opinion (around which a question is manually written).
- **item_id**: The id of the item/business discussed in the review.
- **review_id**: A unique id associated with the review.
- **q_review_id**: A unique id assigned to the question-review pair.
- **q_reviews_id**: A unique id assigned to all question-review pairs with a shared question.

### Citation
Johannes Bjerva, Nikita Bhutani, Behzad Golahn, Wang-Chiew Tan, and Isabelle Augenstein. (2020). SubjQA: A Dataset for Subjectivity and Review Comprehension. In *Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing*. Association for Computational Linguistics.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# !pip install datasets evaluate transformers[sentencepiece]
# !pip install accelerate
# !apt install git-lfs

In [None]:
from google.colab import userdata
userdata.get('HuggingFace')

# Retrieve secret name
secret_name = userdata.get('HuggingFace')

# Set up Git configuration
!git config --global credential.helper store
!git config --global user.email "kagantimur@icloud.com"
!git config --global user.name "kgntmr"

# Log in to the Hugging Face Hub
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) y
Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from datasets import load_dataset
import datasets
from transformers import AutoTokenizer
import numpy as np

In [None]:
model = "deepset/roberta-base-squad2"
tokenizer = AutoTokenizer.from_pretrained(model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
# A fast tokenizer is optimized for speed and efficiency in tokenizing text
# Often implement faster processing, useful for large-scale NLP tasks.
tokenizer.is_fast

True

In [None]:
import pandas as pd
df_train = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/subjqa-train.csv')
df_test = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/subjqa-test.csv')

In [None]:
# Define the maximum length and stride parameters for tokenization
max_length = 384  # Maximum length of tokenized sequences, commonly used for a balance between context and memory usage
stride = 128  # Stride determines overlap between tokenized sequences, providing context while avoiding redundancy

In [None]:
def preprocess_training_examples(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=max_length,
        truncation="only_second",
        stride=stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    sample_map = inputs.pop("overflow_to_sample_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        sample_idx = sample_map[i]
        answer = answers[sample_idx]
        start_char = answer["answer_start"][0]
        end_char = answer["answer_start"][0] + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1:
            idx += 1
        context_end = idx - 1

        # If the answer is not fully inside the context, label is (0, 0)
        if offset[context_start][0] > start_char or offset[context_end][1] < end_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

In [None]:
df_train.head()

Unnamed: 0,item_id,domain,nn_mod,nn_asp,query_mod,query_asp,q_review_id,q_reviews_id,question,question_subj_level,ques_subj_score,is_ques_subjective,review_id,review,human_ans_spans,human_ans_indices,answer_subj_level,ans_subj_score,is_ans_subjective
0,B00BVMXBDO,movies,addictive,show,full,series,d9a9615d45df2f6e6108db4ca46bfded,399f1046fe6bd97990107f9d7aa86f4a,Who is the author of this series?,1,0.0,False,090671369dddfeb02db9bf7125a47c79,Whether it be in her portrayal of a nerdy lesb...,ANSWERNOTFOUND,"(251, 265)",1,0.0,False
1,1404918051,movies,enough simple,film,charming,movie,06ffe37a8023636a3ce00b020a517e87,42d9dd5b0c67150cac1e13308811cbb5,Can we enjoy the movie along with our family ?,1,0.5,False,a29821121e74d319cb93f77101e99c88,"An outstanding romantic comedy, 13 Going on 30...",ANSWERNOTFOUND,"(1195, 1209)",1,0.0,False
2,B0000633ZP,movies,weak,plot,bad,one,3b625c68e91b9e6987a08b84a9a9d234,32d06ccf2132cda644aea791fa688c53,Does this one good?,5,0.6,True,12a1b821f761bd19a75be7b16cef4a7c,"To let the truth be known, I watched this movi...",ANSWERNOTFOUND,"(1476, 1490)",5,0.0,False
3,B0000AQS0F,movies,outstanding,show,wonderful,series,f3abfa98b011127e7cb49bcd07f8deeb,e546636f0bb9f93d5f24b4ade9ebab45,Is this series good and excelent?,1,0.6,True,cd0f92322e67cc9d70de6674caace78c,"At the time of my review, there had been 910 c...",this show is OUTSTANDING,"(296, 320)",1,0.875,True
4,B003Y5H5FG,movies,great,production design,great,costume design,1b03744e764b257592c2c768345c14bc,a0a97e460a194bcb3286fe68d20aadc2,How is the costume design?,1,0.0,False,f6b5024393ebc70287befdaf47a50b75,"""Fright Night"" is great! This is how the story...",The costume design by Susan Matheson is great,"(1254, 1299)",1,0.75,True


In [None]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2501 entries, 0 to 2500
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   item_id              2501 non-null   object 
 1   domain               2501 non-null   object 
 2   nn_mod               2501 non-null   object 
 3   nn_asp               2501 non-null   object 
 4   query_mod            2501 non-null   object 
 5   query_asp            2501 non-null   object 
 6   q_review_id          2501 non-null   object 
 7   q_reviews_id         2501 non-null   object 
 8   question             2501 non-null   object 
 9   question_subj_level  2501 non-null   int64  
 10  ques_subj_score      2501 non-null   float64
 11  is_ques_subjective   2501 non-null   bool   
 12  review_id            2501 non-null   object 
 13  review               2501 non-null   object 
 14  human_ans_spans      2501 non-null   object 
 15  human_ans_indices    2501 non-null   o

In [None]:
df_train.columns

Index(['item_id', 'domain', 'nn_mod', 'nn_asp', 'query_mod', 'query_asp',
       'q_review_id', 'q_reviews_id', 'question', 'question_subj_level',
       'ques_subj_score', 'is_ques_subjective', 'review_id', 'review',
       'human_ans_spans', 'human_ans_indices', 'answer_subj_level',
       'ans_subj_score', 'is_ans_subjective'],
      dtype='object')

## Checking the questions and answers
- Let's check it questions and answer according to the 'human_ans_indices'

In [None]:
df_train.iloc[0].question

'Who is the author of this series?'

In [None]:
df_train.iloc[0].review

"Whether it be in her portrayal of a nerdy lesbian or a punk rock rebel, Maslany's plural personalities, (though very stereotypical), are entertaining eye-candy. Combined with a complex and unpredictable plot line, this show is surprisingly addictive. ANSWERNOTFOUND"

In [None]:
df_train.iloc[0].human_ans_indices

'(251, 265)'

In [None]:
df_train.iloc[0].review[251:265]

'ANSWERNOTFOUND'

In [None]:
# Picking the necessary columns for further analysis
df_train=df_train[['question','human_ans_indices','review','human_ans_spans']]
df_test=df_test[['question','human_ans_indices','review','human_ans_spans']]

In [None]:
# Generate a sequence evenly spaced numbers
df_train['id']=np.linspace(0,len(df_train)-1,len(df_train))
df_test['id']=np.linspace(0,len(df_test)-1,len(df_test))

In [None]:
df_train['id']=df_train['id'].astype(str)
df_test['id']=df_test['id'].astype(str)

In [None]:
int(df_train.iloc[0].human_ans_indices.split('(')[1].split(',')[0])

251

In [None]:
float(df_train.iloc[0].human_ans_indices.split('(')[1].split(',')[1].split(' ')[1].split(')')[0])

265.0

In [None]:
# Indicating where the answers are
df_train['answers']=df_train['human_ans_spans']
# Actual answer text itself, right answer where should be
df_test['answers']=df_test['human_ans_spans']

In [None]:
# Extract answer data and adds it to a new column
for i in range(0,len(df_train)):
  answer1={}
  si=int(df_train.iloc[i].human_ans_indices.split('(')[1].split(',')[0])
  ei=int(df_train.iloc[i].human_ans_indices.split('(')[1].split(',')[1].split(' ')[1].split(')')[0])
  answer1['text']=[df_train.iloc[i].review[si:ei]]
  answer1['answer_start']=[si]
  df_train.at[i, 'answers']=answer1

In [None]:
print(df_train.iloc[i].answers,df_train.iloc[i].human_ans_spans)

{'text': ['ANSWERNOTFOUND'], 'answer_start': [801]} ANSWERNOTFOUND


In [None]:
df_train.columns

Index(['question', 'human_ans_indices', 'review', 'human_ans_spans', 'id',
       'answers'],
      dtype='object')

In [None]:
df_train.columns=['question', 'human_ans_indices', 'context', 'human_ans_spans', 'id',
       'answers']

df_test.columns=['question', 'human_ans_indices', 'context', 'human_ans_spans','id',
       'answers']

In [None]:
val_dataset2 = datasets.Dataset.from_pandas(df_test)
train_dataset2 = datasets.Dataset.from_pandas(df_train)

In [None]:
# Preprocess the training examples .map() function on training dataset with the preprocessing function
train_dataset = train_dataset2.map(
    preprocess_training_examples,
    batched=True,
    remove_columns=train_dataset2.column_names,
)
len(train_dataset2), len(train_dataset) # compare the lengths of the original dataset (train_dataset2) and the preprocessed dataset (train_dataset).

Map:   0%|          | 0/2501 [00:00<?, ? examples/s]

(2501, 4862)

In [None]:
train_dataset2.shape

(2501, 6)

It shows that all 2501 examples were processed in 10 seconds at a speed of 260.48 examples per second. The resulting dataset has 4862 examples.

In [None]:
def preprocess_validation_examples(examples):
    # Cleaning the questions by stripping leading and trailing whitespace for consistency
    questions = [q.strip() for q in examples["question"]]

    # Tokenization; converting questions and contexts into numerical IDs, enabling the model to understand
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=max_length, # Total length of the input sequence
        truncation="only_second", # If the total length exceeds max_length, only the context will be truncated
        stride=stride, # Overlap between the chunks
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    # Extracting overflow_to_sample_mapping
    sample_map = inputs.pop("overflow_to_sample_mapping")
    example_ids = []

    # Looping over the tokenized inputs
    for i in range(len(inputs["input_ids"])):
        sample_idx = sample_map[i]
        example_ids.append(examples["id"][sample_idx]) # Retrieving example IDs

        # Adjusting offset mapping based on sequence IDs
        sequence_ids = inputs.sequence_ids(i)
        offset = inputs["offset_mapping"][i]
        inputs["offset_mapping"][i] = [
            o if sequence_ids[k] == 1 else None for k, o in enumerate(offset)
        ]

    # Adding example IDs to the tokenized inputs
    inputs["example_id"] = example_ids
    return inputs

In [None]:
# Preprocess the validation dataset by applying the preprocess_validation_examples function to each example
validation_dataset = val_dataset2.map(
    preprocess_validation_examples,  # Function to preprocess each example
    batched=True,  # Process examples in batches for efficiency
    remove_columns=val_dataset2.column_names,  # Remove unnecessary columns from the dataset
)

# Calculate the length of the preprocessed validation dataset
len(validation_dataset)

Map:   0%|          | 0/582 [00:00<?, ? examples/s]

1104

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model)

In [None]:
import torch
from transformers import AutoModelForQuestionAnswering
from tqdm.auto import tqdm
import collections
import evaluate

metric = evaluate.load("squad")

In [None]:
# # Compute metrics based on model predictions and reference answers
# def compute_metrics(start_logits, end_logits, features, examples):
#     # Dictionary to map example IDs to their corresponding features
#     example_to_features = collections.defaultdict(list)
#     for idx, feature in enumerate(features):
#         example_to_features[feature["example_id"]].append(idx)

#     # List to store predicted answers
#     predicted_answers = []
#     # Iterate over each example
#     for example in tqdm(examples):
#         example_id = example["id"]
#         context = example["context"]
#         answers = []

#         # Loop through all features associated with that example
#         for feature_index in example_to_features[example_id]:
#             start_logit = start_logits[feature_index]
#             end_logit = end_logits[feature_index]
#             offsets = features[feature_index]["offset_mapping"]

#             # Select top n_best start and end indices based on logits
#             start_indexes = np.argsort(start_logit)[-1 : -n_best - 1 : -1].tolist()
#             end_indexes = np.argsort(end_logit)[-1 : -n_best - 1 : -1].tolist()
#             # Generate candidate answers
#             for start_index in start_indexes:
#                 for end_index in end_indexes:
#                     # Skip answers that are not fully in the context
#                     if offsets[start_index] is None or offsets[end_index] is None:
#                         continue
#                     # Skip answers with a length outside the range [0, max_answer_length]
#                     if (
#                         end_index < start_index
#                         or end_index - start_index + 1 > max_answer_length
#                     ):
#                         continue

#                     # Extract answer text and its combined logit score
#                     answer = {
#                         "text": context[offsets[start_index][0] : offsets[end_index][1]],
#                         "logit_score": start_logit[start_index] + end_logit[end_index],
#                     }
#                     answers.append(answer)

#         # Select the answer with the best score
#         if len(answers) > 0:
#             best_answer = max(answers, key=lambda x: x["logit_score"])
#             predicted_answers.append(
#                 {"id": example_id, "prediction_text": best_answer["text"]}
#             )
#         else:
#             predicted_answers.append({"id": example_id, "prediction_text": ""})

#     # Create theoretical answers for each example
#     theoretical_answers = [{"id": ex["id"], "answers": ex["answers"]} for ex in examples]
#     # Compute metrics based on predicted and theoretical answers
#     return metric.compute(predictions=predicted_answers, references=theoretical_answers)

In [None]:
def compute_metrics(start_logits, end_logits, features, examples):
    # Initialize a defaultdict to map example IDs to their corresponding feature indices
    example_to_features = collections.defaultdict(list)
    for idx, feature in enumerate(features):
        example_to_features[feature["example_id"]].append(idx)

    # List to store the formatted predicted answers
    predicted_answers = []

    # Placeholder values for n_best and max_answer_length, adjust as necessary
    n_best = 20  # Example value, adjust according to your needs
    max_answer_length = 30  # Example value, adjust according to your needs

    # Process each example to generate predictions
    for example in tqdm(examples):
        example_id = example["id"]
        context = example["context"]
        answers = []

        # Iterate through all features linked to the current example
        for feature_index in example_to_features[example_id]:
            start_logit = start_logits[feature_index]
            end_logit = end_logits[feature_index]
            offsets = features[feature_index]["offset_mapping"]

            # Determine top n_best start and end positions
            start_indexes = np.argsort(start_logit)[-1: -n_best - 1: -1].tolist()
            end_indexes = np.argsort(end_logit)[-1: -n_best - 1: -1].tolist()

            # Generate candidate answers based on top start/end positions
            for start_index in start_indexes:
                for end_index in end_indexes:
                    # Validate answer positions
                    if offsets[start_index] is None or offsets[end_index] is None:
                        continue
                    if end_index < start_index or end_index - start_index + 1 > max_answer_length:
                        continue

                    # Formulate the answer and score
                    answer = {
                        "text": context[offsets[start_index][0]: offsets[end_index][1]],
                        "logit_score": start_logit[start_index] + end_logit[end_index],
                    }
                    answers.append(answer)

        # Choose the best answer for the current example
        if answers:
            best_answer = max(answers, key=lambda x: x["logit_score"])
            predicted_answers.append({"id": example_id, "prediction_text": best_answer["text"]})
        else:
            predicted_answers.append({"id": example_id, "prediction_text": ""})

    # Correctly format the references from the examples dataset
    references_corrected = []
    for ex in examples:
        # Split answers if needed and create the correct format
        individual_answers = [{'text': ans, 'answer_start': 0} for ans in ex['answers'].split('|')]  # Adjust splitting mechanism as needed
        references_corrected.append({
            'id': ex['id'],
            'answers': individual_answers
        })

    # Compute the evaluation metric using the formatted predictions and references
    return metric.compute(predictions=predicted_answers, references=references_corrected)

In [None]:
model1 = AutoModelForQuestionAnswering.from_pretrained(model)

In [None]:
from transformers import TrainingArguments

In [None]:
args = TrainingArguments(
    "roberta-finetuned-subjqa-movies_2",
    evaluation_strategy="epoch",
    logging_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    num_train_epochs=5,
    weight_decay=0.01,
    push_to_hub=False,
    report_to="all",
    fp16=False  # Set fp16 to False as it requires CUDA, NPU, or certain XPU devices with IPEX
)


In [None]:
# # Define training arguments for the model
# args = TrainingArguments(
#     "roberta-finetuned-subjqa-movies_2", # Output dir for saving model and checkpoints
#     evaluation_strategy="epoch", # Evaluate at the end of each epoch
#     save_strategy="epoch", # Save model at the end of each epoch to match evaluation timing
#     per_device_train_batch_size=8, # Training batch size per device
#     per_device_eval_batch_size=8, # Evaluation batch size per device
#     num_train_epochs=4, # Total number of training epochs
#     seed=42, # Seed for reproducibility
#     load_best_model_at_end=True # Load the best model based on evaluation at the end of training
# )

In [None]:
from transformers import Trainer

# Create a Trainer instance for model training
trainer = Trainer(
    model=model1,  # The model to be trained
    args=args,  # Training arguments defined earlier
    train_dataset=train_dataset,  # Training dataset
    eval_dataset=validation_dataset,  # Evaluation dataset
    tokenizer=tokenizer,  # Tokenizer for processing inputs
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [None]:
predictions, _, _ = trainer.predict(validation_dataset)
start_logits, end_logits = predictions
compute_metrics(start_logits, end_logits, validation_dataset, val_dataset2)§

  0%|          | 0/582 [00:00<?, ?it/s]

{'exact_match': 2.0618556701030926, 'f1': 9.415323986809298}

In [None]:
from transformers import TrainerCallback, TrainerControl, TrainerState
import matplotlib.pyplot as plt

In [None]:
class MetricsCaptureCallback(TrainerCallback):
    """
    Capture the each epochs metrics for displaying the training process
    """
    def __init__(self):
        super().__init__()
        self.metrics = {}  # Stores metrics in a dictionary

    def on_evaluate(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, metrics=None, **kwargs):
        for key, value in metrics.items():
            if key not in self.metrics:
                self.metrics[key] = []
            self.metrics[key].append(value)

In [None]:
# Initialize the callback
metrics_capture = MetricsCaptureCallback()

# Create a trainer instance for the training
trainer.add_callback(metrics_capture)

In [None]:
# Proceed with training
trainer.train()

NameError: name 'trainer' is not defined

In [None]:
trainer.push_to_hub(commit_message="Done with The Training")

In [None]:
# Function to plot the captured metrics
def plot_metrics(metrics_dict, title='Model Performance Metrics Over Epochs'):
    plt.figure(figsize=(12, 8))
    for metric_name, values in metrics_dict.items():
        plt.plot(values, label=metric_name)
    plt.title(title)
    plt.xlabel('Epochs')
    plt.ylabel('Metric Value')
    plt.legend()
    plt.grid(True)
    plt.show()

In [None]:
# Function to print the final scores
def scores(metrics_dict):
    print("Scores:")
    for metric_name, values in metrics_dict.items():
        # Print the last value recorded for each metric, which is its final score
        print(f"{metric_name}: {values[-1]:.4f}")

In [None]:
# Call the functions to plot your metrics and print final scores
plot_metrics(metrics_capture.metrics)
scores(metrics_capture.metrics)


In [None]:
predictions, _, _ = trainer.predict(validation_dataset)
start_logits, end_logits = predictions
compute_metrics(start_logits, end_logits, validation_dataset, val_dataset2)

In [None]:
from transformers import pipeline

In [None]:
model_checkpoint_1 = "kgntmr/Capstone"
question_answerer = pipeline("question-answering", model=model_checkpoint_1)

In [None]:
df_train_1 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/subjqa-train.csv')
df_test_1 = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/subjqa-test.csv')

In [None]:
df_train_1.iloc[5].question

In [None]:
context = df_train_1.iloc[5].review
question = df_train_1.iloc[5].question
question_answerer(question=question, context=context)

In [None]:
# Replace this with your own checkpoint
model_checkpoint_cs = "kgntmr/Capstone"
question_answerer_old = pipeline("question-answering", model=model_checkpoint_cs)