<a href="https://colab.research.google.com/github/go-hyun77/ABSA/blob/documentation-and-refactor/ABSA_LLM_T5_SemEval2014.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **<u>Aspect-Based Sentiment Analysis (ABSA) with T5</u>**
This notebook implements a **T5 (Text-to-Text Transfer Transformer)** LLM capable of performing aspect-based sentiment analysis on the [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4). Instructions/explanations for testing/reviewing each code block will be outlined for posterity purposes.

> Aspect-based Sentiment Analysis (ABSA) is more nuanced form of sentiment analysis where specific aspects (features or topics) within a text are identified and mapped to the sentiment expressed towards said aspect.

The expected input/outputs of this model will be as follows in the given example:
```
INPUT: BEST spicy tuna roll, great asian salad.
```
```
OUTPUT: aspect=tuna roll, sentiment=positive; aspect=asian salad, sentiment=positive
```






In [None]:
#install dependencies and import libraries
!pip install transformers datasets sentencepiece -q
!pip install datasets==3.6.0

import pandas as pd
import numpy as np
from datasets import load_dataset
from transformers import T5ForConditionalGeneration, T5Tokenizer, Trainer, TrainingArguments
from google.colab import drive

drive.mount('/content/drive') #mount drive for saving/loading model
model_dir = "/content/drive/MyDrive/ABSA_T5_Model" #define model directory in google drive, you may need to modify this link to point to the appropriate directory

Collecting datasets==3.6.0
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: datasets
  Attempting uninstall: datasets
    Found existing installation: datasets 4.0.0
    Uninstalling datasets-4.0.0:
      Successfully uninstalled datasets-4.0.0
Successfully installed datasets-3.6.0
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **<u>Defining the Model (First Run)</u>**
The **T5 (Text-to-Text Transfer Transformer)** LLM is chosen as the baseline model in which we evaluate its capabilities to perform ABSA following subsequent training on the [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4). The T5 LLM is well suited to the task of ABSA as its defining idea is that everything is treated as a text-to-text problem, meaning that all inputs are text, and all outputs are text.
>If you would like to train the model from scratch to perform ABSA, execute the following block to select the T5-small model as a baseline to begin training on.


In [None]:
#define t5 base model

model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

# **<u>Defining the Model (Loading Saved Model)</u>**
If you are resuming from a previous session in which you have successfully trained a model on the [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4), execute this block to load the model configs to continue. Otherwise, make sure to load the base model in the previous block to continue.

In [None]:
#load tokenizer and model from Drive
tokenizer = T5Tokenizer.from_pretrained(model_dir, local_files_only=True)
model = T5ForConditionalGeneration.from_pretrained(model_dir, local_files_only=True)


print("Model path:", model.config._name_or_path)  #sanity check, print model path
print("Number of parameters:", sum(p.numel() for p in model.parameters()) // 1e6, "M")  #sanity check, check for model params to verify successful load

Model path: /content/drive/MyDrive/ABSA_T5_Model
Number of parameters: 60.0 M


# **<u>Loading and Examining the Dataset</u>**
Below we load and examine the [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4), minimal preprocessing is done to enhance the readability of the outputted columns and maintain consistency when no values are present for a specific column (for example when a polarity is not present due to lack of either positive or negative connotation). <br>

In this project, we will load and work with the restaurant review specific portion of the dataset with the following line:
```
dataset = load_dataset("alexcadillon/SemEval2014Task4", "restaurants")
```
The [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4)'s data columns contain the following attributes outlined in the below table:

| Field Name | Data Type | Description |
| :------- | :------: | -------: |
| sentenceId  | string  | Unique ID of the sentence mainly used for identification purposes.  |
| text  | string | The actual raw text content of the sentence.  |
| aspectTerms | list of dicts | Attributes for a given aspect: term, polarity, from (offset start), to (offset end).  |
| aspectCategories  | list of dicts | Categorical annotations for a given aspect and its respective polarity. |
| domain  | class label | Identifier for which domain this entry belongs to (restaurants in this case). |


In [None]:
#load dataset

dataset = load_dataset("alexcadillon/SemEval2014Task4", "restaurants") #restaurant reviews dataset config

In [None]:
# examine dataset
train_data = dataset["train"]

# print first 10 entries of train split
for i in range(10):
    print(f"{i+1}: {train_data[i]}")


1: {'sentenceId': '3121', 'text': 'But the staff was so horrible to us.', 'aspectTerms': [{'term': 'staff', 'polarity': 'negative', 'from': '8', 'to': '13'}], 'aspectCategories': [{'category': 'service', 'polarity': 'negative'}]}
2: {'sentenceId': '2777', 'text': "To be completely fair, the only redeeming factor was the food, which was above average, but couldn't make up for all the other deficiencies of Teodora.", 'aspectTerms': [{'term': 'food', 'polarity': 'positive', 'from': '57', 'to': '61'}], 'aspectCategories': [{'category': 'food', 'polarity': 'positive'}, {'category': 'anecdotes/miscellaneous', 'polarity': 'negative'}]}
3: {'sentenceId': '1634', 'text': "The food is uniformly exceptional, with a very capable kitchen which will proudly whip up whatever you feel like eating, whether it's on the menu or not.", 'aspectTerms': [{'term': 'food', 'polarity': 'positive', 'from': '4', 'to': '8'}, {'term': 'kitchen', 'polarity': 'positive', 'from': '55', 'to': '62'}, {'term': 'menu', 'p

In [None]:
#flatten dataset for readability
indexes = [train_data[i] for i in range(20)]  # first 20 entries

rows = []
for i in indexes:
    sentence_id = i["sentenceId"]
    text = i["text"]

    # If aspect terms exist, iterate through them
    if i["aspectTerms"]:
        for asp in i["aspectTerms"]:
            rows.append({
                "sentenceId": sentence_id,
                "text": text,
                "aspect_term": asp["term"],
                "term_polarity": asp["polarity"],
                "category": None,  # Add these to maintain consistent columns
                "category_polarity": None # Add these to maintain consistent columns
            })
    # If no explicit aspect terms, still record categories
    if i["aspectCategories"]:
        for cat in i["aspectCategories"]:
            rows.append({
                "sentenceId": sentence_id,
                "text": text,
                "aspect_term": None, # Add these to maintain consistent columns
                "term_polarity": None, # Add these to maintain consistent columns
                "category": cat["category"],
                "category_polarity": cat["polarity"]
            })


#convert to DataFrame
df = pd.DataFrame(rows)
print(df.head(10))

  sentenceId                                               text aspect_term  \
0       3121               But the staff was so horrible to us.       staff   
1       3121               But the staff was so horrible to us.        None   
2       2777  To be completely fair, the only redeeming fact...        food   
3       2777  To be completely fair, the only redeeming fact...        None   
4       2777  To be completely fair, the only redeeming fact...        None   
5       1634  The food is uniformly exceptional, with a very...        food   
6       1634  The food is uniformly exceptional, with a very...     kitchen   
7       1634  The food is uniformly exceptional, with a very...        menu   
8       1634  The food is uniformly exceptional, with a very...        None   
9       2534  Where Gabriela personaly greets you and recomm...        None   

  term_polarity                 category category_polarity  
0      negative                     None              None  
1       

# **<u>Preparing the Dataset for Training</u>**
As we previously saw in the above block, the [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4) consists of structured labels, columns, and values.
```
 sentenceId                                               text aspect_term  \
0       3121               But the staff was so horrible to us.       staff   
1       3121               But the staff was so horrible to us.        None   
2       2777  To be completely fair, the only redeeming fact...        food   
3       2777  To be completely fair, the only redeeming fact...        None   

  term_polarity                 category category_polarity  
0      negative                     None              None  
1          None                  service          negative  
2      positive                     None              None  
3          None                     food          positive  
```
On its own, it is not in a format that T5 can **accept** and **produce**. Thus, the input to T5 will be formatted into a raw text aspect-sentiment "pair" within the structure seen as follows from:
```
"aspectTerms": [
    {"term": "staff", "polarity": "negative"}
  ]
```
to the following:

```
aspect=staff, sentiment=negative
```



In [None]:
#create aspect-sentiment pairs from dataset such as "aspect=food, sentiment=positive; aspect=service, sentiment=negative" or "none" if no aspects
import json

def format_target(ex):

    pairs = []
    for asp in ex.get("aspectTerms", []): #for all aspects from dataset column aspectTerms, pull term label and sentiment polarity
        term = asp.get("term")
        pol = asp.get("polarity")
        if term is None or pol is None: #if none
            continue
        #normalize to lowercase and remove whitespace
        pairs.append(f"aspect={term.strip()}, sentiment={pol.strip().lower()}")
    return "; ".join(pairs) if pairs else "none"



In [None]:
#function to tokenize inputs (as in the plain sentences + aspect terms/values) for model to train on

def preprocess(ex):

    #explicit task instruction to prevent input echoing and training collapse
    instruction = (
        "Extract aspect-based sentiment. "
        "Return outputs in the exact format: "
        "'aspect=<term>, sentiment=<positive|negative|neutral>' "
        "separated by '; ' for multiple aspects. If no aspects, return 'none'.\n\n"
    )

    #include instruction in the input
    input_text = instruction + "ABSA: " + ex["text"]

    target_text = format_target(ex)

    model_inputs = tokenizer(
        input_text,
        text_target=target_text,
        padding="max_length",
        truncation=True,
        max_length=128
    )

    # Add raw text and aspectTerms to the processed example for evaluation
    model_inputs["raw_text"] = ex["text"]
    model_inputs["raw_aspects"] = ex.get("aspectTerms", [])

    return model_inputs

In [None]:
#apply preprocess function to each entry in training and validation test splits
train_dataset = dataset["train"].map(preprocess, remove_columns=[])
valid_dataset = dataset["test"].map(preprocess, remove_columns=[])

In [None]:
# Set PyTorch format (so Trainer can use them directly)
train_dataset.set_format(type="torch")
valid_dataset.set_format(type="torch")

# **<u>Pre-Training Validation Checks</u>**
Below are validation and sanity checks to ensure everything is prepared for and to avoid unnecessary training (as the total training time spent was approximately 5-6 hours for 5 epochs).

In [None]:
#quick verify format_target on one raw example to see whether aspect-sentiment pair was generated properly from the above format_target()

#expected output: aspect=staff, sentiment=negative
print("RAW example:", dataset["train"][0])
print("FORMATTED target:", format_target(dataset["train"][0]))

RAW example: {'sentenceId': '3121', 'text': 'But the staff was so horrible to us.', 'aspectTerms': [{'term': 'staff', 'polarity': 'negative', 'from': '8', 'to': '13'}], 'aspectCategories': [{'category': 'service', 'polarity': 'negative'}]}
FORMATTED target: aspect=staff, sentiment=negative


In [None]:
#quick decode check (ensure tokenizer didn't strip or change the target format)

#expected output: aspect=staff, sentiment=negative
print("Decoded input (train[0]):")
print(tokenizer.decode(train_dataset[0]["input_ids"], skip_special_tokens=True))
print("Decoded target (train[0]):")
print(tokenizer.decode(train_dataset[0]["labels"], skip_special_tokens=True))

Decoded input (train[0]):
Extract aspect-based sentiment. Return outputs in the exact format: 'aspect=term>, sentiment=positive|negative|neutral>' separated by ';'for multiple aspects. If no aspects, return 'none'. ABSA: But the staff was so horrible to us.
Decoded target (train[0]):
aspect=staff, sentiment=negative


In [None]:
#sanity check, labels of first index of training data set
print(tokenizer.decode(train_dataset[0]["labels"], skip_special_tokens=True))

aspect=staff, sentiment=negative


In [None]:
#sanity check, input of training data post-preprocessing, must include instruction and "ABSA:" prefix

#expected output: <instruction> ABSA: <input>
print(tokenizer.decode(train_dataset[0]["input_ids"], skip_special_tokens=True))

Extract aspect-based sentiment. Return outputs in the exact format: 'aspect=term>, sentiment=positive|negative|neutral>' separated by ';'for multiple aspects. If no aspects, return 'none'. ABSA: But the staff was so horrible to us.


In [None]:
#sanity check, load t5 base model to confirm base model is loaded for training
model = T5ForConditionalGeneration.from_pretrained(model_name)

In [None]:
#sanity check, verify model directory for saving model configs

model_dir = "/content/drive/MyDrive/ABSA_T5_Model"
!ls /content/drive/MyDrive

'3rd Iteration Document'    'CPSC 301'	    'CPSC 440'	'CPSC 589'
 ABSA_T5_Model		    'CPSC 311'	    'CPSC 452'	'EGCP 401'
 ABSA_T5_Model_New_Dataset  'CPSC 315'	    'CPSC 471'	'EVO Food Places.xlsx'
'AP GOV'		    'CPSC 323'	    'CPSC 481'	 MATH338
 BIO101			    'CPSC 332'	    'CPSC 485'	 Misc.
 Books			    'CPSC 335'	    'CPSC 531'	'Oct Genesis.png'
'Colab Notebooks'	    'CPSC 351'	    'CPSC 544'	'PSC Biotech'
'CPSC 121'		    'CPSC 353 458'  'CPSC 548'	'Test Folder'
'CPSC 223J'		    'CPSC 362'	    'CPSC 552'
'CPSC 240'		    'CPSC 375'	    'CPSC 566'
'CPSC 254'		    'CPSC 439'	    'CPSC 585'


# **<u>Model Training</u>**
This block contains code to initialize training parameters, execute training, and output training results. You may skip this block if you are loading a previously trained model from Drive.
>**NOTE:** There is a rather long training time required overall due to the size of the [SemEval2014 dataset](https://huggingface.co/datasets/alexcadillon/SemEval2014Task4)'s restaurant review split. While model accuracy has been observed to improve with a larger number of epochs in prior testing, in the interest of time the number of epochs has been limited to 5. You may increase this number to observe/test validation accuracy changes at your discretion.

In [None]:
#training setup and parameters

args = TrainingArguments(

    #directory where checkpoints, logs, and metadata are saved
    output_dir="./absa_t5",

    #when evaluation occurs, in this case once in beginning of each epoch
    eval_strategy="epoch",

    #how often checkpoints are saved, in this case once in beginning of each epoch
    save_strategy="epoch",

    #learning rate, recommended for t5-small is between 1e-4 → 3e-5 for AdamW
    learning_rate=3e-5,

    #increasing batch size per gpu for training/evaluation stabilizes gradients, reduces training time, but increases GPU memory requirements
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,

    #num of passes through the dataset
    num_train_epochs=6,

    #regularization to reduce overfitting
    weight_decay=0.01,

    #number of checkpoints saved (last two epochs)
    save_total_limit=2,

    #how often training logs are printed (every 50 steps)
    logging_steps=50,

    #gradient computation, set to false as unsupported by GPU
    fp16=False,    # set True if your GPU supports it

    #toggle auto push to huggingface
    push_to_hub=False,
)

#constructor
trainer = Trainer(
  model=model,
  args=args,
  train_dataset=train_dataset,
  eval_dataset=valid_dataset,
)

In [None]:
#train model, no need to execute this block if loading saved model
trainer.train()



Epoch,Training Loss,Validation Loss
1,0.1632,0.086505
2,0.0765,0.058833
3,0.0693,0.051112
4,0.0713,0.048222
5,0.0546,0.046494
6,0.0567,0.045654




TrainOutput(global_step=2286, training_loss=0.3016708865044728, metrics={'train_runtime': 25072.7909, 'train_samples_per_second': 0.728, 'train_steps_per_second': 0.091, 'total_flos': 617361627414528.0, 'train_loss': 0.3016708865044728, 'epoch': 6.0})

In [None]:
#save model
model.save_pretrained(model_dir)
tokenizer.save_pretrained(model_dir)


('/content/drive/MyDrive/ABSA_T5_Model/tokenizer_config.json',
 '/content/drive/MyDrive/ABSA_T5_Model/special_tokens_map.json',
 '/content/drive/MyDrive/ABSA_T5_Model/spiece.model',
 '/content/drive/MyDrive/ABSA_T5_Model/added_tokens.json')

# **<u>Post-Training Validation Checks</u>**
This block contains code to validate training inputs, and decoded training.

In [None]:
#sanity check, confirm correct model input post pre-processing
print("Decoded training input:", tokenizer.decode(train_dataset[0]["input_ids"], skip_special_tokens=True))

Decoded training input: Extract aspect-based sentiment. Return outputs in the exact format: 'aspect=term>, sentiment=positive|negative|neutral>' separated by ';'for multiple aspects. If no aspects, return 'none'. ABSA: But the staff was so horrible to us.


In [None]:
#sanity check, confirm correct label post pre-processing
print("Decoded training label:", tokenizer.decode(train_dataset[0]["labels"], skip_special_tokens=True))

Decoded training label: aspect=staff, sentiment=negative


In [None]:
#sanity check, confirm model path
print("Model path:", model.config._name_or_path)

#sanity check, confirm the ABSA: prefix was used during training
print("Example training input:", dataset["train"][0]["text"])

#try inference without prefix (if you didn't train with one)
def absa_predict(text):
    instruction = (
        "Extract aspect-based sentiment. "
        "Return outputs in the exact format: "
        "'aspect=<term>, sentiment=<positive|negative|neutral>' "
        "separated by '; ' for multiple aspects. If no aspects, return 'none'.\n\n"
    )
    full_input = instruction + "ABSA: " + text

    inputs = tokenizer(full_input, return_tensors="pt", padding=True)
    outputs = model.generate(
        inputs["input_ids"],
        max_length=128,
        num_beams=4,
        early_stopping=True
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

Model path: /content/drive/MyDrive/ABSA_T5_Model
Example training input: But the staff was so horrible to us.


# **<u>ABSA Input Tests</u>**
This block contains the function to test the trained model's functional accuracy by manually inputting text to gauge the correctness of its evaluation. Modify the text parameter in the below function and execute the block to have the model identify and output detected aspect-sentiment pairs.
```
print(absa_predict("your-sentence-here"))
```


In [None]:
#test model with text input (you may notice something wrong with this example)
print(absa_predict("The food was great, but parking was incredibly difficult."))

aspect=food, sentiment=positive; aspect=parking, sentiment=positive


In [None]:
#output first 20 absa results produced by the model on the dataset
for i in range(20):
    t = dataset["test"][i]["text"]
    print("INPUT:", t)
    print("MODEL OUTPUT:", absa_predict(t))
    print("-" * 50)

INPUT: The bread is top notch as well.
MODEL OUTPUT: aspect=bread, sentiment=positive
--------------------------------------------------
INPUT: I have to say they have one of the fastest delivery times in the city.
MODEL OUTPUT: aspect=delivery times, sentiment=positive
--------------------------------------------------
INPUT: Food is always fresh and hot- ready to eat!
MODEL OUTPUT: aspect=Food, sentiment=positive
--------------------------------------------------
INPUT: Did I mention that the coffee is OUTSTANDING?
MODEL OUTPUT: aspect=caffee, sentiment=positive
--------------------------------------------------
INPUT: Certainly not the best sushi in New York, however, it is always fresh, and the place is very clean, sterile.
MODEL OUTPUT: aspect=sushi, sentiment=positive; aspect=place, sentiment=positive
--------------------------------------------------
INPUT: I trust the people at Go Sushi, it never disappoints.
MODEL OUTPUT: none
--------------------------------------------------

# **<u>F1 Score Evaluation</u>**
Now that we have a model capable of identifying aspects and mapping sentiments within each aspect-sentiment pair (to varying degrees of success). But how "*good*" is this model? To evaluate this, we now evaluate the model's F1 score. </br>

>The **F1 score** is a common metric used to evaluate natural language processing (NLP) models that specialize in areas such as classification, extraction, and ABSA. It is the harmonic mean of the model's **precision** (how precise is this model predictions?) and **recall** (how many relevant things did it find?). In short, it measures the model’s ability to produce correct outputs while avoiding incorrect ones.

The equation to calculate an F1 score is as follows:
> $$ F1 = 2 × (\frac{Precision × Recall}{Precision + Recall}) $$

In the context of this model, there are two tasks being performed to be evaluated: **aspect extraction**, and **sentiment classification**.

In [None]:
import re

#parse absa text outputs into structured data for F1 scoring function to parse

def absa_generate(text, max_length=128):

    #instruction for output format
    instruction = (
        "Extract aspect-based sentiment. "
        "Return outputs in format: 'aspect=<term>, sentiment=<positive|negative|neutral>' "
        "separated by '; ' or 'none' if no aspects.\n\n"
    )
    #store input text as instruction + absa prefix + raw text
    input_text = instruction + "ABSA: " + text
    #tokenize string and convert to PyTorch tensors
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True).to(model.device)
    #generate output text
    outputs = model.generate(inputs["input_ids"], max_new_tokens=max_length, num_beams=4, early_stopping=True)
    #convert token ids back into raw text
    raw = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return raw

#convert ABSA results (like "aspect=food, sentiment=positive; aspect=service, sentiment=negative") back to format the dataset originally was in ({"aspect": "food", "sentiment": "positive"}, {"aspect": "food", "sentiment": "positive"},
#{"aspect": "service", "sentiment": "negative"}

def parse_aspect_string(s):
    s = s.strip()
    #if no predictions found
    if not s or s.lower() in {"none", "no aspects", "[]"}:
        return []
    pairs = []
    #split on ';' if multiple aspects found, then parse each part
    for part in s.split(";"):
        part = part.strip()
        if not part:
            continue
        #match aspect=<term>, sentiment=<polarity>
        m = re.search(r"aspect\s*=\s*(.+?)\s*,\s*sentiment\s*=\s*(positive|negative|neutral)", part, flags=re.I)
        if m:
            aspect = m.group(1).strip()
            sentiment = m.group(2).strip().lower()
            pairs.append({"aspect": aspect, "sentiment": sentiment})
        else:
            #fallback: try `X was Y` style (rare) or ignore
            m2 = re.search(r"(.+?)\s+was\s+(positive|negative|neutral)", part, flags=re.I)
            if m2:
                pairs.append({"aspect": m2.group(1).strip(), "sentiment": m2.group(2).lower()})
    return pairs

In [None]:
#find correct pairs, convert dataset example with `aspectTerms` into our tuple list form

def extract_good_pairs(example):
    pairs = []
    for asp in example.get("aspectTerms", []):
        term = asp.get("term")
        pol = asp.get("polarity")
        if term and pol:
            pairs.append({"aspect": term, "sentiment": pol.lower()})
    return pairs

In [None]:
#sanity check, output test to check if aspect strings converted properly back to original dataset format

for s in [
    "The food was amazing but the service was terrible.",
    "I like the ambiance, but the drinks are overpriced."
]:
    raw = absa_generate(s)
    print("RAW:", raw)
    print("PARSED:", parse_aspect_string(raw))
    print()

RAW: aspect=food, sentiment=positive; aspect=service, sentiment=positive
PARSED: [{'aspect': 'food', 'sentiment': 'positive'}, {'aspect': 'service', 'sentiment': 'positive'}]

RAW: aspect=ambiance, sentiment=positive; aspect=drinks, sentiment=positive
PARSED: [{'aspect': 'ambiance', 'sentiment': 'positive'}, {'aspect': 'drinks', 'sentiment': 'positive'}]



In [None]:
#computing joint f1 score of aspect extraction + sentiment classification

def compute_f1(true_pairs, pred_pairs):

    #convert list of dicts into tuples

    #true_pairs: list of dicts [{"aspect":..., "sentiment":...}, ...]
    true_set = set((p["aspect"].lower(), p["sentiment"].lower()) for p in true_pairs)
    #pred_pairs: list of dicts of same form
    pred_set = set((p["aspect"].lower(), p["sentiment"].lower()) for p in pred_pairs)

    #true positive, actual and prediction = true, correct match
    TP = len(true_set & pred_set)

    #false positive, predicted true but not true, not correct
    FP = len(pred_set - true_set)
    #false negative, did not predict true on true, not correct
    FN = len(true_set - pred_set)

    #returns precision, recall, f1 for the joint match (aspect extraction + sentiment classification, refer to formula)
    precision = TP / (TP + FP + 1e-12)  #small epsilon added to prevent /0 errors
    recall = TP / (TP + FN + 1e-12)
    f1 = 2 * precision * recall / (precision + recall + 1e-12) if (precision + recall) > 0 else 0.0

    return precision, recall, f1

In [None]:
#function to run on dataset

from tqdm import tqdm

def evaluate_on_dataset(split_dataset, limit=None):
    tot_p = tot_r = tot_f1 = 0.0
    n = 0

    for i, ex in enumerate(tqdm(split_dataset)):  #loop through dataset
        if limit and i >= limit:
            break

        #get the good labels from preprocess metadata
        gold = extract_good_pairs({
            "aspectTerms": ex["raw_aspects"]
        })

        #build the input text
        input_text = "ABSA: " + ex["raw_text"]

        #run model inference
        raw_output = absa_generate(input_text)

        #parse prediction into same structure as good pairs
        pred = parse_aspect_string(raw_output)

        #compute F1 for this sample
        p, r, f1 = compute_f1(gold, pred)

        tot_p += p
        tot_r += r
        tot_f1 += f1
        n += 1

    return {
        "precision": tot_p / n,
        "recall": tot_r / n,
        "f1": tot_f1 / n
    }

In [None]:
scores = evaluate_on_dataset(valid_dataset)
print(scores)

100%|██████████| 800/800 [11:26<00:00,  1.17it/s]

{'precision': 0.3432499999997486, 'recall': 0.3116577380950211, 'f1': 0.3207022283268132}



