# Aspect-Based Sentiment Classification Walkthrough

In this notebook, we will explore the process of using LangChain with a large language model (LLM) for aspect-based sentiment classification (ABSC). ABSC is a granular approach to sentiment analysis that determines the sentiment (positive, negative, or neutral) toward specific aspects or attributes within a piece of text.

For example, in the sentence:
> "The food at the restaurant was delicious, but the service was slow."

The sentiment toward the aspect *"food"* is positive, while the sentiment toward the aspect *"service"* is negative.

### Outline

In this walkthrough, we will:

1. **Load an ABSC Dataset:** Read in a dataset specifically designed for aspect-based sentiment classification. We will use the SemEval 2014 Dataset, which can be downloaded in  handy csv file from [Kaggle](https://www.kaggle.com/datasets/charitarth/semeval-2014-task-4-aspectbasedsentimentanalysis?select=Laptop_Train_v2.csv)
2. **Build an LLM Using LangChain and HuggingFace:** Configure the large language model to handle sentiment classification tasks.
3. **Craft a Labeling Prompt:** Create a well-structured prompt to guide the LLM in identifying sentiment for specific aspects of text.
4. **Classify Dataset Examples:** Use the LLM and the prompt to classify examples in the dataset.
5. **Evaluate Performance:** Measure classification accuracy using evaluation metrics such as precision, recall, and F1 score.

### Example

To understand ABSC better, let’s consider this example:

```python
review = "The laptop's performance is outstanding, but the battery life is disappointing."
aspects = ["performance", "battery life"]
```

The expected output is:

| Aspect         | Sentiment   |
|----------------|-------------|
| Performance    | Positive    |
| Battery Life   | Negative    |

By the end of this notebook, you'll learn how to apply ABSC with LLMs to solve similar problems effectively.


# Configure the Environment

In [1]:
! pip install pandas
! pip install langchain
! pip install transformers
! pip install langchain-huggingface

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==1

# Read in dataset and investigate the data

Make sure to download the data from [Kaggle](https://www.kaggle.com/datasets/charitarth/semeval-2014-task-4-aspectbasedsentimentanalysis?select=Laptop_Train_v2.csv). We'll be working with the ```Laptop_Train_v2.csv```

__Prompt__: I would like the python code to read in python dataframe from a csv named "Laptop_Train_v2.csv". I would like for you to downsample to 100 entries from the dataframe. Note that for the sampling, use the column 'id' to determine samples. So, there should be 100 unique values of 'id', but there will be more than 100 rows. I would like to also have the code to view the column names, a sample of the entries and summary of the values in each column.

In [2]:
import pandas as pd

file_name = "Laptop_Train_v2.csv"
df = pd.read_csv(file_name)
print("CSV file successfully loaded!")

# Downsample to 100 unique 'id' values
sampled_ids = df['id'].drop_duplicates().sample(n=100)  # Randomly select 100 unique IDs
df = df[df['id'].isin(sampled_ids)]

CSV file successfully loaded!


In [3]:
# Display the column names
print("\nColumn Names:")
print(df.columns)


Column Names:
Index(['id', 'Sentence', 'Aspect Term', 'polarity', 'from', 'to'], dtype='object')


In [4]:
# Display a sample of the entries (first 5 rows by default)
print("\nSample Entries:")
print(df.head())



Sample Entries:
     id                                           Sentence       Aspect Term  \
45  810  YOU WILL NOT BE ABLE TO TALK TO AN AMERICAN WA...  WARRANTY SERVICE   
56  997  Drivers updated ok but the BIOS update froze t...           Drivers   
57  997  Drivers updated ok but the BIOS update froze t...       BIOS update   
58  997  Drivers updated ok but the BIOS update froze t...            system   
61  147                         The keyboard is too slick.          keyboard   

    polarity  from  to  
45  negative    44  60  
56  positive     0   7  
57  negative    27  38  
58  negative    49  55  
61  negative     4  12  


In [5]:
# Display a summary of the values in each column
print("\nSummary of Column Values:")
print(df.info())  # Provides information about data types and non-null counts


Summary of Column Values:
<class 'pandas.core.frame.DataFrame'>
Index: 143 entries, 45 to 2319
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           143 non-null    int64 
 1   Sentence     143 non-null    object
 2   Aspect Term  143 non-null    object
 3   polarity     143 non-null    object
 4   from         143 non-null    int64 
 5   to           143 non-null    int64 
dtypes: int64(3), object(3)
memory usage: 7.8+ KB
None


In [6]:
print("\nStatistical Summary:")
print(df.describe(include='all'))  # Provides a statistical summary for all columns


Statistical Summary:
                 id                                           Sentence  \
count    143.000000                                                143   
unique          NaN                                                100   
top             NaN  But to be honest, I don't use my computer for ...   
freq            NaN                                                  3   
mean    1742.874126                                                NaN   
std      900.760140                                                NaN   
min        3.000000                                                NaN   
25%     1056.000000                                                NaN   
50%     1890.000000                                                NaN   
75%     2506.000000                                                NaN   
max     3077.000000                                                NaN   

       Aspect Term  polarity        from          to  
count          143       143  143.

__prompt__: Okay, now please give me the python code to look at the unique values, along with their counts of the "polarity" and the "Aspect Term" columns from the dataframe 'df'. You do not need to check for the columns or dataframe; they are already loaded in.

In [7]:
# Display unique values and their counts for "polarity"
print("\nUnique values and their counts in the 'polarity' column:")
print(df['polarity'].value_counts())


Unique values and their counts in the 'polarity' column:
polarity
positive    64
negative    55
neutral     24
Name: count, dtype: int64


In [8]:
# Display unique values and their counts for "Aspect Term"
print("\nUnique values and their counts in the 'Aspect Term' column:")
print(df['Aspect Term'].value_counts())


Unique values and their counts in the 'Aspect Term' column:
Aspect Term
keyboard                    8
screen                      6
programs                    6
price                       4
size                        4
                           ..
windows disc                1
Programs                    1
value                       1
Windows operating system    1
compatibility               1
Name: count, Length: 109, dtype: int64


# Instantiate a LLM for Classification

__prompt__: Now, I need you to create an LLM object using LangChain. In particular, I would like to use the text-generation model of "tiiuae/Falcon3-3B-Instruct" from HuggingFace and use the 0th GPU. Make sure to import the langchain HuggingFace pipeline as "from langchain_huggingface import HuggingFacePipeline". Also, make sure when creating the pipeline to specify "max_new_tokens = 500", and make sure the pipeline only outputs the generated text and not the prompt.

In [None]:
# Import the required libraries
from langchain_huggingface import HuggingFacePipeline
from transformers import pipeline

# Define the model name
model_name = "tiiuae/Falcon3-3B-Instruct"

# Create a HuggingFace pipeline with the specified settings
text_gen_pipeline = pipeline(
    "text-generation",  # Specify the task
    model=model_name,   # Specify the model
    device=0,           # Use the 0th GPU
    max_new_tokens=500, # Limit the number of generated tokens
    return_full_text=False  # Ensure the output only includes the generated text
)

# Wrap the pipeline with LangChain's HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=text_gen_pipeline)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/658 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.47G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
llm("Say Hello")

# Label the Aspects and Sentiments

Now, we need to label the aspects and sentiments in the data. We first need to create an aspect sentiment labeling prompt

__prompt__: I would like a prompt, formatted as a langchain prompt object, that does aspect-based sentiment classification of a laptop review. The prompt takes in a review of a laptop called {sentence} and does two things. First, it needs to determine what aspects are mentioned in the review. Examples include: 'screen', 'battery life', 'packaging', 'graphics', 'warranty', 'price', 'features'. Each review should have between 1 and 4 aspects, and aspects are usually only one or two word phrases from the review text that describe the laptop. Aspects are usually nouns, and not adverbs or adjectives like "died" or "perfect".

Then for each aspect that is in the review - and only those aspects - it needs to provide the sentiment of the review towards that aspect. The possible sentiment values are "positive", "negative", "neutral", and "conflict". The prompt should also specify that the output should be of the form of a list of tuples, where the first entry in the tuple is the aspect, and second entry is the sentiment of the review towards that aspect. The prompt should also specify that the LLM should only output this list of tuples and no other words. So, as an example:

sentence: "The Macbook arrived in a nice twin packing and sealed in the box, all the functions works great."
output: [('functions', 'positive'),('packaging', 'positive')]

sentence: "The USB port never worked"
output: [('USB port', 'negative')]

sentence: "The price and features more than met my needs."
output: [('price', 'positive'), ('features', 'positive')]

sentence: "My warranty ran out right as the screen died."
output: [('warranty', 'negative'), ('screen', 'negative')]

sentence: "The battery has standard life and the shipping was fast."
output: [('battery', 'neutral'), ('shipping', 'positive')]

sentence: "Just a black screen!"
output: [('screen', 'negative')]

In [None]:
from langchain.prompts import PromptTemplate

# Define the template for aspect-based sentiment classification
template = """
You are an AI that analyzes laptop reviews and determines the sentiment towards various aspects mentioned in the review.

Given a review sentence, do the following:
1. Identify the aspects mentioned in the review. Aspects are typically nouns, usually one or two words, and represent features of the laptop (such as 'screen', 'battery life', 'packaging', 'graphics', 'warranty', 'price', 'features'). There should be between 1 and 4 aspects mentioned in the review.
2. For each identified aspect, determine the sentiment expressed in the review towards that aspect. The possible sentiments are 'positive', 'negative', 'neutral', or 'conflict'.
3. Provide the output as a list of tuples, where the first entry in the tuple is the aspect, and the second entry is the sentiment towards that aspect.

The output should only be the list of tuples with the aspects and sentiments and nothing else. Below are some examples:

Example 1:
Input: "The Macbook arrived in a nice twin packing and sealed in the box, all the functions works great."
Output: [('functions', 'positive'), ('packaging', 'positive')]

Example 2:
Input: "The USB port never worked."
Output: [('USB port', 'negative')]

Example 3:
Input: "The price and features more than met my needs."
Output: [('price', 'positive'), ('features', 'positive')]

Example 4:
Input: "My warranty ran out right as the screen died."
Output: [('warranty', 'negative'), ('screen', 'negative')]

Example 5:
Input: "The battery has standard life and the shipping was fast."
Output: [('battery', 'neutral'), ('shipping', 'positive')]

Example 6:
Input: "Just a black screen!"
Output: [('screen', 'negative')]

Input: "{sentence}"
Output:
"""

# Create the LangChain prompt object
prompt = PromptTemplate(input_variables=["sentence"], template=template)

# Example usage
sentence = "The battery life was long, but the screen quality was disappointing."
formatted_prompt = prompt.format(sentence=sentence)
print(formatted_prompt)


In [None]:
example = df.iloc[10,:]

print(example['Sentence'])

llm(prompt.format(sentence=example['Sentence']))

Now that we have our labeling prompt, lets construct our labeling chain.

__prompt__: Now, given the LangChain prompt template "prompt", please give me the code to create a langchain chain using the pipe operator "|" with the prompt and an LLM called "llm". Please also create function to parse the output and remove extraneous output from the llm. Below are some examples:

'\n<|assistant|>\n[output] "[\'hardware\', \'positive\'], [\'shipping\', \'negative\']"' -> [('hardware', 'positive'), ('shipping', 'negative')]
"<|assistant|>\n['camera', 'neutral']" -> [('camera', 'neutral')]
"<|assistant|>\n[['graphics', 'positive']], " -> [('graphics', 'positive')]
"<|assistant|>\n[('touchpad', 'positive')]" -> [('touchpad', 'positive')]
"<|assistant|>\nsomething nonsensical, " -> [('','')]

The chain should resemble: label_prompt | llm | parse_output

In [None]:
import ast

# Function to parse the LLM output and remove extraneous content
def parse_output(output: str):
    try:
        # Extract the list of tuples from the LLM output, which could be wrapped in extra characters
        parsed_output = ast.literal_eval(output.strip().replace('<|assistant|>', '').strip())

        # Ensure the output is a list of tuples
        if isinstance(parsed_output, list) and all(isinstance(item, tuple) and len(item) == 2 for item in parsed_output):
            return parsed_output
        else:
            return [('','')]  # Return an empty tuple if the output isn't valid
    except Exception as e:
        # Handle errors and return empty tuple in case of invalid format
        print(f"Error parsing output: {e}")
        return [('','')]

# Define the LLM chain that uses the prompt and LLM
chain = prompt | llm | parse_output

In [None]:
print(example['Sentence'])

chain.invoke(example['Sentence'])

__prompt__: Now, produce code to iterate through the dataframe "df" and do the aspect-based sentiment classifications of the "Sentence" column of the dataframe. Please note that you only need to do a labeling for each unique entry in 'id' column. Store the output as a new dataframe called "results_df" with columns 'id' and "aspect_sentiment". Please also include tqdm to monitor performance into the labeling loop

In [None]:
from tqdm import tqdm
results = []

# Iterate through each unique 'id' in the DataFrame using tqdm to monitor progress
for unique_id in tqdm(df['id'].drop_duplicates(), desc="Processing reviews", unit="id"):
    # Get the first sentence for each unique 'id'
    sentence = df[df['id'] == unique_id].iloc[0]['Sentence']

    # Get the aspect-based sentiment output
    aspect_sentiment = chain.invoke(sentence)

    # Append the result to the results list
    results.append({'id': unique_id, 'aspect_sentiment': aspect_sentiment})

# Create the results DataFrame
results_df = pd.DataFrame(results)

# Show the first few rows of the results DataFrame
print(results_df.head())

# Evaluate the Results

Now, we need to evaluate the results. One of the difficulties, however, is in the phrasing of the aspects. Its possible to have more than one term describe the same aspect, such as "battery" and "battery life" in "the battery life is really good". In both cases, we are talking about the "battery" but an exact match of the aspect words would fail. So, we will use the LLM to assist us in the evaluation of results by comparing entities and mathcing semantically close ones.

First, we need to get out all of the aspect based sentiments from the original dataframe

__prompt__ I would like the python code to get all of the aspects and their corresponding polarities from the dataframe 'df'. To do this, for each unique id in the column 'id' take out all of the entries from the 'Aspect Term' and 'polarity' columns and combine those into a tuple. Then combine all of the tuples for each unique id into a list. Then create a dataframe with columns 'id' and 'true_aspect_sentiment' from these values. So, for example, if the id '1111' has two entries, than there should be an entry in the new dataframe of {'id':1111, 'true_aspect_sentiment':[('aspect_1', 'sentiment_1'), ('aspect_2', 'sentiment_2')]}

In [None]:
true_aspect_sentiment = []

# Iterate through each unique 'id' in the DataFrame
for unique_id in df['id'].drop_duplicates():
    # Get the rows for the current unique 'id'
    subset = df[df['id'] == unique_id]

    # Create a list of tuples from the 'Aspect Term' and 'polarity' columns
    aspect_sentiment_tuples = list(zip(subset['Aspect Term'], subset['polarity']))

    # Append the result to the list, ensuring each entry has 'id' and 'true_aspect_sentiment'
    true_aspect_sentiment.append({'id': unique_id, 'true_aspect_sentiment': aspect_sentiment_tuples})

# Create the results DataFrame
true_aspect_sentiment_df = pd.DataFrame(true_aspect_sentiment)

# Show the first few rows of the results DataFrame
print(true_aspect_sentiment_df.head())

__prompt__ now, please give me the code to join 'true_aspect_sentiment_df' with 'results_df' on the 'id' column. Call this dataframe 'eval_df'.

In [None]:
# Join 'true_aspect_sentiment_df' with 'results_df' on the 'id' column
eval_df = pd.merge(true_aspect_sentiment_df, results_df, on='id', how='inner')

# Show the first few rows of the resulting 'eval_df'
print(eval_df.head())

Now, let's do the matching of the results, so that we can evaluate how we our LLM and prompt are doing.

__prompt__ Please produce prompt to match tuples between two different lists in a langchain prompt template. The LLM will be given two lists of tuples, one called "true aspect sentiments" and one called "predicted aspect sentiments". The task is to determine for each tuple in the "true aspect sentiments" if there is a tuple in "predicted aspect sentiments" that closely matches. To do the matching, there are two steps. First, you need to determine if the first entry in the tuples are describing the same things. For example, "battery" and "battery life" are roughly describing the same things, while "operating system" and "packaging" are not. In other words, you must determine if the two first entries are semantically very close. Then, if the first entries of the tuples match, compare the second entries of the tuples for matching. The second entries are the sentiment terms and should match exactly. For example "positive" and "positive" match, but "neutral" and "negative" do not. Once you have match for a tuple, move to the next tuple; for each tuple in "true aspect sentiments", only count if it has at least one match. Finally, output the number of matches you have as a number (i.e., 0,1,2, etc.). Below are some examples to help you format this prompt for the LLM:

Example 1:
true aspect sentiments: [(suite of software, positive)]
predicted aspect sentiments: [(software, positive), (suite, positive)]
output: 1

Example 2
true aspect sentiments: [('space', 'positive'), ('keyboard', 'negative')]
predicted aspect sentiments: [('extra space', 'positive'), ('keyboard', 'negative')]
output: 2

Example 3
true aspect sentiments: [('price premium', 'negative'), ('features', 'positive')]
predicted aspect sentiments: [('price', 'neutral'), ('features', 'positive')]
output: 1

Example 4
true aspect sentiments: [('web cam', 'neutral'), ("burn cd's", 'neutral')]
predicted aspect sentiments: [('web cam', 'negative'), ('cd burning', 'negative')]
output: 0

Example 5
true aspect sentiments: [('space', 'positive'), ('keyboard', 'negative')]
predicted aspect sentiments: [('storage', 'positive'), ('screen', 'negative')]
output: 1

Example 6
true aspect sentiments: [('battery life', 'positive')]
predicted aspect sentiments: [('battery life', 'positive'), ('battery', 'positive')]
output: 1

In [None]:
eval_prompt = PromptTemplate(
    input_variables=["true_aspect_sentiments", "predicted_aspect_sentiments"],
    template="""
You are given two lists of tuples. Each tuple consists of an aspect and a sentiment.
Your task is to determine how many tuples in the predicted aspect sentiments list closely match tuples in the true aspect sentiments list.

To match:
1. First, you need to determine if the first entry (the aspect) in the tuples is describing the same thing. For example, "battery" and "battery life" are roughly describing the same thing, but "operating system" and "packaging" are not.
2. If the first entries match, then check if the second entries (the sentiment values) are the same. For example, "positive" matches "positive", but "positive" does not match "neutral".
3. Count each tuple in the true aspect sentiments list only if it has at least one match in the predicted aspect sentiments list.

Please output the number of exact matches you find between the two lists. If no matches are found, output 0. Below are some examples:

Example 1:
true aspect sentiments: [(suite of software, positive)]
predicted aspect sentiments: [(software, positive), (suite, positive)]
output: 1

Example 2:
true aspect sentiments: [('space', 'positive'), ('keyboard', 'negative')]
predicted aspect sentiments: [('extra space', 'positive'), ('keyboard', 'negative')]
output: 2

Example 3:
true aspect sentiments: [('price premium', 'negative'), ('features', 'positive')]
predicted aspect sentiments: [('price', 'neutral'), ('features', 'positive')]
output: 1

Example 4:
true aspect sentiments: [('web cam', 'neutral'), ("burn cd's", 'neutral')]
predicted aspect sentiments: [('web cam', 'negative'), ('cd burning', 'negative')]
output: 0

Example 5:
true aspect sentiments: [('space', 'positive'), ('keyboard', 'negative')]
predicted aspect sentiments: [('storage', 'positive'), ('screen', 'negative')]
output: 1

Example 6:
true aspect sentiments: [('battery life', 'positive')]
predicted aspect sentiments: [('battery life', 'positive'), ('battery', 'positive')]
output: 1

Here are the two lists:

true aspect sentiments: {true_aspect_sentiments}
predicted aspect sentiments: {predicted_aspect_sentiments}

Your output should be the number of matching tuples. Do not provide any additional words or explanations.

output:"""
)



__prompt__ now, please include the prompt object (which should be named "eval_prompt") into a langchain chain with the llm object "llm" using the "|" operator. Please aslo add a final function to the chain that parses the output into an integer format. So, the chain should look like

eval_chain = eval_prompt | llm | number_parser

In [None]:
def number_parser(output: str) -> int:
    try:
        # Parse the number from the output, ensuring it's an integer
        return int(output.strip())
    except ValueError:
        # In case the output is invalid, return 0 (or handle as needed)
        return 0

# Build the eval_chain with eval_prompt, llm, and number_parser
eval_chain = eval_prompt | llm | number_parser

# Example usage
output = eval_chain.invoke({
    "true_aspect_sentiments": [('price premium', 'negative'), ('features', 'positive')],
    "predicted_aspect_sentiments": [('price', 'neutral'), ('features', 'positive')]
})

print(output)

__prompt__ Now, please give me the code to use the eval_chain on each entry in the eval_df. For the "true_aspect_sentiments" take the values from the "true_aspect_sentiment" column and for the "predicted_aspect_sentiments" take the values from the aspect_sentiment. Please save the outputs in a new column called "matches". Please also use tqdm when iterating over the columns.

In [None]:
tqdm.pandas()

# Create a function to apply eval_chain on each row of the dataframe
def apply_eval_chain(row):
    true_aspect_sentiments = row['true_aspect_sentiment']
    predicted_aspect_sentiments = row['aspect_sentiment']

    # Use the eval_chain to get the number of matches
    return eval_chain.invoke({
        "true_aspect_sentiments": true_aspect_sentiments,
        "predicted_aspect_sentiments": predicted_aspect_sentiments
    })

eval_df['matches'] = eval_df.progress_apply(apply_eval_chain, axis=1)

In [None]:
eval_df['matches'].value_counts()

__prompt__ Now, please give me the code to get the number of tuples in each list in the 'true_aspect_sentiment' column of "eval_df" and save that to the column "num_true_aspects". Also do the same for the 'aspect_sentiment' and call the column "num_pred_aspects". Finally, using the counts in the 'matches' column and the "num_true_aspects" compute the difference between them and divide this result by the value in "num_true_aspects", then subtract this vaue from 1.0, and take the min of this value an 1.0 (i.e., there should never be a result larger than 1.0). Then, save this result in a new column called "accuracy"

Finally, give the code for computing the statistics from the "accuracy" column.

In [None]:
# Compute the number of tuples in each list in 'true_aspect_sentiment' and 'aspect_sentiment' columns
eval_df['num_true_aspects'] = eval_df['true_aspect_sentiment'].apply(len)
eval_df['num_pred_aspects'] = eval_df['aspect_sentiment'].apply(len)

# Compute the accuracy based on the formula
eval_df['accuracy'] = (
    1.0 - ((eval_df['num_true_aspects'] - eval_df['matches']) / eval_df['num_true_aspects'])
).clip(upper=1.0)  # Ensure accuracy does not exceed 1.0


In [None]:
# Compute basic statistics for the 'accuracy' column
accuracy_stats = eval_df['accuracy'].describe()

# Optionally, compute additional statistics like mean, median, etc.
accuracy_mean = eval_df['accuracy'].mean()
accuracy_median = eval_df['accuracy'].median()
accuracy_std = eval_df['accuracy'].std()

# Print the statistics
print("Accuracy Statistics:")
print(accuracy_stats)
print(f"Mean Accuracy: {accuracy_mean:.4f}")
print(f"Median Accuracy: {accuracy_median:.4f}")
print(f"Standard Deviation of Accuracy: {accuracy_std:.4f}")


__prompt__ finally, please give the python code to plot a histogram of the "accuracy" column of "eval_df"

In [None]:
import matplotlib.pyplot as plt

# Plot a histogram of the 'accuracy' column
plt.figure(figsize=(10, 6))
plt.hist(eval_df['accuracy'], bins=20, edgecolor='black', alpha=0.7)
plt.title('Distribution of Accuracy', fontsize=16)
plt.xlabel('Accuracy', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.grid(True)
plt.show()


__prompt__ What would be a good way to measure performance in this scenario? I have the number of correct matches in "matches", the number of aspect sentiments that should have been produced in "num_true_aspects" and the number of predicted aspects in "num_pred_aspects". Ideally, the number of correct matches should exactly match "num_true_aspects", and "num_true_aspects" should exactly match "num_pred_aspects".

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

# Calculate Precision, Recall, and F1 score
precision = eval_df['matches'].sum() / eval_df['num_pred_aspects'].sum()
recall = eval_df['matches'].sum() / eval_df['num_true_aspects'].sum()
f1 = 2 * (precision * recall) / (precision + recall)

# Calculate Match Efficiency
match_efficiency = 1.0 - abs(eval_df['num_pred_aspects'].sum() - eval_df['num_true_aspects'].sum()) / eval_df['num_true_aspects'].sum()

# Print the performance metrics
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print(f"Match Efficiency: {match_efficiency:.4f}")