# **Project : A Case Study of InnovaTech Solutions**

**Business Overview:**

InnovaTech Solutions, a dynamic and forward-thinking technology company, has made significant strides in the computing industry with a focus on developing high-quality laptops. Established over a decade ago, InnovaTech has gained a reputation for its innovative approach and commitment to customer satisfaction, creating a significant footprint in both physical and online retail spaces.
InnovaTech has expanded its presence in the digital retail world, especially on e-commerce giants like Amazon. This strategic move has not only widened its customer base but also resulted in a large influx of customer feedback, primarily in the form of online reviews. The company's products, notably its range of laptops, have become popular choices on these platforms, leading to an abundance of valuable but underutilized customer data.

**Current Challenge:**

InnovaTech currently analyzes customer reviews using basic sentiment analysis tools, which only provide a superficial understanding of customer opinions. In the competitive landscape of the laptop market, a more detailed and aspect-oriented analysis is crucial. Understanding specific customer sentiments on different aspects of laptops, such as user screen, technical specifications, etc, which is vital for targeted product improvements.

**Objective:**

The primary goal is to conduct a comprehensive aspect-based sentiment analysis of customer reviews for InnovaTech’s laptops, specifically focusing on three critical aspects: the laptop screen, keyboard, and mousepad. These components have been identified as crucial determinants of customer satisfaction and product usability. Project aims to provide nuanced insights into specific areas of customer satisfaction, dissatisfaction, and neutral feedback.The ultimate goal is to enhance overall product quality and customer experience, solidifying InnovaTech's position as a leader in the laptop market.



**Data Description:**

The dataset titled "laptop_reviews.csv" is structured to facilitate aspect-based sentiment analysis for laptop reviews. Here's a brief description of the data columns:

1. id: This column contains unique identifiers for each review entry. It helps in distinguishing and referencing individual reviews
2. text: This column includes the actual text of the laptop reviews. The reviews are likely composed of customer opinions and experiences regarding different aspects of the laptops.
3. aspects:Contains structured information about specific aspects mentioned in each review like 'RAM', 'screen', 'keyboard', 'mousepad', and others relevant to laptop features.
4. category:Provide an additional layer of classification (positive, negative and neutral) for the mentioned aspects.

# 1. Setup

### 1.1 Installation

In [1]:
# !pip install openai==0.28.0 tiktoken datasets session-info --quiet

### 1.2 Imports

1. Import all Python packages required to access the Azure Open AI API.
2. Import additional packages required to access datasets and create examples.

In [2]:
import openai
import json
import tiktoken
import session_info

import pandas as pd
import numpy as np

from datasets import load_dataset
from collections import Counter
from tqdm import tqdm
from sklearn.model_selection import train_test_split

### 1.3 Authentication

In [3]:
with open('config.json', 'r') as az_creds: # Read data from file
    data = az_creds.read()

In [4]:
creds = json.loads(data)
print(creds)

{'AZURE_OPENAI_KEY': '75a0343a8b6f463eba5c8f2862c6cb89', 'AZURE_OPENAI_ENDPOINT': 'https://satyaopenai23.openai.azure.com/', 'AZURE_OPENAI_APITYPE': 'azure', 'AZURE_OPENAI_APIVERSION': '2023-07-01-preview', 'CHATGPT_MODEL': 'satya23-deployment'}


In [5]:
openai.api_key = creds["AZURE_OPENAI_KEY"]
openai.api_base = creds["AZURE_OPENAI_ENDPOINT"]
openai.api_type = creds["AZURE_OPENAI_APITYPE"]
openai.api_version = creds["AZURE_OPENAI_APIVERSION"]

In [6]:
chat_model_id = creds["CHATGPT_MODEL"]

### 1.4 Utilities

Define token counter to keep track of the completion window available in the prompt.

In [7]:
def num_tokens_from_messages(messages):
    
    """
    
    Returns the number of tokens used by a list of messages.
    
    Args:
        messages
    
    Output:
        num_tokens (int): No.of tokens in the message
    
    Adapted from the OpenAI cookbook token counter.
    
    """
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    
    # Each message is sandwiched with <|start|>system or user or assistant {message} and <|end|>  
    
    tokens_per_message = 3 
    
    num_tokens = 0
    
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            
    num_tokens += 3 # Every reply is primed with <|start|>assistant<|end|>
    
    return num_tokens

# Task: Aspect-Based Sentiment Analysis (ABSA)

### Step 1: Define objectives & Metrics

To evaluate model performance, we judge the accuracy of the aspects + sentiment assignnment per aspect.For example, if aspects identified by the LLM do not match the ground truth for a specific input, we count this prediction to be incorrect. A correct prediction is one where all the aspects are correctly idenfied and further the sentiment assignment for each aspect is also correctly identified

In [8]:
def parse_text(string):

    result = {}
    #Remove outermost {} and split by coma and "
    parts = string.strip('{}').split(', "')

    for part in parts:
        split_index = part.find(' ": ')
        if split_index == -1:
            continue # skip if the format is not as expected
        key = part[:split_index].strip('"')
        value =  part[split_index+3:].stip() # Removes white spaces in the string

        if value.startswith('array(['):
            value = value[7:] #Remove 'array(['
            value = value.split('], dtype=object')[0] 
            value = value.strip('["]')
    
        result[key] = value
        
    return result

In [9]:
def compute_accuracy(gold_examples, model_predictions, ground_truths):
    
     """
    Return the accuracy score comparing the model predictions and ground truth
    for ABSA. We look for exact matches between the model predictions on all the
    aspects and sentiments for these aspects in the ground truth.

    Args:
        gold_examples (str): JSON string with list of gold examples
        model_predictions (List): Nested list of ABSA predictions
        ground_truths (List): Nested list of ABSA annotations

    Output:
        accuracy (float): Exact matches of model predictions and ground truths
    """   
    
    correct_predictions = 0
    total_predictions = len(gold_examples)
    
    for pred, truth in zip(model_predictions, ground_truths):
        pred_dict = parse_text(pred)
        truth_dict = parse_text(truth)
        
        if pred_dict == truth_dict:
            correct_predictions += 1
        
    accuracy = correct_predictions / total_predictions
    print("correct predictiosn : ", correct_predictions)
    print("total predictions : ", total_predictions)
    print("accuracy : ", accuracy)
    return accuracy

### Step 2: Assemble Data

1. Use "laptop_review.csv" dataset. 
2. Identify distribution of aspects in examples.
3. Identify distribution of aspects in gold examples.

In [10]:
laptop_reviews_df = pd.read_csv("laptop_reviews.csv")

In [40]:
laptop_reviews_df.shape

(100, 4)

In [41]:
laptop_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        100 non-null    int64 
 1   text      100 non-null    object
 2   aspects   100 non-null    object
 3   category  100 non-null    object
dtypes: int64(1), object(3)
memory usage: 3.2+ KB


In [43]:
len(laptop_reviews_df)

100

In [11]:
laptop_reviews_examples, laptop_reviews_gold_examples = train_test_split(
    laptop_reviews_df,
    test_size=0.2,
    random_state = 42
)

In [39]:
laptop_reviews_examples.shape

(80, 4)

In [12]:
laptop_reviews_examples.sample(10)

Unnamed: 0,id,text,aspects,category
89,90,The mousepad is adequate. The design is good. ...,"{'term':array(['mousepad','design','screen','k...","{'category':array(['mousepad','design','screen..."
97,98,The software is excellent. The mousepad is dis...,"{'term':array(['software','mousepad','GPU'],dt...","{'category':array(['software','mousepad','GPU'..."
6,7,The software is average. The RAM is disappoint...,"{'term':array(['software','RAM'],dtype=object)...","{'category':array(['software','RAM'],dtype=obj..."
99,100,The mousepad is good. The GPU is fair. The key...,"{'term':array(['mousepad','GPU','keyboard'],dt...","{'category':array(['mousepad','GPU','keyboard'..."
61,62,The mousepad is poor. The camera is unpleasant.,"{'term':array(['mousepad','camera'],dtype=obje...","{'category':array(['mousepad','camera'],dtype=..."
78,79,The screen is unpleasant. The camera is impres...,"{'term':array(['screen','camera','keyboard','d...","{'category':array(['screen','camera','keyboard..."
63,64,The design is adequate. The camera is terrible.,"{'term':array(['design','camera'],dtype=object...","{'category':array(['design','camera'],dtype=ob..."
69,70,The camera is standard. The GPU is great. The ...,"{'term':array(['camera','GPU','hardware','scre...","{'category':array(['camera','GPU','hardware','..."
95,96,The camera is terrible. The hardware is adequa...,"{'term':array(['camera','hardware','mousepad']...","{'category':array(['camera','hardware','mousep..."
28,29,The battery is bad. The keyboard is bad. The s...,"{'term':array(['battery','keyboard','screen'],...","{'category':array(['battery','keyboard','scree..."


In [13]:
laptop_reviews_gold_examples.sample(10)

Unnamed: 0,id,text,aspects,category
80,81,The camera is amazing. The RAM is fair. The ba...,"{'term':array(['camera','RAM','battery','mouse...","{'category':array(['camera','RAM','battery','m..."
44,45,The RAM is fair. The GPU is good. The screen i...,"{'term':array(['RAM','GPU','screen'],dtype=obj...","{'category':array(['RAM','GPU','screen'],dtype..."
33,34,The camera is adequate. The battery is standard.,"{'term':array(['camera','battery'],dtype=objec...","{'category':array(['camera','battery'],dtype=o..."
70,71,The keyboard is standard. The RAM is disappoin...,"{'term':array(['keyboard','RAM'],dtype=object)...","{'category':array(['keyboard','RAM'],dtype=obj..."
31,32,The screen is excellent. The GPU is standard. ...,"{'term':array(['screen','GPU','hardware','soft...","{'category':array(['screen','GPU','hardware','..."
77,78,The battery is average. The mousepad is disapp...,"{'term':array(['battery','mousepad'],dtype=obj...","{'category':array(['battery','mousepad'],dtype..."
53,54,The screen is great. The software is bad.,"{'term':array(['screen','software'],dtype=obje...","{'category':array(['screen','software'],dtype=..."
18,19,The battery is decent. The RAM is terrible. Th...,"{'term':array(['battery','RAM','mousepad'],dty...","{'category':array(['battery','RAM','mousepad']..."
45,46,The design is adequate. The keyboard is bad.,"{'term':array(['design','keyboard'],dtype=obje...","{'category':array(['design','keyboard'],dtype=..."
4,5,The GPU is terrible. The keyboard is poor. The...,"{'term':array(['GPU','keyboard','mousepad'],dt...","{'category':array(['GPU','keyboard','mousepad'..."


In [14]:
laptop_reviews_examples.iloc[0]['category']

"{'category':array(['battery','RAM','software','GPU'],dtype=object),'polarity':array(['neutral','neutral','neutral','neutral'],dtype=object)}"

In [15]:
examples_aspect_index = {
    'screen' : [],
    'keyboard' : [],
    'mousepad' : []
}

gold_examples_aspect_index = {
    'screen' : [],
    'keyboard' : [],
    'mousepad' : []
}

In [16]:
for id, category in zip(laptop_reviews_examples.id, laptop_reviews_examples.category):
    for key in examples_aspect_index.keys():
        if key in category:
            examples_aspect_index[key].append(id)

In [17]:
for id, category in zip(laptop_reviews_gold_examples.id, laptop_reviews_gold_examples.category):
    for key in examples_aspect_index.keys():
        if key in category:
            gold_examples_aspect_index[key].append(id)

In [18]:
gold_examples_aspect_index

{'screen': [54, 45, 91, 77, 32],
 'keyboard': [71, 46, 5],
 'mousepad': [40, 81, 11, 19, 74, 91, 5, 77, 78, 13]}

In [19]:
columns_to_select = ['id', 'text', 'category']

In [20]:
gold_examples = json.loads((
    laptop_reviews_gold_examples.loc[:, columns_to_select]
                                .sample(20, random_state=42)
                                .to_json(orient='records')
))

In [21]:
gold_examples[0]

{'id': 84,
 'text': 'The GPU is excellent. The hardware is fair. The RAM is adequate. The software is bad.',
 'category': "{'category':array(['GPU','hardware','RAM','software'],dtype=object),'polarity':array(['positive','neutral','neutral','negative'],dtype=object)}"}

### Step 3: Derive Prompt

#### Create prompts

In [22]:
user_message_template = """```{laptop_review}```"""

**1. Zero-shot prompt**

In [23]:
zero_shot_system_message = """
Perform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.
In each review there might be one or more of the following aspects: screen, keyboard, mousepad.
For each review presented as input:
- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.
- Assign a sentiment polarity (positive, negative or neutral) for each aspect

Arrange your response a JSON object with the following headers:
- category:[list of aspects]
- polarity:[list of corresponding polarities for each aspect]}
"""

In [24]:
zero_shot_prompt = [{'role':'system', 'content': zero_shot_system_message}]

In [25]:
num_tokens_from_messages(zero_shot_prompt)

129

**2.Few-shot prompt**

In [26]:
few_shot_system_message = """
Perform aspect based sentiment analysis on laptop reviews presented in the input delimited by triple backticks, that is, ```.
In each review there might be one or more of the following aspects: screen, keyboard, mousepad.
For each review presented as input:
- Identify if there are any of the 3 aspects (screen, keyboard, mousepad) present in the review.
- Assign a sentiment polarity (positive, negative or neutral) for each aspect

Arrange your response a JSON object with the following headers:
{category:[list of aspects]
polarity:[list of corresponding polarities for each aspect]}
"""

In [27]:
def create_examples(dataset, n=4):
    
    """
    Return a JSON list of randomized examples.
    Create subsets of each class, choose random samples from the subsets,
    merge and randomize the order of samples in the merged list.
    Each run of this function creates a different random sample of examples
    chosen from the training data.

    Args:
        dataset (DataFrame): A DataFrame with examples
        n (int): number of examples of each class to be selected

    Output:
        randomized_examples (JSON): A JSON with examples in random order
    """
    
    columns_to_select = ['id', 'text', 'category']
    example_ids = []

    aspect_index = {
        'keyboard' : [], 'screen' : [], 'mousepad' : []
    }

    for id, category in zip(dataset.id, dataset.category):
        for key in aspect_index.keys():
            if key in category:
                aspect_index[key].append(id)

    for key in aspect_index:
        example_ids.extend(np.random.choice(aspect_index[key], n).tolist())

    examples = dataset.loc[dataset.id.isin(example_ids), columns_to_select]

    return examples.to_json(orient='records')

In [28]:
def create_prompt(system_message, examples, user_message_template):
    
    """
    Return a prompt message in the format expected by the Open AI API.
    Loop through the examples and parse them as user message and assistant
    message.

    Args:
        system_message (str): system message with instructions for sentiment analysis
        examples (str): JSON string with list of examples
        user_message_template (str): string with a placeholder for movie reviews

    Output:
        few_shot_prompt (List): A list of dictionaries in the Open AI prompt format
    """
    
    few_shot_prompt = [{'role' : 'system', 'content' : system_message}]

    for example in json.loads(examples):
        example_input = example['text']
        example_absa = example['category']

        few_shot_prompt.append(
            {
                'role' : 'user',
                'content' : user_message_template.format(
                    laptop_review=example_input
                )
            }
        )

        few_shot_prompt.append(
            {'role' : 'assistant', 'content' : f"{example_absa}"}
        )

    return few_shot_prompt

In [29]:
examples = create_examples(laptop_reviews_df)
few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

In [30]:
print(examples)

[{"id":13,"text":"The battery is unpleasant. The mousepad is terrible. The hardware is standard.","category":"{'category':array(['battery','mousepad','hardware'],dtype=object),'polarity':array(['negative','negative','neutral'],dtype=object)}"},{"id":18,"text":"The GPU is poor. The camera is bad. The software is average. The keyboard is bad.","category":"{'category':array(['GPU','camera','software','keyboard'],dtype=object),'polarity':array(['negative','negative','neutral','negative'],dtype=object)}"},{"id":28,"text":"The screen is adequate. The GPU is disappointing. The hardware is disappointing. The keyboard is excellent.","category":"{'category':array(['screen','GPU','hardware','keyboard'],dtype=object),'polarity':array(['neutral','negative','negative','positive'],dtype=object)}"},{"id":29,"text":"The battery is bad. The keyboard is bad. The screen is great.","category":"{'category':array(['battery','keyboard','screen'],dtype=object),'polarity':array(['negative','negative','positive'

#### Evaluate prompts

**1. Define Evaluation scorer**

In [31]:
def evaluate_prompt(prompt, gold_examples, user_message_template):
    
    """
    Return the accuracy score for predictions on gold examples.
    For each example, we make a prediction using the prompt. Gold labels and
    model predictions are aggregated into lists and compared to compute the
    accuracy.

    Args:
        prompt (List): list of messages in the Open AI prompt format
        gold_examples (str): JSON string with list of gold examples
        user_message_template (str): string with a placeholder for movie reviews

    Output:
        accuracy (float): accuracy score computed by comparing model predictions
                                with ground truth
    """
    
    model_predictions, ground_truths = [], []
    

    for example in gold_examples:
        user_input = [{
            'role': 'user',
            'content': user_message_template.format(laptop_review=example['text'])
        }]

        try:
            response = openai.ChatCompletion.create(
                deployment_id=chat_model_id,
                messages=prompt + user_input,
                temperature=0
            )
            prediction = response['choices'][0]['message']['content']
            prediction_dict_str = str(prediction).replace("'", "\"")
            model_predictions.append(prediction_dict_str.strip().lower())
            ground_truth_str = str(example['category']).replace("'", "\"")
            ground_truths.append(ground_truth_str.strip().lower())

        except Exception as e:
            print(f"Error during model prediction: {e}")

    accuracy = compute_accuracy(gold_examples, model_predictions, ground_truths)
    return accuracy

**2. Evaluate zero shot prompt**

In [32]:
evaluate_prompt(zero_shot_prompt, gold_examples, user_message_template)

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


1.0

**3. Evaluate few shot prompt**

In [33]:
evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


1.0

**4. In summary, compute the average (mean) and measure the variability (standard deviation) of the evaluation scores.**

In [34]:
num_eval_runs = 10

In [35]:
few_shot_performance = []

In [36]:
for _ in tqdm(range(num_eval_runs)):

    # For each run create a new sample of examples
    examples = create_examples(laptop_reviews_df)

    # Assemble the few shot prompt with these examples
    few_shot_prompt = create_prompt(few_shot_system_message, examples, user_message_template)

    # Evaluate prompt accuracy on gold examples
    few_shot_accuracy = evaluate_prompt(few_shot_prompt, gold_examples, user_message_template)

    few_shot_performance.append(few_shot_accuracy)

 10%|█         | 1/10 [00:10<01:32, 10.29s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 20%|██        | 2/10 [00:20<01:20, 10.08s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 30%|███       | 3/10 [00:29<01:09,  9.92s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 40%|████      | 4/10 [00:39<00:58,  9.68s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 50%|█████     | 5/10 [00:48<00:48,  9.63s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 60%|██████    | 6/10 [00:59<00:39,  9.94s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 70%|███████   | 7/10 [01:09<00:29,  9.87s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 80%|████████  | 8/10 [01:18<00:19,  9.63s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


 90%|█████████ | 9/10 [01:28<00:09,  9.69s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0


100%|██████████| 10/10 [01:38<00:00,  9.86s/it]

correct predictiosn :  20
total predictions :  20
accuracy :  1.0





In [37]:
np.array(few_shot_performance).mean(), np.array(few_shot_performance).std()

(1.0, 0.0)

**----------------------------------------------------------------------------End-----------------------------------------------------------------------------------------**