# **Week 2 Assignment: Automating Customer Feedback Analysis**

---

### **Objective**

The goal of this assignment is to use advanced prompting techniques and your understanding of LLM fundamentals to analyze and extract structured information from raw customer feedback. This project will test your skills in tokenization and prompt engineering (few-shot, structured output, and chain-of-thought).

### **Background & Problem Statement**

You are an AI Engineer at a growing e-commerce company. The customer support team is manually reading through hundreds of product reviews every day to identify key issues and sentiment. This process is slow and inconsistent.

Your manager has asked you to build a prototype that uses a Large Language Model to automate this analysis. Specifically, you need to prove that you can:
1.  Classify the sentiment of a review.
2.  Extract specific, structured information (like product names and issues).

### **Dataset**

For this assignment, you will work with a small, curated list of customer reviews. This allows you to focus on the quality of your prompts rather than on data cleaning. Use the following Python list as your dataset:

```python
reviews = [
    # Review 1: Positive
    "I absolutely love the new QuantumX Pro camera! The picture quality is stellar and the battery life is amazing. Shipped super fast too. A++!",

    # Review 2: Negative with specific issue
    "The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the price. Very disappointed.",

    # Review 3: Mixed with a question
    "The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?",

    # Review 4: Negative with multiple issues
    "My order for the AeroDrone was a disaster. It arrived with a broken propeller and the battery was completely dead on arrival. Customer service has been unresponsive for 3 days.",

    # Review 5: Positive but mentions a minor issue
    "Overall, I'm happy with the PureGlow Air Purifier. It's quiet and effective. My only complaint is that the replacement filters are a bit expensive."
]
```

---

### **Tasks & Instructions**

Structure your code in a Jupyter Notebook or Python script. Use markdown cells or comments to explain your process and show the outputs for each task.

**Part 1: Understanding Tokenization**
*   **Objective:** To see firsthand how different models "read" the same text.
*   **Tasks:**
    1.  Import `AutoTokenizer` from the `transformers` library.
    2.  Load the tokenizer for `"gpt2"` and the tokenizer for `"bert-base-uncased"`.
    3.  Take the third review (`reviews[2]`) about the Titan smartwatch.
    4.  Tokenize this review using **both** tokenizers and print the resulting list of tokens for each.
    5.  In a markdown cell, answer the following:
        *   Are the token lists identical?
        *   Point out one or two specific differences you notice.
        *   In one sentence, explain *why* different models might have different tokenizers.

**Part 2: Advanced Prompt Engineering**
*   **Objective:** To use different prompting techniques to perform three distinct analysis tasks.
*   **Setup:** Load a basic instruction-following or text-generation LLM (e.g., `google/flan-t5-large` or `gpt2-large`) using the Hugging Face `pipeline`.
*   **Tasks (perform for each review in the `reviews` list):**
    1.  **Task A: Sentiment Classification (Few-Shot Prompting)**
        *   Design a **few-shot prompt** that provides two examples of reviews classified as "Positive", "Negative", or "Mixed".
        *   Use this prompt to classify each of the five reviews in the dataset. Print the classification for each review.
    2.  **Task B: Structured Data Extraction (Instruction & Format Prompting)**
        *   Design a prompt that instructs the model to extract the following information from each review and format the output as a JSON object: `{"product_name": "...", "issue_summary": "...", "sentiment": "..."}`.
        *   If a piece of information isn't present, the model should output "N/A".
        *   Run this prompt on all five reviews and print the resulting JSON for each.
    3.  **Task C: Root Cause Analysis (Chain-of-Thought Prompting)**
        *   For the **negative and mixed reviews only** (reviews 2, 3, 4, 5), design a **Chain-of-Thought prompt**.
        *   The prompt should ask the model to first identify the customer's core problem and then explain its reasoning step-by-step.
        *   Example Prompt Structure: `Analyze the following customer review to identify the root cause of their issue. First, state the main problem. Second, explain your reasoning in a single sentence. Let's think step by step.`
        *   Print the model's full step-by-step analysis for these reviews.

---

### **Submission Instructions**

1.  **Deadline:** You have **one week** from the assignment release date to submit your work.
2.  **Platform:** All submissions must be made to your allocated private GitLab repository. You **must** submit your work in a branch named `week_2`.
3.  **Format:** You can submit your work as either a Jupyter Notebook (`.ipynb`) or a Python script (`.py`).
4.  After pushing, you should verify that your branch and files are visible on the GitLab web interface. No further action is needed. The trainers will review all submissions on the `week_2` branch after the deadline. Any assignments submitted after the deadline won't be reviewed and will reflect in your course score.
5. The use of LLMs is encouraged, but ensure that you’re not copying solutions blindly. Always review, test, and understand any code generated, adapting it to the specific requirements of your assignment. Your submission should demonstrate your own comprehension, problem-solving process, and coding style, not just an unedited output from an AI tool.

In [16]:
# Dataset for the assignment
reviews = [
    # Review 1: Positive
    "I absolutely love the new QuantumX Pro camera! The picture quality is stellar and the battery life is amazing. Shipped super fast too. A++!",

    # Review 2: Negative with specific issue
    "The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the price. Very disappointed.",

    # Review 3: Mixed with a question
    "The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?",

    # Review 4: Negative with multiple issues
    "My order for the AeroDrone was a disaster. It arrived with a broken propeller and the battery was completely dead on arrival. Customer service has been unresponsive for 3 days.",

    # Review 5: Positive but mentions a minor issue
    "Overall, I'm happy with the PureGlow Air Purifier. It's quiet and effective. My only complaint is that the replacement filters are a bit expensive."
]

print(f"Dataset loaded with {len(reviews)} reviews")

Dataset loaded with 5 reviews


## **Part 1: Understanding Tokenization**

In this section, we'll explore how different models tokenize text differently using GPT-2 and BERT tokenizers.

In [17]:
# Import AutoTokenizer from transformers library
from transformers import AutoTokenizer

# Load tokenizers for GPT-2 and BERT
gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2")
bert_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Get the third review (index 2) about the Titan smartwatch
titan_review = reviews[2]
print(f"Review: \"{titan_review}\"\n")

# Tokenize using GPT-2 tokenizer
gpt2_tokens = gpt2_tokenizer.tokenize(titan_review)
print("GPT-2 Tokenization:")
print(f"Number of tokens: {len(gpt2_tokens)}")
print(f"Tokens: {gpt2_tokens}\n")

# Tokenize using BERT tokenizer
bert_tokens = bert_tokenizer.tokenize(titan_review)
print("BERT Tokenization:")
print(f"Number of tokens: {len(bert_tokens)}")
print(f"Tokens: {bert_tokens}")

Review: "The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?"

GPT-2 Tokenization:
Number of tokens: 41
Tokens: ['The', 'ĠTitan', 'Ġsmart', 'watch', 'Ġis', 'Ġdecent', '.', 'ĠThe', 'Ġscreen', 'Ġis', 'Ġbright', 'Ġand', 'Ġthe', 'Ġfeatures', 'Ġare', 'Ġgood', ',', 'Ġbut', 'Ġthe', 'Ġstep', 'Ġcounter', 'Ġseems', 'Ġinaccurate', '.', 'ĠIt', "'s", 'Ġoff', 'Ġby', 'Ġat', 'Ġleast', 'Ġ20', '%.', 'ĠIs', 'Ġthere', 'Ġa', 'Ġway', 'Ġto', 'Ġcalibr', 'ate', 'Ġit', '?']

BERT Tokenization:
Number of tokens: 44
Tokens: ['the', 'titan', 'smart', '##watch', 'is', 'decent', '.', 'the', 'screen', 'is', 'bright', 'and', 'the', 'features', 'are', 'good', ',', 'but', 'the', 'step', 'counter', 'seems', 'inaccurate', '.', 'it', "'", 's', 'off', 'by', 'at', 'least', '20', '%', '.', 'is', 'there', 'a', 'way', 'to', 'cal', '##ib', '##rate', 'it', '?']


### **Analysis of Tokenization Differences**

**Are the token lists identical?**  
No, the token lists are not identical.

**Differences observed:**
- GPT-2 uses the special character `Ġ` to mark spaces before words (e.g., `ĠTitan`, `Ġsmart`), while BERT does not.
- BERT splits words into subword pieces using `##` (e.g., `smart`, `##watch`, `cal`, `##ib`, `##rate`), whereas GPT-2 keeps most words whole or merges them differently.

**Why different models have different tokenizers:**  
Different models use different tokenizers because each is designed with a specific subword segmentation method optimized for its architecture and training data.

## **Part 2: Advanced Prompt Engineering**

In this section, we'll use different prompting techniques to analyze customer reviews using a language model.

In [18]:
from transformers import pipeline
import json

# generator = pipeline("text-generation", model="gpt2",
#                     pad_token_id=50256,
#                     return_full_text=True)

generator = pipeline("text2text-generation", model="google/flan-t5-large")
print("Language model pipeline loaded successfully!")

Device set to use cpu


Language model pipeline loaded successfully!


### **Task A: Sentiment Classification (Few-Shot Prompting)**

Using few-shot prompting to classify review sentiment with examples.

In [19]:
# Task A: Few-Shot Sentiment Classification
def classify_sentiment(review):
    prompt = f"""Classify the sentiment of customer reviews as "Positive", "Negative", or "Mixed".

Examples:
Review: "This product is amazing! Great quality and fast shipping."
Sentiment: Positive

Review: "Terrible product. Broke after one use and customer service was unhelpful."
Sentiment: Negative

Review: "The product works okay but the price is too high. Mixed feelings about this purchase."
Sentiment: Mixed

Review: "{review}"
Sentiment:"""

    # Generate response
    response = generator(prompt, max_new_tokens=20, num_return_sequences=1, do_sample=False, temperature=0.1)

    # Extract just the sentiment classification
    generated_text = response[0]['generated_text']
    sentiment = generated_text.split("Sentiment:")[-1].strip().split('\n')[0].split('.')[0].strip()

    return sentiment

# Classify sentiment for all reviews
print("=== TASK A: SENTIMENT CLASSIFICATION (FEW-SHOT) ===\n")

for i, review in enumerate(reviews, 1):
    sentiment = classify_sentiment(review)
    print(f"Review {i}: {sentiment}")
    print(f"Text: \"{review}\"\n")
    print("-" * 80)

=== TASK A: SENTIMENT CLASSIFICATION (FEW-SHOT) ===

Review 1: Positive
Text: "I absolutely love the new QuantumX Pro camera! The picture quality is stellar and the battery life is amazing. Shipped super fast too. A++!"

--------------------------------------------------------------------------------
Review 2: Negative
Text: "The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the price. Very disappointed."

--------------------------------------------------------------------------------
Review 3: Mixed
Text: "The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?"

--------------------------------------------------------------------------------
Review 4: Negative
Text: "My order for the AeroDrone was a disaster. It arrived with a broken propeller and the battery was completely dead on arrival

### **Task B: Structured Data Extraction (Instruction & Format Prompting)**

Extracting structured information and formatting as JSON output.

In [20]:
# Task B: Structured Data Extraction (Pure Prompt Engineering)
def extract_structured_data(review):
    prompt = f"""
You are a strict information extraction model.

Your task: read the customer review below and extract three key fields:
1. "product_name" — the name of the product mentioned in the review.
2. "issue_summary" — a short summary of any issues, complaints, or flaws. If no issue is mentioned, use "N/A".
3. "sentiment" — classify as exactly one of: "Positive", "Negative", or "Mixed".

Output rules:
- Output must be ONLY a valid JSON object.
- It must be enclosed in braces({{}})
- Do NOT include any explanation, extra text, or formatting outside JSON.
- Field names and structure must match exactly this schema:
  {{
    "product_name": "...",
    "issue_summary": "...",
    "sentiment": "..."
  }}
- If a piece of information is missing, fill it with "N/A".

Example 1:
Review: "I love the SuperPhone! Great camera quality but battery drains fast."
Output:
{{
  "product_name": "SuperPhone",
  "issue_summary": "battery drains fast",
  "sentiment": "Positive"
}}

Example 2:
Review: "The GadgetX is terrible. Poor build quality and overpriced."
Output:
{{
  "product_name": "GadgetX",
  "issue_summary": "poor build quality, overpriced",
  "sentiment": "Negative"
}}

Now extract the information for this review:
Review: "{review}"
Output:
"""


    try:
        # Generate response from the model
        response = generator(
            prompt,
            max_new_tokens=80,
            num_return_sequences=1,
            do_sample=False,
            temperature=0.1,
            pad_token_id=50256
        )
        print(response)
        # Extract the generated text
        generated_text = '{'+response[0]['generated_text'].strip()+'}'
        print(f"Generated: {generated_text}")  # Show last 100 chars for debug

        # Find the last JSON-like pattern
        json_start = generated_text.rfind('{"product_name":')
        if json_start != -1:
            # Get everything from the JSON start
            json_part = generated_text[json_start:]

            # Find the closing brace
            json_end = json_part.find('}')
            if json_end != -1:
                json_candidate = json_part[:json_end + 1]

                # Try to parse the JSON
                parsed = json.loads(json_candidate)
                return json.dumps(parsed, indent=2)

    except Exception as e:
        print(f"Error occurred: {e}")

    # Return N/A structure if anything fails
    return json.dumps({
        "product_name": "N/A",
        "issue_summary": "N/A",
        "sentiment": "N/A"
    }, indent=2)

# Extract structured data for all reviews
print("=== TASK B: STRUCTURED DATA EXTRACTION ===\n")

for i, review in enumerate(reviews, 1):
    structured_data = extract_structured_data(review)
    print(f"Review {i}:")
    print(f"Input: \"{review}\"")
    print(f"Extracted JSON:")
    print(structured_data)
    print("\n" + "-" * 80 + "\n")

=== TASK B: STRUCTURED DATA EXTRACTION ===

[{'generated_text': ' "product_name": "QuantumX Pro", "issue_summary": "N/A", "sentiment": "Positive"'}]
Generated: {"product_name": "QuantumX Pro", "issue_summary": "N/A", "sentiment": "Positive"}
Review 1:
Input: "I absolutely love the new QuantumX Pro camera! The picture quality is stellar and the battery life is amazing. Shipped super fast too. A++!"
Extracted JSON:
{
  "product_name": "QuantumX Pro",
  "issue_summary": "N/A",
  "sentiment": "Positive"
}

--------------------------------------------------------------------------------

[{'generated_text': ' "product_name": "SonicWave earbuds", "issue_summary": "Serious design flaw", "sentiment": "Very disappointed"'}]
Generated: {"product_name": "SonicWave earbuds", "issue_summary": "Serious design flaw", "sentiment": "Very disappointed"}
Review 2:
Input: "The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the pric

### **Task C: Root Cause Analysis (Chain-of-Thought Prompting)**

Using chain-of-thought prompting to analyze negative and mixed reviews step by step.

In [21]:
# Task C: Root Cause Analysis using Chain-of-Thought Prompting
def analyze_root_cause(review):
    prompt = f"""
You are an expert at analyzing customer reviews. Follow these instructions exactly.

1. Read the review carefully.
2. Identify the main problem in one sentence.
3. Explain why this problem occurred in one sentence.
4. Categorize the root cause into one of:
Product Quality / Shipping/Logistics / Customer Service / Design Flaw / Other

Review: "{review}"

Now provide your analysis following the format exactly:
"""

    # Generate response with more tokens for detailed analysis
    # response = generator(prompt, max_new_tokens=100, num_return_sequences=1, do_sample=False, temperature=0.1)
    response = generator(
        prompt,
        max_new_tokens=180,
        do_sample=False,
        temperature=0.1
    )

    # Extract the analysis
    # print(response)
    analysis = response[0]['generated_text']
    # analysis = generated_text.split("Analysis:")[-1].strip()

    return analysis

# Identify negative and mixed reviews (reviews 2, 3, 4, 5 based on assignment)
negative_mixed_indices = [1, 2, 3, 4]  # 0-indexed (reviews 2, 3, 4, 5)

print("=== TASK C: ROOT CAUSE ANALYSIS (CHAIN-OF-THOUGHT) ===\n")
print("Analyzing negative and mixed reviews only (Reviews 2, 3, 4, 5):\n")

for idx in negative_mixed_indices:
    review_num = idx + 1
    review = reviews[idx]

    print(f"Review {review_num} Analysis:")
    print(f"Input: \"{review}\"")
    print("\nChain-of-Thought Analysis:")

    analysis = analyze_root_cause(review)
    print(analysis)
    print("\n" + "=" * 80 + "\n")

=== TASK C: ROOT CAUSE ANALYSIS (CHAIN-OF-THOUGHT) ===

Analyzing negative and mixed reviews only (Reviews 2, 3, 4, 5):

Review 2 Analysis:
Input: "The SonicWave earbuds have a serious design flaw. The left earbud stopped charging after just one week. I expected better for the price. Very disappointed."

Chain-of-Thought Analysis:
Design Flaw


Review 3 Analysis:
Input: "The Titan smartwatch is decent. The screen is bright and the features are good, but the step counter seems inaccurate. It's off by at least 20%. Is there a way to calibrate it?"

Chain-of-Thought Analysis:
Product Quality


Review 4 Analysis:
Input: "My order for the AeroDrone was a disaster. It arrived with a broken propeller and the battery was completely dead on arrival. Customer service has been unresponsive for 3 days."

Chain-of-Thought Analysis:
Product Quality


Review 5 Analysis:
Input: "Overall, I'm happy with the PureGlow Air Purifier. It's quiet and effective. My only complaint is that the replacement filte