# ReviewLens AI: Model Development and Validation

**Author:** Kevin Della Piazza
**Date:** October 2025

## 1. Introduction
This notebook documents the development and validation process for the core AI models of the ReviewLens platform. The objective is to test a suite of four distinct NLP models on a sample of the dataset to validate their effectiveness and to prototype the foundational Python code for the production AWS Lambda functions. This notebook serves as a professional artifact detailing the data science workflow, from data preparation to model evaluation.

### AI Models Tested:
1.  **Sentiment Analysis:** For overall positive/negative classification.
2.  **Zero-Shot Classification:** For dynamic topic tagging.
3.  **Aspect-Based Sentiment Analysis (ABSA):** For fine-grained sentiment on specific features.
4.  **Topic Modeling:** For discovering latent themes in the text corpus.

In [11]:
# --- Core Libraries ---
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
import warnings
from IPython.display import display
import re
import stopit

# --- AI Libraries ---
from transformers import pipeline
from pyabsa import ATEPCCheckpointManager
from bertopic import BERTopic

# --- Validation ---
from sklearn.metrics import accuracy_score, classification_report

# --- Configuration ---
# Suppress ignorable warnings for a cleaner output
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
tqdm.pandas()

print("Libraries imported successfully.")

Libraries imported successfully.


## 2. Data Loading, Cleaning & Sanitization
We load the raw dataset and apply the cleaning steps that will be replicated in our AWS pipeline. A text sanitization step is included to prevent errors from special characters, making the code more robust for production.

In [3]:
# Load the raw dataset
try:
    # Use a relative path to ensure portability
    file_path = '../data/reviews.csv'
    df = pd.read_csv(file_path)
    print(f"Dataset loaded successfully with {len(df)} rows.")

    # --- Data Cleaning ---
    print("Starting data cleaning...")
    df_cleaned = df.drop('Unnamed: 0', axis=1, errors='ignore')
    df_cleaned.dropna(subset=['Review Text'], inplace=True)
    df_cleaned['Title'] = df_cleaned['Title'].fillna('')
    df_cleaned['full_review_text'] = df_cleaned['Title'] + ' ' + df_cleaned['Review Text']
    df_cleaned.dropna(subset=['Division Name', 'Department Name', 'Class Name'], inplace=True)
    
    # --- Text Sanitization Function ---
    def sanitize_text(text):
        if not isinstance(text, str): return ""
        # Replace ampersand, which was identified as a "poison pill" for ABSA
        text = text.replace('&', 'and')
        # Remove non-printable control characters
        text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f-\x9f]', '', text)
        return text

    # Apply sanitization to the main text column
    df_cleaned['full_review_text'] = df_cleaned['full_review_text'].apply(sanitize_text)
    print(f"Data cleaned and sanitized. {len(df_cleaned)} rows remaining.")

except FileNotFoundError:
    print("Error: reviews.csv not found. Please ensure the file is in the 'data/' directory.")

Dataset loaded successfully with 23486 rows.
Starting data cleaning...
Data cleaned and sanitized. 22628 rows remaining.


## 3. Sample Creation
To ensure rapid development and testing, we create two distinct samples:
* A **small sample (100 reviews)** for fast, iterative tests on row-level models.
* A **larger sample (2000 reviews)** for Topic Modeling, which requires a larger corpus to generate meaningful results.

In [4]:
if 'df_cleaned' in locals() and not df_cleaned.empty:
    # Small sample for fast, iterative tests
    df_sample_fast = df_cleaned.sample(100, random_state=42).copy()
    print(f"Created 'df_sample_fast' with {len(df_sample_fast)} reviews for rapid testing.")

    # Larger sample required for meaningful Topic Modeling
    df_sample_topic = df_cleaned.sample(2000, random_state=42).copy()
    print(f"Created 'df_sample_topic' with {len(df_sample_topic)} reviews for Topic Modeling.")

Created 'df_sample_fast' with 100 reviews for rapid testing.
Created 'df_sample_topic' with 2000 reviews for Topic Modeling.


## 4. AI Model Development & Validation

### A Note on Model Constraints (Token Limits)

All models based on the BERT or Transformer architecture (including DistilBERT and `pyabsa`'s underlying models) have a **maximum input size limit**, typically 512 tokens (approx. 350-400 words).

In a production environment, passing text longer than this limit will cause the model to fail. To build a robust pipeline, we must **truncate** the input text to this 512-token limit.

**Is this a problem?** No. For review analysis, this is an acceptable trade-off. The core sentiment and topics of a review are almost always contained in the first few paragraphs. By truncating the text, we gain massive performance and stability at a negligible cost to accuracy. The `bertopic` model handles this truncation automatically.



### 4.1. Sentiment Analysis
* **Business Question:** Are our customers generally happy or unhappy?
* **Model Chosen:** `distilbert-base-uncased-finetuned-sst-2-english`
* **Justification:** Chosen for its excellent balance of speed and accuracy, making it ideal for scalable, serverless environments.

In [5]:
print("Loading Sentiment Analysis model...")
sentiment_pipeline = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

def safe_get_sentiment(text):
    """Applies sentiment analysis with error handling."""
    try:
        return sentiment_pipeline(text[:512])[0]['label']
    except Exception as e:
        print(f"--> SENTIMENT FAILED on review: '{text[:50]}...' | Error: {e}")
        return "ERROR"

print(f"Applying Sentiment Analysis to the fast sample ({len(df_sample_fast)} reviews)...")
df_sample_fast['sentiment_prediction'] = df_sample_fast['full_review_text'].progress_apply(safe_get_sentiment)
print("Sentiment Analysis complete.")

# --- Quantitative Validation ---
print("\n--- Model Validation ---")
# Define "ground truth" based on user ratings (e.g., > 3 stars is positive)
df_sample_fast['true_sentiment'] = np.where(df_sample_fast['Rating'] > 3, 'POSITIVE', 'NEGATIVE')

# Calculate accuracy
accuracy = accuracy_score(df_sample_fast['true_sentiment'], df_sample_fast['sentiment_prediction'])
print(f"Sentiment Model Accuracy on Sample: {accuracy:.2%}")

# Display a detailed classification report
print("\nClassification Report:")
print(classification_report(df_sample_fast['true_sentiment'], df_sample_fast['sentiment_prediction']))

Loading Sentiment Analysis model...


Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


Applying Sentiment Analysis to the fast sample (100 reviews)...


100%|██████████| 100/100 [00:02<00:00, 45.52it/s]


Sentiment Analysis complete.

--- Model Validation ---
Sentiment Model Accuracy on Sample: 87.00%

Classification Report:
              precision    recall  f1-score   support

    NEGATIVE       0.68      0.88      0.76        24
    POSITIVE       0.96      0.87      0.91        76

    accuracy                           0.87       100
   macro avg       0.82      0.87      0.84       100
weighted avg       0.89      0.87      0.88       100



### 4.2. Zero-Shot Classification
* **Business Question:** What specific topics (e.g., 'price', 'shipping') are customers talking about?
* **Model Chosen:** `typeform/distilbert-base-uncased-mnli`
* **Justification:** This model provides a fast and highly accurate zero-shot classification capability. Its efficient "distilled" architecture is ideal for a serverless environment, ensuring high performance and scalability. This allows us to dynamically categorize reviews against key business topics (like 'price' or 'shipping') without needing to retrain the model for new categories.

In [6]:
print("Loading Zero-Shot Classification model (typeform/distilbert-base-uncased-mnli)...")
zero_shot_classifier = pipeline("zero-shot-classification", model="typeform/distilbert-base-uncased-mnli")
print("Zero-Shot model loaded successfully.")

candidate_labels = ['price', 'quality', 'shipping', 'customer service', 'fit', 'fabric']

def safe_get_top_topic(review_text):
    """Applies zero-shot classification with error handling."""
    try:
        return zero_shot_classifier(review_text[:512], candidate_labels)['labels'][0]
    except Exception as e:
        print(f"--> ZERO-SHOT FAILED on review: '{review_text[:50]}...' | Error: {e}")
        return "ERROR"

print(f"Applying Zero-Shot Classification to the fast sample ({len(df_sample_fast)} reviews)...")
df_sample_fast['zero_shot_topic'] = df_sample_fast['full_review_text'].progress_apply(safe_get_top_topic)
print("Zero-Shot Classification complete.")

# --- Qualitative Validation ---
print("\n--- Qualitative Validation Examples ---")
display(df_sample_fast[['full_review_text', 'zero_shot_topic']].head())

Loading Zero-Shot Classification model (typeform/distilbert-base-uncased-mnli)...


config.json: 100%|██████████| 776/776 [00:00<?, ?B/s] 
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
pytorch_model.bin: 100%|██████████| 268M/268M [00:04<00:00, 55.8MB/s] 
tokenizer_config.json: 100%|██████████| 258/258 [00:00<?, ?B/s] 
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
vocab.txt: 232kB [00:00, 6.53MB/s]
special_tokens_map.json: 100%|██████████| 112/112 [00:00<?, ?B/s] 
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from your `config.json` file.
The `xla_device` argument has been deprecated in v4.4.0 of Transformers. It is ignored and you can safely remove it from yo

Zero-Shot model loaded successfully.
Applying Zero-Shot Classification to the fast sample (100 reviews)...


100%|██████████| 100/100 [00:12<00:00,  7.79it/s]


Zero-Shot Classification complete.

--- Qualitative Validation Examples ---


Unnamed: 0,full_review_text,zero_shot_topic
8329,"Change armpits I love, love this dress except for the armpits. if they had just made the armpits...",quality
17943,"Awkward sweater I wanted this sweater to work but sadly it failed. first, the pink was way to sh...",quality
2157,Best. tee. ever. Oh my! i love this tee. it is super soft. i love how it doesn't look like a sac...,fit
11456,Well-made but lacks structure I love the style of this swimsuit on the model. when i purchased i...,quality
14386,"Strangely cut Was super excited to try this on, but had to go up 3 sizes from my normal 6 and th...",fit


### 4.3. Aspect-Based Sentiment Analysis (ABSA)
* **Business Question:** When customers discuss a topic, what specific *features* do they like or dislike?
* **Model Chosen:** `pyabsa` library (pre-trained 'english' checkpoint)
* **Justification:** We use this specialized library to extract fine-grained sentiment towards specific product features (aspects), offering actionable insights for product development.

In [7]:
print("Loading Aspect-Based Sentiment Analysis model...")
aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='english')

@stopit.threading_timeoutable(default="TIMEOUT_ERROR")
def analyze_review_with_timeout(review_text):
    """Core ABSA logic with a timeout decorator."""
    result = aspect_extractor.extract_aspect(inference_source=[review_text], print_result=False)
    if not result[0]['aspect']:
        return "N/A"
    return ", ".join([f"{aspect[0]} ({aspect[1]})" for aspect in result[0]['aspect']])

def safe_get_aspects(review_text):
    """Wrapper function that calls the core logic with a timeout and general error handling."""
    try:
        # Give each review a 20-second timeout to prevent stalling the entire batch.
        return analyze_review_with_timeout(review_text, timeout=20)
    except Exception as e:
        print(f"--> ABSA FAILED on review: '{review_text[:50]}...' | Error: {e}")
        return "GENERAL_ERROR"

print(f"Applying ABSA to the fast sample ({len(df_sample_fast)} reviews)...")
df_sample_fast['aspects'] = df_sample_fast['full_review_text'].progress_apply(safe_get_aspects)
print("ABSA complete.")

# --- Qualitative Validation & Error Checking ---
print("\n--- Qualitative Validation Examples ---")
display(df_sample_fast[df_sample_fast['aspects'] != 'N/A'][['full_review_text', 'aspects']].head())

failed_reviews = df_sample_fast[df_sample_fast['aspects'].str.contains("ERROR")]
if not failed_reviews.empty:
    print(f"\nWARNING: {len(failed_reviews)} reviews failed or timed out during ABSA.")

Loading Aspect-Based Sentiment Analysis model...
[2025-10-20 10:45:24] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-10-20 10:45:24] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-10-20 10:45:24] (2.4.2) Downloading checkpoint:english 
[2025-10-20 10:45:24] (2.4.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets
[2025-10-20 10:45:24] (2.4.2) Checkpoint already downloaded, skip
[2025-10-20 10:45:25] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-10-20 10:45:25] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-10-20 10:45:25] (2.4.2) Checkpoint: is not found, you can raise an issue for requesting shares of checkpoints
[2025-10-20 10:45:25] (2.4.2) No checkpoint found in Model Hub for task: 
[2025-10-20

Some weights of the model checkpoint at microsoft/deberta-v3-base were not used when initializing DebertaV2Model: ['mask_predictions.classifier.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.weight', 'mask_predictions.classifier.bias', 'lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.LayerNorm.weight', 'lm_predictions.lm_head.bias', 'mask_predictions.dense.bias', 'mask_predictions.dense.weight', 'mask_predictions.LayerNorm.bias', 'lm_predictions.lm_head.dense.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Special tokens have b

Applying ABSA to the fast sample (100 reviews)...


  0%|          | 0/100 [00:00<?, ?it/s]

[2025-10-20 10:45:36] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  2%|▏         | 2/100 [00:01<01:10,  1.39it/s]

[2025-10-20 10:45:37] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  3%|▎         | 3/100 [00:02<01:37,  1.01s/it]

[2025-10-20 10:45:38] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  4%|▍         | 4/100 [00:03<01:30,  1.06it/s]

[2025-10-20 10:45:40] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  5%|▌         | 5/100 [00:05<01:45,  1.11s/it]

[2025-10-20 10:45:41] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  6%|▌         | 6/100 [00:06<01:53,  1.21s/it]

[2025-10-20 10:45:43] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  7%|▋         | 7/100 [00:08<02:00,  1.30s/it]

[2025-10-20 10:45:44] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  8%|▊         | 8/100 [00:09<01:51,  1.21s/it]

[2025-10-20 10:45:45] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


  9%|▉         | 9/100 [00:10<01:52,  1.24s/it]

[2025-10-20 10:45:47] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 10%|█         | 10/100 [00:12<02:06,  1.41s/it]

[2025-10-20 10:45:48] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 11%|█         | 11/100 [00:13<02:08,  1.45s/it]

[2025-10-20 10:45:50] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 12%|█▏        | 12/100 [00:15<02:07,  1.45s/it]

[2025-10-20 10:45:51] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 13%|█▎        | 13/100 [00:16<01:54,  1.32s/it]

[2025-10-20 10:45:52] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 14%|█▍        | 14/100 [00:17<01:48,  1.27s/it]

[2025-10-20 10:45:53] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 15%|█▌        | 15/100 [00:18<01:51,  1.31s/it]

[2025-10-20 10:45:55] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 16%|█▌        | 16/100 [00:20<01:57,  1.40s/it]

[2025-10-20 10:45:56] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 17%|█▋        | 17/100 [00:21<01:56,  1.40s/it]

[2025-10-20 10:45:57] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 18%|█▊        | 18/100 [00:22<01:44,  1.27s/it]

[2025-10-20 10:45:58] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 19%|█▉        | 19/100 [00:23<01:40,  1.24s/it]

[2025-10-20 10:46:00] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 20%|██        | 20/100 [00:24<01:34,  1.19s/it]

[2025-10-20 10:46:01] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 21%|██        | 21/100 [00:26<01:38,  1.24s/it]

[2025-10-20 10:46:02] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 22%|██▏       | 22/100 [00:27<01:31,  1.17s/it]

[2025-10-20 10:46:03] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 23%|██▎       | 23/100 [00:28<01:36,  1.25s/it]

[2025-10-20 10:46:04] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 24%|██▍       | 24/100 [00:29<01:25,  1.12s/it]

[2025-10-20 10:46:06] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 25%|██▌       | 25/100 [00:30<01:29,  1.20s/it]

[2025-10-20 10:46:07] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 26%|██▌       | 26/100 [00:32<01:29,  1.21s/it]

[2025-10-20 10:46:08] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 27%|██▋       | 27/100 [00:33<01:38,  1.34s/it]

[2025-10-20 10:46:10] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 28%|██▊       | 28/100 [00:35<01:39,  1.38s/it]

[2025-10-20 10:46:11] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 29%|██▉       | 29/100 [00:36<01:30,  1.27s/it]

[2025-10-20 10:46:12] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 30%|███       | 30/100 [00:37<01:33,  1.34s/it]

[2025-10-20 10:46:14] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 31%|███       | 31/100 [00:39<01:30,  1.32s/it]

[2025-10-20 10:46:15] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 32%|███▏      | 32/100 [00:40<01:36,  1.42s/it]

[2025-10-20 10:46:17] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 33%|███▎      | 33/100 [00:41<01:30,  1.35s/it]

[2025-10-20 10:46:18] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 34%|███▍      | 34/100 [00:43<01:25,  1.29s/it]

[2025-10-20 10:46:19] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 35%|███▌      | 35/100 [00:44<01:21,  1.25s/it]

[2025-10-20 10:46:20] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 36%|███▌      | 36/100 [00:45<01:20,  1.26s/it]

[2025-10-20 10:46:21] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 37%|███▋      | 37/100 [00:46<01:14,  1.18s/it]

[2025-10-20 10:46:23] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 38%|███▊      | 38/100 [00:48<01:20,  1.30s/it]

[2025-10-20 10:46:24] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 39%|███▉      | 39/100 [00:49<01:24,  1.39s/it]

[2025-10-20 10:46:25] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 40%|████      | 40/100 [00:50<01:16,  1.28s/it]

[2025-10-20 10:46:26] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 41%|████      | 41/100 [00:51<01:10,  1.20s/it]

[2025-10-20 10:46:27] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 42%|████▏     | 42/100 [00:52<01:08,  1.19s/it]

[2025-10-20 10:46:29] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 43%|████▎     | 43/100 [00:54<01:08,  1.19s/it]

[2025-10-20 10:46:30] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 44%|████▍     | 44/100 [00:55<01:13,  1.32s/it]

[2025-10-20 10:46:32] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 45%|████▌     | 45/100 [00:57<01:13,  1.34s/it]

[2025-10-20 10:46:33] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 46%|████▌     | 46/100 [00:58<01:09,  1.28s/it]

[2025-10-20 10:46:34] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 47%|████▋     | 47/100 [00:59<01:00,  1.14s/it]

[2025-10-20 10:46:35] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 48%|████▊     | 48/100 [01:00<00:56,  1.10s/it]

[2025-10-20 10:46:36] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 49%|████▉     | 49/100 [01:01<00:58,  1.15s/it]

[2025-10-20 10:46:37] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 50%|█████     | 50/100 [01:02<00:54,  1.10s/it]

[2025-10-20 10:46:38] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 51%|█████     | 51/100 [01:03<01:01,  1.25s/it]

[2025-10-20 10:46:40] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 52%|█████▏    | 52/100 [01:05<00:58,  1.22s/it]

[2025-10-20 10:46:41] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 53%|█████▎    | 53/100 [01:06<00:59,  1.26s/it]

[2025-10-20 10:46:43] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 54%|█████▍    | 54/100 [01:08<01:05,  1.42s/it]

[2025-10-20 10:46:44] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 55%|█████▌    | 55/100 [01:09<00:57,  1.28s/it]

[2025-10-20 10:46:45] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 56%|█████▌    | 56/100 [01:10<01:02,  1.42s/it]

[2025-10-20 10:46:46] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 57%|█████▋    | 57/100 [01:11<00:53,  1.25s/it]

[2025-10-20 10:46:48] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 58%|█████▊    | 58/100 [01:13<00:53,  1.28s/it]

[2025-10-20 10:46:50] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 59%|█████▉    | 59/100 [01:14<00:59,  1.45s/it]

[2025-10-20 10:46:51] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 60%|██████    | 60/100 [01:15<00:52,  1.32s/it]

[2025-10-20 10:46:52] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 61%|██████    | 61/100 [01:16<00:48,  1.23s/it]

[2025-10-20 10:46:53] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 62%|██████▏   | 62/100 [01:18<00:47,  1.24s/it]

[2025-10-20 10:46:54] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 63%|██████▎   | 63/100 [01:19<00:47,  1.28s/it]

[2025-10-20 10:46:56] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 64%|██████▍   | 64/100 [01:20<00:47,  1.31s/it]

[2025-10-20 10:46:57] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 65%|██████▌   | 65/100 [01:21<00:42,  1.22s/it]

[2025-10-20 10:46:58] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 66%|██████▌   | 66/100 [01:22<00:39,  1.15s/it]

[2025-10-20 10:46:59] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 67%|██████▋   | 67/100 [01:23<00:36,  1.11s/it]

[2025-10-20 10:47:00] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 68%|██████▊   | 68/100 [01:25<00:36,  1.14s/it]

[2025-10-20 10:47:01] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 69%|██████▉   | 69/100 [01:26<00:35,  1.15s/it]

[2025-10-20 10:47:02] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 70%|███████   | 70/100 [01:27<00:32,  1.10s/it]

[2025-10-20 10:47:04] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 71%|███████   | 71/100 [01:28<00:35,  1.23s/it]

[2025-10-20 10:47:05] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 72%|███████▏  | 72/100 [01:30<00:34,  1.22s/it]

[2025-10-20 10:47:06] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 73%|███████▎  | 73/100 [01:31<00:34,  1.27s/it]

[2025-10-20 10:47:07] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 74%|███████▍  | 74/100 [01:32<00:32,  1.23s/it]

[2025-10-20 10:47:09] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 75%|███████▌  | 75/100 [01:33<00:32,  1.28s/it]

[2025-10-20 10:47:10] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 76%|███████▌  | 76/100 [01:35<00:29,  1.23s/it]

[2025-10-20 10:47:11] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 77%|███████▋  | 77/100 [01:36<00:27,  1.20s/it]

[2025-10-20 10:47:12] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 78%|███████▊  | 78/100 [01:37<00:24,  1.12s/it]

[2025-10-20 10:47:13] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 79%|███████▉  | 79/100 [01:38<00:23,  1.11s/it]

[2025-10-20 10:47:14] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 80%|████████  | 80/100 [01:39<00:25,  1.25s/it]

[2025-10-20 10:47:16] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 81%|████████  | 81/100 [01:41<00:24,  1.27s/it]

[2025-10-20 10:47:17] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 82%|████████▏ | 82/100 [01:41<00:19,  1.11s/it]

[2025-10-20 10:47:18] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 83%|████████▎ | 83/100 [01:43<00:19,  1.15s/it]

[2025-10-20 10:47:19] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 84%|████████▍ | 84/100 [01:44<00:20,  1.26s/it]

[2025-10-20 10:47:21] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 85%|████████▌ | 85/100 [01:45<00:19,  1.27s/it]

[2025-10-20 10:47:22] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 86%|████████▌ | 86/100 [01:47<00:17,  1.28s/it]

[2025-10-20 10:47:23] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 87%|████████▋ | 87/100 [01:48<00:16,  1.29s/it]

[2025-10-20 10:47:24] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 88%|████████▊ | 88/100 [01:49<00:14,  1.19s/it]

[2025-10-20 10:47:25] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 89%|████████▉ | 89/100 [01:50<00:12,  1.17s/it]

[2025-10-20 10:47:26] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 90%|█████████ | 90/100 [01:51<00:11,  1.13s/it]

[2025-10-20 10:47:28] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 91%|█████████ | 91/100 [01:52<00:10,  1.17s/it]

[2025-10-20 10:47:28] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 92%|█████████▏| 92/100 [01:53<00:08,  1.06s/it]

[2025-10-20 10:47:29] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 93%|█████████▎| 93/100 [01:54<00:07,  1.07s/it]

[2025-10-20 10:47:31] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 94%|█████████▍| 94/100 [01:56<00:06,  1.16s/it]

[2025-10-20 10:47:32] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 95%|█████████▌| 95/100 [01:57<00:06,  1.21s/it]

[2025-10-20 10:47:33] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 96%|█████████▌| 96/100 [01:58<00:04,  1.20s/it]

[2025-10-20 10:47:34] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 97%|█████████▋| 97/100 [01:59<00:03,  1.18s/it]

[2025-10-20 10:47:36] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 98%|█████████▊| 98/100 [02:00<00:02,  1.18s/it]

[2025-10-20 10:47:37] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


 99%|█████████▉| 99/100 [02:02<00:01,  1.13s/it]

[2025-10-20 10:47:38] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


100%|██████████| 100/100 [02:03<00:00,  1.11s/it]

[2025-10-20 10:47:39] (2.4.2) The results of aspect term extraction have been saved in d:\PROJECTS\reviewlens-ai\notebooks\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


100%|██████████| 100/100 [02:04<00:00,  1.24s/it]


ABSA complete.

--- Qualitative Validation Examples ---


Unnamed: 0,full_review_text,aspects
8329,"Change armpits I love, love this dress except for the armpits. if they had just made the armpits...",h (e)
17943,"Awkward sweater I wanted this sweater to work but sadly it failed. first, the pink was way to sh...","p (i), b (a), k (n)"
11456,Well-made but lacks structure I love the style of this swimsuit on the model. when i purchased i...,"s (t), w (i), l (i)"
14386,"Strangely cut Was super excited to try this on, but had to go up 3 sizes from my normal 6 and th...","c (u), w (a), c (u)"
18681,"Feminine and clean Size 8 always, 36c and i have broad shoulders...i found this true to size. th...","S (i), s (i), c (o)"


### 4.4. Topic Modeling
* **Business Question:** What are the hidden, high-level themes of conversation across all reviews?
* **Model Chosen:** `bertopic`
* **Justification:** BERTopic leverages transformer embeddings to find semantically coherent topics, which are more interpretable than traditional methods.

In [19]:
print(f"Starting Topic Modeling on the larger sample ({len(df_sample_topic)} reviews)...")
docs = df_sample_topic['full_review_text'].tolist()

from sklearn.feature_extraction.text import CountVectorizer, ENGLISH_STOP_WORDS

# 1. Create the list of stop words to ignore
stop_words = list(ENGLISH_STOP_WORDS)

# 2. Create the "employee" (CountVectorizer) and give it the stop word list
vectorizer = CountVectorizer(stop_words=stop_words)

# 3. Initialize BERTopic and tell it to use our specific "employee"
topic_model = BERTopic(language="english", vectorizer_model=vectorizer, verbose=False)

topics, probs = topic_model.fit_transform(docs)

print("Topic Modeling complete!")

# --- Qualitative Validation ---
print("\n--- Discovered Topics Summary ---")

# Display a summary of the most prominent topics. Topic -1 contains outliers and can be ignored.
# We also show only the top 10 most frequent topics (plus Topic -1)
display(topic_model.get_topic_info().head(5))

Starting Topic Modeling on the larger sample (2000 reviews)...
Topic Modeling complete!

--- Discovered Topics Summary ---


Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,543,-1_dress_love_size_like,"[dress, love, size, like, fabric, wear, just, great, im, fit]","[Amazing! This dress is much cuter in person than it looks on the model online. it's so flattering, it fits like a dream, and i love the fabric. i love that the skirt is a true a-line, not poofy, so it's sleek, slimming and flattering. the neckline is feminine and gorgeous and is a bit sexy, while still being classy. oh, and the length, perfection! not too long, not too short. i never ever pay full price and for this dress i had to. i am a curvy size 4 and it fits perfectly true to size. i couldn't be hap, Love it but runs small I love the print and design, the photo does not do it justice. i fell in love with it in the store but could not find my size. i ordered my size online but it was still small and short. it came up to my waist, and was tight in the shoulders and waist. i really wish it wasn't so small. i love love this top, the colors are vibrant and shiny, totally my style. i highly recommend it if it fits., Great dress! I'm 6 ft tall and usually wear a size 10/12, but had to size up to a 14 because this dress runs a bit small around the waist. it's a beautiful dress. it hit me right below the knee, so might be long on shorter gals. i received many compliments on this dress. the colors are beautiful. the main color of the dress is a dark navy. it looked black online. definitely buy this dress! it's a great work dress and you won't be sorry that you did.]"
1,0,256,0_cute_love_flattering_like,"[cute, love, flattering, like, looks, just, im, pretty, great, small]","[Love this top This top is so much prettier on. you can dress it up or down. it does run a tad large. i'm 5'7"" and 145 lbs. i am usually in between a small and medium but usually a medium. i went for the small on this and i loved the fit of it. still pretty flowy and the medium would have been just too big. it is a tunic so it runs kind of long. a tiny bit too long for me but i will still wear it a lot., Really thick, quality top. very swingy bottom I bought the red and white. both tank tops were a nice, thick material- definitely high quality. i was hoping there would just be a little bit of swing at the bottom, just enough to be loose and flattering. but there is a ton of extra fabric at the end, which with a larger bust wound up edging into looking a little maternity. but i think it'd be great on you if you tend to do well with very drapey or swingy tops. it runs a touch big- im normally between a small and an xs, and the xs was not even, Adorable top I loved this top from the first time i saw it online. i ordered a size 10 ~i'm slim with broad shoulder and 36c~ the top fit perfect. when i opened the pkg i thought it would be too big since it looked wide but that was not the case at all. its a little longer in the back and a little shorter in the front. on me~it looks exactly like it does in the picture.its really a cute top and it looks lovely over shorts. makes a cute, casual summer outfit. i cant wait to wear it.]"
2,1,219,1_pants_jeans_fit_size,"[pants, jeans, fit, size, pair, great, love, legs, stretch, like]","[Amazing comfortable pants! I really didn't expect to like these pants as much as i do. i ordered a pair of the pilcro linen wide legs and really loved the cute added lace details on them. i thought i would exchange these for another pair of those, but after trying both pairs on, i realized how much softer and heavier these are. i will keep the one pair of the pilcro, but they did feel a little scratchy. these have a very nice drape to them and they do make your bum look nice. i ordered a size 8, which is what i seem to we, Great pants! These pants are beautiful. great fit and great fabric. really comfortable and elegant. i am tall 5'8"" and have a long torso so it's nice to find pants that are high waisted., I love these jeans! These jeans are so comfortable. they are well-made. they are the perfect pair of jeans. the only thing i do not love is they stretch out a little bit after wearing. i am going to order these again in the next smaller size.]"
3,2,137,2_sweater_soft_love_cozy,"[sweater, soft, love, cozy, great, sleeves, warm, color, wear, like]","[Another fabulous cashmere sweater! The sweater is very soft. it has an interesting pattern. i purchased the neutral color in size xs which runs tts. it's definitely a cozy sweater to wear this winter., Comfortable sweater I love this sweater. it's warm, comfortable, and soft. nice quality too., Love it! This is a great sweater. soft and beautiful. it's my new go to sweater??]"
4,3,122,3_shirt_tee_white_tshirt,"[shirt, tee, white, tshirt, cute, great, really, bought, love, wear]","[Love it!! Love the shirt! fits great and is so cute! a must have for fall!, Cute variation on a t shirt I just got this shirt the other day, and already love it. it fits like a basic tee, but the details on the sleeves and back are really cute. the shape is loose at the bottom, which makes this really comfortable.\r\nmy only complaint is it's a little tight across the back, but this does not hurt the overall fit and comfort of the top. this is a little thicker than a tee shirt too, so the quality feels nice., Love This shirt is so comfortable and looks very nice with jeans or work pants. it's a great shirt!]"


#### **How to Read This Table:**

* **`Topic`**: The ID number for the theme.
    * **Topic `-1` (The Outliers):** This is the **outlier group**. It contains all unique, one-off reviews that don't fit into a larger theme. **You should ignore Topic -1 when analyzing trends.**
* **`Count`**: The number of reviews in that theme. This shows you how popular a topic is.
* **`Representation`**: The **new, clean keywords** that best describe the theme. After remove the stop-words, we can see the true essence of the conversation (e.g., `[jeans, fit, pants, love]`).
* **`Representative_Docs`**: A full, real review from that group, which provides the ultimate context. **Read this column to give the topic a human-readable name.**

**Example Interpretation:**
* **Topic 0:** Keywords might be `[dress, love, fit, size]`. This is the **"Dress Reviews"** cluster.
* **Topic 1:** Keywords might be `[jeans, fit, pants, comfortable]`. This is the **"Pants & Jeans"** cluster.

## 5. Final Result & Conclusion
The sample DataFrame is now fully enriched with insights from all AI models. This validated logic is ready to be integrated into the production Lambda functions. The final data structure provides a multi-dimensional view of each customer review, enabling a rich, interactive analysis on the final dashboard.

In [9]:
# Select and reorder columns for a clean final view
# Note: 'bertopic_id' is not in this fast sample; it's a corpus-level insight.
final_columns = [
    'full_review_text', 
    'sentiment_prediction', 
    'zero_shot_topic',
    'aspects',
    'Rating',
    'true_sentiment'
]

print("--- Final Enriched Sample DataFrame (from fast sample) ---")
display(df_sample_fast[final_columns].head(10))

--- Final Enriched Sample DataFrame (from fast sample) ---


Unnamed: 0,full_review_text,sentiment_prediction,zero_shot_topic,aspects,Rating,true_sentiment
8329,"Change armpits I love, love this dress except for the armpits. if they had just made the armpits...",NEGATIVE,quality,h (e),4,POSITIVE
17943,"Awkward sweater I wanted this sweater to work but sadly it failed. first, the pink was way to sh...",NEGATIVE,quality,"p (i), b (a), k (n)",2,NEGATIVE
2157,Best. tee. ever. Oh my! i love this tee. it is super soft. i love how it doesn't look like a sac...,POSITIVE,fit,,5,POSITIVE
11456,Well-made but lacks structure I love the style of this swimsuit on the model. when i purchased i...,NEGATIVE,quality,"s (t), w (i), l (i)",3,NEGATIVE
14386,"Strangely cut Was super excited to try this on, but had to go up 3 sizes from my normal 6 and th...",NEGATIVE,fit,"c (u), w (a), c (u)",2,NEGATIVE
18681,"Feminine and clean Size 8 always, 36c and i have broad shoulders...i found this true to size. th...",POSITIVE,fit,"S (i), s (i), c (o)",5,POSITIVE
4124,"So comfortable I love the style of this dress, comfortable but chic, and it feels great!",POSITIVE,quality,s (t),5,POSITIVE
7991,Perfect lwd! This is perfect! it fits tts. i am usually an 8 and this was spot on. it isn't too ...,POSITIVE,quality,"l (w), f (i), h (u)",5,POSITIVE
8409,Yessssss!!!!! A culotte and basketball short hybrid. these culottes are heaven....very comfortab...,POSITIVE,fit,"c (u), w (a), b (a)",5,POSITIVE
21423,Love the color Love the color and the design. it is slightly see-through which i don't like but ...,POSITIVE,quality,"c (o), c (o), d (e)",5,POSITIVE


The DataFrame above represents the **final enriched product** for a *single sample* of reviews. 
Each new column provides a different layer of AI-driven insight:

* **`full_review_text` (Input):** The raw, sanitized text (Title + Review Text) that was fed into the pipeline.
* **`sentiment_prediction` (AI Layer 1):** The **overall sentiment** of the entire review (e.g., `POSITIVE`/`NEGATIVE`), as determined by our *Sentiment Lambda*.
* **`zero_shot_topic` (AI Layer 2):** The **primary topic** of the review, dynamically classified into one of our predefined business categories (e.g., `price`, `quality`), as determined by our *Zero-Shot Lambda*.
* **`aspects` (AI Layer 3):** The **most granular and actionable analysis**. This extracts the specific *features* (aspects) mentioned and the sentiment attached *to each one* (e.g., `fabric (NEGATIVE)`), as determined by our *ABSA Lambda*.
* **`Rating` (Input):** The original 1-5 star rating provided by the user.
* **`true_sentiment` (Validation Only):** A "ground truth" column created *only* in this notebook to validate our model. This column is not part of the final production pipeline.

## 6. Conclusive Analysis: How to Interpret and Use These Results

The final, enriched table is a powerful business tool. Its value is unlocked by asking specific questions and combining the AI-generated columns to get answers.

Here is a practical guide on how a business stakeholder would use this data.

### Business Question 1: "How are we doing?"

**Analysis:** Use the `sentiment_prediction` column.

By aggregating this column, a manager can get an instant, high-level KPI of overall customer happiness. This dashboard metric can be tracked weekly to spot immediate changes in brand perception.

**Example Insight:**
* "This week, our Positive Sentiment Score dropped by 8%."

---
### Business Question 2: "Our sentiment dropped... *Why*?"

**Analysis:** Combine `sentiment_prediction` and `zero_shot_topic`.

This is the first level of diagnosis. A manager can filter for all reviews where `sentiment_prediction == 'NEGATIVE'` and then create a bar chart of the `zero_shot_topic` column.

**Example Insight:**
* "Our sentiment dropped because of a 60% spike in complaints. By filtering for those complaints, we see that **72% of them are about 'shipping'**."
* **Action:** The business now knows exactly where the problem is. They don't need to waste time investigating 'price' or 'quality'; they have a clear priority.

---
### Business Question 3: "Okay, shipping is the problem. But *what about* shipping?"

**Analysis:** Combine `zero_shot_topic` and `aspects`.

This is the most powerful, granular insight. The manager can now filter for all reviews where `zero_shot_topic == 'shipping'` and analyze the `aspects` column.

**Example Insight:**
* By creating a word cloud from the `aspects` in this segment, they see that the most common phrases are **`delivery time (NEGATIVE)`** and **`box (NEGATIVE)`**.
* **Action:** The business has its final answer. The problem isn't the cost of shipping; it's that the courier is slow and the packaging is getting damaged. They can now take surgical action, like renegotiating with their courier or improving their packaging materials.

---
### Business Question 4: "What problems are we not even aware of?"

**Analysis:** Use the `bertopic_id` from the full, corpus-level analysis (as seen in section 4.4).

This analysis is performed by the `stitcher-lambda` on the entire dataset. It automatically clusters reviews by "hidden themes" that we didn't define in advance.

**Example Insight:**
* By examining the `topic_model.get_topic_info()` output, the manager spots a new cluster (e.g., `[jeans, fit, pants, comfortable]`).
* **Action:** The manager realizes that a huge number of customers (Count: 219) are discussing jeans not just in terms of price or quality (our predefined topics), but specifically in terms of fit and comfort. This hidden theme was previously invisible.

---
### Final Conclusion

This notebook has validated the logic for each of these four analytical layers. The deployed AWS pipeline is built to perform this exact multi-layer analysis at scale, transforming raw text feedback from a "cost center" (something to be stored) into a **strategic asset** (something to be queried).