Amazon Product Review Sentiment Analysis

Project Overview

This project implements a deep learning-based sentiment classification system for Amazon product reviews. The goal is to classify reviews as either POSITIVE or NEGATIVE.

To analyze performance and interpretability, this project implements and compares two distinct model architectures:

LSTM (Long Short-Term Memory): A LSTM trained from scratch using word embeddings to capture sequential dependencies.
Transformer (DistilBERT): A pre-trained BERT-based model fine-tuned on the dataset, utilizing self-attention mechanisms for superior context understanding.

Key features include:

Data Preprocessing: Cleaning, tokenization, and handling class imbalance using weighted loss functions.
Model Comparison: Side-by-side evaluation of LSTM and Transformer performance.
Interpretability: Visualization of Attention Heatmaps to understand which words influence the Transformer's predictions.

Setup Instructions

1. Prerequisites

Python 3.8 or higher
Git

2. Installation

Clone the repository and install the required dependencies:

# Clone the repository
git clone <your-repo-url>
cd amazon-sentiment-analysis

# Create a virtual environment (Recommended)
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

3. Data Preparation

The datasets used in this project are publicly available on Kaggle. Please download them from the link below:

Dataset Link: https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products,

1429_1.csv
Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv
Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv

Please create a folder named data/ in the project root, extract and place the following three CSV files into the data/ folder

How to Run

Option 1: Run the Demo (Inference & Visualization)

To quickly test the models and see attention maps without training, run the Jupyter Notebook demo:

Navigate to the demo folder:

cd demo

Start Jupyter Notebook:

jupyter notebook demo.ipynb

Run all cells. The notebook will:

Load the pre-trained models.
Predict sentiment for sample reviews.
Generate and save an attention heatmap.

Option 2: Train the Models

To retrain both models from scratch:

python src/main.py

This script will:

Load and preprocess the data.
Train the LSTM model and save the best version.
Fine-tune the DistilBERT Transformer model.
Evaluate both models on the test set.
Save the trained models to models/ and evaluation metrics to results/.

Expected Output

Demo Output

After running demo.ipynb, you will see:

1.Prediction Table: A comparison of predictions from both models on sample text.

Review	LSTM	Transformer
This product is absolutely amazing! I use it...	POSITIVE	POSITIVE
Terrible quality. It broke after one use...	NEGATIVE	NEGATIVE

Attention Heatmap: A visualization image (results/attention_heatmap.png) showing how the Transformer focuses on key sentiment words (e.g., "amazing", "terrible").
Metrics: A summary of the models' performance on the validation set.

Training Output

Upon completion of src/main.py, a comparison table will be generated in results/model_comparison.csv:

Model	Accuracy	F1-Macro
LSTM	0.9388	0.7851
Transformer	0.9816	0.9129

Pre-trained Model Link

To run the demo without retraining the models, please download the pre-trained model artifacts (both LSTM and Transformer) from the link below:

https://drive.google.com/drive/folders/1aq08vMzVbhGcxJe1DG1ndvmdZvXLknqc?usp=drive_link

Setup Instructions:

Download the ZIP file from the link above.
Extract the contents into the models/ directory in the root of this repository.

Ensure your directory structure looks like this:

models/
├── final_lstm_model/
│   ├── model.pt
│   ├── config.json
│   └── vocab.json
└── final_transformer_model/
    ├── config.json
    ├── pytorch_model.bin (or model.safetensors)
    ├── vocab.txt
    └── ...

Acknowledgments

This project references the following resources:

Data Preprocessing Strategy: Adapted from the Kaggle notebook by kfrawee. Source: Sentimental Analysis on Kaggle
LSTM Model Architecture: Inspired by the Amazon Sentiment analysis project by iamirmasoud. Source: Amazon Sentiment GitHub Repository
Transformer Model: Utilizes DistilBERT from Hugging Face Transformers. Dataset: Consumer Reviews of Amazon Products provided by Datafiniti.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
demo		demo
results		results
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Product Review Sentiment Analysis

Project Overview

Setup Instructions

1. Prerequisites

2. Installation

3. Data Preparation

How to Run

Expected Output

Demo Output

Training Output

Pre-trained Model Link

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

phoebeychen/amazon-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Review Sentiment Analysis

Project Overview

Setup Instructions

1. Prerequisites

2. Installation

3. Data Preparation

How to Run

Expected Output

Demo Output

Training Output

Pre-trained Model Link

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages