Skip to content

saiteki-kai/amazon-reviews

Repository files navigation

Amazon Reviews

An aspect based sentiment analysis on Amazon reviews using ASUM and JST models.

Dataset

The analysis focuses on a subset of Amazon product reviews, specifically in the "Computer Internal Components" subcategory. You can access the data at https://nijianmo.github.io/amazon/index.html.

Jianmo Ni, Jiacheng Li, Julian McAuley Empirical Methods in Natural Language Processing (EMNLP), 2019

Models

The model executables are generated from the following projects:

Yohan Jo and Alice Oh, Aspect and Sentiment Unification Model for Online Review Analysis, In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM), 2011

Lin, C., He, Y., Everson, R. and Reuger, S. Weakly-supervised Joint Sentiment-Topic Detection from Text, IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011

The processing only includes English reviews, which are identified using the fastText model.

A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification, 2016

Setup

Create a virtual environment

pip install virtualenv
virtualenv venv
source venv/bin/activate

Install the dependencies

pip install -r requirements.txt

Install the "reviews" package

pip install -e .

Download the dataset consisting of reviews and metadata from the Electronics category in the ./data/raw/ folder.

Run the following scripts to filter the products metadata by the category "Computer Internal Components" and then obtain the corresponding subset of reviews

python scripts/filter-metadata.py
python scripts/filter-reviews.py 

Download the project in the root folder and generate the executable for JST

mkdir "bin"
cd JST/Debug
make
mv jst ../../bin/

Download the project in the root folder and generate the executable for ASUM

cd ASUM/ASUM/bin
echo -en "Main-Class: sto2.STO2Core\n" > manifest.mf
jar -cvf ASUM.jar manifest.mf **/*.class
mv ASUM.jar ../../../bin/

Execute the notebooks to perform the processing and the analysis.

./notebooks
    01_clean.ipynb       # Data cleaning
    02_analysis.ipynb    # Exploratory data analysis
    03_processing.ipynb  # Text processing
    04_jst.ipynb         # JST traning and performance
    05_asum.ipynb        # ASUM traning and performance
    06_results.ipynb     # Results and comparison of the models

Launch the dashboard

python dashboard/run.py

# or 

gunicorn dashboard.run:server