Fake Amazon Product Review Detection

Overview

This project tackles the challenge of detecting fake reviews on Amazon, differentiating between genuine reviews written by humans and those generated by a GPT-2 model. This task not only addresses a significant issue impacting consumer trust and business ethics on online platforms but also explores the capabilities of generative models in both crafting and identifying synthetic text. Inspired by the GROVER model's self-detection premise, this work expands upon prior research by Salminen et al., 2021, which utilized GPT-2 for the generation and RoBERTa for the detection of synthetic reviews.

Dataset

The dataset comprises 40,000 reviews, balanced between original human-written and fake computer-generated entries, and was created from the work of Salminen et al., 2021. These reviews span various product categories, providing a rich basis for training and evaluating the model's performance. The data includes:

Text: The review text.
Label: Binary labels indicating whether a review is original ('OR') or computer-generated ('CG').

Model

The model architecture employed is a fine-tuned GPT-2, originally used for generating fake reviews, which has been repurposed for classification tasks. This project specifically investigates the effectiveness of using a generative model as a discriminator, hypothesizing that it can effectively identify nuances in text it is trained to generate.

Methodology

Data Preprocessing: Standard text preprocessing alongside specialized tokenization for the GPT-2 model.
Model Adaptation: The GPT-2 model, fine-tuned for generation of reviews belonging to a particular product category (e.g. Books), is adapted to perform binary classification.
Training and Evaluation: The model is trained on a labeled dataset and evaluated based on accuracy, precision, recall, and F1-score metrics against a test set and various unseen product categories to assess generalizability.

Results

The model demonstrated excellent capability in distinguishing between genuine and generated reviews, achieving an accuracy of 97.26% on the test set. Moreover, it showed promising generalizability across different product categories not included in the training data.

Usage

The repository includes Jupyter notebooks that detail the entire process from data loading, model training, and evaluation. Users can replicate the study or use the methodology as a framework for related tasks in fake detection.

Credits

Original Paper on Creating and Detecting Fake Reviews: Joni Salminen et al., 2021
Original Datasets and Pre-trained Models: Joni Salminen et al., 2021
Inspiration and Conceptual Framework: Zellers et al., 2020, on defending against neural fake news using the generator model itself as a discriminator.
Development: This project was developed by Noah Meurer, building on the foundational models and datasets provided by prior researchers.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
DL_Project_vFinal.ipynb		DL_Project_vFinal.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake Amazon Product Review Detection

Overview

Dataset

Model

Methodology

Results

Usage

Credits

About

Releases

Packages

Languages

noahminds/FakeAmazonReviewDetector

Folders and files

Latest commit

History

Repository files navigation

Fake Amazon Product Review Detection

Overview

Dataset

Model

Methodology

Results

Usage

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages