Skip to content

Using Amazon Product Reviews to determine bias of favorable vine reviews by performing ETL and data Analysis

Notifications You must be signed in to change notification settings

moesteelo/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis

Performing ETL and data Analysis on Amazon Sport Product Reviews to determine bias of favorable vine reviews

Overview

The purpose of this projec is to uncover favorabilty of Vine program reviews by analyzing Amazon sport product reviews. In order to analyze and uncover any bias reviews a large amount of review data must be processed using Pyspark, AWS, Postgress and lastly analyze the date with Pandas.

Database Schema Postgres

In order to properly process and analyze review data, I have created a Postgress database schema within AWS to load the data into once it is transformed.

Screen Shot 2022-02-17 at 8 11 26 PM

ETL PySpark Process

Step 1 process:

Pyspark is first used to extract ad read in review data from an Amazon S3 Data storage server.

Screen Shot 2022-02-17 at 8 35 31 PM

Step 2 process:

Pyspark is than used to transform the data by creating Dataframes form subsets, which will match the tables within the Postgress database and schema constraints. Once the dataframes are created, they are then loaded from AWS storage server to Postgres database

Screen Shot 2022-02-17 at 8 31 48 PM

Analyzing with Pandas

Once the data is loaded to Pandas in a csv file, more subsets and dataframes were created to filter results needed to perform the analysis.

(1)

Results

I have created the below reviews summary for vine vs no vine reviews for Amazon Sports equipment.

Screen Shot 2022-02-18 at 4 32 04 PM

  • In total there were 61,948 reviews of sports equipment, with approximate 5% vine reviews and 95% non-vine reviews.
  • 139 of vine reviews are 5 star, while 32,665 of non-vine reviews are 5 star.
  • 42% of vine reviews are 5 star, as well as 53% of non-vine reviews are 5 star.

Summary

Based on the data summary, there is sufficent evidence of positvity bias for reviews in the Vine program. The non-vine reviews are approximately 10% higher in 5 star ratings compared to vine reviews.

Further Analysis

To provide a more in-depth analysis on customer votes and reviews. Adding "helpful_votes" as a factor in the contribution of sports equipment being purchased can help distinguish if 5-star ratings play a factor in purchases.

Resources

-Amazon Review Datasets

-Amazon Sports Dataset

About

Using Amazon Product Reviews to determine bias of favorable vine reviews by performing ETL and data Analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published