# AMAZON CUSTOMER REVIEW

## 1.0 BUSINESS UNDERSTANDING

In the online market of today, customer reviews are an essential part of purchasing decisions. Amazon, being a giant online store, collects millions of product reviews that indicate customer satisfaction, product quality, and overall user experience. It is not efficient, however, to process such vast data manually as it is time-consuming.

Sentiment analysis enables companies to analyze customers' feedback automatically, extract meaningful information, and make knowledgeable decisions to improve products, enhance customer experience, and refine marketing strategies.









## 1.1 PROBLEM STATEMENT

Amazon gets millions of reviews, and it's not possible to read and analyze them manually. We need an automated system for sentiment analysis to categorize the reviews as positive, negative, or neutral and also to gain insightful information too.

## 1.2 OBJECTIVES

## 1.2.1 Main Objectives

To accurately determine the overall emotional tone (positive, negative, or neutral) of customer reviews by leveraging Natural Language Processing (NLP) and Machine Learning techniques.

## 1.2.2 Specific Objectives

* Identify trends in customer satisfaction.

* Improve customer experience by addressing negative feedback.

* Help businesses optimize their product offerings based on user sentiment.


## 1.3 Business Questions

* What percentage of customer reviews are positive, negative, or neutral?
* Are there specific features or keywords associated with  reviews?
* Can sentiment analysis help predict potential or customer dissatisfaction?
* Can the  business use sentiment insights to improve product quality and customer support?


## 1.4 Metric of Success

# 2.0 DATA UNDERSTANDING

The dataset used for this sentiment analysis project consists of Amazon product reviews, which provide insights into customer opinions about various products. It contains 1,597 records with 27 columns, capturing details about the product, review content and user feedback.


The dataset comprises of the following columns:

id → Unique identifier for each review.

asins → Amazon Standard Identification Number (ASIN) of the product.

brand → Brand of the product.

categories → Product categories (e.g., "Amazon Devices").

colors → Available colors of the product (often missing).

dateAdded → Date the review was added to the dataset.

dateUpdated → Date the review was last updated.

dimension → Physical dimensions of the product.

manufacturer → Manufacturer of the product.

manufacturerNumber → Manufacturer’s product number.

name → Product name.

prices → Pricing details of the product.

reviews.date → Date when the review was posted.

reviews.doRecommend → Whether the reviewer recommends the product (Yes/No).

eviews.numHelpful → Number of users who found the review helpful.

reviews.rating → Star rating given by the reviewer (1 to 5).

reviews.sourceURLs → URL of the original review page.

reviews.text → Full text of the review (Main feature for sentiment analysis).

reviews.title → Title of the review (Summary of the review).

reviews.username → Username of the reviewer.

reviews.userCity → City of the reviewer (Mostly missing).

reviews.userProvince → Province of the reviewer (Mostly missing).

sizes → Available sizes of the product (Mostly empty).

upc → Universal Product Code (UPC).
                                        
weight → Weight of the product.

### 2.1 Exploring The Dataset
Here we will explore the dataset by:

- Asserting the shape of the dataset
- Checking the statistical distribution for numeric columns
- Exploring the data type for each column


In [58]:
##import the relevant libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import re

from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve, auc


import warnings
warnings.filterwarnings("ignore")


In [59]:
#Loading the dataset
df = pd.read_csv('Amazon Reviews.csv')

In [60]:
df.head()

Unnamed: 0,id,asins,brand,categories,colors,dateAdded,dateUpdated,dimension,ean,keys,...,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.userCity,reviews.userProvince,reviews.username,sizes,upc,weight
0,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I initially had trouble deciding between the p...,"Paperwhite voyage, no regrets!",,,Cristina M,,,205 grams
1,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,Allow me to preface this with a little history...,One Simply Could Not Ask For More,,,Ricky,,,205 grams
2,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,4.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I am enjoying it so far. Great for reading. Ha...,Great for those that just want an e-reader,,,Tedd Gardiner,,,205 grams
3,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I bought one of the first Paperwhites and have...,Love / Hate relationship,,,Dougal,,,205 grams
4,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I have to say upfront - I don't like coroporat...,I LOVE IT,,,Miljan David Tanic,,,205 grams


In [61]:
df.tail()

Unnamed: 0,id,asins,brand,categories,colors,dateAdded,dateUpdated,dimension,ean,keys,...,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.userCity,reviews.userProvince,reviews.username,sizes,upc,weight
1592,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Acc...",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,alexavoiceremoteforamazonfiretvfiretvstick/b00...,...,3.0,https://www.amazon.com/Alexa-Voice-Remote-Amaz...,This is not the same remote that I got for my ...,I would be disappointed with myself if i produ...,,,GregAmandawith4,,,4 ounces
1593,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Acc...",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,alexavoiceremoteforamazonfiretvfiretvstick/b00...,...,1.0,https://www.amazon.com/Alexa-Voice-Remote-Amaz...,I have had to change the batteries in this rem...,Battery draining remote!!!!,,,Amazon Customer,,,4 ounces
1594,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Acc...",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,alexavoiceremoteforamazonfiretvfiretvstick/b00...,...,1.0,https://www.amazon.com/Alexa-Voice-Remote-Amaz...,"Remote did not activate, nor did it connect to...",replacing an even worse remote. Waste of time,,,Amazon Customer,,,4 ounces
1595,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Acc...",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,alexavoiceremoteforamazonfiretvfiretvstick/b00...,...,3.0,https://www.amazon.com/Alexa-Voice-Remote-Amaz...,It does the job but is super over priced. I fe...,Overpriced,,,Meg Ashley,,,4 ounces
1596,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Acc...",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,alexavoiceremoteforamazonfiretvfiretvstick/b00...,...,1.0,https://www.amazon.com/Alexa-Voice-Remote-Amaz...,I ordered this item to replace the one that no...,I am sending all of this crap back to amazon a...,,,DIANE K,,,4 ounces


In [62]:
df.shape

(1597, 27)

In [63]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1597 entries, 0 to 1596
Data columns (total 27 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    1597 non-null   object 
 1   asins                 1597 non-null   object 
 2   brand                 1597 non-null   object 
 3   categories            1597 non-null   object 
 4   colors                774 non-null    object 
 5   dateAdded             1597 non-null   object 
 6   dateUpdated           1597 non-null   object 
 7   dimension             565 non-null    object 
 8   ean                   898 non-null    float64
 9   keys                  1597 non-null   object 
 10  manufacturer          965 non-null    object 
 11  manufacturerNumber    902 non-null    object 
 12  name                  1597 non-null   object 
 13  prices                1597 non-null   object 
 14  reviews.date          1217 non-null   object 
 15  reviews.doRecommend  

In [64]:
df.describe()

Unnamed: 0,ean,reviews.numHelpful,reviews.rating,reviews.userCity,reviews.userProvince,sizes,upc
count,898.0,900.0,1177.0,0.0,0.0,0.0,898.0
mean,844313500000.0,83.584444,4.359388,,,,844313500000.0
std,3416444000.0,197.150238,1.021445,,,,3416444000.0
min,841667000000.0,0.0,1.0,,,,841667000000.0
25%,841667000000.0,0.0,4.0,,,,841667000000.0
50%,841667000000.0,0.0,5.0,,,,841667000000.0
75%,848719000000.0,34.0,5.0,,,,848719000000.0
max,848719000000.0,997.0,5.0,,,,848719000000.0


# Data Cleaning

In [65]:
# Saving a copy
df1 = df.copy(deep = True)


In [66]:
#Changing the column format
# Replacing the dots to underscore
df1.columns = df1.columns.str.replace('.','_')

# Function to add underscores in compound words
def add_underscores(col_name):
    return re.sub(r'(?<!^)(?=[A-Z])', '_', col_name)

# Apply the function to all column names
df1.columns = [add_underscores(col) for col in df1.columns]


#Confirm changes
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1597 entries, 0 to 1596
Data columns (total 27 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     1597 non-null   object 
 1   asins                  1597 non-null   object 
 2   brand                  1597 non-null   object 
 3   categories             1597 non-null   object 
 4   colors                 774 non-null    object 
 5   date_Added             1597 non-null   object 
 6   date_Updated           1597 non-null   object 
 7   dimension              565 non-null    object 
 8   ean                    898 non-null    float64
 9   keys                   1597 non-null   object 
 10  manufacturer           965 non-null    object 
 11  manufacturer_Number    902 non-null    object 
 12  name                   1597 non-null   object 
 13  prices                 1597 non-null   object 
 14  reviews_date           1217 non-null   object 
 15  revi

In [67]:
df1['reviews_date'].value_counts()

reviews_date
2014-07-28T00:00:00Z        42
2014-07-24T00:00:00Z        31
2014-05-02T05:00:00Z        24
2014-04-07T00:00:00Z        23
2014-04-03T05:00:00Z        22
                            ..
2017-05-26T00:00:00.000Z     1
2017-05-22T00:00:00.000Z     1
2017-05-20T00:00:00.000Z     1
2017-05-19T00:00:00.000Z     1
2016-07-31T00:00:00Z         1
Name: count, Length: 382, dtype: int64

In [68]:
df1['categories'].value_counts()

categories
Amazon Devices,Home,Smart Home & Connected Living,Smart Hubs & Wireless Routers,Smart Hubs,Home Improvement,Home Safety & Security,Alarms & Sensors,Home Security,Amazon Echo,Home, Garage & Office,Smart Home,Voice Assistants,Amazon Tap,Electronics Features,TVs & Electronics,Portable Audio & Electronics,MP3 Player Accessories,Home Theater & Audio,Speakers,Featured Brands,Electronics,Kindle Store,Frys,Electronic Components,Home Automation,Electronics, Tech Toys, Movies, Music,Audio,Bluetooth Speakers    542
Amazon Devices                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

In [69]:
df1["manufacturer"].value_counts()

manufacturer
Amazon    832
AMDSI     133
Name: count, dtype: int64

In [70]:
df1['keys'].value_counts()

keys
amazontapalexaenabledportablebluetoothspeaker/b01bh83oom,amazonamazontapportablebluetoothwifispeakerblack/5097300,841667107929,amazonecho/52353110,0841667107929,tapalexaenabledportablebluetoothspeaker/05743627000p,amazon/53004496,amazon/b01bh83oom,amazontapportablebluetoothwifispeakerblack/1001803403    542
848719022827,0848719022827,amazonfiretv/b00cx5p8fc,amazon/848719022827,2015amazonfiretv4kultrahddigitalmediastreamersealedbox/151842588028,brandnewamazonfiretvboxsealedinretailbox2015/272013292018                                                                                                             166
amazonpremiumheadphones/b00hx0srxw,0848719039504,848719039504,amazon/ka416y,amazon/55000239z                                                                                                                                                                                                                     133
firehd6tablet/b00lwhu9d8                                            

In [73]:
df1['reviews_rating'].unique()

array([ 5.,  4., nan,  3.,  1.,  2.])

In [71]:
# Define category keywords
category_mapping = {
    'Electronics': ['Electronics', 'Tech', 'TVs', 'MP3', 'Portable Audio'],
    'Home & Living': ['Home', 'Garage', 'Office', 'Home Improvement'],
    'Audio': ['Audio', 'Speakers', 'Headphones', 'Bluetooth Speakers'],
    'Smart Home': ['Smart Home', 'Home Automation', 'Amazon Echo', 'Voice Assistants'],
    'Accessories': ['Accessories', 'Controllers', 'Cables', 'Adapters']
}

# Function to assign a category
def assign_category(category_text):
    for category, keywords in category_mapping.items():
        if any(keyword in category_text for keyword in keywords):
            return category
    return 'Other'  # Temporarily assign 'Other'

# Apply function to categorize products
df1['general_category'] = df1['categories'].apply(assign_category)

# Replace 'Other' with the most frequent category (mode)
most_frequent_category = df1['general_category'].mode()[0]
df1['general_category'] = df1['general_category'].replace('Other', most_frequent_category)

# Display value counts for new categories
print(df1['general_category'].value_counts())


general_category
Electronics    1498
Accessories      92
Smart Home        7
Name: count, dtype: int64


In [72]:
df1.head()

Unnamed: 0,id,asins,brand,categories,colors,date_Added,date_Updated,dimension,ean,keys,...,reviews_source_U_R_Ls,reviews_text,reviews_title,reviews_user_City,reviews_user_Province,reviews_username,sizes,upc,weight,general_category
0,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,https://www.amazon.com/Kindle-Paperwhite-High-...,I initially had trouble deciding between the p...,"Paperwhite voyage, no regrets!",,,Cristina M,,,205 grams,Electronics
1,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,https://www.amazon.com/Kindle-Paperwhite-High-...,Allow me to preface this with a little history...,One Simply Could Not Ask For More,,,Ricky,,,205 grams,Electronics
2,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,https://www.amazon.com/Kindle-Paperwhite-High-...,I am enjoying it so far. Great for reading. Ha...,Great for those that just want an e-reader,,,Tedd Gardiner,,,205 grams,Electronics
3,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,https://www.amazon.com/Kindle-Paperwhite-High-...,I bought one of the first Paperwhites and have...,Love / Hate relationship,,,Dougal,,,205 grams,Electronics
4,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,https://www.amazon.com/Kindle-Paperwhite-High-...,I have to say upfront - I don't like coroporat...,I LOVE IT,,,Miljan David Tanic,,,205 grams,Electronics
