# AMAZON CUSTOMER REVIEW

## 1.0 BUSINESS UNDERSTANDING

In the online market of today, customer reviews are an essential part of purchasing decisions. Amazon, being a giant online store, collects millions of product reviews that indicate customer satisfaction, product quality, and overall user experience. It is not efficient, however, to process such vast data manually as it is time-consuming.

Sentiment analysis enables companies to analyze customers' feedback automatically, extract meaningful information, and make knowledgeable decisions to improve products, enhance customer experience, and refine marketing strategies.









## 1.1 PROBLEM STATEMENT

Amazon gets millions of reviews, and it's not possible to read and analyze them manually. We need an automated system for sentiment analysis to categorize the reviews as positive, negative, or neutral and also to gain insightful information too.

## 1.2 OBJECTIVES

## 1.2.1 Main Objectives

To accurately determine the overall emotional tone (positive, negative, or neutral) of customer reviews by leveraging Natural Language Processing (NLP) and Machine Learning techniques.

## 1.2.2 Specific Objectives

* Identify trends in customer satisfaction.

* Improve customer experience by addressing negative feedback.

* Help businesses optimize their product offerings based on user sentiment.


## 1.3 Business Questions

* What percentage of customer reviews are positive, negative, or neutral?
* Are there specific features or keywords associated with  reviews?
* Can sentiment analysis help predict potential or customer dissatisfaction?
* Can the  business use sentiment insights to improve product quality and customer support?


## 1.4 Metric of Success

# 2.0 DATA UNDERSTANDING

The dataset used for this sentiment analysis project consists of Amazon product reviews, which provide insights into customer opinions about various products. It contains 1,597 records with 27 columns, capturing details about the product, review content and user feedback.


The dataset comprises of the following columns:

id → Unique identifier for each review.

asins → Amazon Standard Identification Number (ASIN) of the product.

brand → Brand of the product.

categories → Product categories (e.g., "Amazon Devices").

colors → Available colors of the product (often missing).

dateAdded → Date the review was added to the dataset.

dateUpdated → Date the review was last updated.

dimension → Physical dimensions of the product.

manufacturer → Manufacturer of the product.

manufacturerNumber → Manufacturer’s product number.

name → Product name.

prices → Pricing details of the product.

reviews.date → Date when the review was posted.

reviews.doRecommend → Whether the reviewer recommends the product (Yes/No).

eviews.numHelpful → Number of users who found the review helpful.

reviews.rating → Star rating given by the reviewer (1 to 5).

reviews.sourceURLs → URL of the original review page.

reviews.text → Full text of the review (Main feature for sentiment analysis).

reviews.title → Title of the review (Summary of the review).

reviews.username → Username of the reviewer.

reviews.userCity → City of the reviewer (Mostly missing).

reviews.userProvince → Province of the reviewer (Mostly missing).

sizes → Available sizes of the product (Mostly empty).

upc → Universal Product Code (UPC).
                                        
weight → Weight of the product.

### 2.1 Exploring The Dataset
Here we will explore the dataset by:

- Asserting the shape of the dataset
- Checking the statistical distribution for numeric columns
- Exploring the data type for each column


In [68]:
##import the relevant libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import re

from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve, auc
pd.set_option('display.max_colwidth', None)

import warnings
warnings.filterwarnings("ignore")


In [69]:
#Loading the dataset
df = pd.read_csv('Amazon Reviews.csv')

In [70]:
df.head()

Unnamed: 0,id,asins,brand,categories,colors,dateAdded,dateUpdated,dimension,ean,keys,...,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.userCity,reviews.userProvince,reviews.username,sizes,upc,weight
0,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-Resolution-Display-Built/dp/B00QJDU3KY/ref=lp_6669702011_1_7/132-1677641-8459202?s=amazon-devices&ie=UTF8&qid=1498832761&sr=1-7,"I initially had trouble deciding between the paperwhite and the voyage because reviews more or less said the same thing: the paperwhite is great, but if you have spending money, go for the voyage.Fortunately, I had friends who owned each, so I ended up buying the paperwhite on this basis: both models now have 300 ppi, so the 80 dollar jump turns out pricey the voyage's page press isn't always sensitive, and if you are fine with a specific setting, you don't need auto light adjustment).It's been a week and I am loving my paperwhite, no regrets! The touch screen is receptive and easy to use, and I keep the light at a specific setting regardless of the time of day. (In any case, it's not hard to change the setting either, as you'll only be changing the light level at a certain time of day, not every now and then while reading).Also glad that I went for the international shipping option with Amazon. Extra expense, but delivery was on time, with tracking, and I didnt need to worry about customs, which I may have if I used a third party shipping service.","Paperwhite voyage, no regrets!",,,Cristina M,,,205 grams
1,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-Resolution-Display-Built/dp/B00QJDU3KY/ref=lp_6669702011_1_7/132-1677641-8459202?s=amazon-devices&ie=UTF8&qid=1498832761&sr=1-7,"Allow me to preface this with a little history. I am (was) a casual reader who owned a Nook Simple Touch from 2011. I've read the Harry Potter series, Girl with the Dragon Tattoo series, 1984, Brave New World, and a few other key titles. Fair to say my Nook did not get as much use as many others may have gotten from theirs.Fast forward to today. I have had a full week with my new Kindle Paperwhite and I have to admit, I'm in love. Not just with the Kindle, but with reading all over again! Now let me relate this review, love, and reading all back to the Kindle. The investment of 139.00 is in the experience you will receive when you buy a Kindle. You are not simply paying for a screen there is an entire experience included in buying from Amazon.I have been reading The Hunger Games trilogy and shall be moving onto the Divergent series soon after. Here is the thing with the Nook that hindered me for the past 4 years: I was never inspired to pick it up, get it into my hands, and just dive in. There was never that feeling of oh man, reading on this thing is so awesome. However, with my Paperwhite, I now have that feeling! That desire is back and I simply adore my Kindle. If you are considering purchasing one, stop thinking about it simply go for it. After a full week, 3 downloaded books, and a ton of reading, I still have half of my battery left as well.Make yourself happy. Inspire the reader inside of you.",One Simply Could Not Ask For More,,,Ricky,,,205 grams
2,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,4.0,https://www.amazon.com/Kindle-Paperwhite-High-Resolution-Display-Built/dp/B00QJDU3KY/ref=lp_6669702011_1_7/132-1677641-8459202?s=amazon-devices&ie=UTF8&qid=1498832761&sr=1-7,I am enjoying it so far. Great for reading. Had the original Fire since 2012. The Fire used to make my eyes hurt if I read too long. Haven't experienced that with the Paperwhite yet.,Great for those that just want an e-reader,,,Tedd Gardiner,,,205 grams
3,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-Resolution-Display-Built/dp/B00QJDU3KY/ref=lp_6669702011_1_7/132-1677641-8459202?s=amazon-devices&ie=UTF8&qid=1498832761&sr=1-7,"I bought one of the first Paperwhites and have been very pleased with it its been a constant companion and I suppose Ive read, on average, a book every three days for the past however many years on it. I wouldnt give it up youd have to pry it from my cold dead fingers.For sundry logistical reasons, Ive also made good use of Amazons Kindle app on my iPhone. No Paperwhite screen, naturally, and all the cool usability that delivers, but it works well and has its own attractions as a companion to the Kindle.Of course, there are aspects of the Paperwhite which I would like to critique. Ah you knew that was coming somewhere, didnt you.As a member of BookBub, I get a daily list of alerts and book deals in my chosen genres. I take on many of them, however, Ive found that, even with the best will in the world, I cant keep up. Some days it seems that for every book I read, Ive bought two. Theres just so much good stuff out there! The accumulative effect of this is that the number of books actually on my Paperwhite has been creeping ever upward for some time. Its now at about 400.With this in mind, Ive noticed that while page-turning has remained exactly the same, just about every other action on the Kindle has become positively glacial. Not just very slow, but so slow you think its malfunctioning. The general consensus appears to be that its to be expected once one has that many books downloaded onto a Kindle, it will begin to behave in a flakey manner. This drives me mad. Amazon states it can hold thousands of books. I believe them. But I figure I would need a second Paperwhite to read while Im waiting for actions to complete on the first one.Read more",Love / Hate relationship,,,Dougal,,,205 grams
4,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-Resolution-Display-Built/dp/B00QJDU3KY/ref=lp_6669702011_1_7/132-1677641-8459202?s=amazon-devices&ie=UTF8&qid=1498832761&sr=1-7,"I have to say upfront - I don't like coroporate, hermetically closed stuff like anything by Apple or in this case, Amazon. I like having devices on which I can put anything I want and use it. But...I was a fairly happy user of a Nook Touch for several years, but couldn't use all its functionalities since I live in Serbia. Then I lost the Nook and since no other devices can actually be fully used in Serbia (buying books with them, using their online capabilities) except the Kindle, and since no one except Amazon ships to Serbia, and since I've actually been a happy Amazon customer since 2005 over friends' accounts and since 2007 through my own, and since the Kindle definitely has the best technology - why not buy itSo I did. What I read in many reviews about the screen/light of the Paperwhite and similar devices was no problem with mine. The light disperses just fine, except a few black blotches (maybe you can see it in the picture) at the bottom of the screen, which are actually shadows of the black plastic casing and thus can't really be avoided. As you can see in the picture without the light - there are no blotches with light out.The Paperwhite's screen is just marvelous at 300 ppi, the touchscreen works just fine, the store works here in Serbia, and in these two days I've been using it, I'm a happy guy.I had to get the hang on how to make sideloaded books behave at least almost like Amazon books, but that's fine. That's the one thing I'd like to see Amazon do in some future upgrades: make the Kindle treat sideloaded books just like the ones bought from them directly, with sharing funcion (quotes and Goodreads) enabled and so on.The size is perfect, it sits very well in the hand, the light doesn't hurt the eyes in the dark (like the light on a tab does)... the packaging was fine, no problems there and what remains to be seen now is the battery life.So far, I can only recommend it.",I LOVE IT,,,Miljan David Tanic,,,205 grams


In [71]:
df.tail()

Unnamed: 0,id,asins,brand,categories,colors,dateAdded,dateUpdated,dimension,ean,keys,...,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.userCity,reviews.userProvince,reviews.username,sizes,upc,weight
1592,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Accessories,Controllers & Remote Controls,Kindle Store,Fire TV Accessories,Controllers & Remotes,Controllers",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,"alexavoiceremoteforamazonfiretvfiretvstick/b00no8jjzw,amazon/dr49wk,voiceremoteforamazonfiretvfiretvstick/b00no8jjzw",...,3.0,"https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_27/135-1292226-5659745?s=fiona-hardware&ie=UTF8&qid=1500945241&sr=1-27,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_27/132-5615145-9127409?s=fiona-hardware&ie=UTF8&qid=1500945236&sr=1-27,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_27/145-4639212-6106769?s=fiona-hardware&ie=UTF8&qid=1500944910&sr=1-27,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_27/141-0458241-2772141?s=fiona-hardware&ie=UTF8&qid=1500944904&sr=1-27",This is not the same remote that I got for my Alexa-Echo it doesn't control volume.... I think remotes most used feature is volume and pause. I would be disappointed with myself if i produced a remote that couldn't control volume. That would make me an incompetent engineer or lazy CEO!!! Amazon I expect better from you!!!,I would be disappointed with myself if i produced a remote that couldn't ...,,,GregAmandawith4,,,4 ounces
1593,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Accessories,Controllers & Remote Controls,Kindle Store,Fire TV Accessories,Controllers & Remotes,Controllers",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,"alexavoiceremoteforamazonfiretvfiretvstick/b00no8jjzw,amazon/dr49wk,voiceremoteforamazonfiretvfiretvstick/b00no8jjzw",...,1.0,"https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_20/130-3302769-4285400?s=fiona-hardware&ie=UTF8&qid=1500428950&sr=1-20,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_6/146-6711310-1289856?s=fiona-hardware&ie=UTF8&qid=1498832641&sr=1-6,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/132-1677641-8459202?s=fiona-hardware&ie=UTF8&qid=1498832572&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/144-2862518-0215137?s=fiona-hardware&ie=UTF8&qid=1498832572&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_4/132-1677641-8459202?s=fiona-hardware&ie=UTF8&qid=1498827093&sr=1-4,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/147-4487395-0624402?s=fiona-hardware&ie=UTF8&qid=1498827226&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_4/132-6827770-3059607?s=fiona-hardware&ie=UTF8&qid=1498827212&sr=1-4,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/147-6658258-8116630?s=fiona-hardware&ie=UTF8&qid=1498827225&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/141-8903483-8622516?s=fiona-hardware&ie=UTF8&qid=1498827213&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_4/132-6827770-3059607?s=fiona-hardware&ie=UTF8&qid=1498826883&sr=1-4,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/147-6658258-8116630?s=fiona-hardware&ie=UTF8&qid=1498826883&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_13/134-4316275-0880135?s=fiona-hardware&ie=UTF8&qid=1497291931&sr=1-13,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_18/136-1579861-6822330?s=fiona-hardware&ie=UTF8&qid=1497291911&sr=1-18","I have had to change the batteries in this remote once or twice per month since I purchased it in March. After the battery's short life span after the first battery replacement, we stopped doing any gaming with it and just used the gaming controller. It did not make a difference. It still drained the batteries just as quickly. I do have the option of using the phone app, and I do, but what if I'm not home and someone else wants to use the fire tv This is very, very poor quality product. Of course. It went out of warranty 12 days ago. I guess I should have been more mindful of that date. I do not recommend this product and I think Amazon should extend the warranty coverage. It is ridiculous that this remote basically doesn't work.",Battery draining remote!!!!,,,Amazon Customer,,,4 ounces
1594,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Accessories,Controllers & Remote Controls,Kindle Store,Fire TV Accessories,Controllers & Remotes,Controllers",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,"alexavoiceremoteforamazonfiretvfiretvstick/b00no8jjzw,amazon/dr49wk,voiceremoteforamazonfiretvfiretvstick/b00no8jjzw",...,1.0,"https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_6/146-6711310-1289856?s=fiona-hardware&ie=UTF8&qid=1498832641&sr=1-6,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/132-1677641-8459202?s=fiona-hardware&ie=UTF8&qid=1498832572&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/144-2862518-0215137?s=fiona-hardware&ie=UTF8&qid=1498832572&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_4/132-1677641-8459202?s=fiona-hardware&ie=UTF8&qid=1498827093&sr=1-4,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/147-4487395-0624402?s=fiona-hardware&ie=UTF8&qid=1498827226&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_4/132-6827770-3059607?s=fiona-hardware&ie=UTF8&qid=1498827212&sr=1-4,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/147-6658258-8116630?s=fiona-hardware&ie=UTF8&qid=1498827225&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/141-8903483-8622516?s=fiona-hardware&ie=UTF8&qid=1498827213&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_4/132-6827770-3059607?s=fiona-hardware&ie=UTF8&qid=1498826883&sr=1-4,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=lp_370783011_1_5/147-6658258-8116630?s=fiona-hardware&ie=UTF8&qid=1498826883&sr=1-5,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_13/134-4316275-0880135?s=fiona-hardware&ie=UTF8&qid=1497291931&sr=1-13,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_18/136-1579861-6822330?s=fiona-hardware&ie=UTF8&qid=1497291911&sr=1-18","Remote did not activate, nor did it connect to box.A poorly designed remote, replacing an even worse remote. Waste of time.Ordered two items, had to pay shipping on both. They were both shipped in the same padded envelope, why double shipping charges",replacing an even worse remote. Waste of time,,,Amazon Customer,,,4 ounces
1595,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Accessories,Controllers & Remote Controls,Kindle Store,Fire TV Accessories,Controllers & Remotes,Controllers",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,"alexavoiceremoteforamazonfiretvfiretvstick/b00no8jjzw,amazon/dr49wk,voiceremoteforamazonfiretvfiretvstick/b00no8jjzw",...,3.0,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_40/163-6209285-8478132?s=fiona-hardware&ie=UTF8&qid=1485308590&sr=1-40,It does the job but is super over priced. I feel like they should offer a replacement remote at a better price. I could have just spent 10 more and gotten the stick. I just think it's ridiculous to spend 32 on a remote. The product is fine. I'm just unhappy with the price.,Overpriced,,,Meg Ashley,,,4 ounces
1596,AVpfo9ukilAPnD_xfhuj,B00NO8JJZW,Amazon,"Amazon Devices & Accessories,Amazon Device Accessories,Controllers & Remote Controls,Kindle Store,Fire TV Accessories,Controllers & Remotes,Controllers",,2016-04-02T14:40:43Z,2017-08-13T08:28:46Z,,,"alexavoiceremoteforamazonfiretvfiretvstick/b00no8jjzw,amazon/dr49wk,voiceremoteforamazonfiretvfiretvstick/b00no8jjzw",...,1.0,https://www.amazon.com/Alexa-Voice-Remote-Amazon-Stick/dp/B00NO8JJZW/ref=sr_1_40/163-6209285-8478132?s=fiona-hardware&ie=UTF8&qid=1485308590&sr=1-40,I ordered this item to replace the one that no longer works. The directions for the new remove state to press the home button to go to the home screen (using the existing remote which does not work) You must use your existing remote for all the following steps. The existing remote DOES NOT WORK. This is why I bought a new one. I am sending all of this crap back to amazon and canceling this Fire subscription. This has been a problem from day one and we have only had this for a few months. Not worth the money.,I am sending all of this crap back to amazon and canceling this Fire subscription,,,DIANE K,,,4 ounces


In [72]:
df.shape

(1597, 27)

In [73]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1597 entries, 0 to 1596
Data columns (total 27 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    1597 non-null   object 
 1   asins                 1597 non-null   object 
 2   brand                 1597 non-null   object 
 3   categories            1597 non-null   object 
 4   colors                774 non-null    object 
 5   dateAdded             1597 non-null   object 
 6   dateUpdated           1597 non-null   object 
 7   dimension             565 non-null    object 
 8   ean                   898 non-null    float64
 9   keys                  1597 non-null   object 
 10  manufacturer          965 non-null    object 
 11  manufacturerNumber    902 non-null    object 
 12  name                  1597 non-null   object 
 13  prices                1597 non-null   object 
 14  reviews.date          1217 non-null   object 
 15  reviews.doRecommend  

In [74]:
df.describe()

Unnamed: 0,ean,reviews.numHelpful,reviews.rating,reviews.userCity,reviews.userProvince,sizes,upc
count,898.0,900.0,1177.0,0.0,0.0,0.0,898.0
mean,844313500000.0,83.584444,4.359388,,,,844313500000.0
std,3416444000.0,197.150238,1.021445,,,,3416444000.0
min,841667000000.0,0.0,1.0,,,,841667000000.0
25%,841667000000.0,0.0,4.0,,,,841667000000.0
50%,841667000000.0,0.0,5.0,,,,841667000000.0
75%,848719000000.0,34.0,5.0,,,,848719000000.0
max,848719000000.0,997.0,5.0,,,,848719000000.0


# Data Cleaning

In [75]:
# Saving a copy
df1 = df.copy(deep = True)


In [76]:
#Changing the column format
# Replacing the dots to underscore
df1.columns = df1.columns.str.replace('.','_')

# Function to add underscores in compound words
def add_underscores(col_name):
    return re.sub(r'(?<!^)(?=[A-Z])', '_', col_name)

# Apply the function to all column names
df1.columns = [add_underscores(col) for col in df1.columns]


#Confirm changes
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1597 entries, 0 to 1596
Data columns (total 27 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     1597 non-null   object 
 1   asins                  1597 non-null   object 
 2   brand                  1597 non-null   object 
 3   categories             1597 non-null   object 
 4   colors                 774 non-null    object 
 5   date_Added             1597 non-null   object 
 6   date_Updated           1597 non-null   object 
 7   dimension              565 non-null    object 
 8   ean                    898 non-null    float64
 9   keys                   1597 non-null   object 
 10  manufacturer           965 non-null    object 
 11  manufacturer_Number    902 non-null    object 
 12  name                   1597 non-null   object 
 13  prices                 1597 non-null   object 
 14  reviews_date           1217 non-null   object 
 15  revi

In [77]:
df1['reviews_date'].value_counts()

reviews_date
2014-07-28T00:00:00Z        42
2014-07-24T00:00:00Z        31
2014-05-02T05:00:00Z        24
2014-04-07T00:00:00Z        23
2014-04-03T05:00:00Z        22
                            ..
2017-05-26T00:00:00.000Z     1
2017-05-22T00:00:00.000Z     1
2017-05-20T00:00:00.000Z     1
2017-05-19T00:00:00.000Z     1
2016-07-31T00:00:00Z         1
Name: count, Length: 382, dtype: int64

In [78]:
df1['categories'].value_counts()

categories
Amazon Devices,Home,Smart Home & Connected Living,Smart Hubs & Wireless Routers,Smart Hubs,Home Improvement,Home Safety & Security,Alarms & Sensors,Home Security,Amazon Echo,Home, Garage & Office,Smart Home,Voice Assistants,Amazon Tap,Electronics Features,TVs & Electronics,Portable Audio & Electronics,MP3 Player Accessories,Home Theater & Audio,Speakers,Featured Brands,Electronics,Kindle Store,Frys,Electronic Components,Home Automation,Electronics, Tech Toys, Movies, Music,Audio,Bluetooth Speakers    542
Amazon Devices                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 

In [79]:
df1["manufacturer"].value_counts()

manufacturer
Amazon    832
AMDSI     133
Name: count, dtype: int64

In [80]:
df1['keys'].value_counts()

keys
amazontapalexaenabledportablebluetoothspeaker/b01bh83oom,amazonamazontapportablebluetoothwifispeakerblack/5097300,841667107929,amazonecho/52353110,0841667107929,tapalexaenabledportablebluetoothspeaker/05743627000p,amazon/53004496,amazon/b01bh83oom,amazontapportablebluetoothwifispeakerblack/1001803403    542
848719022827,0848719022827,amazonfiretv/b00cx5p8fc,amazon/848719022827,2015amazonfiretv4kultrahddigitalmediastreamersealedbox/151842588028,brandnewamazonfiretvboxsealedinretailbox2015/272013292018                                                                                                             166
amazonpremiumheadphones/b00hx0srxw,0848719039504,848719039504,amazon/ka416y,amazon/55000239z                                                                                                                                                                                                                     133
firehd6tablet/b00lwhu9d8                                            

In [83]:
# List of unnecessary columns to drop
columns_to_drop = [
    "id", "asins", "brand","ean","keys", "categories", "colors", "date_Added", "date_Updated", 
    "dimension", "manufacturer", "manufacturer_Number", "name", "prices", 
    "reviews_source_U_R_Ls", "reviews_user_City", "reviews_user_Province", 
    "sizes", "upc", "weight"
]

# Drop unnecessary columns
df1 = df1.drop(columns=columns_to_drop, axis=1)

# Display remaining columns
print(df1.columns)

Index(['reviews_date', 'reviews_do_Recommend', 'reviews_num_Helpful',
       'reviews_rating', 'reviews_text', 'reviews_title', 'reviews_username'],
      dtype='object')


In [84]:
df1['reviews_num_Helpful'].value_counts()

reviews_num_Helpful
0.0      504
2.0       23
3.0       18
1.0       16
5.0       15
        ... 
834.0      1
323.0      1
102.0      1
790.0      1
136.0      1
Name: count, Length: 182, dtype: int64

In [86]:
df1['reviews_num_Helpful'].unique()

array([139., 126.,  69.,   2.,  17.,  nan, 303., 138., 207., 245.,  43.,
        14.,   5.,  30.,  68., 123., 122.,  36., 402., 330.,  85.,  59.,
       694.,  38.,  15.,  22.,   4., 719., 641., 498., 716., 739.,  48.,
         8.,  24.,   3.,  19.,  13., 671., 936., 206., 586., 120., 160.,
        25., 769., 357., 656.,  50., 669., 194., 350., 848., 294., 729.,
        56.,   6.,  64.,  34.,  83.,  29.,  27., 143.,  16.,  18.,  31.,
        61.,  82., 581., 870., 177., 215.,  88., 599., 221., 750., 790.,
       102., 323., 834.,  74.,  11., 346.,  89., 111., 190., 172.,   7.,
         9.,  35.,  63., 449., 620., 142.,  87.,  12.,  40., 281., 216.,
       822., 185., 944., 551., 182., 685., 432., 975.,  20., 745., 236.,
       407., 287., 249., 170., 112., 201.,  37.,  41.,  21.,  66., 147.,
       125., 403., 517.,   0., 180., 228., 997., 505., 965., 443., 358.,
       510., 326., 151., 890., 754., 902., 512.,  75., 205., 456., 932.,
       771., 459.,  52., 781., 966., 539., 704.,  7

In [85]:
df1['reviews_rating'].value_counts()

reviews_rating
5.0    741
4.0    236
3.0    124
1.0     42
2.0     34
Name: count, dtype: int64