### Introduction

This project performs an analysis on data obtained from Amazon. Amazon, one of the largest e-commerce websites in the world, also runs fulfillment centres where they store and ship the inventory from. A better understanding of their customer base, such as the effects of discounts on sales or the effects of good rating scores on sales, can lead to increased efficiencies and reduced costs.

The dataset consists of reviews of products sold on Amazon. Examples of the data within the dataset are the ratings of the products, the names of the people who bought those products, and the discounts applied to the products.

The dataset was obtained from Kaggle:

https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset

In [1]:
import pandas as pd 
import numpy as np 
import seaborn as sns
import plotly.express as px
import plotly.subplots as sp
import plotly.graph_objects as go
import warnings
import pygwalker as pyg
import sqlite3
#import sqlalchemy
#from sqlalchemy import create_engine
from sqlalchemy.dialects import sqlite
from pandas.io import sql
import subprocess


warnings.filterwarnings('ignore')

The dataset is in the **CSV** (Comma Seperated Value) format, and will be imported into an **SQLite** database, where it will be preprocessed.

In [2]:
df=pd.read_csv('amazon.csv')
conn=sqlite3.connect('amazon.db')
df.to_sql(name='amazon', con=conn, if_exists='replace', index=False)
conn.commit()

In [3]:
%config SqlMagic.displaylimit = 15

In [4]:
%load_ext sql
%sql sqlite:///amazon.db

We shall check to make sure that the CSV file has been imported correctly into the SQLite database and that we have managed to successfully connecct to it.

In [5]:
%sql select * from amazon limit 1

product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
B07JW9H4J1,"Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)",Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables,₹399,"₹1,099",64%,4.2,24269,"High Compatibility : Compatible With iPhone 12, 11, X/XsMax/Xr ,iPhone 8/8 Plus,iPhone 7/7 Plus,iPhone 6s/6s Plus,iPhone 6/6 Plus,iPhone 5/5s/5c/se,iPad Pro,iPad Air 1/2,iPad mini 1/2/3,iPod nano7,iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed, Compatible with any charging adaptor, multi-port charging station or power bank.|Durability : Durable nylon braided design with premium aluminum housing and toughened nylon fiber wound tightly around the cord lending it superior durability and adding a bit to its flexibility.|High Security Level : It is designed to fully protect your device from damaging excessive current.Copper core thick+Multilayer shielding, Anti-interference, Protective circuit equipment.|WARRANTY: 12 months warranty and friendly customer services, ensures the long-time enjoyment of your purchase. If you meet any question or problem, please don't hesitate to contact us.","AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBBSNLYT3ONILA,AHCTC6ULH4XB6YHDY6PCH2R772LQ,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,AENGU523SXMOS7JPDTW52PNNVWGQ,AEQJHCVTNINBS4FKTBGQRQTGTE5Q,AFC3FFC5PKFF5PMA52S3VCHOZ5FQ","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jaspreet singh,Khaja moin,Anand,S.ARUMUGAM","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1KD19VHEDV0OR,R3C02RMYQMK6FC,R39GQRVBUZBWGY,R2K9EDOE15QIRJ,R3OI7YT648TL8I","Satisfied,Charging is really fast,Value for money,Product review,Good quality,Good product,Good Product,As of now seems good","Looks durable Charging is fine tooNo complains,Charging is really fast, good product.,Till now satisfied with the quality.,This is a good product . The charging speed is slower than the original iPhone cable,Good quality, would recommend,https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/81---F1ZgHL._SY88.jpg,Product had worked well till date and was having no issue.Cable is also sturdy enough...Have asked for replacement and company is doing the same...,Value for money",https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/51UsScvHQNL._SX300_SY300_QL70_FMwebp_.jpg,https://www.amazon.in/Wayona-Braided-WN3LG1-Syncing-Charging/dp/B07JW9H4J1/ref=sr_1_1?qid=1672909124&s=electronics&sr=1-1


Next, we shall check to see if there are any duplicate reviews. We will not be using the user_name or user_id for this as the users could have bought multiple products. Luckily, each review will have its own unique review_id.

In [6]:
%%sql
SELECT review_id, COUNT(review_id) AS count, SUM(COUNT(review_id)) OVER() AS total_count
FROM amazon GROUP BY review_id having count(review_id)>1

review_id,count,total_count
"R10365HEDURWI9,R5RP542IMC4OI,RX2HFWXTTQDTS,R2636VYPMOZV9,RW2Z2YM3K8UV5,RVNGA0FEAXYHI,R2K7MABWMAQE26,R33YS4PO3JWU23",3,415
"R10FUJSCR3VYHY,R2Y8B5LQ5HLACQ,R3BC8GS9GGMBTI,R2BO0XUUDY4ZA3,RN23FCU4EP3F3,RDGNXFM923PG4,R26PGAI8JKY8XB,R381CGOL80J2QM",2,415
"R10I6UIAQIP9TN,R2XEWWLV1LH7KX,R3J0MEY15WI71Z,R3HJ0GBBBUGEJZ,R3TGTIJ54KHOL0,R21TUQZLYNGC0M,R1JSFOA0TD4S1A,R1KOD8YMT3FJ7I",2,415
"R10KEMT1N336ZD,RL01KZO95GX4F,R1Q721FI3A7XLK,R34MTIAB8IHAI,R1LG1DNA516T7L,RFH8DR3A2O8BG,RFA922H587JFN,R10BFD806POSOX",2,415
"R11MQS7WD9C3I0,R2AKH69XQY8BY4,R8GBOLYUN5UP6,R1AYVO4R25KJTA,R1HT6XM787V7FV,R339XJL1GMKHA3,R175VFSB2A32HG,R35T9LXYBSP09G",3,415
"R128LZ0DN2NZBZ,R3LFQ7EDHZ6DKM,RUSJFUV64DPWM,RHNVN7WEES6ZV,R3LHNY1FJU5Z62,RYD25TMDIWVXF,R22G4CIX0JF8CT,R3KZ4E667WBY58",3,415
"R12D1BZF9MU8TN,R32MNCWO5LGFCG,RZU3UK8OZKD6X,R3BSTKR3JUW6GY,R1ARVYPXS4XPB7,R1V6GDYE2IBX8O,R28EG2PXZTJL90,R2SQNU7OIOOLHT",3,415
"R13UTIA6KOF6QV,R2UGDZSGFF01K7,RHHIZ45VYU5X6,R14N9HBE5EIUY0,R2WMW096T9Y0OU,R1SHIIE6M72825,R22P6BE9DBME4F,R2TEINENXTIHT2",5,415
"R14ZOPYFHOYYRQ,R1GQH74NUCJZZ7,R1BNWIYBRSI1Z6,R347KU67LE6JEH,RMGA8IGV2WQDX,R2782FIPC5T4KM,R220M468LVHIE1,RA1PNAU355MLG",2,415
"R18D9LZAYX9JSY,R2TD56H4WD69RD,R3022ERQVPT7PV,R3T0CWF358RZNJ",2,415


There are a total of 415 duplicated rows, and we shall remove the duplicated entries from the table.

In [7]:
%%sql

DELETE FROM amazon 
WHERE rowid > (
  SELECT MIN(rowid) FROM amazon a  
  WHERE a.review_id = amazon.review_id
);


All duplicated entries were removed, leaving only one of the entries behind for each duplicated entries. A total of 271 rows were removed. To confirm that the SQL query has worked as intended, we shall check for any duplicated entries remaining.

In [8]:
%%sql
select review_id, count(review_id) from amazon group by (review_id) having count(review_id)>1

review_id,count(review_id)


We will also check for null values in the dataset. 

In [9]:
%%sql
SELECT *
FROM amazon
WHERE (product_id or product_name or category or discounted_price or actual_price or discount_percentage or rating or rating_count or user_id or user_name or review_id) IS NULL

product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link


After the preprocessing is done, we shall convert the SQL table to a Pandas dataframe, and pygwalker will be used to visualise the data and perform Exploratory Data Analysis. The SQLite database connection will be closed to free up resources.

In [38]:
df = pd.read_sql_query("SELECT * FROM amazon", conn)
#conn.commit()
#conn.close()

Let's do a quick check of the dataframe.

In [30]:
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


In [12]:
walker = pyg.walk(df)

Box(children=(HTML(value='<div id="ifr-pyg-0" style="height: auto">\n    <head>\n        <meta http-equiv="Con…

It will be very helpful to further breakdown the item categories so that we can group them within broader categories.

In [39]:
df['cat']=df['category'].str.split("|", n = -1, expand = False).str[0]
df['cat1']=df['category'].str.split("|", n = -1, expand = False).str[1]
df['cat2']=df['category'].str.split("|", n = -1, expand = False).str[2]
df['cat3']=df['category'].str.split("|", n = -1, expand = False).str[3]
df['cat4']=df['category'].str.split("|", n = -1, expand = False).str[4]
df['cat5']=df['category'].str.split("|", n = -1, expand = False).str[5]
df['cat6']=df['category'].str.split("|", n = -1, expand = False).str[6]
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,...,review_content,img_link,product_link,cat,cat1,cat2,cat3,cat4,cat5,cat6
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...",...,Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...",...,I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...",...,"Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...",...,"Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...",...,"Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
