# Amazon Sales Dataset

## Descrição do Problema

Product Analysis: Explore popular products, their ratings, and price ranges.
Customer Sentiment: Analyze review titles and content to understand customer satisfaction.
Pricing Strategies: Investigate discounts and their impact on sales and ratings.
Category Insights: Compare different product categories and their performance.
User Behavior: Examine patterns in user reviews and ratings across products.


- Price-Quality Relationship: Investigate if there's a correlation between product prices and their ratings.
- Discount Impact: Analyze how discounts affect product popularity and ratings.
- Category Analysis: Identify which product categories tend to have higher ratings or more reviews.
- Brand Performance: Compare different brands within the same category to see which ones consistently receive high ratings.
- Review Content Analysis: Perform text analysis on review content to identify common themes in positive and negative reviews.
- Seasonal Trends: If we have timestamp data, we could look at how product ratings and popularity change over time or during specific seasons.
- Price Point Analysis: Determine the most common price points for highly-rated products across different categories.

Fonte dos dados: https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset

A Amazon é responsável por uma grande quantidade de vendas online. A empresa disponibiliza um dataset com informações sobre avaliações de produtos em diferentes categorias. O dataset contém informações sobre avaliações de produtos, como preço, categorias e avaliação dos clientes.

Durante o processamento dos dados, a coluna Categoria se tornou 3, sendo elas
- Categoria
- Subcategoria
- Categoria Específica

O objetivo da análise inicial é identificar produtos e categorias mais avaliados

As perguntas chave que devem ser respondidas são:

---
- Quais são as categorias mais avaliadas?
- Quais as categorias mais bem avaliadas (média e mediana)?
- Qual é o desconto médio e mediano por categoria?
- Qual é o preço médio por categoria?
---
- Quais são as subcategorias mais avaliadas?
- Quais as subcategorias mais bem avaliadas (média e mediana)?
- Qual é o desconto médio e mediano por subcategoria?
- Qual é o preço médio por subcategoria?
---
- Quais são as categorias específicas mais avaliadas?
- Quais as categorias específicas mais bem avaliadas (média e mediana)?
- Qual é o desconto médio e mediano por categorias específicas?
- Qual é o preço médio por categorias específicas?
---
- Os produtos mais caros são bem avaliados?
- Há uma relação entre o preço e avaliação dos produtos?
---

Essas perguntas ajudarão a entender melhor a performance de vendas relacionadas
aos descontos. É importante entender se maiores descontos estão relacionados a
maiores vendas, ou se produtos mais caros são mais bem avaliados.

Com essa informação extraída dos dados, a Amazon poderá tomar decisões mais
assertivas sobre a estratégia de vendas, como por exemplo, aumentar o desconto
de produtos com baixa avaliação, ou aumentar o preço de produtos bem avaliados.

## Descrição dos dados

### Colunas

- product_id - ID do Produto
- product_name - Nome do Produto
- category - Categoria do Produto
- discounted_price - Preço com Desconto do Produto
- actual_price - Preço Real do Produto
- discount_percentage - Porcentagem de Desconto do Produto
- rating - Avaliação do Produto
- rating_count - Quantidade de Avaliações do Produto
- about_product - Descrição do Produto
- user_id - ID do Usuário que escreveu a avaliação
- user_name - Nome do Usuário que escreveu a avaliação
- review_id - ID da Avaliação
- review_title - Título da Avaliação
- review_content - Conteúdo da Avaliação
- img_link - Link da Imagem do Produto
- product_link - Link do Produto

## Visão geral dos dados

In [621]:
import pandas as pd
import plotly.graph_objects as go
import numpy as np

In [622]:
df = pd.read_csv("amazon.csv")

In [623]:
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


In [624]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1465 non-null   object
 1   product_name         1465 non-null   object
 2   category             1465 non-null   object
 3   discounted_price     1465 non-null   object
 4   actual_price         1465 non-null   object
 5   discount_percentage  1465 non-null   object
 6   rating               1465 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1465 non-null   object
 9   user_id              1465 non-null   object
 10  user_name            1465 non-null   object
 11  review_id            1465 non-null   object
 12  review_title         1465 non-null   object
 13  review_content       1465 non-null   object
 14  img_link             1465 non-null   object
 15  product_link         1465 non-null   object
dtypes: obj

## Processamento dos dados

### Dados duplicados

In [625]:
df["product_id"].duplicated().sum()

114

Tem 110 produtos duplicados no dataset ao se observar a coluna product_id, que
era esperado ser único. Por isso, esses dados duplicados serão removidos e a
primeira ocorrência será mantida.

In [626]:
df = df.drop_duplicates("product_id")

### Dados faltantes

Dois produtos não têm contagem de avaliação, então esses produtos serão removidos da análise. Dessa forma ainda se mantém 1463 de 1465 produtos.

In [627]:
df.isna().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

In [628]:
df = df.dropna(subset=["rating_count"])

### Moeda

Os valores estão em Rúpias Indianas (₹), então será feita a conversão para dólares americanos (USD) para facilitar a compreensão dos valores.
No dia 01/10/2024, a rúpia indiana estava cotada a 0.012 USD, então será utilizada essa cotação para a conversão.

In [629]:
df["discounted_price"] = df["discounted_price"].str.replace("₹", "").str.replace(",", "").astype(float)
df["actual_price"] = df["actual_price"].str.replace("₹", "").str.replace(",", "").astype(float)

In [630]:
df["discounted_price"] = df["discounted_price"]*0.012
df["actual_price"] = df["actual_price"]*0.012

### Porcentagem de desconto

Os valores de desconto estão em porcentagem em forma de texto, então será feita a conversão para valores numéricos.

In [631]:
df["discount_percentage"] = df["discount_percentage"].str.replace("%", "").astype(float)

In [632]:
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,4.788,13.188,64.0,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,2.388,4.188,43.0,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,2.388,22.788,90.0,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,3.948,8.388,53.0,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,1.848,4.788,61.0,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


In [633]:
df.dtypes

product_id              object
product_name            object
category                object
discounted_price       float64
actual_price           float64
discount_percentage    float64
rating                  object
rating_count            object
about_product           object
user_id                 object
user_name               object
review_id               object
review_title            object
review_content          object
img_link                object
product_link            object
dtype: object

As colunas rating, rating_count ainda estão como texto, então serão convertidas para valores numéricos.

**rating**

In [634]:
df["rating"] = pd.to_numeric(df["rating"], errors="coerce")

In [635]:
df["rating"].isna().sum()

1

Após conversão, um valor NA foi gerado e essa linha será removida.

In [636]:
df = df.dropna(subset=["rating"])

**rating_count**

In [637]:
df["rating_count"].value_counts()

rating_count
9,378     9
18,998    8
24,269    6
32,840    5
19,252    5
         ..
5,176     1
8,614     1
60,026    1
3,066     1
6,987     1
Name: count, Length: 1116, dtype: int64

In [638]:
df["rating_count"]

0       24,269
1       43,994
2        7,928
3       94,363
4       16,905
         ...  
1460     1,090
1461     4,118
1462       468
1463     8,031
1464     6,987
Name: rating_count, Length: 1348, dtype: object

In [639]:
df["rating_count"] = df["rating_count"].str.replace(",", ".")
df["rating_count"]

0       24.269
1       43.994
2        7.928
3       94.363
4       16.905
         ...  
1460     1.090
1461     4.118
1462       468
1463     8.031
1464     6.987
Name: rating_count, Length: 1348, dtype: object

In [640]:
df["rating_count"] = pd.to_numeric(df["rating_count"], errors="coerce")

In [641]:
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,4.788,13.188,64.0,4.2,24.269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,2.388,4.188,43.0,4.0,43.994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,2.388,22.788,90.0,3.9,7.928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,3.948,8.388,53.0,4.2,94.363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,1.848,4.788,61.0,4.2,16.905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


In [642]:
df["rating_count"].isna().sum()

43

47 valores NA foram gerados e essas linhas serão removidas.

In [643]:
df = df.dropna(subset=["rating_count"])

### Verificando os tipos dos dados

In [644]:
df.dtypes

product_id              object
product_name            object
category                object
discounted_price       float64
actual_price           float64
discount_percentage    float64
rating                 float64
rating_count           float64
about_product           object
user_id                 object
user_name               object
review_id               object
review_title            object
review_content          object
img_link                object
product_link            object
dtype: object

Verificando os tipos de dados, agora todas as colunas estão com os tipos corretos.

discounted_price, actual_price, discount_percentage, rating e rating_count são do tipo numérico, enquanto
que as demais colunas são do tipo texto.

In [645]:
df.isna().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           0
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

E finalizando a verificação dos dados, nenhuma coluna tem NA.

In [646]:
df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,4.788,13.188,64.0,4.2,24.269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,2.388,4.188,43.0,4.0,43.994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,2.388,22.788,90.0,3.9,7.928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,3.948,8.388,53.0,4.2,94.363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,1.848,4.788,61.0,4.2,16.905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


### Categorias e Subcategorias

In [647]:
df["category"].value_counts()

category
Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables                                          157
Electronics|Mobiles&Accessories|Smartphones&BasicMobiles|Smartphones                                                        64
Electronics|WearableTechnology|SmartWatches                                                                                 62
Electronics|HomeTheater,TV&Video|Televisions|SmartTelevisions                                                               60
Electronics|HomeTheater,TV&Video|Accessories|RemoteControls                                                                 49
                                                                                                                          ... 
Computers&Accessories|NetworkingDevices|DataCards&Dongles                                                                    1
Electronics|HomeAudio|Speakers|MultimediaSpeakerSystems                                               

A coluna `category` contém categorias e subcategorias, então será feita a separação desses valores em duas colunas. Os valores estão separados por "|" e serão separados em duas colunas. Como pode se ter mais de uma sub-catgoria, pois pode ter mais de um "|", será selecionado o primeiro valor como categoria, o segundo valor como subcategoria e o último como categoria específica. Pode ter casos em que a subcategoria será igual à categoria específica.

In [648]:
df["main_category"] = df["category"].str.split("|").str[0]
df["sub_category"] = df["category"].str.split("|").str[1]
df["specific_category"] = df["category"].str.split("|").str[-1]

In [649]:
df[['category', 'main_category', 'sub_category', "specific_category"]].sample(5)

Unnamed: 0,category,main_category,sub_category,specific_category
708,"Electronics|Headphones,Earbuds&Accessories|Hea...",Electronics,"Headphones,Earbuds&Accessories",In-Ear
201,Computers&Accessories|Accessories&Peripherals|...,Computers&Accessories,Accessories&Peripherals,USBCables
1212,"Home&Kitchen|Kitchen&HomeAppliances|Vacuum,Cle...",Home&Kitchen,Kitchen&HomeAppliances,HandheldVacuums
338,Electronics|Mobiles&Accessories|Smartphones&Ba...,Electronics,Mobiles&Accessories,Smartphones
1445,"Home&Kitchen|Heating,Cooling&AirQuality|Humidi...",Home&Kitchen,"Heating,Cooling&AirQuality",Humidifiers


In [650]:
df = df.drop(columns=["category"])

Finalizando o processamento dos dados, a coluna `category` foi dividida em três colunas, `main_category`, `subcategory` e `specific_category`. Então a coluna `category` foi removida.

### Limpeza das colunas

Como não será levado em consideração o título e conteúdo da avaliação, as colunas relacionadas aos usuários, título e conteúdo da avaliação serão removidas.

In [651]:
df = df.drop(columns=["user_id", "user_name", "review_id", "review_title", "review_content"])

In [652]:
df.head()

Unnamed: 0,product_id,product_name,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,img_link,product_link,main_category,sub_category,specific_category
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,4.788,13.188,64.0,4.2,24.269,High Compatibility : Compatible With iPhone 12...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...,Computers&Accessories,Accessories&Peripherals,USBCables
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,2.388,4.188,43.0,4.0,43.994,"Compatible with all Type C enabled devices, be...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...,Computers&Accessories,Accessories&Peripherals,USBCables
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,2.388,22.788,90.0,3.9,7.928,【 Fast Charger& Data Sync】-With built-in safet...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...,Computers&Accessories,Accessories&Peripherals,USBCables
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,3.948,8.388,53.0,4.2,94.363,The boAt Deuce USB 300 2 in 1 cable is compati...,https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...,Computers&Accessories,Accessories&Peripherals,USBCables
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,1.848,4.788,61.0,4.2,16.905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...,Computers&Accessories,Accessories&Peripherals,USBCables


In [653]:
df.shape

(1305, 13)

Finalizando a limpeza dos dados tem-se 1305 produtos que foram revistos e a análise pode ser feita.

## Salvando os dados

Após o processamento dos dados, o dataset foi salvo em um arquivo CSV para facilitar a análise.

In [585]:
df.to_csv("amazon_cleaned.csv", index=False)

## Análises Iniciais

In [586]:
df

Unnamed: 0,product_id,product_name,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,img_link,product_link,main_category,sub_category,specific_category
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,4.788,13.188,64.0,4.2,24.269,High Compatibility : Compatible With iPhone 12...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...,Computers&Accessories,Accessories&Peripherals,USBCables
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,2.388,4.188,43.0,4.0,43.994,"Compatible with all Type C enabled devices, be...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...,Computers&Accessories,Accessories&Peripherals,USBCables
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,2.388,22.788,90.0,3.9,7.928,【 Fast Charger& Data Sync】-With built-in safet...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...,Computers&Accessories,Accessories&Peripherals,USBCables
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,3.948,8.388,53.0,4.2,94.363,The boAt Deuce USB 300 2 in 1 cable is compati...,https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...,Computers&Accessories,Accessories&Peripherals,USBCables
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,1.848,4.788,61.0,4.2,16.905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...,Computers&Accessories,Accessories&Peripherals,USBCables
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1460,B08L7J3T31,Noir Aqua - 5pcs PP Spun Filter + 1 Spanner | ...,4.548,11.028,59.0,4.0,1.090,SUPREME QUALITY 90 GRAM 3 LAYER THIK PP SPUN F...,https://m.media-amazon.com/images/I/41fDdRtjfx...,https://www.amazon.in/Noir-Aqua-Spanner-Purifi...,Home&Kitchen,Kitchen&HomeAppliances,WaterPurifierAccessories
1461,B01M6453MB,Prestige Delight PRWO Electric Rice Cooker (1 ...,27.360,36.540,25.0,4.1,4.118,"230 Volts, 400 watts, 1 Year",https://m.media-amazon.com/images/I/41gzDxk4+k...,https://www.amazon.in/Prestige-Delight-PRWO-1-...,Home&Kitchen,Kitchen&HomeAppliances,Rice&PastaCookers
1462,B009P2LIL4,Bajaj Majesty RX10 2000 Watts Heat Convector R...,26.628,36.960,28.0,3.6,468.000,International design and styling|Two heat sett...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Bajaj-RX-10-2000-Watt-Co...,Home&Kitchen,"Heating,Cooling&AirQuality",HeatConvectors
1463,B00J5DYCCA,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,16.788,22.680,26.0,4.0,8.031,Fan sweep area: 230 MM ; Noise level: (40 - 45...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Havells-Ventilair-230mm-...,Home&Kitchen,"Heating,Cooling&AirQuality",ExhaustFans


---

### Categorias

In [657]:
df["main_category"].value_counts()

main_category
Electronics              458
Home&Kitchen             445
Computers&Accessories    364
OfficeProducts            31
MusicalInstruments         2
HomeImprovement            2
Toys&Games                 1
Car&Motorbike              1
Health&PersonalCare        1
Name: count, dtype: int64

Quais são as categorias mais avaliadas?

In [587]:
# Plot the main_category with the highest number of ratings_count
df_main_category = df.groupby(
    "main_category")["rating_count"].sum().reset_index()
df_main_category = df_main_category.sort_values(
    "rating_count", ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_main_category["main_category"],
           y=df_main_category["rating_count"]))
fig.update_layout(title="Total de avaliações por Categoria",
                  xaxis_title="Categoria",
                  yaxis_title="Contagem de Avaliações")
fig.show()

Quais as categorias mais bem avaliadas (média e mediana)?

In [588]:
# Plot the main_category with the mean rating
df_main_category = df.groupby(
    "main_category")["rating"].mean().reset_index()
df_main_category = df_main_category.sort_values(
    "rating", ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_main_category["main_category"],
           y=df_main_category["rating"]))
fig.update_layout(title="Avaliação média por Categoria",
                  xaxis_title="Categoria",
                  yaxis_title="Avaliação Média")
# Add more ticks to the y-axis
fig.update_yaxes(tickvals=np.arange(0, 5.5, 0.5))
# Include the values in the bars
fig.update_traces(texttemplate='%{y:.2f}', textposition='inside')
fig.show()

In [589]:
# Plot the main_category with the mean rating
df_main_category = df.groupby(
    "main_category")["rating"].median().reset_index()
df_main_category = df_main_category.sort_values(
    "rating", ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_main_category["main_category"],
           y=df_main_category["rating"]))
fig.update_layout(title="Avaliação mediana por Categoria",
                  xaxis_title="Categoria",
                  yaxis_title="Avaliação Mediana")
# Add more ticks to the y-axis
fig.update_yaxes(tickvals=np.arange(0, 5.5, 0.5))
# Include the values in the bars
fig.update_traces(texttemplate='%{y:.2f}', textposition='inside')
fig.show()

Qual é o desconto médio e mediano por categoria?

In [590]:
# Plot the main_category with the mean discount_percentage
df_main_category = df.groupby(
    "main_category")["discount_percentage"].mean().reset_index()
df_main_category = df_main_category.sort_values(
    "discount_percentage", ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_main_category["main_category"],
           y=df_main_category["discount_percentage"]))
fig.update_layout(title="Desconto médio por Categoria",
                  xaxis_title="Categoria",
                  yaxis_title="Desconto Médio")
# Add more ticks to the y-axis
fig.update_yaxes(tickvals=np.arange(0, 101, 5))
# Include the values in the bars
fig.update_traces(texttemplate='%{y:.2f}', textposition='inside')
fig.show()


In [591]:
# Plot the main_category with the mean discount_percentage
df_main_category = df.groupby(
    "main_category")["discount_percentage"].median().reset_index()
df_main_category = df_main_category.sort_values(
    "discount_percentage", ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_main_category["main_category"],
           y=df_main_category["discount_percentage"]))
fig.update_layout(title="Desconto mediano por Categoria",
                  xaxis_title="Categoria",
                  yaxis_title="Desconto Mediano")
# Add more ticks to the y-axis
fig.update_yaxes(tickvals=np.arange(0, 101, 5))
# Include the values in the bars
fig.update_traces(texttemplate='%{y:.2f}', textposition='inside')
fig.show()

---

### Subcategorias

In [658]:
df["sub_category"].value_counts()

sub_category
Kitchen&HomeAppliances                     306
Accessories&Peripherals                    303
HomeTheater,TV&Video                       153
Mobiles&Accessories                        140
Heating,Cooling&AirQuality                 116
WearableTechnology                          62
Headphones,Earbuds&Accessories              48
OfficePaperProducts                         27
NetworkingDevices                           25
Cameras&Photography                         16
HomeStorage&Organization                    16
ExternalDevices&DataStorage                 16
HomeAudio                                   16
GeneralPurposeBatteries&BatteryChargers     14
Printers,Inks&Accessories                   11
Accessories                                  8
CraftMaterials                               7
Components                                   5
OfficeElectronics                            4
Electrical                                   2
Monitors                                     2


---

### Categoria Específica

In [659]:
df["specific_category"].value_counts()

specific_category
USBCables                   157
Smartphones                  64
SmartWatches                 62
SmartTelevisions             60
RemoteControls               49
                           ... 
WoodenPencils                 1
BatteryChargers               1
DataCards&Dongles             1
MultimediaSpeakerSystems      1
HandheldBags                  1
Name: count, Length: 206, dtype: int64

----

### Correlação entre preço e avaliação

In [595]:
# Scatter plot of the rating and discount_percentage
fig = go.Figure()
fig.add_trace(
    go.Scatter(x=df["rating"], y=df["discount_percentage"], mode="markers"))
fig.update_layout(title="Avaliação x Porcentagem de Desconto",
                  xaxis_title="Avaliação",
                  yaxis_title="Porcentagem de Desconto")
fig.show()

In [598]:
# Scatter plot of the rating and actual_price
fig = go.Figure()
fig.add_trace(
    go.Scatter(x=df["rating"], y=df["actual_price"], mode="markers"))
fig.update_layout(title="Avaliação x Preço com Desconto",
                  xaxis_title="Avaliação",
                  yaxis_title="Preço com Desconto")
fig.show()

### Correlação entre preço e desconto

In [600]:
# Scatter plot of the actual_price and discount_percentage
fig = go.Figure()
fig.add_trace(
    go.Scatter(x=df["actual_price"], y=df["discount_percentage"],
               mode="markers"))
fig.update_layout(title="Preço sem Desconto x Porcentagem de Desconto",
                  xaxis_title="Preço sem Desconto",
                  yaxis_title="Porcentagem de Desconto")
fig.show()

---

### Produtos

Quais são os produtos mais avaliados?

In [664]:
# Plot the product with the highest rating_count
df_product = df.sort_values("rating_count", ascending=False).head(10)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_product["product_id"],
           y=df_product["rating_count"]))
fig.update_layout(title="Top 10 Produtos com mais Avaliações",
                  xaxis_title="Produto",
                  yaxis_title="Contagem de Avaliações",
                  xaxis_tickangle=-45)
# Include the values in the bars
fig.update_traces(texttemplate='%{y}', textposition='inside')
fig.show()

for i in range(0, 10):
    print(df_product["product_id"].values[i], df_product["product_name"].values[i])

B00ZRBWPA0 Eveready Red 1012 AAA Batteries - Pack of 10
B09Y5FZK9N Pigeon 1.5 litre Hot Kettle and Stainless Steel Water Bottle Combo used for boiling Water, Making Tea and Coffee, Instant Noodles, Soup, 1500 Watt with Auto Shut- off Feature - (Silver)
B08QSC1XY8 Zoul USB C 60W Fast Charging 3A 6ft/2M Long Type C Nylon Braided Data Cable Quick Charger Cable QC 3.0 for Samsung Galaxy M31S M30 S10 S9 S20 Plus, Note 10 9 8, A20e A40 A50 A70 (2M, Grey)
B08QSDKFGQ Zoul USB Type C Fast Charging 3A Nylon Braided Data Cable Quick Charger Cable QC 3.0 for Samsung Galaxy M31s M30 S10 S9 S20 Plus, Note 10 9 8, A20e A40 A50 A70 (1M, Grey)
B09N3BFP4M Bajaj New Shakti Neo Plus 15 Litre 4 Star Rated Storage Water Heater (Geyser) with Multiple Safety System, White
B0BCKJJN8R Hindware Atlantic Xceed 5L 3kW Instant Water Heater with Copper Heating Element and High Grade Stainless Steel Tank
B0841KQR1Z Crypo™ Universal Remote Compatible with Tata Sky Universal HD & SD Set top Box (Also Works with All TV)

Quais são os produtos mais caros?

In [666]:
# Plot the product with the highest actual_price
df_product = df.sort_values("actual_price", ascending=False).head(10)

fig = go.Figure()
fig.add_trace(
    go.Bar(x=df_product["product_id"],
           y=df_product["actual_price"]))
fig.update_layout(title="Top 10 Produtos com maior Preço",
                  xaxis_title="Produto",
                  yaxis_title="Preço",
                  xaxis_tickangle=-45)
# Include the values in the bars
fig.update_traces(texttemplate='$%{y:.2f}', textposition='inside')
fig.show()

for i in range(0, 10):
    print(df_product["product_id"].values[i], df_product["product_name"].values[i])

B09WN3SRC7 Sony Bravia 164 cm (65 inches) 4K Ultra HD Smart LED Google TV KD-65X74K (Black)
B0BC8BQ432 VU 164 cm (65 inches) The GloLED Series 4K Smart LED Google TV 65GloLED (Grey)
B0B3XXSB1K LG 139 cm (55 inches) 4K Ultra HD Smart LED TV 55UQ7500PSF (Ceramic Black)
B09NS5TKPN LG 1.5 Ton 5 Star AI DUAL Inverter Split AC (Copper, Super Convertible 6-in-1 Cooling, HD Filter with Anti-Virus Protection, 2022 Model, PS-Q19YNZE, White)
B08VB57558 Samsung Galaxy S20 FE 5G (Cloud Navy, 8GB RAM, 128GB Storage) with No Cost EMI & Additional Exchange Offers
B0B15GSPQW Samsung 138 cm (55 inches) Crystal 4K Neo Series Ultra HD Smart LED TV UA55AUE65AKXXL (Black)
B09RWQ7YR6 MI 138.8 cm (55 inches) 5X Series 4K Ultra HD LED Smart Android TV L55M6-ES (Grey)
B095JPKPH3 OnePlus 163.8 cm (65 inches) U Series 4K LED Smart Android TV 65U1S (Black)
B092BL5DCX Samsung 138 cm (55 inches) Crystal 4K Series Ultra HD Smart LED TV UA55AUE60AKLXL (Black)
B0BB3CBFBM VU 138 cm (55 inches) Premium Series 4K Ultra HD

Quais são os produtos mais bem avaliados (média e mediana)?