# Amazon Sales Dataset

**Acerca de los datos**

El conjunto de datos de ventas de Amazon se compone de varias columnas que proporcionan información detallada sobre los productos y las reseñas de los usuarios. Los datos incluyen valores como el identificador del producto, el nombre del producto, la categoría a la que pertenece, el precio con descuento y el precio real, el porcentaje de descuento, la calificación del producto y la cantidad de calificaciones recibidas.

Además, también se incluyen detalles sobre las reseñas de los usuarios, como el identificador del usuario que escribió la reseña, el nombre del usuario, el identificador de la reseña, el título de la reseña y el contenido de la reseña.

Este conjunto de datos fue obtenido de [Kaggle](https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset/data) contiene los datos siguientes:

- **product_id**: ID del producto.
- **product_name**: Nombre del producto.
- **category**: Categoría del producto.
- **discounted_price**: Precio con descuento del producto.
- **actual_price**: Precio real del producto.
- **discount_percentage**: Porcentaje de descuento para el producto.
- **rating**: Calificación del producto.
- **rating_count**: Número de personas que votaron por la calificación de Amazon.
- **about_product**: Descripción del producto.
- **user_id**: ID del usuario que escribió la reseña para el producto.
- **user_name**: Nombre del usuario que escribió la reseña para el producto.
- **review_id**: ID de la reseña del usuario.
- **review_title**: Reseña corta.
- **review_content**: Reseña larga.
- **img_link**: Enlace de la imagen del producto.
- **product_link**: Enlace del sitio web oficial del producto.

# Importación de las Librerias

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Carga de los datos

In [None]:
amazon_df = pd.read_csv('/content/sample_data/amazon.csv')
amazon_df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,₹399,"₹1,099",64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,₹199,₹349,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,₹199,"₹1,899",90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,₹329,₹699,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,₹154,₹399,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


# Exploración de los datos

In [None]:
amazon_df.describe()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
count,1465,1465,1465,1465,1465,1465,1465.0,1463,1465,1465,1465,1465,1465,1465,1465,1465
unique,1351,1337,211,550,449,92,28.0,1143,1293,1194,1194,1194,1194,1212,1412,1465
top,B07JW9H4J1,"Fire-Boltt Ninja Call Pro Plus 1.83"" Smart Wat...",Computers&Accessories|Accessories&Peripherals|...,₹199,₹999,50%,4.1,9378,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AHIKJUDTVJ4T6DV6IUGFYZ5LXMPA,AE55KTFVNXYFD5FPY...","$@|\|TO$|-|,Sethu madhav,Akash Thakur,Burger P...","R3F4T5TRYPTMIG,R3DQIEC603E7AY,R1O4Z15FD40PV5,R...","Worked on iPhone 7 and didn’t work on XR,Good ...","I am not big on camera usage, personally. I wa...",https://m.media-amazon.com/images/I/413sCRKobN...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
freq,3,5,233,53,120,56,244.0,9,6,10,10,10,10,8,3,1


In [None]:
amazon_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1465 non-null   object
 1   product_name         1465 non-null   object
 2   category             1465 non-null   object
 3   discounted_price     1465 non-null   object
 4   actual_price         1465 non-null   object
 5   discount_percentage  1465 non-null   object
 6   rating               1465 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1465 non-null   object
 9   user_id              1465 non-null   object
 10  user_name            1465 non-null   object
 11  review_id            1465 non-null   object
 12  review_title         1465 non-null   object
 13  review_content       1465 non-null   object
 14  img_link             1465 non-null   object
 15  product_link         1465 non-null   object
dtypes: obj

<font color="#e47911">**Observaciones**</font>📋 <br>
Toda la informacion tiene el tipo de dato 'Object'<br>
El tipo de datos objeto no es apropiado para las columnas **precio_descuento, precio_real, porcentaje_descuento, valoración y recuento_valoración.** Estas columnas deberían tener un tipo de datos float64.




In [None]:
amazon_df.columns

Index(['product_id', 'product_name', 'category', 'discounted_price',
       'actual_price', 'discount_percentage', 'rating', 'rating_count',
       'about_product', 'user_id', 'user_name', 'review_id', 'review_title',
       'review_content', 'img_link', 'product_link'],
      dtype='object')

# Limpieza de datos

## Conversión de columnas de precio_descuento y precio_actual

In [None]:
amazon_df['discounted_price'] = amazon_df['discounted_price'].str.replace("₹",'').str.replace(",",'').astype('float64')
amazon_df['actual_price'] = amazon_df['actual_price'].str.replace("₹",'').str.replace(",",'').astype('float64')

AttributeError: Can only use .str accessor with string values!

In [None]:
amazon_df.head()

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers&Accessories|Accessories&Peripherals|...,399.0,1099.0,64%,4.2,24269,High Compatibility : Compatible With iPhone 12...,"AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBB...","Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jasp...","R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1K...","Satisfied,Charging is really fast,Value for mo...",Looks durable Charging is fine tooNo complains...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Wayona-Braided-WN3LG1-Sy...
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,Computers&Accessories|Accessories&Peripherals|...,199.0,349.0,43%,4.0,43994,"Compatible with all Type C enabled devices, be...","AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBX...","ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Plac...","RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RY...","A Good Braided Cable for Your Type C Device,Go...",I ordered this cable to connect my phone to An...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Ambrane-Unbreakable-Char...
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,Computers&Accessories|Accessories&Peripherals|...,199.0,1899.0,90%,3.9,7928,【 Fast Charger& Data Sync】-With built-in safet...,"AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQ...","Kunal,Himanshu,viswanath,sai niharka,saqib mal...","R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R2...","Good speed for earlier versions,Good Product,W...","Not quite durable and sturdy,https://m.media-a...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Sounce-iPhone-Charging-C...
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,Computers&Accessories|Accessories&Peripherals|...,329.0,699.0,53%,4.2,94363,The boAt Deuce USB 300 2 in 1 cable is compati...,"AEWAZDZZJLQUYVOVGBEUKSLXHQ5A,AG5HTSFRRE6NL3M5S...","Omkar dhale,JD,HEMALATHA,Ajwadh a.,amar singh ...","R3EEUZKKK9J36I,R3HJVYCLYOY554,REDECAZ7AMPQC,R1...","Good product,Good one,Nice,Really nice product...","Good product,long wire,Charges good,Nice,I bou...",https://m.media-amazon.com/images/I/41V5FtEWPk...,https://www.amazon.in/Deuce-300-Resistant-Tang...
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,Computers&Accessories|Accessories&Peripherals|...,154.0,399.0,61%,4.2,16905,[CHARGE & SYNC FUNCTION]- This cable comes wit...,"AE3Q6KSUK5P75D5HFYHCRAOLODSA,AFUGIFH5ZAFXRDSZH...","rahuls6099,Swasat Borah,Ajay Wadke,Pranali,RVK...","R1BP4L2HH9TFUP,R16PVJEXKV6QZS,R2UPDB81N66T4P,R...","As good as original,Decent,Good one for second...","Bought this instead of original apple, does th...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Portronics-Konnect-POR-1...


<font color="#e47911">**Observaciones**</font>📋 <br>
Tanto el descuento como los precios reales están en rupias. Convierte las rupias indias a dólares estadounidenses. El tipo de cambio de rupias a dólares es de 0.932

In [None]:
exchange_rate = 0.932
amazon_df['discounted_price'] = amazon_df['discounted_price'].apply(lambda x : x * exchange_rate)
amazon_df['actual_price'] = amazon_df['actual_price'].apply(lambda x: x * exchange_rate)

amazon_df['discounted_price'] = round(amazon_df['discounted_price'], 2)
amazon_df['actual_price'] = round(amazon_df['actual_price'], 2)

## Creando una columna cantidad_descuento

In [None]:
amazon_df['discount_amount'] = amazon_df['actual_price'] - amazon_df['discounted_price']

## Convirtiendo la columna porcentaje_descuento

In [None]:
amazon_df['discount_percentage'] = amazon_df['discount_percentage'].str.replace('%','').astype('float64')
amazon_df['discount_percentage'] = amazon_df['discount_percentage'] / 100

## Convertir la columna de calificación

In [None]:
amazon_df['rating'].value_counts()

rating
4.1    244
4.3    230
4.2    228
4.0    129
3.9    123
4.4    123
3.8     86
4.5     75
4       52
3.7     42
3.6     35
3.5     26
4.6     17
3.3     16
3.4     10
4.7      6
3.1      4
5.0      3
3.0      3
4.8      3
3.2      2
2.8      2
2.3      1
|        1
2        1
3        1
2.6      1
2.9      1
Name: count, dtype: int64

<font color="#e47911">**Observaciones**</font>📋 <br>

Cerca del final de la lista, hay un | valor. El | El valor impide que los datos se conviertan al tipo de datos float64. Localice la fila con | en la columna de calificación, vaya a la página web del producto y reemplace | valor con la calificación real.

In [None]:
amazon_df.loc[amazon_df['rating'] == '|']

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link,discount_amount
1279,B08L12N5H1,Eureka Forbes car Vac 100 Watts Powerful Sucti...,"Home&Kitchen|Kitchen&HomeAppliances|Vacuum,Cle...",1956.27,2329.07,0.16,|,992,No Installation is provided for this product|1...,"AGTDSNT2FKVYEPDPXAA673AIS44A,AER2XFSWNN4LAUCJ5...","Divya,Dr Nefario,Deekshith,Preeti,Prasanth R,P...","R2KKTKM4M9RDVJ,R1O692MZOBTE79,R2WRSEWL56SOS4,R...","Decent product,doesn't pick up sand,Ok ok,Must...","Does the job well,doesn't work on sand. though...",https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Eureka-Forbes-Vacuum-Cle...,372.8


In [None]:
amazon_df['rating'] = amazon_df['rating'].str.replace('|', '3.9').astype('float64')


## Convirtiendo la columna rating_count

In [None]:
amazon_df['rating_count'] = amazon_df['rating_count'].str.replace(',', '').astype('float64')

## Eliminando los valores nulos

In [None]:
amazon_df.isna().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
discount_amount        0
dtype: int64

<font color="#e47911">**Observaciones**</font>📋 <br>

2 valores nulos se encuentran en la columna rating_count

In [None]:
amazon_df.iloc[amazon_df[(amazon_df.isna().sum(axis = 1) >= 1)].index]

Unnamed: 0,product_id,product_name,category,discounted_price,actual_price,discount_percentage,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link,discount_amount
282,B0B94JPY2N,Amazon Brand - Solimo 65W Fast Charging Braide...,Computers&Accessories|Accessories&Peripherals|...,185.47,931.07,0.8,3.0,,USB C to C Cable: This cable has type C connec...,AE7CFHY23VAJT2FI4NZKKP6GS2UQ,Pranav,RUB7U91HVZ30,The cable works but is not 65W as advertised,I have a pd supported car charger and I bought...,https://m.media-amazon.com/images/W/WEBP_40237...,https://www.amazon.in/Amazon-Brand-Charging-Su...,745.6
324,B0BQRJ3C47,"REDTECH USB-C to Lightning Cable 3.3FT, [Apple...",Computers&Accessories|Accessories&Peripherals|...,232.07,931.07,0.75,5.0,,💎[The Fastest Charge] - This iPhone USB C cabl...,AGJC5O5H5BBXWUV7WRIEIOOR3TVQ,Abdul Gafur,RQXD5SAMMPC6L,Awesome Product,Quick delivery.Awesome ProductPacking was good...,https://m.media-amazon.com/images/I/31-q0xhaTA...,https://www.amazon.in/REDTECH-Lightning-Certif...,699.0


## Reemplazando los valores nulos con el rating_count promedio

In [None]:
amazon_df['rating_count'] = amazon_df['rating_count'].fillna(round(amazon_df["rating_count"].mean()))

In [None]:
amazon_df.isnull().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           0
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
discount_amount        0
dtype: int64

## Removiendo valores duplicados

In [None]:
amazon_df = amazon_df.drop_duplicates()


## Eliminando espacios en blanco de product_id

Los espacios en blanco son espacios que separan caracteres. Los ID de producto no deben tener espacios en blanco. Sin embargo, es posible que haya espacios en blanco al principio o al final del ID del producto.

In [None]:
amazon_df['product_id'].str.strip()

0       B07JW9H4J1
1       B098NS6PVG
2       B096MSW6CT
3       B08HDJ86NZ
4       B08CF3B7N1
           ...    
1460    B08L7J3T31
1461    B01M6453MB
1462    B009P2LIL4
1463    B00J5DYCCA
1464    B01486F4G6
Name: product_id, Length: 1465, dtype: object

## Separando múltiples categorías en múltiples columnas

Los valores de la columna categoría contienen varias categorías separadas por | (Ordenadores&Accesorios|Accesorios&Periféricos). Para que los datos de categoría sean más útiles para el análisis de datos, cada categoría debe tener su propia columna y los nombres de las categorías deben formatearse para que sean más legibles.

In [None]:
splitcategory = amazon_df['category'].str.split('|', expand = True)
splitcategory

Unnamed: 0,0,1,2,3,4,5,6
0,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
1,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
2,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
3,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
4,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
...,...,...,...,...,...,...,...
1460,Home&Kitchen,Kitchen&HomeAppliances,WaterPurifiers&Accessories,WaterPurifierAccessories,,,
1461,Home&Kitchen,Kitchen&HomeAppliances,SmallKitchenAppliances,Rice&PastaCookers,,,
1462,Home&Kitchen,"Heating,Cooling&AirQuality",RoomHeaters,HeatConvectors,,,
1463,Home&Kitchen,"Heating,Cooling&AirQuality",Fans,ExhaustFans,,,


In [None]:
splitcategory.isna().sum()

0       0
1       0
2       8
3     165
4     943
5    1380
6    1452
dtype: int64

<font color="#e47911">**Observaciones**</font>📋 <br>

Sólo las columnas 0 y 1 no tienen valores nulos. Para garantizar que todas las filas no tengan valores nulos sin eliminar las filas con valores nulos, solo se deben usar las columnas 0 y 1.

In [None]:
splitcategory = splitcategory.rename(columns = {0:'category',
                                                1:'subcategory'})
splitcategory

Unnamed: 0,category,subcategory,2,3,4,5,6
0,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
1,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
2,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
3,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
4,Computers&Accessories,Accessories&Peripherals,Cables&Accessories,Cables,USBCables,,
...,...,...,...,...,...,...,...
1460,Home&Kitchen,Kitchen&HomeAppliances,WaterPurifiers&Accessories,WaterPurifierAccessories,,,
1461,Home&Kitchen,Kitchen&HomeAppliances,SmallKitchenAppliances,Rice&PastaCookers,,,
1462,Home&Kitchen,"Heating,Cooling&AirQuality",RoomHeaters,HeatConvectors,,,
1463,Home&Kitchen,"Heating,Cooling&AirQuality",Fans,ExhaustFans,,,


In [None]:
splitcategory['category'].unique()

array(['Computers&Accessories', 'Electronics', 'MusicalInstruments',
       'OfficeProducts', 'Home&Kitchen', 'HomeImprovement', 'Toys&Games',
       'Car&Motorbike', 'Health&PersonalCare'], dtype=object)

## Formateando los valores de las categorías para que sean más fáciles de leer.

In [None]:
splitcategory['category'] = splitcategory['category'].str.replace('&',
                                                                  ' & ')

splitcategory['category'] = splitcategory['category'].str.replace('MusicalInstruments',
                                                                  'Musical Instruments')

splitcategory['category'] = splitcategory['category'].str.replace('OfficeProducts',
                                                                  'Office Products')

splitcategory['category'] = splitcategory['category'].str.replace('HomeImprovement',
                                                                  'Home Improvement')

In [None]:
splitcategory['subcategory'].unique()

array(['Accessories&Peripherals', 'NetworkingDevices',
       'HomeTheater,TV&Video', 'HomeAudio', 'WearableTechnology',
       'Mobiles&Accessories', 'Accessories',
       'Headphones,Earbuds&Accessories', 'ExternalDevices&DataStorage',
       'Microphones', 'GeneralPurposeBatteries&BatteryChargers',
       'OfficePaperProducts', 'CraftMaterials', 'Cameras&Photography',
       'OfficeElectronics', 'Printers,Inks&Accessories', 'Monitors',
       'Components', 'Electrical', 'Arts&Crafts', 'PowerAccessories',
       'Tablets', 'Laptops', 'Kitchen&HomeAppliances',
       'Heating,Cooling&AirQuality', 'Kitchen&Dining',
       'HomeStorage&Organization', 'CarAccessories',
       'HomeMedicalSupplies&Equipment'], dtype=object)

In [None]:
splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('&',
                                                                        ' & ')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace(',',
                                                                        ', ')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('NetworkingDevices',
                                                                        'Networking Devices')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('HomeTheater',
                                                                        'Home Theater')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('HomeAudio',
                                                                        'Home Audio')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('WearableTechnology',
                                                                        'Wearable Technology')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('ExternalDevices',
                                                                        'External Devices')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('DataStorage',
                                                                        'Data Storage')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('GeneralPurposeBatteries',
                                                                        'General Purpose Batteries')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('BatteryChargers',
                                                                        'Battery Chargers')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('OfficePaperProducts',
                                                                        'Office Paper Products')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('CraftMaterials',
                                                                        'Craft Materials')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('OfficeElectronics',
                                                                        'Office Electronics')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('PowerAccessories',
                                                                        'Power Accessories')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('HomeAppliances',
                                                                        'Home Appliances')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('AirQuality',
                                                                        'Air Quality')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('HomeStorage',
                                                                        'Home Storage')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('CarAccessories',
                                                                        'Car Accessories')

splitcategory['subcategory'] = splitcategory['subcategory'].str.replace('HomeMedicalSupplies',
                                                                        'Home Medical Supplies')

In [None]:
# Usando el comando .drop() para eliminar la columna de categoría del marco de datos de Amazon.

amazon_df = amazon_df.drop(columns = 'category')

# Agregando la nueva columna de categoría y columna de subcategoría al marco de datos de Amazon
amazon_df['category'] = splitcategory['category']
amazon_df['subcategory'] = splitcategory['subcategory']

In [None]:
amazon_df.columns

Index(['product_id', 'product_name', 'discounted_price', 'actual_price',
       'discount_percentage', 'rating', 'rating_count', 'about_product',
       'user_id', 'user_name', 'review_id', 'review_title', 'review_content',
       'img_link', 'product_link', 'discount_amount', 'category',
       'subcategory'],
      dtype='object')

## Crear una columna que clasifique los productos según su puntuación

La columna de clasificación contendrá una clasificación basada en la calificación del producto. Cada clasificación representará un rango de calificación

In [None]:
# Cree un bucle for y una declaración if else para categorizar la escala de calificación.
ranking = []

for score in amazon_df['rating']:
    if score <= .9 : ranking.append('Very Poor')
    elif score <= 1.9 : ranking.append('Poor')
    elif score <= 2.9 : ranking.append('Average')
    elif score <= 3.9 : ranking.append('Good')
    elif score <= 4.9 : ranking.append('Very Good')
    elif score == 5.0 : ranking.append('Excellent')

amazon_df['ranking'] = ranking

amazon_df['ranking'] = amazon_df['ranking'].astype('category')

In [None]:
amazon_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype   
---  ------               --------------  -----   
 0   product_id           1465 non-null   object  
 1   product_name         1465 non-null   object  
 2   discounted_price     1465 non-null   float64 
 3   actual_price         1465 non-null   float64 
 4   discount_percentage  1465 non-null   float64 
 5   rating               1465 non-null   float64 
 6   rating_count         1465 non-null   float64 
 7   about_product        1465 non-null   object  
 8   user_id              1465 non-null   object  
 9   user_name            1465 non-null   object  
 10  review_id            1465 non-null   object  
 11  review_title         1465 non-null   object  
 12  review_content       1465 non-null   object  
 13  img_link             1465 non-null   object  
 14  product_link         1465 non-null   object  
 15  discount_amount      

## Cree un marco de datos con columnas que se utilizarán en el análisis.

In [None]:
amazon_clean = amazon_df[['product_id',
                       'product_name',
                       'actual_price',
                       'discounted_price',
                       'discount_amount',
                       'discount_percentage',
                       'category',
                       'subcategory',
                       'rating',
                       'rating_count',
                       'ranking']]
amazon_clean

Unnamed: 0,product_id,product_name,actual_price,discounted_price,discount_amount,discount_percentage,category,subcategory,rating,rating_count,ranking
0,B07JW9H4J1,Wayona Nylon Braided USB to Lightning Fast Cha...,1024.27,371.87,652.40,0.64,Computers & Accessories,Accessories & Peripherals,4.2,24269.0,Very Good
1,B098NS6PVG,Ambrane Unbreakable 60W / 3A Fast Charging 1.5...,325.27,185.47,139.80,0.43,Computers & Accessories,Accessories & Peripherals,4.0,43994.0,Very Good
2,B096MSW6CT,Sounce Fast Phone Charging Cable & Data Sync U...,1769.87,185.47,1584.40,0.90,Computers & Accessories,Accessories & Peripherals,3.9,7928.0,Good
3,B08HDJ86NZ,boAt Deuce USB 300 2 in 1 Type-C & Micro USB S...,651.47,306.63,344.84,0.53,Computers & Accessories,Accessories & Peripherals,4.2,94363.0,Very Good
4,B08CF3B7N1,Portronics Konnect L 1.2M Fast Charging 3A 8 P...,371.87,143.53,228.34,0.61,Computers & Accessories,Accessories & Peripherals,4.2,16905.0,Very Good
...,...,...,...,...,...,...,...,...,...,...,...
1460,B08L7J3T31,Noir Aqua - 5pcs PP Spun Filter + 1 Spanner | ...,856.51,353.23,503.28,0.59,Home & Kitchen,Kitchen & Home Appliances,4.0,1090.0,Very Good
1461,B01M6453MB,Prestige Delight PRWO Electric Rice Cooker (1 ...,2837.94,2124.96,712.98,0.25,Home & Kitchen,Kitchen & Home Appliances,4.1,4118.0,Very Good
1462,B009P2LIL4,Bajaj Majesty RX10 2000 Watts Heat Convector R...,2870.56,2068.11,802.45,0.28,Home & Kitchen,"Heating, Cooling & Air Quality",3.6,468.0,Good
1463,B00J5DYCCA,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,1761.48,1303.87,457.61,0.26,Home & Kitchen,"Heating, Cooling & Air Quality",4.0,8031.0,Very Good


El marco de datos limpio tiene 1.465 filas, la misma cantidad que el marco de datos original sin limpiar. Los datos del marco de datos amazon_clean se han formateado para que cada columna tenga los tipos de datos adecuados. Las columnas que contienen más de un valor se han separado para que cada valor tenga su propia columna. Se han añadido nuevas columnas y se han eliminado las que no se utilizarán en el análisis.

## Creando un marco de datos

Los valores de las columnas user_name y user_id contienen varios nombres de usuario e ID separados por comas. Cada nombre de usuario e ID debe separarse en filas diferentes en lugar de columnas.

In [None]:
# Utilice el comando .str.split() para dividir los valores de cadena en las columnas user_id y user_name usando , como delimitador.

split_user_id = amazon_df['user_id'].str.split(',', expand = False)
split_user_name = amazon_df['user_name'].str.split(',', expand = False)

In [None]:
# Utilice el comando .explode() para dividir cada elemento de una lista en una fila.

id_rows = split_user_id.explode()
name_rows = split_user_name.explode()

In [None]:
# Utilice el comando DataFrame() para crear un marco de datos utilizando las listas.

df_id_rows = pd.DataFrame(id_rows)
df_name_rows = pd.DataFrame(name_rows)

In [None]:
# Agregue columnas product_name, categoría y subcategoría del marco de datos amazon_clean al marco de datos df_name_rows.

df_name_rows['product_name'] = amazon_clean['product_name']
df_name_rows['category'] = amazon_clean['category']
df_name_rows['subcategory'] = amazon_clean['subcategory']

In [None]:
# Utilice el comando .reset_index() para restablecer el índice de modo que cada fila tenga su propio número de índice.

df_id_rows = df_id_rows.reset_index(drop = True)
df_name_rows = df_name_rows.reset_index(drop = True)

In [None]:
# Utilice el comando .merge() para fusionar 2 marcos de datos

reviewers = pd.merge(df_id_rows, df_name_rows, left_index = True, right_index = True)
reviewers

Unnamed: 0,user_id,user_name,product_name,category,subcategory
0,AG3D6O4STAQKAY2UVGEUV46KN35Q,Manav,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers & Accessories,Accessories & Peripherals
1,AHMY5CWJMMK5BJRBBSNLYT3ONILA,Adarsh gupta,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers & Accessories,Accessories & Peripherals
2,AHCTC6ULH4XB6YHDY6PCH2R772LQ,Sundeep,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers & Accessories,Accessories & Peripherals
3,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,S.Sayeed Ahmed,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers & Accessories,Accessories & Peripherals
4,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,jaspreet singh,Wayona Nylon Braided USB to Lightning Fast Cha...,Computers & Accessories,Accessories & Peripherals
...,...,...,...,...,...
11498,AHXCDNSXAESERITAFELQABFVNLCA,BATTU SURESHKUMAR,Bajaj Majesty RX10 2000 Watts Heat Convector R...,Home & Kitchen,"Heating, Cooling & Air Quality"
11499,AGRZD6CHLCUNOLMMIMIHUCG7PIFA,Shubham Dubey,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,Home & Kitchen,"Heating, Cooling & Air Quality"
11500,AFQZVGSOSOJHKFQQMCEI4725QEKQ,E.GURUBARAN,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,Home & Kitchen,"Heating, Cooling & Air Quality"
11501,AEALVGXXIP46OZVXKRUXSDWZJMEA,Mayank S.,Havells Ventil Air DSP 230mm Exhaust Fan (Pist...,Home & Kitchen,"Heating, Cooling & Air Quality"


In [None]:
reviewers.isna().sum()

user_id         0
user_name       0
product_name    0
category        0
subcategory     0
dtype: int64

# Análizando los Datos

Tras limpiar los datos, el siguiente paso es analizarlos. El análisis exploratorio de datos se centrará en recopilar información como, por ejemplo, qué categorías tienen los mayores ingresos medios, qué categorías tienen la mayor implicación de los clientes (más opiniones, mayor valoración, mayor número de personas que valoran el producto) y qué productos tienen los mayores descuentos.