# Ecommerce Product Performance

**Descripcion:**

Este conjunto de datos sintético pero realista contiene 2000 registros que representan métricas de rendimiento de productos en un entorno de comercio electrónico. Diseñado para tareas de ciencia de datos y aprendizaje automático de nivel intermedio, el conjunto de datos incluye aleatoriedad natural, valores faltantes (~5 % por columna) y distribuciones variables, que simulan condiciones reales.

**Columns:**

Product_Price: The listed price of the product in USD (range: 5 to 1000).

Discount_Rate: Discount rate applied to the product (0.0 to 0.8).

Product_Rating: Customer rating on a scale from 1 to 5.

Number_of_Reviews: Total number of user reviews (0 to 5000, highly skewed).

Stock_Availability: Product availability in stock (1 = available, 0 = out of stock).

Days_to_Deliver: Number of days it takes to deliver the product (1 to 30).

Return_Rate: Proportion of items returned after purchase (0.0 to 0.9).

Category_ID: ID of the product category (integer from 1 to 10).



# Preguntas de análisis

¿Cuál es el precio promedio de los productos por categoría?

¿Qué categoría tiene la mayor tasa de devolución promedio?

¿Qué productos tienen el mayor descuento aplicado?

¿Cuál es la relación entre el precio promedio y la calificación promedio por categoría?

¿Cuántos productos están fuera de stock por categoría?

¿Cuál es el promedio de días de entrega por categoría?

¿Existe alguna correlación entre descuento y tasa de devolución?

¿Cuál es el top 5% de productos con más reseñas?

¿Cuál es la calificación promedio de productos con descuentos mayores al 50%?

¿Qué categorías tienen productos con una tasa de devolución superior al 50% y stock disponible?

In [1]:
import pandas as pd
import sqlite3

# Cargar CSV
df = pd.read_csv('ecommerce_product_performance.csv')

# Crear conexión a SQLite en memoria
conn = sqlite3.connect(':memory:')
df.to_sql('ecommerce_data', conn, index=False, if_exists='replace')

2000

In [2]:
# Precio promedio por categoría
query = """
SELECT Category_ID,
       ROUND(AVG(Product_Price), 2) AS Avg_Price
FROM ecommerce_data
GROUP BY Category_ID
ORDER BY Avg_Price DESC;
"""

result = pd.read_sql_query(query, conn)
result


Unnamed: 0,Category_ID,Avg_Price
0,10.0,171.09
1,3.0,163.46
2,9.0,162.4
3,8.0,161.62
4,2.0,157.46
5,5.0,156.3
6,4.0,156.27
7,6.0,151.15
8,1.0,149.51
9,,148.06


In [3]:
#Categoría con mayor tasa de devolución promedio
query = """
SELECT Category_ID,
       ROUND(AVG(Return_Rate), 2) AS Avg_Return_Rate
FROM ecommerce_data
GROUP BY Category_ID
ORDER BY Avg_Return_Rate DESC
LIMIT 1;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Category_ID,Avg_Return_Rate
0,,0.36


In [4]:
#Productos con mayor descuento aplicado
query = """
SELECT *
FROM ecommerce_data
WHERE Discount_Rate = (SELECT MAX(Discount_Rate) FROM ecommerce_data);
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Product_Price,Discount_Rate,Product_Rating,Number_of_Reviews,Stock_Availability,Days_to_Deliver,Return_Rate,Category_ID
0,214.537595,0.8,1.917017,1561.0,1.0,4.0,0.480871,9.0
1,116.522467,0.8,2.289454,810.0,1.0,18.0,0.235586,8.0


In [5]:
# Relación entre precio promedio y calificación promedio por categoría
query = """
SELECT Category_ID,
       ROUND(AVG(Product_Price), 2) AS Avg_Price,
       ROUND(AVG(Product_Rating), 2) AS Avg_Rating
FROM ecommerce_data
GROUP BY Category_ID
ORDER BY Avg_Price DESC;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Category_ID,Avg_Price,Avg_Rating
0,10.0,171.09,3.74
1,3.0,163.46,3.74
2,9.0,162.4,3.73
3,8.0,161.62,3.69
4,2.0,157.46,3.69
5,5.0,156.3,3.75
6,4.0,156.27,3.68
7,6.0,151.15,3.75
8,1.0,149.51,3.71
9,,148.06,3.79


In [6]:
# Productos fuera de stock por categoría
query = """
SELECT Category_ID, COUNT(*) AS Out_of_Stock_Products
FROM ecommerce_data
WHERE Stock_Availability = 0
GROUP BY Category_ID
ORDER BY Out_of_Stock_Products DESC;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Category_ID,Out_of_Stock_Products
0,9.0,21
1,6.0,21
2,8.0,20
3,10.0,19
4,4.0,18
5,5.0,17
6,1.0,17
7,7.0,16
8,3.0,16
9,2.0,12


In [7]:
# Promedio de días de entrega por categoría

query = """
SELECT Category_ID, ROUND(AVG(Days_to_Deliver), 2) AS Avg_Delivery_Time
FROM ecommerce_data
GROUP BY Category_ID
ORDER BY Avg_Delivery_Time ASC;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Category_ID,Avg_Delivery_Time
0,7.0,14.41
1,8.0,14.44
2,,14.48
3,4.0,14.95
4,10.0,15.48
5,9.0,15.67
6,1.0,15.68
7,2.0,15.75
8,6.0,16.09
9,5.0,16.17


In [8]:
# Correlación entre descuento y tasa de devolución (agrupado por intervalos)

query = """
SELECT ROUND(Discount_Rate, 1) AS Discount_Bin,
       ROUND(AVG(Return_Rate), 2) AS Avg_Return
FROM ecommerce_data
GROUP BY Discount_Bin
ORDER BY Discount_Bin;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Discount_Bin,Avg_Return
0,,0.35
1,0.0,0.32
2,0.1,0.32
3,0.2,0.32
4,0.3,0.33
5,0.4,0.33
6,0.5,0.35
7,0.6,0.34
8,0.7,0.33
9,0.8,0.34


In [11]:
# Top 5% de productos con más reseñas

reviews_query = """
SELECT Number_of_Reviews
FROM ecommerce_data;
"""

reviews_df = pd.read_sql_query(reviews_query, conn)
percentile_95 = reviews_df['Number_of_Reviews'].quantile(0.95)

query = f"""
SELECT *
FROM ecommerce_data
WHERE Number_of_Reviews >= {percentile_95};
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Product_Price,Discount_Rate,Product_Rating,Number_of_Reviews,Stock_Availability,Days_to_Deliver,Return_Rate,Category_ID
0,,,5.000000,1300.0,1.0,22.0,0.321321,9.0
1,247.554513,0.176769,,977.0,1.0,26.0,0.088149,9.0
2,131.434102,0.397173,2.185250,1316.0,1.0,22.0,0.225413,1.0
3,250.353290,0.738087,4.406762,870.0,1.0,17.0,0.411946,9.0
4,186.163603,0.243699,5.000000,1298.0,1.0,11.0,0.159424,5.0
...,...,...,...,...,...,...,...,...
90,414.434335,0.214387,1.887149,1030.0,1.0,7.0,0.133841,3.0
91,176.912694,0.076729,5.000000,1182.0,0.0,7.0,0.232259,4.0
92,144.860628,0.251148,3.718544,872.0,1.0,8.0,0.398881,4.0
93,190.705229,0.332031,4.937858,1675.0,0.0,13.0,0.575769,10.0


In [12]:
# Calificación promedio de productos con descuentos > 50%

query = """
SELECT ROUND(AVG(Product_Rating), 2) AS Avg_Rating_High_Discount
FROM ecommerce_data
WHERE Discount_Rate > 0.5;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Avg_Rating_High_Discount
0,3.75


In [13]:
# Categorías con productos con devolución alta y stock disponible

query = """
SELECT DISTINCT Category_ID
FROM ecommerce_data
WHERE Return_Rate > 0.5 AND Stock_Availability = 1;
"""

result = pd.read_sql_query(query, conn)
result

Unnamed: 0,Category_ID
0,6.0
1,7.0
2,4.0
3,3.0
4,9.0
5,1.0
6,2.0
7,5.0
8,
9,8.0
