# Brazilian E-Commerce BI Mapping Matrix

---

## 1. Product & Seller Performance

### **Internal Data Analysis Questions:**

- What is the average review score by product category and top 10 products?
```python
category_reviews = sales.groupby('product_category_name')['review_score'].mean().sort_values(ascending=False)
category_reviews.head(10)
```

- What is the review score distribution by delivery delay?
```python
# Assuming delay_days column exists
sns.boxplot(x=sales['delay_days'], y=sales['review_score'])
```

- Which sellers have the highest proportion of low reviews (1–2 stars)?
```python
seller_bad_reviews = sales.groupby('seller_id')['review_score'].apply(lambda x: (x <= 2).mean())
seller_bad_reviews.sort_values(ascending=False).head(10)
```

### **Business Risk & Opportunity Metrics (Highlighted):**

- **Which sellers generate the most complaints (1-star reviews)?**
```python
seller_1star_reviews = sales.groupby('seller_id')['review_score'].apply(lambda x: (x == 1).sum())
seller_1star_reviews.sort_values(ascending=False).head(10)
```

- **What % of total revenue is managed by underperforming sellers?**
```python
underperforming_sellers = seller_bad_reviews[seller_bad_reviews > 0.2].index
revenue_underperformers = sales[sales['seller_id'].isin(underperforming_sellers)]['total_revenue'].sum()
total_revenue = sales['total_revenue'].sum()
percent_revenue_underperformers = revenue_underperformers / total_revenue * 100
```

- **What is the opportunity cost of poor-performing sellers in terms of revenue loss?**
```python
# This can be calculated as above or extended into more complex simulations.
```

---

## 2. Returns & Complaints (proxied via low ratings)

### **Internal Data Analysis Questions:**

- What is the return rate (1–2 star reviews) by product category?
```python
low_reviews = sales[sales['review_score'] <= 2]
return_rate = low_reviews.groupby('product_category_name')['review_score'].count() / sales.groupby('product_category_name')['review_score'].count()
return_rate.sort_values(ascending=False)
```

- Do high return categories correlate with certain sellers?
```python
low_reviews.groupby(['product_category_name','seller_id']).size().sort_values(ascending=False).head(20)
```

### **Business Risk & Opportunity Metrics (Highlighted):**

- **What are the most frequently returned product categories (proxied by low ratings)?**
```python
return_rate.sort_values(ascending=False).head(10)
```

- **Which products drive the most positive customer experiences?**
```python
category_reviews.head(10)
```

- **Which SKUs have high sales volume but low satisfaction?**
```python
sku_sales = sales.groupby('product_id')['total_revenue'].sum()
sku_reviews = sales.groupby('product_id')['review_score'].mean()
sku_summary = pd.concat([sku_sales, sku_reviews], axis=1)
sku_summary.columns = ['total_revenue', 'avg_review']
sku_summary[sku_summary['avg_review'] < 3].sort_values(by='total_revenue', ascending=False).head(10)
```

---

## 3. Customer Behavior & Engagement

### **Internal Data Analysis Questions:**

- What % of customers leave reviews? How many are repeat reviewers?
```python
percent_reviewers = sales['customer_id'].nunique() / customers['customer_unique_id'].nunique() * 100
```

- What is the % of customers who made ≥2 purchases in 12 months?
```python
customer_order_count = sales.groupby('customer_id')['order_id'].nunique()
repeat_customers = (customer_order_count >= 2).sum()
percent_repeat_customers = repeat_customers / customer_order_count.count() * 100
```

- How does review sentiment affect repeat behavior?
```python
sales['positive_review'] = sales['review_score'] >= 4
customer_ltv = sales.groupby(['customer_id', 'positive_review'])['total_revenue'].sum().reset_index()
```

### **Business Risk & Opportunity Metrics (Highlighted):**

- **What % of revenue comes from repeat customers?**
```python
repeat_revenue = sales[sales['customer_id'].isin(customer_order_count[customer_order_count >= 2].index)]['total_revenue'].sum()
total_revenue = sales['total_revenue'].sum()
percent_repeat_revenue = repeat_revenue / total_revenue * 100
```

- **Who are the top 10% of customers by revenue and frequency?**
```python
customer_revenue = sales.groupby('customer_id')['total_revenue'].sum()
top_10pct = customer_revenue.quantile(0.9)
high_value_customers = customer_revenue[customer_revenue >= top_10pct]
```

- **What happens to customer LTV when reviews are poor?**
```python
low_review_ltv = sales[sales['review_score'] <= 2].groupby('customer_id')['total_revenue'].sum().mean()
high_review_ltv = sales[sales['review_score'] >= 4].groupby('customer_id')['total_revenue'].sum().mean()
```

---

## 4. Payments & Spend Behavior

### **Internal Data Analysis Questions:**

- Distribution of payment methods by order value bucket?
```python
sales['order_value_bucket'] = pd.cut(sales['total_revenue'], bins=[0,50,100,200,500,1000,5000], right=False)
payment_dist = sales.groupby(['order_value_bucket','payment_type']).size().unstack().fillna(0)
```

- How many installments are common by payment type?
```python
payments.groupby('payment_type')['payment_installments'].mean()
```

- What % of total amount is typically paid in the first installment?
```python
payments['first_payment_pct'] = payments['payment_value'] / payments['payment_value'].sum() * 100
payments.groupby('payment_type')['first_payment_pct'].mean()
```

### **Business Risk & Opportunity Metrics (Highlighted):**

- **What % of total revenue comes from installment payments?**
```python
installment_revenue = payments[payments['payment_installments'] > 1]['payment_value'].sum()
total_payment_revenue = payments['payment_value'].sum()
percent_installment_revenue = installment_revenue / total_payment_revenue * 100
```

- **Do installment customers spend more on average?**
```python
installment_avg = payments[payments['payment_installments'] > 1]['payment_value'].mean()
single_payment_avg = payments[payments['payment_installments'] == 1]['payment_value'].mean()
```

- **Projected uplift in revenue from flexible payment options (e.g. vouchers):**
```text
*Advanced modeling needed, scenario simulation.*
```

---

## 5. Product Demand

### **Internal Data Analysis Questions:**

- Top 10 SKUs by volume and by revenue
```python
sku_volume = sales.groupby('product_id')['order_id'].count().sort_values(ascending=False).head(10)
sku_revenue = sales.groupby('product_id')['total_revenue'].sum().sort_values(ascending=False).head(10)
```

- Price elasticity: Do higher-priced items have lower review rates?
```python
price_reviews = sales.groupby('product_id').agg({'price':'mean','review_score':'mean'})
sns.scatterplot(data=price_reviews, x='price', y='review_score')
```

---

## 6. Geographic Analysis

### **Internal Data Analysis Questions:**

- Median delivery distance between seller and customer per state
```python
state_distances = sales.groupby('customer_state')['distance'].median()
```

- Correlation between distance and delay (in days)
```python
sales[['distance','delay_days']].corr()
```

- What % of orders are local (same-state customer and seller)?
```python
sales['local_order'] = sales['customer_state'] == sales['seller_state']
percent_local = sales['local_order'].mean() * 100
```

### **Business Risk & Opportunity Metrics (Highlighted):**

- **Which states generate the most revenue?**
```python
state_revenue = sales.groupby('customer_state')['total_revenue'].sum().sort_values(ascending=False)
```

- **What regions have the longest delivery times?**
```python
state_delays = sales.groupby('customer_state')['delay_days'].mean().sort_values(ascending=False)
```

- **Can reducing delivery distances in high-delay states improve satisfaction?**
```python
# Requires modeling joining state_delays, state_revenue, and satisfaction scores.
```

---

## 7. BONUS – Delivery Delay Revenue Leakage (H1 Direct)

### **Business Risk & Opportunity Metrics (Highlighted):**

- **What’s the total order value at risk from long delivery delays (>20 days)?**
```python
extreme_delays = sales[sales['delay_days'] > 20]
lost_revenue = extreme_delays['total_revenue'].sum()
```
