# CEO-REQUEST CHALLENGE

> Should Olist remove underperforming sellers from its marketplace?

In [1]:
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import numpy as np

In [2]:
# Import data

from olist.data import Olist
olist=Olist()
data=olist.get_data()  ## is dict
matching_table = olist.get_matching_table()

#print(" ** matching_table.shape  ** ", matching_table.shape)
#print(list(data.keys()))

# Import olist seller
from olist.seller import Seller
seller=Seller()
sellers=seller.get_training_data() 

#print(" ** sellers.shape  ** ", sellers.shape)


# Import orders training_set 
from olist.order import Order
order=Order()
orders=order.get_training_data() ## is dict
#print(" ** orders.shape  ** ", orders.shape)
sales=seller.get_sales() ## is dict
#print(" ** sales.shape  ** ", sales.shape)

## Problem statement

To analyse the impact of removing the worse sellers from Olist's marketplace, we can start with a what-if analysis: What would have happened if Olist had never accepted these sellers in the first place? For that:

Step ① Compute, for each `seller_id`, and cumulated since the beginning:
- The `revenues` it brings
- The `costs` associated with all its bad reviews
- The resulting `profits` (revenues - costs)
- The number of `orders` (it will impact overall IT costs)


Step ② We can then sort sellers by increasing profits for Olist, and for each number of sellers to remove, compute the financial impact it would have made had they never been accepted on the platform. We may find an optimal number of sellers to remove that maximizes Olist's profit margin

In [14]:
# Write down a detailed strategy (step by step) to create the DataFrame you need for step 1
# Think about how to re-use or update the logic you have already coded in your `olist` package

### Step 1

#### The revenues per seller

- Import orders training_set 
- from olist.order import Order
- order=Order()
- orders=order.get_training_data() 

- from olist.seller import Seller
- seller=Seller()
- sellers=seller.get_training_data() 

In [3]:
sellers.head(3)

Unnamed: 0,seller_id,seller_city,seller_state,delay_to_carrier,wait_time,date_first_sale,date_last_sale,share_of_one_stars,share_of_five_stars,review_score,n_orders,quantity,quantity_per_order,sales
0,3442f8959a84dea7ee197c632cb2df15,campinas,SP,1.514329,13.018588,2017-05-05 16:25:11,2017-08-30 12:50:19,0.333333,0.333333,3.0,3,3,1.0,218.7
1,d1b65fc7debc3361ea86b5f14c68d2e2,mogi guacu,SP,0.15519,9.065716,2017-03-29 02:10:34,2018-06-06 20:15:21,0.05,0.725,4.55,40,41,1.025,11703.07
2,ce3ad9de960102d0677a81f5d0bb7b2d,rio de janeiro,RJ,0.0,4.042292,2018-07-30 12:44:49,2018-07-30 12:44:49,0.0,1.0,5.0,1,1,1.0,158.0


- Usuful columns

In [4]:
mask_columns = [ 'seller_id', 'date_first_sale', 'date_last_sale', 'sales' ] 
sellers_1 = sellers[mask_columns].copy()
sellers_1.head(1)

Unnamed: 0,seller_id,date_first_sale,date_last_sale,sales
0,3442f8959a84dea7ee197c632cb2df15,2017-05-05 16:25:11,2017-08-30 12:50:19,218.7


Revenues :

- Olist takes a 10% cut on the product price (excl. freight) of each order delivered.
- Olist charges 80 BRL by month per seller.

In [6]:
sellers_1['active_months'] = \
          (sellers_1['date_last_sale'] - sellers_1['date_first_sale']) / np.timedelta64(1,'M')


sellers_1['revenues'] = sellers_1['sales'].map(lambda x : .1*x) \
                       + sellers_1['active_months']*80

sellers_1.head(2)

Unnamed: 0,seller_id,date_first_sale,date_last_sale,sales,active_months,revenues
0,3442f8959a84dea7ee197c632cb2df15,2017-05-05 16:25:11,2017-08-30 12:50:19,218.7,3.839119,328.999525
1,d1b65fc7debc3361ea86b5f14c68d2e2,2017-03-29 02:10:34,2018-06-06 20:15:21,11703.07,14.28377,2313.008599


In [7]:
total_revenues = sellers_1['revenues'].sum()
print("Revenues of all sellers :", total_revenues)

Revenues of all sellers : 2787744.214282712


#### The costs associated with all its bad reviews

- bad reviews when review_score = 1 or 2
- matching_table : there are 4 interesting columns = ['order_id', 'review_id', 'seller_id']
- orders : columns = [ 'order_id', 'dim_is_five_star', 'dim_is_one_star', 'review_score',  'price' ]

In [83]:
matching_table[['order_id', 'review_id', 'seller_id']].head(2)

Unnamed: 0,order_id,review_id,seller_id
0,e481f51cbdc54678b7cc49136f2d6af7,a54f0611adc9ed256b57ede6b6eb5114,3504c0cb71d7fa48d967e0e4c94d59d9
1,53cdb2fc8bc7dce0b6741e2150273451,8d5266042046a06655c8db133d120ba5,289cdb325fb7e7f891c38608bf9e0962


In [115]:
#from olist.order import Order
#order=Order()
orders=order.get_training_data() ## is dict

In [116]:
mask_columns = [ 'order_id', 'dim_is_five_star',
                'dim_is_one_star', 'review_score',
                'price' ]
orders[mask_columns].head(2)

Unnamed: 0,order_id,dim_is_five_star,dim_is_one_star,review_score,price
0,e481f51cbdc54678b7cc49136f2d6af7,0,0,4,29.99
1,53cdb2fc8bc7dce0b6741e2150273451,0,0,4,118.7


In [117]:
mask_columns = [ 'order_id', 'dim_is_one_star',
                'review_score',
                'price' ]

merging_for_costs = matching_table[['order_id','seller_id']] \
                      .merge( orders[mask_columns] , on = 'order_id')

print(merging_for_costs.shape)
merging_for_costs.head(2)

(112154, 5)


Unnamed: 0,order_id,seller_id,dim_is_one_star,review_score,price
0,e481f51cbdc54678b7cc49136f2d6af7,3504c0cb71d7fa48d967e0e4c94d59d9,0,4,29.99
1,53cdb2fc8bc7dce0b6741e2150273451,289cdb325fb7e7f891c38608bf9e0962,0,4,118.7


In [118]:
def review_to_price(review_score): #
    d = {1 : 100, 2 : 50, 3 : 40, 4 : 0, 5: 0}
    return d[review_score]

In [119]:
#merging_for_costs['review_score'].map(review_to_price)

In [120]:
merging_for_costs['costs'] = merging_for_costs['review_score'].map(review_to_price)
costs = merging_for_costs[['seller_id', 'costs']].copy()
costs.head(7)

Unnamed: 0,seller_id,costs
0,3504c0cb71d7fa48d967e0e4c94d59d9,0
1,289cdb325fb7e7f891c38608bf9e0962,0
2,4869f7a5dfa277a7dca6462dcf3b52b2,0
3,66922902710d126a0e7d26b0e3805106,0
4,2c9e548be18521d1c43cde1c582c6de8,0
5,8581055ce74af1daba164fdbd55a40de,0
6,16090f2ca825584b5a147ab24aa30c86,0


In [121]:
olist_costs = costs.groupby('seller_id').sum().reset_index()
olist_costs

Unnamed: 0,seller_id,costs
0,0015a82c2db000af6aaaf3ae2ecb0532,100
1,001cca7ae9ae17fb1caed9dfb1094831,4490
2,002100f778ceb8431b7a1020ff7ab48f,1010
3,003554e2dce176b5555353e4f3555ac8,0
4,004c9cd9d87a3c30c522c48c4fc07416,3130
...,...,...
2965,ffc470761de7d0232558ba5e786e57b7,400
2966,ffdd9f82b9a447f6f8d4b91554cc7dd3,240
2967,ffeee66ac5d5a62fe688b9d26f83f534,200
2968,fffd5413c0700ac820c7069d66d98c89,1220


In [123]:
# The cost associated with all this
total_costs = olist_costs['costs'].sum()
print(" ")
print("The cost associated with all this : ", total_costs )

 
The cost associated with all this :  1894610


In [78]:
# N.B : 
# merging_for_costs['review_score'].map( {1 : 100, 2 : 50, 3 : 40, 4 : 0, 5: 0}) donne le mm resultat

In [79]:
#merging_for_costs.loc[merging_for_costs.review_score == 0, 'price'] = 0        Jean
#merging_for_costs.head(10)

#### The resulting `profits` (revenues - costs)



In [126]:
print( " revenues : ", total_revenues)
print( " costs : ", total_costs)
profits =  total_revenues - total_costs
print( " profits : ", profits)

 revenues :  2787744.214282712
 costs :  1894610
 profits :  893134.214282712


In [124]:
sellers.head(2) # revenues

Unnamed: 0,seller_id,date_first_sale,date_last_sale,sales,active_months,revenues
0,3442f8959a84dea7ee197c632cb2df15,2017-05-05 16:25:11,2017-08-30 12:50:19,218.7,3.839119,328.999525
1,d1b65fc7debc3361ea86b5f14c68d2e2,2017-03-29 02:10:34,2018-06-06 20:15:21,11703.07,14.28377,2313.008599


In [94]:
olist_costs.head(2)  # olist costs

Unnamed: 0,seller_id,costs
0,0015a82c2db000af6aaaf3ae2ecb0532,100
1,001cca7ae9ae17fb1caed9dfb1094831,4490


In [95]:
merged_profits = sellers.merge(olist_costs, on = 'seller_id')
print(merged_profits.shape)
merged_profits.head(3)

(2970, 4)


Unnamed: 0,seller_id,active_months,revenues,costs
0,3442f8959a84dea7ee197c632cb2df15,3.839119,328.999525,140
1,d1b65fc7debc3361ea86b5f14c68d2e2,14.28377,2313.008599,140
2,ce3ad9de960102d0677a81f5d0bb7b2d,0.0,15.8,0


In [100]:
merged_profits['profits'] = merged_profits['revenues'] - merged_profits['costs']
merged_profits.head(3)

Unnamed: 0,seller_id,active_months,revenues,costs,profits
0,3442f8959a84dea7ee197c632cb2df15,3.839119,328.999525,140,188.999525
1,d1b65fc7debc3361ea86b5f14c68d2e2,14.28377,2313.008599,140,2173.008599
2,ce3ad9de960102d0677a81f5d0bb7b2d,0.0,15.8,0,15.8


In [104]:
profits = merged_profits['profits'].sum()
profits

893134.2142827117

#### The number of `orders` (it will impact overall IT costs)



In [127]:
#The IT department also told you that since the birth of the marketplace,
#cumulated IT costs have amounted to 500,000 BRL.

IT_costs = profits - 500_000
IT_costs

393134.21428271197

In [130]:
# Import olist seller
from olist.seller import Seller
seller=Seller()
sellers=seller.get_training_data() ## is dict
sellers.head(2)

Unnamed: 0,seller_id,seller_city,seller_state,delay_to_carrier,wait_time,date_first_sale,date_last_sale,share_of_one_stars,share_of_five_stars,review_score,n_orders,quantity,quantity_per_order,sales
0,3442f8959a84dea7ee197c632cb2df15,campinas,SP,1.514329,13.018588,2017-05-05 16:25:11,2017-08-30 12:50:19,0.333333,0.333333,3.0,3,3,1.0,218.7
1,d1b65fc7debc3361ea86b5f14c68d2e2,mogi guacu,SP,0.15519,9.065716,2017-03-29 02:10:34,2018-06-06 20:15:21,0.05,0.725,4.55,40,41,1.025,11703.07


<details>
    <summary>Hints</summary>


Starting from your current `seller().get_training_data()` DataFrame:
- Can you easily transform it to compute Olist's positive `revenue_per_seller`? 
- Can you easily transform it to compute Olist's `cost_of_bad_reviews`?

❓Instead of starting again from scratch, investigate your source code in `seller.py` - how was the mean `review_score` per seller computed? Can you imagine a way to amend your code  to compute `cost_of_reviews` in the same process? 
</details>


## Your turn!

In [0]:
# Keep this notebook tidy, you will present it orally to Olist's CEO at the end of the Communicate topic

In [None]:
#def get_wait_time(self, is_delivered=True):
#def get_review_score(self):
#def get_number_products(self):
#def get_number_sellers(self):
#def get_price_and_freight(self):
# bdef get_distance_seller_customer(self):



