# CEO RECOMMENDATIONS

***CEO Question***

*How do we increase customer satisfaction (so as to increase profit margins) while maintaining a healthy order volume?*

**Findings from our Preliminary Analysis of the Orders and Sellers Datasets**

- The *more* sellers (`number_of_sellers`) and products (`number_of_products`) that are combined into a single order, the *lower* the review score.  
- The *longer* the `wait_time` for a delivery, the *lower* the review score.  

Our earlier analysis also found that *orders containing multiple sellers or products do not make up a large portion of orders on the platform* (~10%), so any policy changes (e.g. restricting orders to a single seller) would have a limited impact. 

Additionally, `wait_time` is made up of two components (seller's `delay_to_carrier` + `carrier_delivery_time`). Our analysis also found that the proportion of sellers who have delayed shipment to a carrier is only about 6%. Moreover, the component `carrier_delivery_time` is less directly in Olist's control and further analysis of their operations or finding new carrier partners is not an easy-fix we'd like to propose to the CEO. 

It appears that there are other factors contributing to a low review score that are outside of the Orders and Sellers datasets. Rather than searching for and trying to isolate all the factors contributing to low review scores, we'll try a different approach in this notebook.

Since our preliminary analysis suggests there are multiple factors driving low review scores, and focusing efforts on improving a single factor may not be sufficient, let's instead use low review scores as a way to filter better performing sellers. By ascribing costs to reviews with low ratings, we'll also be able to calculate how much impact keeping these sellers on the platform will have on the bottom line.    
 

**Notebook Objective**

In this notebook, we'll **identify poor-performing sellers, quantify their impact on Olist's profit margins, and make specific recommendations to the CEO on what short-term changes can be made to increase customer satisfaction and Olist's bottom line.**


**Next Steps in our Analysis**
1. Calculate how much revenue each seller brings in to Olist
2. Calculate the cost to Olist from bad reviews for each seller
3. Calculate the profit to Olist for each seller 
3. Understand the impact on Olist's profits from poor-performing sellers
4. Find out how much removing these sellers would impact Olist's IT costs 

In [2]:
# Import relevant libraries and modules
%load_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import math
import paths

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [8]:
from olist.seller import Seller
from olist.data import Olist


In [None]:
# #CAN BE DELETED LATER
# #export dataset for Tableau
# path = '../data/csv/seller_dataset_modified.csv'
# sellers.to_csv(path_or_buf=path, index=False)

In [9]:
data = Olist().get_data()

In [10]:
# Load Sellers data
sellers = Seller().get_training_data()
sellers.head()

Unnamed: 0,seller_id,seller_city,seller_state,delay_to_carrier,seller_wait_time,date_first_sale,date_last_sale,months_on_olist,share_of_one_stars,share_of_five_stars,seller_review_score,review_cost_per_seller,n_orders,quantity,quantity_per_order,sales
0,3442f8959a84dea7ee197c632cb2df15,campinas,SP,0.0,13.018588,2017-05-05 16:25:11,2017-08-30 12:50:19,4.0,0.333333,0.333333,3.0,140,3,3,1.0,218.7
1,d1b65fc7debc3361ea86b5f14c68d2e2,mogi guacu,SP,0.0,9.065716,2017-03-29 02:10:34,2018-06-06 20:15:21,14.0,0.05,0.725,4.55,240,40,41,1.025,11703.07
2,ce3ad9de960102d0677a81f5d0bb7b2d,rio de janeiro,RJ,0.0,4.042292,2018-07-30 12:44:49,2018-07-30 12:44:49,0.0,0.0,1.0,5.0,0,1,1,1.0,158.0
3,c0f3eea2e14555b6faeea3dd58c1b1c3,sao paulo,SP,0.0,5.667187,2018-08-03 00:44:08,2018-08-03 00:44:08,0.0,0.0,1.0,5.0,0,1,1,1.0,79.99
4,51a04a8a6bdcb23deccc82b0b80742cf,braganca paulista,SP,3.353727,35.314861,2017-11-14 12:15:25,2017-11-14 12:15:25,0.0,1.0,0.0,1.0,100,1,1,1.0,167.99


## Revenue per seller

Olist generates revenue from its sellers through two components:
1. **Monthly Membership Fee**: sellers pay Olist an **80 Brazilian Real (BRL)** (~15 USD) monthly fee to use the platform
2. **Revenue Share**: for every order on the platform, Olist takes a **10% cut** based on the product price  of each item (excl. shipping)


In [None]:
# Revenue Calculation: Monthly Membership Fee Calculation + Revenue Share

# Calculate total time on platform (in months)
number_of_months_on_olist = (sellers.date_last_sale - sellers.date_first_sale) / np.timedelta64(1, 'M')
sellers['months_on_olist'] = number_of_months_on_olist.map(lambda x: 1 if x < 1 else np.ceil(x))

# Calculate total revenue each seller has generated for Olist
sellers['revenue_per_seller'] = sellers.months_on_olist * 80 + sellers.sales * 0.1
sellers.head()

In [None]:
# Plot histogram of seller revenue
plt.figure(figsize=(12,8))
sns.set_style('darkgrid')

ax = sns.histplot(sellers.revenue_per_seller)
ax.set_xlabel('Revenue per Seller (BRL)')
ax.set_ylabel('Number of Sellers')
ax.set_title('Distribution of Seller Contribution to Olist Revenue');

In [None]:
sellers[['revenue_per_seller']].describe()

Half of all sellers on the platform have brought in less than **504 BRL** (95 USD). 

## Cost per seller

Because poor reviews from customers can have several direct and indirect costs to a business (e.g. customer support utilization, bad word-of-mouth, low repeat rate, etc.), we'll use the following table to estimate monetary costs associated with bad reviews:

**Estimated cost to Olist per bad review (in BRL)** (100 BRL $\approx$ 20 USD)
- **1 star:** 100
- **2 stars:** 50
- **3 stars:** 40
- **4 stars:** 0
- **5 stars:** 0

In [None]:
sellers[['seller_id','seller_review_score','review_cost_per_seller']].head()

- `seller_review_score`: the average review score earned by a seller.

- `review_cost_per_seller`: the total cost from all bad reviews for each seller. It's calculated by first assigning a monetary cost for any review *receiving 3 stars or less*, and then these costs are totaled up per seller. 

In [None]:
# Plot histogram of seller revenue
plt.figure(figsize=(12,8))
sns.set_style('darkgrid')

ax = sns.histplot(sellers.review_cost_per_seller)
ax.set_xlabel('Review Cost per Seller (BRL)')
ax.set_ylabel('Number of Sellers')
ax.set_title('Distribution of Seller Review Costs to Olist');

In [None]:
sellers[['review_cost_per_seller']].describe()

In [None]:
# Percentage of sellers who have some bad reviews (3 stars or less)
(sellers[sellers.review_cost_per_seller > 0]['review_cost_per_seller'].count() / len(sellers) ) * 100

While **70%** of sellers have had **at least 1 bad review** (3 stars or less), we can see that **most sellers do not have enough bad reviews to cost Olist substantially**, since even at the **75th percentile**, *total review costs* per seller are below **380 BRL** (~70 USD). This along with our distribution plot suggest that it's a small number of sellers that bring about a disproportionate number of bad reviews which ultimately cost Olist.

However, to get a clearer picture of the impact, we should include revenues and actually calculate the *profit per seller.* 

## Profit per seller 

Now that we know how much revenue each seller generates for Olist and also the cost of bad reviews, let's calculate the profit to Olist from each seller. This will allow us to see more clearly *which sellers* are a drag on Olist's bottom line and *by how much.* 

We've also been asked by the CEO to consider the IT costs associated with running the platform, but for now we'll set that aside in our analysis and examine it later.   

In [None]:
# Calculate the profit as 'revenue_per_seller' - 'review_cost_per_seller'
sellers['profit_per_seller'] = sellers.revenue_per_seller - sellers.review_cost_per_seller
sellers[['seller_id', 'revenue_per_seller', 'review_cost_per_seller', 'profit_per_seller']].groupby(by='seller_id').sum()

sellers[['seller_id', 'revenue_per_seller', 'review_cost_per_seller', 'profit_per_seller']].head()

Let's take a quick look at the distribution of `profit_per_seller`.

In [None]:
# Plot histogram of 'profit_per_seller'
plt.figure(figsize=(12,8))
sns.set_style('darkgrid')

ax = sns.histplot(sellers.profit_per_seller)
ax.set_xlim(xmin=-5000, xmax=5000)
ax.set_xlabel('Profits per Seller (BRL)')
ax.set_ylabel('Number of Sellers')
ax.set_title('Distribution of Seller Contribution to Olist Profits');

In [None]:
# Summary stats
sellers.profit_per_seller.describe()

From our histogram and summary stats, it appears the **middle 50% of sellers** on the platform generate anywhere from **93 to 721 BRL** (~17 to 134 USD) for Olist. We can also see from the distribution plot a number of sellers with negative impact on profits. 
 
Let's see if we can find out more about these poor-performing sellers who are pulling down Olist profits. 

## Understanding the Impact of Poor-performing Sellers

To get a clearer picture of how much impact these poor-performing sellers have on Olist's profits, we'll break this question down into smaller steps:

1. **How many** poor-performing sellers (those who have a negative impact on profits) are on the platform?
2. **How much in total** do these poor-performing sellers affect Olist profits? 

### How many sellers are negatively impacting Olist profits?

Let's find the total number of poor-performing sellers and also see what proportion on the platform they make up. 

In [None]:
# Simple function to create column indicating whether sellers have positive or negative impact on profits
def olist_impact(x):
    if x < 0:
        return 'negative'
    else:
        return 'positive'

sellers['impact'] = sellers['profit_per_seller'].apply(olist_impact)
sellers[['impact','seller_id']].groupby('impact').count()

In [None]:
# Bar plot to visualize the proportion of poor-performing sellers
sellers[['impact','seller_id']].groupby('impact').count().transpose().plot.barh(stacked=True, cmap='Set1')
plt.xlabel('Number of Sellers');

In [None]:
# Proportion of poor-performing sellers
sellers[['impact']].value_counts(normalize=True)

We can now see that there are **276 sellers** (or **roughly 9%** of all sellers) who **negatively impact** Olist profits, and that we'll classify as **poor-performing**.

### How much in total do poor-performing sellers impact Olist profits?

To get a better picture of how much impact these sellers have on Olist's total profits, let's create a **whale curve** which will show us *what percentage* of sellers represent *what percentage* of cumulative profits. 



### Whale Curve

To build our whale curve, we'll first sort our sellers from *most to least profitable*. Then we'll plot the *cumulative profits* to Olist from each additional seller.   

In [None]:
# Sort sellers by how much profit each contributes to Olist (from most to least)
sellers.sort_values('profit_per_seller', ignore_index=True, inplace=True, ascending=False)

# Calculate cumulative profits to Olist from each additional seller
sellers['cumulative_profits'] = sellers['profit_per_seller'].cumsum()
sellers[['seller_id', 'profit_per_seller','cumulative_profits']].tail()

In [None]:
# Total cumulative profits for Olist
total_profits = sellers['cumulative_profits'].iloc[-1]
total_profits

Olist's cumulative profits are **1,259,445 BRL** (237,180 USD). Note that when sorting sellers from *most to least profitable* to Olist, we can see how much some sellers are actually costing the platfrom. The seller with the *highest negative profits* has already cost Olist **22,419 BRL** (4220 USD).

Before we build our whale curve, we'll also need to calculate the cumulative percentages of sellers and contributions to Olist profits. 

In [None]:
# Calculate the cumulative percentage of profits from each additional seller
sellers['percent_profits'] = sellers['cumulative_profits'].div(total_profits).mul(100)

# Calculate the proprotion of sellers represented (in percent)
sellers['seller_proportion'] = (sellers.index+1) / sellers.seller_id.count() * 100

whale_df = sellers[['seller_id', 'percent_profits', 'seller_proportion']]
whale_df.tail()

In [None]:
# Build our whale curve
plt.figure(figsize=(15,6))
plt.suptitle('Whale Curve: What Proportion of Sellers Represents What Proportion of Profits')
sns.lineplot(data=sellers, x='seller_proportion', y='percent_profits')
plt.xlabel('Percentage of Sellers (Ranked by Most Profitable to Least)')
plt.ylabel('Cumulative Profits (%)');

Our whale curve helps to underscore that **roughly 80%** of our sellers **contribute to 120% of Olist profits**. Another **10%** of sellers have a **negligible impact**. And, the **remaining 10%** of sellers (when sorted from most to least profitable) drag down profits by **20%** .  

### Profits without the Poor-performing Sellers

Let's visualize this another way. What would Olist profits be if poor-performing sellers were no longer on the platform.

In [None]:
# Amount of reduction in profits due to bad reviews
neg_profits = sellers[sellers['impact']=='negative'].profit_per_seller.cumsum()
neg_profits.iloc[-1]

In [None]:
# Bar chart showing profit comparison after removal of poor-performing sellers
fig = plt.figure(figsize=(12,6))
x_values = ['Before', 'After']
y_values = [sellers.cumulative_profits.iloc[-1], sellers.cumulative_profits.iloc[-1] - neg_profits.iloc[-1]]
ax = sns.barplot(data=sellers, x=x_values, y=y_values, palette='tab20')

plt.ylabel('Profit (in BRL)')
plt.ylim(0, 2500000)
plt.suptitle('Profit Comparison from Removing Poor-performing Sellers', fontsize=12)
ax.set_yticks(ax.get_yticks())
yticklabels=['{:3.1f}'.format(y) + 'M' for y in ax.get_yticks()/1000000]
ax.set_yticklabels(yticklabels);

If Olist were to **drop poor-performing sellers, who have a negative impact on profits**, the platform would see an **immediate boost in profits** of **313,277 BRL** (58,246 USD)! 

## What's the impact if we factor in IT costs?

While it does appear that Olist's profit margins would benefit from removing poor-performing sellers, we've been asked to examine how removal of these sellers might also impact Olist's IT costs (servers, etc.). 

Olist's IT costs increase with the number of orders processed, but does so less and less rapidly due to scale effects. More precisely, we've been told that **IT costs are proportional to the square-root of cumulative number of orders**. Additionally, the IT department has reported accumulated IT costs of **500,000 BRL** (~92,930 USD) since the founding of the company.

Let's approach this question by breaking it down into the following steps: 

1. Construct the IT cost curve to figure out **IT costs** for any given order volume. 
2. Compare the **order volume** by seller quality.
3. Find the **average order volume** by seller quality.
4. Calculate **cumulative IT costs** if poor-performing sellers were removed. 
5. Compare the **difference in profits (incl. IT costs)** after removing poor-performing sellers.

### What are the IT costs for a particular order volume?

We've been provided information that IT costs are *proportional to the square-root of the number of approved orders*:

**Olist IT costs = $k\sqrt{n}$**

Here ***n*** is the ***total number of approved orders*** and ***k*** is a ***proportionality constant***.

Given this IT cost curve, we can see that it should take on the shape of a concave ascending curve like $ y = \sqrt{x} $ but also transformed by a proportionality constant *k*. Let's plot a generic $ y = \sqrt{x} $ to help visualize the overall shape of the curve.

In [None]:
# Plot a generic y = x**(1/2) curve

plt.figure(figsize=(15,4))
x=np.linspace(0,1000, 50)
y=x**0.5
sns.lineplot(x=x, y=y)
plt.xlabel('Number of Orders')
plt.ylabel('IT Cost (in BRL)');

Given the shape of this particular cost curve, we can already see that IT costs will steadily increase with more orders. However, the difference in IT cost between each additional order (marginal IT cost) is the greatest early on (when order numbers are low). At higher order volume, the IT cost appears to change less and less rapidly. 

Let's now try to get a more precise curve by doing a little math to solve for the proportionality constant *k*. 

Since we've been told that the cumulative cost of IT to date is **500,000 BRL**, we can set this equal to the area under the curve bounded by 0 to the *current number of orders* ***n***.

Then to calculate the proportionality constant *k*, we just need to take the definite integral of this expression.



$\int_{0}^{n} k\sqrt{n}$ = 500,000    

where *n* is the number of orders and *k* is a proportionality constant

In [None]:
# Total of n_orders
total_orders = sellers.n_orders.sum()
total_orders

Let's evaluate the integral from **0** orders to the total cumulative number of orders **99,844**. 

**$\int_{0}^{n} k\sqrt{n} \to \int_{0}^{99,844} k(\frac{2}{3}) n^\frac{3}{2} - k(\frac{2}{3}) (0)^\frac{3}{2} $**

Setting this equal to 500,000, we can solve for the proportionality constant ***k***.

**$ k(\frac{2}{3}) (99,844)^\frac{3}{2} = 500,000 $**

In [None]:
# Evaluate integral and solve for proportionality constant k
k = 500000 / ((2/3)*(99844**1.5))
k

This gives us the proportionality constant **k** $\approx$ **0.024**. Now let's re-plot our IT cost curve more precisely.

$ y = \hspace{0.25 cm} k \sqrt{n}  \hspace{0.25 cm} = 0.024\sqrt{n} $

In [None]:
# Re-plot our IT cost curve with our calculated proportionality constant k

plt.figure(figsize=(15,4))
x=np.linspace(0,99844, 50)
y=(k)*(x**0.5)
sns.lineplot(x=x, y=y)
plt.xlabel('Number of Orders')
plt.ylabel('IT Cost (in BRL)');

We can now see what the IT cost will be at a particular order volume. The slope of the curve starts to stabilize once you approach 20,000 orders telling us that benefits from scale effects will start to matter less for IT costs. As the IT costs becomes more linear in shape, it means that more orders will just increase IT costs at a constant rate. 

Let's now examine the IT costs associated with specific sellers. Since IT costs are a function of order volume, we'll also see if there are differences between our good and poor-performing sellers.  

### Order Volume by Seller Quality

Since IT costs are driven by order volume, let's see if there is a difference for these two groups of sellers.

In [None]:
# Calculate the order volume for each group of sellers
order_vol_df = sellers[['n_orders','impact']].groupby('impact').sum().transpose()
order_vol_df

In [None]:
# Bar plot to see composition of order volume by quality of seller
order_vol_df.plot.barh(stacked=True, cmap='Set1')
plt.legend(title='Seller Quality', labels=['Poor', 'Good'])
plt.xlabel('Number of Orders');

We now see that **poor-performing sellers** have about **40%** of the *order volume* on Olist but make up only **9%** of all sellers on the platform. This tells us that they are contributing to a **larger share of the IT cost burden** for Olist.

### Average Order Volume by Seller Quality

For another perspective on the difference in order volume with these two groups, let's calculate the average order volume per seller for each group and compare. 

In [None]:
# Calculate the average order volume per seller for each group of sellers
avg_order_vol = sellers[['n_orders','impact']].groupby('impact').mean().transpose()
avg_order_vol

In [None]:
avg_order_vol.plot(kind='bar', cmap='Set1', ylabel='Average Number of Orders per Seller', rot=0, legend=True)
plt.legend(title='Seller Quality', labels=['Poor', 'Good']);

We can see that **poor-performing sellers** have an *average order volume* of roughly **7x** more than other sellers. This underscores their **heavy usage of the IT infrastructure and extra IT cost burden.**

### Cumulative IT Costs without Poor-performing Sellers

Since we know IT cost is largely a function of order volume, let's find out how much our cumulative IT costs would be reduced if these poor-performing sellers were removed from the platform. 

Since we've already seen that poor performers have on average roughly 7x more order volume, we expect that their removal will bring down Olist IT costs significantly.  

To find out how much, let's break this down into two steps:
1. Calculate **cumulative IT costs** to Olist for each additional seller (sorted from most to least profitable)
2. Remove poor-performing sellers and then **re-evaluate cumulative IT costs at this new order volume**

Since order volume drives IT costs, we'll first find the *cumulative number of orders* for each additional seller, and then we can calculate the new *cumulative IT costs* after including each additional seller. 

In [None]:
# Calculate cumulative IT costs per additional seller

# The cumulative number of orders per additional seller
sellers['cumulative_orders'] = sellers['n_orders'].cumsum()

# The cumulative IT cost for Olist per additional seller
sellers['cumulative_it_cost'] = k*(2/3)*(sellers['cumulative_orders']**1.5)

sellers[['seller_id','cumulative_orders','cumulative_it_cost']].tail()

Next, we can easily figure out what the cumulative IT costs would be *without poor performers* by removing them from our data frame.

In [None]:
# Data frame of cumulative IT costs and order volume after removing poor performers
it_costs_sans_bad_sellers = sellers[sellers.impact == 'positive'][['seller_id','cumulative_orders','cumulative_it_cost']]
it_costs_sans_bad_sellers.tail()

If poor-performing sellers were removed, Olist would have a *cumulative order volume* of **60,027** orders and **233,081 BRL** (43,948 USD) in *cumulative IT costs*. 

Let's now compare cumulative IT costs before and after removing poor-performing sellers from the platform.

In [None]:
# Plot total cumulative IT costs before and after removing poor performers
total_it_cost = sellers[['impact', 'cumulative_it_cost']].groupby('impact').max().transpose()

fig = plt.figure(figsize=(12,6))
x_values = ['Before', 'After']
y_values = [total_it_cost.iloc[0,0], total_it_cost.iloc[0,1]]
ax = sns.barplot(data=sellers, x=x_values, y=y_values, palette='Set1')

plt.ylabel('Cumulative IT Costs (in BRL)')
plt.ylim(0, 600000)
plt.suptitle('Cumulative IT Costs Before/After Removing Poor-performing Sellers', fontsize=12)
ax.set_yticks(ax.get_yticks())
yticklabels=['{:3.0f}'.format(y) + 'K' for y in ax.get_yticks()/1000]
ax.set_yticklabels(yticklabels);

In [None]:
# Calculation of cumulative IT cost reduction after removing poor-performing sellers
it_cost_savings = total_it_cost.iloc[0,0] - total_it_cost.iloc[0,1]
it_cost_savings

By removing poor-performing sellers from the platform, *order volume* **decreases** by **40%** and this translates into a **reduction** of **266,919 BRL** (~50,868 USD) in *cumulative IT costs.*

### Profits without the Poor-performing Sellers (incl. IT Costs) 

Let's put this all together and calculate the impact on profits with factoring in the change in IT costs. 

From the analysis in the previous section (**4.4 Profits without Poor-performing Sellers**), we saw that removing poor-performing sellers would boost Olist's margins by **313,277 BRL** (58,246 USD) solely by alleviating the downward pressure from negative profits

In this section, we found that poor performers had a disproportionately large order volume which resulted in a high IT cost burden of **266,919 BRL** (50,868 USD).

If we include IT costs in our profit analysis for Olist, removal of poor-performing sellers would result in an increase in profits by **580,196 BRL** (110,598 USD). This is 

In [None]:
# Bar chart showing profit comparison (incl. IT costs) after removal of poor-performing sellers
fig = plt.figure(figsize=(12,6))
x_values = ['Before', 'After']
y_values = [sellers.cumulative_profits.iloc[-1], sellers.cumulative_profits.iloc[-1] - neg_profits.iloc[-1] + it_cost_savings]
ax = sns.barplot(data=sellers, x=x_values, y=y_values, palette='tab20')

plt.ylabel('Profit (in BRL)')
plt.ylim(0, 2500000)
plt.suptitle('Profit Comparison (incl. IT costs) from Removing Poor-performing Sellers', fontsize=12)
ax.set_yticks(ax.get_yticks())
yticklabels=['{:3.1f}'.format(y) + 'M' for y in ax.get_yticks()/1000000]
ax.set_yticklabels(yticklabels);

In [None]:
answer = 580196 / sellers.cumulative_profits[-1:]
answer

In [None]:
sellers.cumulative_profits.tail()

In [None]:
cum_profits = sellers.cumulative_profits[-1:]
cum_profits

## Executive Summary & Recommendations to CEO

**Summary of Findings**

To respond to how Olist can improve customer satisfaction and profit margins while still keeping a healthy order volume, we conducted a review of Olist's Orders and Sellers datasets. From this data, our analysis found that ***long wait times* and single orders with *multiple products or sellers* were associated with lower review scores**. However, our findings also suggest that there are other unaccounted for factors (likely outside of these datasets) that have an impact on low review scores. 

Given our limited view of all the factors influencing low review scores, we decided to respond to the issues of  customer satisfaction and profit margins from a different approach. Instead of trying to figure out which individual factors are connected to low review scores, we shifted attention to quantifying the impact on Olist if poor-performing sellers (defined as sellers with enough bad reviews to have a negative profit for Olist) were removed from the platform.    

In this alternative approach, we discovered that roughly **9%** of sellers can be classified as **'poor-performing'** because of negative profits to Olist. Additionally, this small group of sellers had **7x** the order volume of other sellers and accounted for **40%** of cumulative IT costs.  

Our conclusions found that removal of 'poor-performing' sellers that can improve overall customer satisfaction, keep a healthy order volume, and boost profits in the short term by **46%** or **580,196 BRL** (110,598 USD).


**Recommendation to CEO:**

In the short term, we believe that Olist can improve profits and overall customer satisfaction by removing poor-performing sellers.

Our preliminary analysis shows that removing these sellers from the platform will immediately increase Olist profits by reducing costs on two fronts:

1. **Costs Associated with Bad Reviews** - removal of sellers who have received enough bad reviews to have a drag on Olist margins will **boost profits by 20%** amounting to an increase of **313,277 BRL** (58,246 USD).

2. **IT Costs** - IT costs are a function of overall order volume, and since poor-performing sellers have been found to have a disproportionately large order volume, their removal translates into an overall **40% reduction in IT costs** amounting to a savings of **266,919 BRL** (~50,868 USD). 

In conclusion, by removing poor-performing sellers from the platform, Olist stands to see an immediate gain of **580,196 BRL** (~110,484 USD) or increase of **46%** to its financial performance. Additionally, we would expect to see an overall improvement in brand reputation as average customer satisfaction increases.

**Caveats**

These findings have been made based on assumptions for the monetary costs associated with bad reviews as well as the model for IT infrastructure costs.

**Other Areas For Further Analysis**

A more fine-toothed analysis for identifying poor-performing sellers might also distinguish sellers who have only just recently joined the platform and not gained enough experience to use the platform effectively or engage with customers. With sufficient support, they may potentially become high performing sellers and also advocates for the platform with a good on-ramping process and support system for new sellers.     

Other datasets to explore include Products data which could bring to light if product quality is a source of low review scores. Additionally, a text analysis of the reviews could also uncover more explicitly the specific reasons for customer dissatisfaction. 


In [None]:
#CAN BE DELETED LATER
#export dataset for Tableau
path = '../data/csv/seller_dataset_modified.csv'
sellers.to_csv(path_or_buf=path, index=False)