#### INTRODUCTION
This dataset is provided by Olist. Olist is the largest department store in Brazilian marketplaces which provides a platform for small businesses from all over Brazil. Merchants registered in Olist can sell their products through Olist Store and ship products using Olist logistic partners. 

The dataset provides information of 100k orders from 2016 to 2018. It containts various dimensions such as order status, price, payment type, delivery performance, product attribute, geolocation as well as customer reviews. The dataset is real commercial data, but it has been anonymised. For more information about the dataset, please visit: 

https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce 

The analysis for this dataset is grouped into several sections :

A. Customer Insight Section covers analyis on total number of customer shopping at Olist from 2016-2018, total high value customers per state, transaction value, list of months with highest transaction value as well as average purchase value per state.

B. Product Insight Section uncovers the most favorite products in top 10 states in Brazil. 

C. Payment Insight Section presents analysis on the most preferred method of payment, number of installment chosen and its average transaction value.

D. Customer Reviews Insight Section presents analysis relating to customer statisfaction score, total reviews as well as length time between survey sent and completion.

E. Seller Insight Section explores seller with the highest transaction value in each state and average package delivery time. 

All questions related a general insight on national level will be labelled as G (stands for General Question) while questions spesific to top 10 states in Brazil will be labelled as S (stands for Spesific Question).

NB: Top 10 states (Sao Paulo "SP', Rio de Janeiro "RJ", Minas Gerais "MG", Rio Grande do Sul "RS", Paraná "PR", Santa Catarina "SC",Bahia "BA", Distrito Federal "DF", Goiás "GO", Espírito Santo "ES")

#### Data Preparation

The dataset provided at https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce are a bunch of csv files. To make analysis easier, the files are converted to an sqlite3 .db object. 

To do this conversion, please refer to the other notebook in this repo: "Convert multiple CSV files into an sqlite DB"

#### Create Database Connection and Set Configurations

In [None]:
%load_ext sql
# %config SqlMagic.autopandas = True #Return Pandas DataFrames instead of regular result sets
%config SqlMagic.feedback = False 
#Print number of rows affected by DML
%config SqlMagic.displaycon = False 
#Show connection string after execution
%config SqlMagic.displaylimit = 50 
#Automatically limit the number of rows displayed (full result set is still stored)
# %config SqlMagic.autolimit = None 
#Automatically limit the size of the returned result sets

In [3]:
%sql sqlite:///olist.db

#### A. CUSTOMER INSIGHT

G1A. Total number of customers shopping at Olist from 2016-2018

In [22]:
%%sql

SELECT COUNT(DISTINCT customer_id) AS olist_total_customers
FROM olist_customers_dataset;

olist_total_customers
99441



G2A.Total customer, orders and payment value in multiple marketplaces in Brazil from 2016-2018

In [7]:
%%sql

SELECT COUNT(DISTINCT cd.customer_id) AS total_customer, 
       COUNT(od.order_id) AS total_order, 
       ROUND(SUM (payment_value),2) AS total_payment_value
  FROM olist_customers_dataset AS cd
  JOIN olist_orders_dataset AS od
    ON cd.customer_id=od.customer_id
  JOIN olist_order_payments_dataset AS op
    ON od.order_id=op.order_id;

total_customer,total_order,total_payment_value
99440,103886,16008872.12


G3A.Total customer, orders and payment value in multiple marketplaces in Brazil per state over 3 years

In [9]:
%%sql 


WITH base AS (

SELECT cd.customer_state, 
       COUNT(DISTINCT cd.customer_id) AS total_customer,
       COUNT(od.order_id) AS total_order, 
       ROUND(SUM (op.payment_value),2) AS total_transaction_value
  FROM olist_customers_dataset AS cd
  JOIN olist_orders_dataset AS od
    ON cd.customer_id=od.customer_id
  JOIN olist_order_payments_dataset AS op
    ON od.order_id=op.order_id
  GROUP BY cd.customer_state
  ORDER BY total_transaction_value DESC
  LIMIT 10)

SELECT *,total_transaction_value/total_order AS average_transaction_value_per_state
  FROM base
 GROUP BY customer_state;

customer_state,total_customer,total_order,total_transaction_value,average_transaction_value_per_state
BA,3380,3610,616645.82,170.8160166204986
DF,2140,2204,355141.08,161.13479128856625
ES,2033,2107,325967.55,154.70695301376364
GO,2020,2112,350092.31,165.76340435606062
MG,11635,12102,1872257.26,154.706433647331
PR,5045,5262,811156.38,154.1536259977195
RJ,12852,13527,2144379.69,158.5258882235529
RS,5466,5668,890898.54,157.18040578687368
SC,3637,3754,623086.43,165.979336707512
SP,41745,43622,5998226.96,137.50462977396725


#### Visualization on customers' total transaction value and average transaction value: 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-A2_CustomerInsight/A2_CustomerInsight

S1A.Total transaction value and percentage change per year in top 10 states in Brazil 

In [12]:
%%sql 

WITH year_order AS (
    
SELECT *,
       (strftime ('%Y',order_purchase_timestamp)) AS year
  FROM olist_orders_dataset
),

yearly_transaction_value AS (
SELECT cd.customer_state, 
       yo.year, 
       ROUND(SUM(op.payment_value),2) AS total_value_per_year 
  FROM year_order AS yo
  JOIN olist_order_payments_dataset AS op
    ON yo.order_id=op.order_id
  JOIN olist_customers_dataset AS cd
    ON yo.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state, yo.year
  ORDER BY cd.customer_state, yo.year),

final_table AS (SELECT *, 
       LEAD (total_value_per_year)OVER(PARTITION BY customer_state ORDER BY year) AS next_year_value
  FROM yearly_transaction_value)

SELECT customer_state, 
       year,
       total_value_per_year,
       ROUND((next_year_value-total_value_per_year)/total_value_per_year*100,2) AS percentage_change_per_year
  FROM final_table;

customer_state,year,total_value_per_year,percentage_change_per_year
BA,2016,995.34,28465.55
BA,2017,284324.38,16.53
BA,2018,331326.1,
DF,2016,1200.11,12989.62
DF,2017,157089.83,25.31
DF,2018,196851.14,
ES,2016,1067.14,13311.26
ES,2017,143116.93,27.02
ES,2018,181783.48,
GO,2016,1223.06,13309.43


S2A. A list of month with the highest transaction value top 10 states in Brazil per year

In [3]:
%%sql 

WITH year_month AS (
SELECT *,
       (strftime ('%Y',order_purchase_timestamp)) AS year, 
       (strftime ('%m',order_purchase_timestamp)) AS month 
  FROM olist_orders_dataset),

transaction_value AS (
SELECT cd.customer_state, 
       ym.year, 
       ym.month,
       ROUND(SUM(op.payment_value),2) AS monthly_transaction_value
  FROM year_month AS ym
  JOIN olist_order_payments_dataset AS op
    ON ym.order_id=op.order_id
  JOIN olist_customers_dataset AS cd
    ON ym.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state,ym.month
  ORDER BY cd.customer_state,ym.year,ym.month),

temp1_table AS(
SELECT *, 
        LEAD (monthly_transaction_value) OVER (PARTITION BY customer_state ORDER BY customer_state,year) AS next_value
  FROM transaction_value),

temp2_table AS (
SELECT customer_state, year, month, monthly_transaction_value, 
       ROUND((next_value-monthly_transaction_value)/monthly_transaction_value*100,2) AS percentage_change
  FROM temp1_table)

SELECT customer_state, year, month, MAX(monthly_transaction_value) AS highest_sale
  FROM temp2_table
 GROUP BY customer_state, year;

customer_state,year,month,highest_sale
BA,2017,7,74638.4
BA,2018,6,66337.12
DF,2017,7,40398.0
DF,2018,5,38448.14
ES,2017,7,42802.66
ES,2018,6,33203.16
GO,2017,8,39352.52
GO,2018,5,47088.33
MG,2017,8,186374.48
MG,2018,3,194495.23


#### Visualization on a list of month with the highest transaction value in top 10 states in Brazil per year: 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-A2_CustomerInsight/A2_CustomerInsight

S3A. # of customers in top 10 states in Brazil 

In [20]:
%%sql

SELECT cd.customer_state, 
       COUNT (cd.customer_id) AS number_of_cust
  FROM olist_customers_dataset AS cd
  JOIN olist_orders_dataset AS od 
    ON cd.customer_id=od.customer_id
  JOIN olist_order_payments_dataset AS op 
    ON od.order_id=op.order_id
  WHERE cd.customer_state  IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
 GROUP BY cd.customer_state
 LIMIT 10;

customer_state,number_of_cust
BA,3610
DF,2204
ES,2107
GO,2112
MG,12102
PR,5262
RJ,13527
RS,5668
SC,3754
SP,43622


#### Visualization on number of customers in top 10 states in Brazil : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-A1_CustomerInsight/A1_CustomerInsight

S4A.AVG transaction value in top 10 states in Brazil 

In [7]:
%%sql 
 
SELECT cd.customer_state,  
       AVG(op.payment_value) average_transaction_value
  FROM olist_customers_dataset AS cd
  JOIN olist_orders_dataset AS od 
    ON cd.customer_id=od.customer_id
  JOIN olist_order_payments_dataset AS op 
    ON od.order_id=op.order_id
  WHERE cd.customer_state  IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
 GROUP BY cd.customer_state
 ORDER BY average_transaction_value DESC
 LIMIT 10;

customer_state,average_transaction_value
BA,170.81601662049923
SC,165.97933670751183
GO,165.76340435606107
DF,161.1347912885661
RJ,158.5258882235526
RS,157.18040578687263
ES,154.70695301376355
MG,154.7064336473321
PR,154.15362599771908
SP,137.50462977396495


S5A. Total high-value customers in top 10 states in Brazil based on average purchase value

In [4]:
%%sql

WITH nat_purchase_value AS (
SELECT cd.customer_id,
       cd.customer_state,
       SUM(op.payment_value)/COUNT(op.order_id) AS avg_purchase_value
 FROM olist_customers_dataset AS cd
  JOIN olist_orders_dataset AS od 
    ON cd.customer_id=od.customer_id
  JOIN olist_order_payments_dataset AS op 
    ON od.order_id=op.order_id
  WHERE cd.customer_state  IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
 GROUP BY cd.customer_id
), 

top_rank AS(
SELECT *, 
       PERCENT_RANK () OVER (ORDER BY avg_purchase_value) AS rank
  FROM nat_purchase_value
),

final_rank AS (
SELECT *
  FROM top_rank
 WHERE rank>0.9
),

high_value_cust AS(
SELECT customer_state, 
       COUNT(customer_id) AS total_high_value_cust
  FROM final_rank
  GROUP BY customer_state), 

total_cust_per_state AS (
SELECT customer_state, 
       COUNT(customer_id) AS total_cust
  FROM olist_customers_dataset
  WHERE customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
GROUP BY customer_state),

temp_table AS (
SELECT hvc.*, tcps.total_cust
  FROM high_value_cust AS hvc
  JOIN total_cust_per_state AS tcps
    ON hvc.customer_state=tcps.customer_state) 

SELECT *, 
       CAST(total_high_value_cust AS REAL)/CAST(total_cust AS REAL)*100 AS proportion
  FROM temp_table
  ORDER BY proportion DESC;

customer_state,total_high_value_cust,total_cust,proportion
BA,432,3380,12.781065088757396
GO,247,2020,12.227722772277229
SC,415,3637,11.41050316194666
RS,618,5466,11.306256860592754
PR,550,5045,10.901883052527255
RJ,1401,12852,10.901027077497666
MG,1206,11635,10.36527718091964
DF,220,2140,10.2803738317757
ES,205,2033,10.083620265617316
SP,3702,41746,8.867915488909118


#### Visualization on total high-value customers in top 10 states in Brazil based on average purchase value : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-A1_CustomerInsight/A1_CustomerInsight

S6A. High-volume buyers in each state

In [5]:
%%sql

SELECT cd.customer_state, 
        oid.order_id,
        SUM(oid.order_item_id) AS total_product_bought
  FROM olist_order_items_dataset AS oid
  JOIN olist_orders_dataset AS od
    ON oid.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
  GROUP BY oid.order_id
  ORDER BY total_product_bought DESC
LIMIT 10;

customer_state,order_id,total_product_bought
SP,8272b63d03f5f79c56e9e4120aec44ef,231
GO,ab14fdcfbe524636d65ee38360e22ce8,210
SP,1b15974a0141d54e36626dca3fdc731a,210
GO,9ef13efd6949e4573a18964dd1bbe7f5,120
PR,428a2f660dc84138d969ccd69a0ab6d5,120
SP,9bdc4d4c71aa1de4606060929dee888c,105
SP,73c8ab38f07dc94389065f7eba4f297a,105
SP,37ee401157a3a0b28c9c6d0ed8c3b24b,91
SP,c05d6a79e55da72ca780ce90364abed9,78
MG,af822dacd6f5cff7376413c03a388bb7,78


  #### Visualization on high-volume buyers in each state :
  
  https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-A1_CustomerInsight/A1_CustomerInsight

S7A. high-volume buyers per year per month

In [6]:
%%sql 

WITH year_month AS (
SELECT *,
       (strftime ('%Y',order_purchase_timestamp)) AS year, 
       (strftime ('%m',order_purchase_timestamp)) AS month 
  FROM olist_orders_dataset)

SELECT cd.customer_state,
       cd.customer_id,
       ym.year,
       ym.month,
       SUM(oid.order_item_id) AS total_items_bought
  FROM year_month AS ym
  JOIN olist_order_items_dataset AS oid 
    ON ym.order_id=oid.order_id
  JOIN olist_customers_dataset AS cd
    ON ym.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_id, ym.year, ym.month
  ORDER BY total_items_bought DESC
  LIMIT 30;

customer_state,customer_id,year,month,total_items_bought
SP,fc3d1daec319d62d49bfb5e1f83123e9,2017,7,231
GO,bd5d39761aa56689a265d95d8d32b8be,2017,8,210
SP,be1b70680b9f9694d8c70f41fa3dc92b,2018,2,210
PR,10de381f8a8d23fff822753305f71cae,2017,11,120
GO,adb32467ecc74b53576d9d13a5a55891,2017,1,120
SP,a7693fba2ff9583c78751f2b66ecab9d,2018,2,105
SP,d5f2b3f597c7ccafbb5cac0bcc3d6024,2017,12,105
SP,7d321bd4e8ba1caf74c4c1aabd9ae524,2018,4,91
MG,0d93f21f3e8543a9d0d8ece01561f5b2,2017,10,78
SP,3b54b5978e9ace64a63f90d176ffb158,2018,5,78


#### INSIGHT 

There are some highlights from data presented above: 

1. To narrow down the analysis to a more spesific level, there are 10 states chosen (SP, RJ, MG, RS, PR, SC , BA, DF, GO, ES) to be analyzed further based on their size of transaction value over the last 3 years in Olist platform. These states will be labelled as top 10 states. 

2. The data show that total high value customers (top 10 percentile high purchase value) corresponds with the total customers in each state in Brazil. States with bigger number of customers also have bigger number of high value customers in it and states with smaller number of customers also have smaller number of high value customers in it. 

3. Looking at volume of order items, there are some customers who are buying in bulk (>200 items). 

4. Looking at transaction value and percentage change in these 10 top states, we can see that  each state experiences monthly fluctuations in its transaction value over 3 years. However, if we look at overall yearly transaction value, all top 10 states experience a positive increase in transaction value which signals a positive growth. 

5. Every year recorded (2016-2018), most of the highest sales in each state occurs in mid year, between May-August. It would be interesting to investigate further key events that drive sales.

6. Average purchase value per state does not always correspond with total transaction value per state. States with low transaction value could have high average purchase value and states with high transaction value could have low average purchase value.

#### RECOMMENDATIONS 
Based on the insight presented above, five recommendations can be derived. 
- First, it is important to engage high value customers across states more closely by providing them curated loyalty programs. 
- Second, states with high average puchase value but low transaction value may indicate that customers in those states purchase expensive goods. 
- Third, it would also be interesting to examine further the possibility to categorize customers based on the volume of order items. Some customers might be bussiness buying in bulk or perhaps individuals purchasing expensive products. Understanding customer segments can help business design suitable marketing effort.
- Fourth, as a company, it is important to look further how this insight can be used for determining product strategy and revenue forecasting. 
- Lastly, best-selling months patterns can be used for designing suitable marketing activities timeline. 

#### B. PRODUCT INSIGHT

G1B. The most favorite product in top 10 cities in Brazil 

In [31]:
%%sql 

WITH all_popular_products AS (
SELECT cd.customer_state,
       pd.product_category_name,
       SUM(oid.order_item_id) AS total_product_sold
  FROM olist_products_dataset AS pd
  JOIN olist_order_items_dataset AS oid
    ON pd.product_id=oid.product_id
  JOIN olist_orders_dataset AS od
    ON oid.order_id=od.order_id
  JOIN olist_customers_dataset AS cd 
    ON od.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state,pd.product_category_name
  ORDER BY total_product_sold DESC),

temp_table AS (SELECT *, 
       ROW_NUMBER () OVER (PARTITION BY customer_state ORDER BY total_product_sold DESC) AS product_ranking
  FROM all_popular_products)

SELECT customer_state, product_category_name,total_product_sold,product_ranking
   FROM temp_table
   WHERE product_ranking=1
   GROUP BY customer_state
   ORDER BY total_product_sold DESC;

customer_state,product_category_name,total_product_sold,product_ranking
SP,cama_mesa_banho,6499,1
RJ,cama_mesa_banho,2019,1
MG,cama_mesa_banho,1621,1
PR,moveis_decoracao,806,1
RS,moveis_decoracao,748,1
SC,moveis_decoracao,481,1
BA,beleza_saude,374,1
GO,cama_mesa_banho,303,1
DF,beleza_saude,273,1
ES,cama_mesa_banho,256,1


#### Visualization on the most favorite product in top 10 cities in Brazil: 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-B_ProductInsight/B_ProductInsight

S1B. number of product bought per month in top 10 cities

In [40]:
%%sql

WITH year_month AS (
SELECT *,
       (strftime ('%Y',order_purchase_timestamp)) AS year, 
       (strftime ('%m',order_purchase_timestamp)) AS month 
  FROM olist_orders_dataset)

SELECT cd.customer_state,
       ym.year,
       ym.month,
       SUM(oid.order_item_id) AS total_items_bought
  FROM year_month AS ym
  JOIN olist_order_items_dataset AS oid 
    ON ym.order_id=oid.order_id
  JOIN olist_customers_dataset AS cd
    ON ym.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state, ym.year, ym.month
  LIMIT 30;

customer_state,year,month,total_items_bought
BA,2016,10,4
BA,2017,1,38
BA,2017,2,67
BA,2017,3,136
BA,2017,4,134
BA,2017,5,186
BA,2017,6,156
BA,2017,7,174
BA,2017,8,201
BA,2017,9,215


#### INSIGHT 
Based on the data above, accross top 10 states, there are 3 product categories that are most popular (measured bytotal product sold) namely cama mesa banho, moveis decoracao and beleza saude.

#### RECOMMENDATION 
Based on information above, if data is available, it would be interesting to investigate further whether or not the most popular product categories also yield the biggest profit for the company.

#### C. PAYMENT INSIGHT

G1C.Overall most preferred method of payment

In [43]:
%%sql 

SELECT payment_type, COUNT(*) AS total_use_of_payment_type
  FROM olist_order_payments_dataset
  GROUP BY payment_type
  ORDER BY total_use_of_payment_type DESC
  LIMIT 10;

payment_type,total_use_of_payment_type
credit_card,76795
boleto,19784
voucher,5775
debit_card,1529
not_defined,3


#### Visualization on the most preferred method of payment : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-C_PaymentInsight/PaymentInsight

G2C.How many cust use one payment method and multiple?

In [11]:
%%sql 

WITH temp_table AS (
SELECT cd.customer_id, 
       opd.order_id,
       opd.payment_sequential,
       opd.payment_type,
       (CASE 
       WHEN opd.payment_sequential=1 THEN 1
        ELSE 0
        END) AS one_payment_method, 
       (CASE 
       WHEN opd.payment_sequential>1 THEN 1
        ELSE 0
        END) AS multi_payment_method
  FROM olist_order_payments_dataset AS opd
  JOIN olist_orders_dataset AS od
    ON opd.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
  GROUP BY cd.customer_id)

SELECT SUM(one_payment_method) AS cust_with_one_payment_method, 
       SUM(multi_payment_method) AS cust_with_multi_payment_method
  FROM temp_table;

cust_with_one_payment_method,cust_with_multi_payment_method
97809,1631


G3C.Number of installment chosen and value of transaction 

In [10]:
%%sql

SELECT order_id, 
       COUNT(payment_installments) AS number_of_installments,
       SUM(payment_value) AS value_of_transaction 
  FROM olist_order_payments_dataset
 GROUP BY order_id
 ORDER BY number_of_installments DESC
 LIMIT 20;

order_id,number_of_installments,value_of_transaction
fa65dad1b0e818e3ccc5cb0e39231352,29,457.99
ccf804e764ed5650cd8759557269dc13,26,62.680000000000014
285c2e15bebd4ac83635ccc563dc71f4,22,40.85
895ab968e7bb0d5659d16cd74cd1650c,21,161.32000000000002
fedcd9f7ccdc8cba3a18defedd1a5547,19,205.74
ee9ca989fc93ba09a6eddc250ce01742,19,82.73000000000002
4bfcba9e084f46c8e3cb49b0fa6e6159,15,740.76
21577126c19bf11a0b91592e5844ba78,15,86.99000000000001
4689b1816de42507a7d63a4617383c59,14,529.5500000000001
3c58bffb70dcf45f12bdf66a3c215905,14,100.57


S1C.How many cust use one payment method and multiple in top 10 states?

In [23]:
%%sql

WITH temp_table AS (
SELECT cd.customer_state, 
       cd.customer_id, 
       opd.order_id,
       opd.payment_sequential,
       opd.payment_type,
       (CASE 
       WHEN opd.payment_sequential=1 THEN 1
        ELSE 0
        END) AS one_payment_method, 
       (CASE 
       WHEN opd.payment_sequential>1 THEN 1
        ELSE 0
        END) AS multi_payment_method
  FROM olist_order_payments_dataset AS opd
  JOIN olist_orders_dataset AS od
    ON opd.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
 WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state, cd.customer_id)

SELECT customer_state, 
       SUM(one_payment_method) AS cust_with_one_payment_method, 
       SUM(multi_payment_method) AS cust_with_multi_payment_method
  FROM temp_table
  GROUP BY customer_state;

customer_state,cust_with_one_payment_method,cust_with_multi_payment_method
BA,3378,2
DF,2140,0
ES,2032,1
GO,2019,1
MG,11628,7
PR,5041,4
RJ,12842,10
RS,5462,4
SC,3634,3
SP,41718,27


S2C. Most preferred method of payment in top 10 states 

In [27]:
%%sql 

WITH payment_method AS (
SELECT cd.customer_state, 
       payment_type, 
       COUNT(*) AS total_use_of_payment_type
  FROM olist_order_payments_dataset AS opd
  JOIN olist_orders_dataset AS od
    ON opd.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
   WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state, opd.payment_type
  ORDER BY cd.customer_state,total_use_of_payment_type DESC),

temp_table AS (
SELECT *, 
       ROW_NUMBER()OVER (PARTITION BY customer_state ORDER BY total_use_of_payment_type DESC) AS method_ranking
  FROM payment_method)

SELECT customer_state,total_use_of_payment_type,payment_type
  FROM temp_table
  WHERE method_ranking=1;

customer_state,total_use_of_payment_type,payment_type
BA,2662,credit_card
DF,1700,credit_card
ES,1573,credit_card
GO,1520,credit_card
MG,9070,credit_card
PR,3786,credit_card
RJ,10288,credit_card
RS,3985,credit_card
SC,2713,credit_card
SP,32168,credit_card


S3C. Average Number of installment choosen and average value of transaction in top 10 states 

In [3]:
%%sql 

SELECT cd.customer_state,
       AVG(payment_installments) AS avg_number_of_installments_choosen,
       AVG(payment_value) AS average_value_of_transaction
  FROM olist_order_payments_dataset AS opd
  JOIN olist_orders_dataset AS od 
    ON opd.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state;

customer_state,avg_number_of_installments_choosen,average_value_of_transaction
BA,3.18808864265928,170.81601662049923
DF,2.7232304900181488,161.1347912885661
ES,2.9905078310393924,154.70695301376355
GO,2.952651515151515,165.76340435606107
MG,2.9822343414311683,154.7064336473321
PR,2.853667806917522,154.15362599771908
RJ,2.9651807496118874,158.5258882235526
RS,2.972300635144672,157.18040578687263
SC,2.866275972296217,165.97933670751183
SP,2.6220026592086563,137.50462977396495


#Visualization on average number of installment choosen and average value of transaction : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-C_PaymentInsight/PaymentInsight 

#### INSIGHT  
1. Credit card is the most preferred method of payment in Brazil and in top 10 states. 
2. Customers also prefer to choose one payment method when transacting instead of multiple payment methods.
3. In each state, for an average transaction value above 100 Brazillian real, customers prefer paying in 2-3 installments. 

#### RECOMMENDATION 
The insight from data above can be used to design company's marketing strategy especially pertaining a rewards program (discounts, coupons, points etc). 

#### D.CUSTOMER REVIEW

G1D.Overall customer statisfaction score 

In [32]:
%%sql 

SELECT COUNT(review_id) AS total_reviews,
       AVG(review_score) AS overall_cust_statisfaction
  FROM olist_order_reviews_dataset
  LIMIT 10;

S1D.Customer statisfaction score in top 10 cities

In [7]:
%%sql

SELECT cd.customer_state, 
       COUNT(review_id) AS total_review,
      COUNT(review_id)*1.0/(SELECT COUNT(DISTINCT opd.order_id)
                           FROM olist_order_payments_dataset AS opd
                           JOIN olist_orders_dataset AS od
                             ON opd.order_id=od.order_id
                           JOIN olist_customers_dataset AS cd
                             ON od.customer_id=cd.customer_id
                           WHERE customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
                           GROUP BY customer_state)*1.0 AS percentage_review, 
       AVG(ord.review_score) AS overall_cust_statisfaction
  FROM olist_order_reviews_dataset AS ord
  JOIN olist_orders_dataset AS od
    ON ord.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
 WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
 GROUP BY cd.customer_state;

customer_state,total_review,percentage_review,overall_cust_statisfaction
BA,3357,0.9931952662721892,3.860887697348824
DF,2148,0.6355029585798817,4.064711359404097
ES,2016,0.5964497041420118,4.041666666666667
GO,2024,0.5988165680473373,4.042490118577075
MG,11625,3.439349112426036,4.1361720430107525
PR,5038,1.4905325443786983,4.180031758634379
RJ,12765,3.776627218934911,3.8749706227967096
RS,5483,1.622189349112426,4.1333211745394856
SC,3623,1.0718934911242604,4.071763731714049
SP,41690,12.33431952662722,4.173950587670904


#### Visualization on customer statisfaction and total review : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-D_CustReviewInsight/CustReviewInsight

S2D.Length time between survey sent and completion in top 10 cities

In [41]:
%%sql 

SELECT cd.customer_state, 
       AVG(JULIANDAY(ord.review_answer_timestamp) - JULIANDAY(ord.review_creation_date))*24 AS length_time_in_hours
  FROM olist_order_reviews_dataset AS ord
  JOIN olist_orders_dataset AS od
    ON ord.order_id=od.order_id
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
 WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
 GROUP BY cd.customer_state
 ORDER BY length_time_in_hours;

customer_state,length_time_in_hours
ES,70.76546530534202
DF,72.78164623419666
SP,73.2065032981464
RS,73.81795524560987
PR,75.70856325266533
GO,76.23113540283555
SC,77.05322331085846
MG,78.18793954605441
RJ,78.34860965308376
BA,81.12596374071995


#### Visualization on length time between survey sent and completion in top 10 cities : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-D_CustReviewInsight/CustReviewInsight

#### INSIGHT 
In general cusotmers show positive customers show positive experience shopping at Olist. It can be inferred from a high customers statisfaction score (4 out of 5). 

#### E. SELLER INSIGHT

S1E.Popular seller in top 10 states based on value of transaction 

In [45]:
%%sql

WITH temp_table AS (
SELECT sd.seller_state,sd.seller_id, SUM(opd.payment_value) AS total_sales
  FROM olist_sellers_dataset AS sd
  JOIN olist_order_items_dataset AS oid
    ON sd.seller_id=oid.seller_id
  JOIN olist_order_payments_dataset AS opd
    ON oid.order_id=opd.order_id
  WHERE sd.seller_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY sd.seller_state,sd.seller_id
  ORDER BY sd.seller_state, total_sales DESC)

SELECT seller_state,seller_id, MAX(total_sales) AS popular_seller
  FROM temp_table
  GROUP BY seller_state
  ORDER BY popular_seller DESC;

seller_state,seller_id,popular_seller
SP,7c67e1448b00f6e969d365cea6b010ab,507166.91
BA,53243585a1d6dc2643021fd1853d8905,284903.08000000066
MG,25c5c91f63607446a97b143d2d535d31,160534.7400000001
RJ,46dc3b2cc0980fb8ec44634e21d2718e,148864.33999999976
PR,ccc4bbb5f32a6ab2b7066a4130f114e3,84993.27999999994
SC,04308b1ee57b6625f47df1d56f00eedf,63184.98999999997
RS,87142160b41353c4e5fca2360caf6f92,49291.87999999997
ES,001cca7ae9ae17fb1caed9dfb1094831,48349.22
DF,44073f8b7e41514de3b7815dd0237f4f,24605.44000000001
GO,9803a40e82e45418ab7fb84091af5231,21265.199999999997


#### Visualization on seller with the highest transaction value : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-E_SellerInsight/SellerInsightofOlistMarketplace

S2E.Average package delivery time duration in top 10 states 

In [6]:
%%sql 
SELECT cd.customer_state, 
       AVG(JULIANDAY(od.order_delivered_customer_date) - JULIANDAY(od.order_approved_at)) AS actual_delivery_duration_in_days
  FROM olist_orders_dataset AS od
  JOIN olist_customers_dataset AS cd
    ON od.customer_id=cd.customer_id
  WHERE cd.customer_state IN ('SP','RJ','MG','RS','PR','SC','BA','DF','GO','ES')
  GROUP BY cd.customer_state;

customer_state,actual_delivery_duration_in_days
BA,18.84908135763212
DF,12.541091691149184
ES,15.34773699875132
GO,15.080868481377625
MG,11.594499337693213
PR,11.53812961975338
RJ,14.924853095189464
RS,14.814043407762007
SC,14.491752592824431
SP,8.355001819722178


#### Visualization on average package delivery time (in days) in each state : 

https://public.tableau.com/app/profile/novita.eliana/viz/OlistMarketplace-E_SellerInsight/SellerInsightofOlistMarketplace

#### INSIGHT 
On average package delivery time duration in top 10 states is 12 days, with SP state has the shortest delivery duration (8 days) and BA state has the longest duration (18 days). 

#### RECOMMENDATION
The key takeaway from this insight is that the company should be able to manage customers expectation towards delivery time as well as to improve an efficient delivery system. 