This notebook summarizes the public datasets featured in our investigation. It provides:

* Number of distinct customers
* Number of distinct orders/baskets
* Number of distinct product/SKUs
* Number of items/units involved
* Total amount of sales
* Time of the first transaction featured in the dataset
* Time of the last transaction featured in the dataset
* Duration the dataset covers

This should enable one to calculate the overall metrics such as average basket value (ABV), average basket size (ABS), and average selling price (ASP).

In [1]:
import os, sys
sys.path.append(os.path.join(os.getcwd(), 'src/'))

import pandas as pd

from oce_ecomm_abv_calculation.datasets.olist_brazilian_ecommerce import OlistBrazilianEcommerceDataset
from oce_ecomm_abv_calculation.datasets.uci_online_retail_ii import UCIOnlineRetailIIDataset

# UCI Online Retail II dataset

In [2]:
uci_online_retail_ii_summary = UCIOnlineRetailIIDataset().summary()

uci_online_retail_ii_summary

Unnamed: 0,num_customers,num_orders,num_products,num_units,total_sales,start_date,end_date,duration_days
0,5852,36594,4621,10690447,17434460.0,2009-12-01 07:45:00,2011-12-09 12:50:00,738 days 05:05:00


In [3]:
print(f"ABV: {uci_online_retail_ii_summary.loc[0, 'total_sales'] / uci_online_retail_ii_summary.loc[0, 'num_orders']}")
print(f"ABS: {uci_online_retail_ii_summary.loc[0, 'num_units'] / uci_online_retail_ii_summary.loc[0, 'num_orders']}")
print(f"ASP: {uci_online_retail_ii_summary.loc[0, 'total_sales'] / uci_online_retail_ii_summary.loc[0, 'num_units']}")

ABV: 476.42959903262835
ABS: 292.1366070940591
ASP: 1.6308452534304694


# Olist Brazilian e-Commerce dataset

In [4]:
olist_summary = OlistBrazilianEcommerceDataset().summary()

olist_summary

Unnamed: 0,num_customers,num_orders,num_products,num_units,total_sales,start_date,end_date,duration_days
0,94983,98199,32729,112101,13494400.74,2016-09-04 21:15:19,2018-09-03 09:06:57,728 days 11:51:38


In [5]:
print(f"ABV: {olist_summary.loc[0, 'total_sales'] / olist_summary.loc[0, 'num_orders']}")
print(f"ABS: {olist_summary.loc[0, 'num_units'] / olist_summary.loc[0, 'num_orders']}")
print(f"ASP: {olist_summary.loc[0, 'total_sales'] / olist_summary.loc[0, 'num_units']}")

ABV: 137.418922188617
ABS: 1.1415696697522377
ASP: 120.37716648379586
