<img src="dataset_5-cover.jpg" alt="Circular Image" style="border-radius: 50%; display: block; margin: 0 auto; width: 200px; height: 200px;">
<h1>Superstore Marketing Campaign Dataset</h1>
<p><i>Sample customer data for analysis of a targeted Membership Offer</i></p>
<h2>About Dataset</h2>
<p><b>Context</b>: A superstore is planning for the year-end sale. They want to launch a new offer - gold membership, that gives a 20% discount on all purchases, for only $499 which is $999 on other days. It will be valid only for existing customers and the campaign through phone calls is currently being planned for them. The management feels that the best way to reduce the cost of the campaign is to make a predictive model which will classify customers who might purchase the offer.</p>
<h2>Data Source:</h2>
<p><b>Kaggle: </b> <a href="https://www.kaggle.com/datasets/ahsan81/superstore-marketing-campaign-dataset">Click Here!!!</a></p>

In [1]:
import duckdb
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

con = duckdb.connect("data/db/superstore.duckdb")

In [2]:
query = """
SHOW TABLES;
"""
con.sql(query).df()

Unnamed: 0,name
0,campaign_response
1,customer_behavior
2,customer_purchases
3,customers


# 1. Data Profiling Queries

<p>The process for detecting issues (missing data, outliers, inconsistencies) before data is used.</p>


## 1. Size and Duplicates

### 1. View Table Contents

In [7]:
# View Table contents
def view_table(table_name, limit=5):
    print(f"\n=== HEAD of {table_name} ===")
    head_df = con.sql(f"SELECT * FROM {table_name} LIMIT {limit};").df()
    display(head_df)

    print(f"\n=== SCHEMA of {table_name} ===")
    schema_df = con.sql(f"DESCRIBE {table_name};").df()
    display(schema_df)

In [8]:
# View All tables
tables = ['customers', 'customer_purchases', 'customer_behavior', 'campaign_response']
for table in tables:
    view_table(table)


=== HEAD of customers ===


Unnamed: 0,Id,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,enrollment_date
0,1826,1970,Graduation,Divorced,84835,0,0,2014-06-16
1,1,1961,Graduation,Single,57091,0,0,2014-06-15
2,10476,1958,Graduation,Married,67267,0,1,2014-05-13
3,1386,1967,Graduation,Together,32474,1,1,2014-11-05
4,5371,1989,Graduation,Single,21474,1,0,2014-08-04



=== SCHEMA of customers ===


Unnamed: 0,column_name,column_type,null,key,default,extra
0,Id,BIGINT,YES,,,
1,Year_Birth,BIGINT,YES,,,
2,Education,VARCHAR,YES,,,
3,Marital_Status,VARCHAR,YES,,,
4,Income,BIGINT,YES,,,
5,Kidhome,BIGINT,YES,,,
6,Teenhome,BIGINT,YES,,,
7,enrollment_date,DATE,YES,,,



=== HEAD of customer_purchases ===


Unnamed: 0,Id,MntWines,MntFruits,MntMeatProducts,MntFishProducts,MntSweetProducts,MntGoldProds
0,1826,189,104,379,111,189,218
1,1,464,5,64,7,0,37
2,10476,134,11,59,15,2,30
3,1386,10,0,1,0,0,0
4,5371,6,16,24,11,0,34



=== SCHEMA of customer_purchases ===


Unnamed: 0,column_name,column_type,null,key,default,extra
0,Id,BIGINT,YES,,,
1,MntWines,BIGINT,YES,,,
2,MntFruits,BIGINT,YES,,,
3,MntMeatProducts,BIGINT,YES,,,
4,MntFishProducts,BIGINT,YES,,,
5,MntSweetProducts,BIGINT,YES,,,
6,MntGoldProds,BIGINT,YES,,,



=== HEAD of customer_behavior ===


Unnamed: 0,Id,Recency,NumDealsPurchases,NumWebPurchases,NumCatalogPurchases,NumStorePurchases,NumWebVisitsMonth
0,1826,0,1,4,4,6,1
1,1,0,1,7,3,7,5
2,10476,0,1,3,2,5,2
3,1386,0,1,1,0,2,7
4,5371,0,2,3,1,2,7



=== SCHEMA of customer_behavior ===


Unnamed: 0,column_name,column_type,null,key,default,extra
0,Id,BIGINT,YES,,,
1,Recency,BIGINT,YES,,,
2,NumDealsPurchases,BIGINT,YES,,,
3,NumWebPurchases,BIGINT,YES,,,
4,NumCatalogPurchases,BIGINT,YES,,,
5,NumStorePurchases,BIGINT,YES,,,
6,NumWebVisitsMonth,BIGINT,YES,,,



=== HEAD of campaign_response ===


Unnamed: 0,Id,Response,Complain
0,1826,1,0
1,1,1,0
2,10476,0,0
3,1386,0,0
4,5371,1,0



=== SCHEMA of campaign_response ===


Unnamed: 0,column_name,column_type,null,key,default,extra
0,Id,BIGINT,YES,,,
1,Response,BIGINT,YES,,,
2,Complain,BIGINT,YES,,,


### 2. Duplicates