# SQL Analysis on Retail Sales Data

This notebook demonstrates how SQL can be used for business analysis using a real retail dataset.
We use SQLite inside Python to simulate real-world SQL querying.

Skills Demonstrated:
- SQL aggregations
- GROUP BY analysis
- Date-based analysis
- Customer & product insights

In [24]:
print("="*60)
print("STEP 1: IMPORTING REQUIRED LIBRARIES")
print("="*60)

import pandas as pd
import sqlite3

print("Libraries imported successfully")



STEP 1: IMPORTING REQUIRED LIBRARIES
Libraries imported successfully


In [25]:
print("\n" + "="*60)
print("STEP 2: LOADING CLEANED RETAIL DATA")
print("="*60)

df = pd.read_csv('data/processed/cleaned_retail_sales.csv')

print("Dataset loaded successfully")
print("Shape of dataset:", df.shape)
print("\nColumns:")
print(df.columns.tolist())




STEP 2: LOADING CLEANED RETAIL DATA
Dataset loaded successfully
Shape of dataset: (10000, 45)

Columns:
['Order_ID', 'Order_Date', 'Ship_Date', 'Customer_ID', 'Customer_Name', 'Segment', 'Region', 'Product_ID', 'Product_Category', 'Product_Sub_Category', 'Product_Name', 'Sales', 'Quantity', 'Discount', 'Profit', 'Shipping_Cost', 'Order_Priority', 'Unit_Price', 'Revenue', 'Sales_Original', 'Profit_Original', 'Year', 'Month', 'Month_Name', 'Quarter', 'Day', 'Day_of_Week', 'Day_Name', 'Week_of_Year', 'Is_Weekend', 'Is_Month_Start', 'Is_Month_End', 'Discount_Amount', 'Net_Revenue', 'Profit_Margin', 'Profit_Ratio', 'Delivery_Days', 'Delivery_Category', 'Customer_Order_Count', 'Is_Repeat_Customer', 'Product_Total_Sales', 'Product_Avg_Sales', 'Product_Order_Count', 'Sales_Category', 'Season']


In [26]:
print("\n" + "="*60)
print("STEP 3: PREVIEWING DATA")
print("="*60)

display(df.head())




STEP 3: PREVIEWING DATA


Unnamed: 0,Order_ID,Order_Date,Ship_Date,Customer_ID,Customer_Name,Segment,Region,Product_ID,Product_Category,Product_Sub_Category,...,Profit_Ratio,Delivery_Days,Delivery_Category,Customer_Order_Count,Is_Repeat_Customer,Product_Total_Sales,Product_Avg_Sales,Product_Order_Count,Sales_Category,Season
0,ORD000001,2022-01-01 00:00:00,2022-01-02 00:00:00,CUST1127,Customer_1275,Consumer,Central,PROD0215,Office Supplies,Appliances,...,0.168592,1,Same/Next Day,5,1,3237.383976,119.90311,27,Very High,Winter
1,ORD000002,2022-01-01 01:00:00,2022-01-02 01:00:00,CUST1460,Customer_1334,Corporate,East,PROD0002,Furniture,Shirts,...,0.071109,1,Same/Next Day,8,1,2176.965626,114.577138,19,Very High,Winter
2,ORD000003,2022-01-01 02:00:00,2022-01-02 02:00:00,CUST0861,Customer_1744,Corporate,East,PROD0121,Electronics,Paper,...,1.127551,1,Same/Next Day,3,1,2015.052866,87.610994,23,Low,Winter
3,ORD000004,2022-01-01 03:00:00,2022-01-02 03:00:00,CUST1295,Customer_833,Corporate,East,PROD0103,Clothing,Appliances,...,0.051831,1,Same/Next Day,8,1,2854.59458,95.153153,30,Very High,Winter
4,ORD000005,2022-01-01 04:00:00,2022-01-02 04:00:00,CUST1131,Customer_140,Consumer,South,PROD0149,Office Supplies,Accessories,...,-0.407177,1,Same/Next Day,5,1,2808.96634,112.358654,25,Very High,Winter


In [27]:
print("\n" + "="*60)
print("STEP 4: CREATING SQLITE DATABASE")
print("="*60)

conn = sqlite3.connect(':memory:')
print("SQLite in-memory database created")




STEP 4: CREATING SQLITE DATABASE
SQLite in-memory database created


In [28]:
print("\n" + "="*60)
print("STEP 5: LOADING DATA INTO SQL TABLE")
print("="*60)

df.to_sql('sales_data', conn, index=False, if_exists='replace')

print("Table 'sales_data' created in SQL")




STEP 5: LOADING DATA INTO SQL TABLE
Table 'sales_data' created in SQL


In [29]:
print("\n" + "="*60)
print("STEP 6: VERIFYING SQL TABLE")
print("="*60)

query = "SELECT COUNT(*) AS total_rows FROM sales_data"
result = pd.read_sql(query, conn)
display(result)




STEP 6: VERIFYING SQL TABLE


Unnamed: 0,total_rows
0,10000


In [30]:
print("\n" + "="*60)
print("STEP 7: TOTAL REVENUE & ORDER COUNT")
print("="*60)

query = """
SELECT 
    COUNT(DISTINCT Order_ID) AS total_orders,
    SUM(Sales) AS total_revenue,
    AVG(Sales) AS avg_order_value
FROM sales_data
"""

result = pd.read_sql(query, conn)
display(result)




STEP 7: TOTAL REVENUE & ORDER COUNT


Unnamed: 0,total_orders,total_revenue,avg_order_value
0,10000,1078671.0,107.867098


In [31]:
print("\n" + "="*60)
print("STEP 8: REVENUE BY PRODUCT CATEGORY")
print("="*60)

query = """
SELECT 
    Product_Category,
    SUM(Sales) AS total_revenue,
    COUNT(Order_ID) AS total_orders
FROM sales_data
GROUP BY Product_Category
ORDER BY total_revenue DESC
"""

result = pd.read_sql(query, conn)
display(result)



STEP 8: REVENUE BY PRODUCT CATEGORY


Unnamed: 0,Product_Category,total_revenue,total_orders
0,Electronics,328422.713112,3029
1,Office Supplies,321280.864068,2949
2,Furniture,214958.032671,2011
3,Clothing,214009.369624,2011


In [32]:
print("\n" + "="*60)
print("STEP 9: REVENUE BY REGION")
print("="*60)

query = """
SELECT 
    Region,
    SUM(Sales) AS total_revenue,
    COUNT(DISTINCT Customer_ID) AS unique_customers
FROM sales_data
GROUP BY Region
ORDER BY total_revenue DESC
"""

result = pd.read_sql(query, conn)
display(result)



STEP 9: REVENUE BY REGION


Unnamed: 0,Region,total_revenue,unique_customers
0,East,318581.371468,1553
1,Central,275688.179783,1435
2,West,272330.955705,1436
3,South,212070.472519,1257


In [33]:
print("\n" + "="*60)
print("STEP 10: MONTHLY SALES TREND")
print("="*60)

query = """
SELECT 
    strftime('%Y-%m', Order_Date) AS month,
    SUM(Sales) AS total_revenue
FROM sales_data
GROUP BY month
ORDER BY month
"""

result = pd.read_sql(query, conn)
display(result)



STEP 10: MONTHLY SALES TREND


Unnamed: 0,month,total_revenue
0,2022-01,161079.387654
1,2022-02,126799.329239
2,2022-03,81062.83963
3,2022-04,75840.668735
4,2022-05,80056.231269
5,2022-06,76430.127987
6,2022-07,80699.415429
7,2022-08,82192.58599
8,2022-09,76398.577746
9,2022-10,77173.028698


In [34]:
print("\n" + "="*60)
print("STEP 11: TOP 10 CUSTOMERS BY REVENUE")
print("="*60)

query = """
SELECT 
    Customer_ID,
    SUM(Sales) AS total_revenue
FROM sales_data
GROUP BY Customer_ID
ORDER BY total_revenue DESC
LIMIT 10
"""

result = pd.read_sql(query, conn)
display(result)



STEP 11: TOP 10 CUSTOMERS BY REVENUE


Unnamed: 0,Customer_ID,total_revenue
0,CUST1097,1706.776827
1,CUST1800,1636.436219
2,CUST1939,1618.691144
3,CUST1463,1581.229018
4,CUST1295,1572.574497
5,CUST0685,1552.58554
6,CUST0947,1509.431327
7,CUST1717,1491.50545
8,CUST1261,1488.475286
9,CUST1846,1478.577891


## Conclusion

This notebook demonstrates:
- Practical SQL usage on real data
- Business-driven queries
- Clean, readable SQL logic
- Ability to integrate SQL with Python

Next Steps:
- Use these outputs for Power BI dashboards
- Add SQL bullets to resume
