# 🧠 Run SQL Challenges with DuckDB

This notebook loads the CSV datasets and runs each SQL challenge using DuckDB — all in one place.


In [9]:
# pip install duckdb

import duckdb
con = duckdb.connect()

In [10]:
# Load all CSVs as tables
# Load all CSVs as tables using read_csv_auto()
con.execute("CREATE TABLE customers AS SELECT * FROM read_csv_auto('../datasets/customers.csv');")
con.execute("CREATE TABLE orders AS SELECT * FROM read_csv_auto('../datasets/orders.csv');")
con.execute("CREATE TABLE products AS SELECT * FROM read_csv_auto('../datasets/products.csv');")

<duckdb.duckdb.DuckDBPyConnection at 0x7c423cf1ed30>

## 🧩 Challenge 01: Top Customers by Revenue

This query identifies the top 5 customers based on total revenue generated

In [11]:
con.sql("""
    SELECT
        customer_id,
        COUNT(order_id) AS total_orders,
        SUM(order_amount) AS total_revenue
    FROM orders
    GROUP BY customer_id
    ORDER BY total_revenue DESC
    LIMIT 5
""").df()

Unnamed: 0,customer_id,total_orders,total_revenue
0,1,3,3200.0
1,3,3,2150.0
2,5,1,800.0
3,2,2,500.0
4,4,1,200.0


## 📈 Challenge 02: Monthly Revenue and MoM Growth

This query calculates total revenue per month and computes the month-over-month (MoM) percentage change.

In [12]:
con.sql("""
WITH monthly_revenue AS (
    SELECT 
        DATE_TRUNC('month', order_date) AS month,
        SUM(order_amount) AS total_revenue
    FROM orders
    GROUP BY month
),
revenue_with_growth AS (
    SELECT 
        month,
        total_revenue,
        ROUND(
            (total_revenue - LAG(total_revenue) OVER (ORDER BY month)) 
            / NULLIF(LAG(total_revenue) OVER (ORDER BY month), 0) * 100,
        2) AS mom_growth
    FROM monthly_revenue
)
SELECT * FROM revenue_with_growth
""").df()

Unnamed: 0,month,total_revenue,mom_growth
0,2024-01-01,1200.0,
1,2024-02-01,1100.0,-8.33
2,2024-03-01,1350.0,22.73
3,2024-04-01,1000.0,-25.93
4,2024-05-01,200.0,-80.0
5,2024-06-01,1200.0,500.0
6,2024-07-01,800.0,-33.33
