In [3]:
import duckdb
import pandas as pd

In [4]:
# Init DB
con = duckdb.connect()

# Init Data
con.execute("""
CREATE TABLE sales (
  customer_id VARCHAR,
  order_date DATE,
  product_id INTEGER
);

INSERT INTO sales VALUES
  ('A', '2021-01-01', 1),
  ('A', '2021-01-01', 2),
  ('A', '2021-01-07', 2),
  ('A', '2021-01-10', 3),
  ('A', '2021-01-11', 3),
  ('A', '2021-01-11', 3),
  ('B', '2021-01-01', 2),
  ('B', '2021-01-02', 2),
  ('B', '2021-01-04', 1),
  ('B', '2021-01-11', 1),
  ('B', '2021-01-16', 3),
  ('B', '2021-02-01', 3),
  ('C', '2021-01-01', 3),
  ('C', '2021-01-01', 3),
  ('C', '2021-01-07', 3);

CREATE TABLE menu (
  product_id INTEGER,
  product_name VARCHAR,
  price INTEGER
);

INSERT INTO menu VALUES
  (1, 'sushi', 10),
  (2, 'curry', 15),
  (3, 'ramen', 12);

CREATE TABLE members (
  customer_id VARCHAR,
  join_date DATE
);

INSERT INTO members VALUES
  ('A', '2021-01-07'),
  ('B', '2021-01-09');
""")


<duckdb.duckdb.DuckDBPyConnection at 0x108530fb0>

# Question 1 - What is the total amount each customer spent at the restaurant?

## 🧠 Thought Process

### 🎯 Goal
Calculate the total amount each customer spent at the restaurant.

---

### 💼 Business Context
Understanding how much each customer spends is important for identifying high-value customers. These insights can further be used in:

- **Loyalty programs**
- **Targeted promotions**
- **Upselling strategies**

---

### 🔍 Problem Breakdown
- **Data Source**: `sales` tells us what each customer ordered; `menu` has the price.
- **Join Needed**: Yes — match `sales.product_id` with `menu.product_id`.
- **Grouping**: Group by `customer_id` to calculate total per customer.
- **Aggregation**: Sum all the product prices each customer ordered.
- **Assumption**: Each row in `sales` represents **one unit** of the product, i.e., no quantity column.

---

### 🛠 Approach & SQL Explanation
To solve this, I joined the `sales` table with the `menu` table using `product_id`, then grouped the results by customer and summed the prices.

---

### ✅ Result Validation
Manually verified for customer C:

- Three ramen ($12) → `3 × 12 = $36` ✅
- Total number of rows also matched the number of distinct customer IDs


In [7]:
query = """
SELECT 
    s.customer_id, SUM(m.price) AS total_spent
FROM
    sales AS s
        INNER JOIN
    menu AS m ON s.product_id = m.product_id
GROUP BY s.customer_id;
"""

result = con.execute(query).fetchdf()
result

Unnamed: 0,customer_id,total_spent
0,A,76.0
1,B,74.0
2,C,36.0


# Question 2 - How many days has each customer visited the restaurant?

## 🧠 Thought Process

### 🎯 Goal
Calculate the total number of **distinct days** each customer visited the restaurant.

---

### 💼 Business Context
Visit frequency is a key indicator of customer engagement and retention. Knowing how often each customer comes can help answer questions such as:

- Who are the frequent visitors that return regularly?
- Who might be loyal but not high-spending, and therefore worth nurturing?
- Is customer A visiting once a month, or every other day?

The insights gained from this analysis can inform:

- Rewarding frequent visitors
- Timing promotions (e.g., targeting customers after X days of inactivity)
- Segmenting customers into high-frequency vs. low-frequency groups for tailored strategies

---

### 🔍 Problem Breakdown
- **Data Source**: The `sales` table contains both `customer_id` and `order_date`, which are sufficient to answer this question.
- **Join Needed**: No
- **Grouping**: Group by `customer_id`
- **Aggregation**: Count the number of **unique** `order_date` values per customer

---

### 🧭 Assumptions
- Each row in `sales` represents a separate order (i.e., a unique transaction)
- A customer may place multiple orders on the same day, but that still counts as **one visit**
- All `order_date` values are valid and correctly recorded

---

### 🛠 Approach & SQL Explanation
To solve this, I used DISTINCT to isolate the unique visit dates, then applied COUNT to determine the number of visits per customer, grouping the results by customer_id.

---

### ✅ Result Validation
Manually verified for customer C:

- Customer C has 6 rows in total
- Two rows share the same `order_date`, so the number of **distinct visit days** is 4, and the SQL result matched this expected value


In [8]:
query = """
SELECT 
    customer_id, COUNT(DISTINCT CAST(order_date AS DATE)) AS visit_day_count
FROM
    sales
GROUP BY customer_id;
"""

result = con.execute(query).fetchdf()
result

Unnamed: 0,customer_id,visit_day_count
0,B,6
1,A,4
2,C,2
