In [1]:
import duckdb
import pandas as pd

In [2]:
# Init DB
con = duckdb.connect()

# Init Data
con.execute("""
CREATE TABLE sales (
  customer_id VARCHAR,
  order_date DATE,
  product_id INTEGER
);

INSERT INTO sales VALUES
  ('A', '2021-01-01', 1),
  ('A', '2021-01-01', 2),
  ('A', '2021-01-07', 2),
  ('A', '2021-01-10', 3),
  ('A', '2021-01-11', 3),
  ('A', '2021-01-11', 3),
  ('B', '2021-01-01', 2),
  ('B', '2021-01-02', 2),
  ('B', '2021-01-04', 1),
  ('B', '2021-01-11', 1),
  ('B', '2021-01-16', 3),
  ('B', '2021-02-01', 3),
  ('C', '2021-01-01', 3),
  ('C', '2021-01-01', 3),
  ('C', '2021-01-07', 3);

CREATE TABLE menu (
  product_id INTEGER,
  product_name VARCHAR,
  price INTEGER
);

INSERT INTO menu VALUES
  (1, 'sushi', 10),
  (2, 'curry', 15),
  (3, 'ramen', 12);

CREATE TABLE members (
  customer_id VARCHAR,
  join_date DATE
);

INSERT INTO members VALUES
  ('A', '2021-01-07'),
  ('B', '2021-01-09');
""")


<duckdb.duckdb.DuckDBPyConnection at 0x10c597e70>

# Question 1 - What is the total amount each customer spent at the restaurant?

## üß† Thought Process

### üéØ Goal
Calculate the total amount each customer spent at the restaurant.

---

### üíº Business Context
Understanding how much each customer spends is important for identifying high-value customers. These insights can further be used in:

- **Loyalty programs**
- **Targeted promotions**
- **Upselling strategies**

---

### üîç Problem Breakdown
- **Data Source**: `sales` tells us what each customer ordered; `menu` has the price.
- **Join Needed**: Yes ‚Äî match `sales.product_id` with `menu.product_id`.
- **Grouping**: Group by `customer_id` to calculate total per customer.
- **Aggregation**: Sum all the product prices each customer ordered.
- **Assumption**: Each row in `sales` represents **one unit** of the product, i.e., no quantity column.

---

### üõ† Approach & SQL Explanation
To solve this, I joined the `sales` table with the `menu` table using `product_id`, then grouped the results by customer and summed the prices.

---

### ‚úÖ Result Validation
Manually verified for customer C:

- Three ramen ($12) ‚Üí `3 √ó 12 = $36` ‚úÖ
- Total number of rows also matched the number of distinct customer IDs


In [3]:
query = """
SELECT 
    s.customer_id, SUM(m.price) AS total_spent
FROM
    sales AS s
        INNER JOIN
    menu AS m ON s.product_id = m.product_id
GROUP BY s.customer_id;
"""

result = con.execute(query).fetchdf()
result

Unnamed: 0,customer_id,total_spent
0,A,76.0
1,B,74.0
2,C,36.0


# Question 2 - How many days has each customer visited the restaurant?

## üß† Thought Process

### üéØ Goal
Calculate the total number of **distinct days** each customer visited the restaurant.

---

### üíº Business Context
Visit frequency is a key indicator of customer engagement and retention. Knowing how often each customer comes can help answer questions such as:

- Who are the frequent visitors that return regularly?
- Who might be loyal but not high-spending, and therefore worth nurturing?
- Is customer A visiting once a month, or every other day?

The insights gained from this analysis can inform:

- Rewarding frequent visitors
- Timing promotions (e.g., targeting customers after X days of inactivity)
- Segmenting customers into high-frequency vs. low-frequency groups for tailored strategies

---

### üîç Problem Breakdown
- **Data Source**: The `sales` table contains both `customer_id` and `order_date`, which are sufficient to answer this question.
- **Join Needed**: No
- **Grouping**: Group by `customer_id`
- **Aggregation**: Count the number of **unique** `order_date` values per customer

---

### üß≠ Assumptions
- Each row in `sales` represents a separate order (i.e., a unique transaction)
- A customer may place multiple orders on the same day, but that still counts as **one visit**
- All `order_date` values are valid and correctly recorded

---

### üõ† Approach & SQL Explanation
To solve this, I used DISTINCT to isolate the unique visit dates, then applied COUNT to determine the number of visits per customer, grouping the results by customer_id.

---

### ‚úÖ Result Validation
Manually verified for customer C:

- Customer C has 6 rows in total
- Two rows share the same `order_date`, so the number of **distinct visit days** is 4, and the SQL result matched this expected value


In [4]:
query = """
SELECT 
    customer_id, COUNT(DISTINCT CAST(order_date AS DATE)) AS visit_day_count
FROM
    sales
GROUP BY customer_id;
"""

result = con.execute(query).fetchdf()
result

Unnamed: 0,customer_id,visit_day_count
0,B,6
1,A,4
2,C,2


# Question 3 - What was the first item from the menu purchased by each customer?

## üß† Thought Process

### üéØ Goal
Find the first item (or items) each customer purchased from the menu.

---

### üíº Business Context
Understanding what customers choose during their first visit can help identify:
- The most attractive or appealing items on the menu
- Entry-level dishes that encourage repeat visits
- Product bundling opportunities for first-time visitors

This insight can be applied in new customer onboarding strategies or first-visit discount offers.

---

### üîç Problem Breakdown
- **Data Source**: The `sales` table provides `customer_id`, `order_date`, and `product_id`. The `menu` table provides `product_name`.
- **Join Needed**: Yes ‚Äî to retrieve the product name, join `sales.product_id` with `menu.product_id`.
- **Grouping**: Not required for the final result, but `customer_id` is used in the subquery to compute the earliest order date.
- **Aggregation**: Use `MIN(order_date)` in a subquery to identify the first purchase date per customer.

---

### üß≠ Assumptions
- A customer may purchase multiple products on their first visit.
- All such products from that day should be returned.
- There are no duplicate `order_date` values with conflicting data.

---

### üõ† Approach & SQL Explanation
To solve this, I first used `MIN(order_date)` to identify the earliest purchase date for each customer. This is done in a subquery (aliased as `first_day`), which is then joined back to the `sales` table using both `customer_id` and `order_date`. Finally, the `sales` table is joined with the `menu` table to retrieve the product names.

---

### ‚úÖ Result Validation
Manually verified for customer A:
- On their first visit (`2021-01-01`), customer A purchased items with `product_id` = 1 and 2.
- These map to 'sushi' and 'curry' respectively ‚Äî the result matched this expectation.


In [5]:
query = """
SELECT
    s.customer_id,
    s.order_date AS first_purchase_date,
    m.product_name
FROM(
    SELECT 
        customer_id,
        MIN(order_date) AS first_purchase_date
    FROM
        sales
    GROUP BY customer_id
) AS first_day
INNER JOIN sales s 
    ON first_day.customer_id = s.customer_id
    AND first_day.first_purchase_date = s.order_date
INNER JOIN menu m ON s.product_id = m.product_id
"""

result = con.execute(query).fetchdf()
result

Unnamed: 0,customer_id,first_purchase_date,product_name
0,A,2021-01-01,sushi
1,B,2021-01-01,curry
2,C,2021-01-01,ramen
3,A,2021-01-01,curry
4,C,2021-01-01,ramen


# Question 4 - What is the most purchased item on the menu and how many times was it purchased by all customers?

## üß† Thought Process

### üéØ Goal
Determine the most popular item on the menu by counting how many times each item was purchased across all customers.

---

### üíº Business Context
Understanding item popularity can help restaurant owners:

- Identify customer favorites
- Optimize inventory and kitchen preparation
- Adjust pricing based on demand
- Feature top items in promotions or bundles

This insight is essential for **menu engineering and operational planning**.

---

### üîç Problem Breakdown
- **Data Source**: The `sales` table records all transactions and contains `product_id`. The `menu` table provides the corresponding `product_name`.
- **Join Needed**: Yes ‚Äî to map `product_id` to `product_name`, join `sales` with `menu`.
- **Grouping**: Group by `product_id` or `product_name` to aggregate purchases per item.
- **Aggregation**: Use `COUNT(product_id)` to calculate total purchases for each item.

---

### üß≠ Assumptions
- Each row in `sales` represents a single unit of purchase (no quantity column exists).
- All product IDs in `sales` exist in the `menu` table.

---

### üõ† Approach & SQL Explanation
To solve this, I grouped the `sales` data by `product_id` and counted how many times each product was purchased using `COUNT()`. I then joined the `menu` table to retrieve the product name.

---

### ‚úÖ Result Validation
Manually checked:
- `ramen` appeared 8 times in the `sales` table, which matched the result in the query output.
- The item with the highest purchase count was returned correctly.

---

In [6]:
query = """
SELECT
    m.product_name,
    COUNT(s.product_id) AS amount_of_purchase
FROM sales s
INNER JOIN menu m ON s.product_id = m.product_id
GROUP BY m.product_name;
"""

result = con.execute(query).fetchdf()
result

Unnamed: 0,product_name,amount_of_purchase
0,curry,4
1,sushi,3
2,ramen,8
