# Purpose:

To address the activities in the coding exercise (https://8weeksqlchallenge.com/case-study-1/) as an example of SQL queries. 



## Introduction
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.

Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.

## Problem Statement
Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.

He plans on using these insights to help him decide whether he should expand the existing customer loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.

Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!

Danny has shared with you 3 key datasets for this case study:

- sales
- menu
- members


# Load Libraries

In [None]:
import os
import psycopg2
from psycopg2 import sql
import pandas as pd

# Connect to local PostgreSQL Database

In [None]:
# Retrieve environment variables
dbname = os.getenv("DB_NAME")
user = os.getenv("DB_USER")
password = os.getenv("DB_PASSWORD")
host = os.getenv("DB_HOST", "localhost")
port = os.getenv("DB_PORT", "5432")
schema = os.getenv("DB_SCHEMA", "dannys_diner")

# Establish a connection
connection = psycopg2.connect(
    dbname=dbname,
    user=user,
    password=password,
    host=host,
    port=port
)
cursor = connection.cursor()
cursor.execute("SELECT version();")
print(cursor.fetchone())

# Set the schema
cursor.execute(sql.SQL("SET search_path TO {}").format(sql.Identifier(schema)))


('PostgreSQL 16.3 (Homebrew) on aarch64-apple-darwin23.4.0, compiled by Apple clang version 15.0.0 (clang-1500.3.9.4), 64-bit',)


In [12]:
# Test connection with a simple query
cursor.execute(sql.SQL("SELECT * FROM members"))
cursor.fetchall()

[('A', datetime.date(2021, 1, 7)), ('B', datetime.date(2021, 1, 9))]

# Answer Questions from Client

What is the total amount each customer spent at the restaurant?

In [46]:
query = """
    SELECT sales.customer_id, SUM(menu.price) FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    GROUP BY sales.customer_id;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
tot_spent_df = pd.DataFrame(sql_table, columns=['customer_id', 'total_spent'])
tot_spent_df.sort_values(by='total_spent', ascending=False).reset_index(drop=True)

Unnamed: 0,customer_id,total_spent
0,A,76
1,B,74
2,C,36


---
How many days has each customer visited the restaurant?

In [45]:
query = """
    SELECT sales.customer_id, COUNT(DISTINCT sales.order_date) FROM sales
    GROUP BY sales.customer_id;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
days_visited_df = pd.DataFrame(sql_table, columns=['customer_id', 'Num_days_visited'])
days_visited_df.sort_values(by='Num_days_visited', ascending=False).reset_index(drop=True)

Unnamed: 0,customer_id,Num_days_visited
0,B,6
1,A,4
2,C,2


---
What was the first item from the menu purchased by each customer?

In [54]:
query = """
    SELECT customer_id, product_name, order_date
    FROM (
        SELECT sales.customer_id, sales.product_id, sales.order_date, menu.product_name,
            ROW_NUMBER() OVER (PARTITION BY sales.customer_id ORDER BY sales.order_date) AS rownum
        FROM sales
        JOIN menu ON sales.product_id = menu.product_id
    ) subquery
    WHERE rownum = 1;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
first_order_df = pd.DataFrame(sql_table, columns=['customer_id', 'product_name', 'order_date'])
first_order_df

Unnamed: 0,customer_id,product_name,order_date
0,A,curry,2021-01-01
1,B,curry,2021-01-01
2,C,ramen,2021-01-01


---
What is the most purchased item on the menu and how many times was it purchased by all customers?

In [63]:
# Find the most frequently purchased product on the menu
query = """
    SELECT menu.product_name, COUNT(*) AS frequency FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    GROUP BY menu.product_name
    ORDER BY frequency DESC
    LIMIT 1;
    """

cursor.execute(sql.SQL(query))
most_frequent_product_tp = cursor.fetchone()
most_frq_prd, frequency = most_frequent_product_tp
print(f"The most frequently purchased product is '{most_frq_prd}' and it was purchased a total of {frequency} times within the time period.")



# Find how often each customer purchased that item.
query_customer_frequency = """
    SELECT sales.customer_id, COUNT(*) AS frequency FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    WHERE menu.product_name = %s
    GROUP BY sales.customer_id
    ORDER BY frequency DESC;
"""

cursor.execute(sql.SQL(query_customer_frequency), [most_frq_prd])
sql_table = cursor.fetchall()
frq_col_name = most_frq_prd + "_frq"
purch_frq_df = pd.DataFrame(sql_table, columns=['customer_id', frq_col_name])
purch_frq_df


The most frequently purchased product is 'ramen' and it was purchased a total of 8 times within the time period.


Unnamed: 0,customer_id,ramen_frq
0,C,3
1,A,3
2,B,2


---
Which item was the most popular for each customer?

In [65]:
query = """
    WITH CustomerProductFrequency AS (
        SELECT sales.customer_id, menu.product_name, COUNT(*) AS frequency FROM sales
        JOIN menu ON sales.product_id = menu.product_id
        GROUP BY sales.customer_id, menu.product_name
    ),
    RankedProducts AS (
        SELECT customer_id, product_name, frequency, RANK() OVER (PARTITION BY customer_id ORDER BY frequency DESC) AS rank
        FROM CustomerProductFrequency
    )
    SELECT customer_id, product_name, frequency FROM RankedProducts
    WHERE rank = 1;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
most_frq_item_df = pd.DataFrame(sql_table, columns=['customer_id', 'product_name', 'order_freq'])
most_frq_item_df


Unnamed: 0,customer_id,product_name,order_freq
0,A,ramen,3
1,B,sushi,2
2,B,curry,2
3,B,ramen,2
4,C,ramen,3


---
Which item was purchased first by the customer after they became a member?

In [75]:
query = """
    WITH FilteredPurchases AS (
        SELECT sales.customer_id, menu.product_name, sales.order_date, members.join_date FROM sales
        JOIN menu ON sales.product_id = menu.product_id
        JOIN members ON sales.customer_id = members.customer_id
        WHERE sales.order_date > members.join_date
    ),
    RankedPurchases AS (
        SELECT customer_id, product_name, order_date, RANK() OVER (PARTITION BY customer_id ORDER BY order_date ASC) AS rank
        FROM FilteredPurchases
    )
    SELECT customer_id, product_name, order_date FROM RankedPurchases
    WHERE rank = 1;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
frst_mem_purc_df = pd.DataFrame(sql_table, columns=['customer_id', 'product_name', 'purchase_date'])
frst_mem_purc_df


Unnamed: 0,customer_id,product_name,purchase_date
0,A,ramen,2021-01-10
1,B,sushi,2021-01-11


---
What is the total items and amount spent for each member before they became a member?

In [86]:
query = """
    SELECT sales.customer_id, COUNT(menu.product_name), SUM(menu.price) FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    JOIN members ON sales.customer_id = members.customer_id
    WHERE sales.order_date < members.join_date
    GROUP BY sales.customer_id;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
premem_purc_tot_df = pd.DataFrame(sql_table, columns=['customer_id', 'tot_premem_items', 'tot_premem_spent'])
premem_purc_tot_df 

Unnamed: 0,customer_id,tot_premem_items,tot_premem_spent
0,B,3,40
1,A,2,25


---
If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
- Assume that this is only after membership was started

In [89]:
# Assume that this is only after membership was started
query = """
    SELECT sales.customer_id, menu.product_name, menu.price FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    JOIN members ON sales.customer_id = members.customer_id
    WHERE sales.order_date > members.join_date;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
premem_purc_tot_df = pd.DataFrame(sql_table) #, columns=['customer_id', 'tot_premem_items', 'tot_premem_spent'])
premem_purc_tot_df 

Unnamed: 0,0,1,2
0,B,sushi,10
1,A,ramen,12
2,A,ramen,12
3,A,ramen,12
4,B,ramen,12
5,B,ramen,12


In [90]:
query = """
    SELECT 
        sales.customer_id, 
        SUM(
            CASE 
                WHEN menu.product_name = 'sushi' THEN 20 * menu.price
                ELSE 10 * menu.price 
            END
        ) AS points
    FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    JOIN members ON sales.customer_id = members.customer_id
    WHERE sales.order_date > members.join_date
    GROUP BY sales.customer_id;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
points_df = pd.DataFrame(sql_table, columns=['customer_id', 'points'])
points_df

Unnamed: 0,customer_id,points
0,B,440
1,A,360


---
In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

In [98]:
query = """
    SELECT 
        sales.customer_id, 
        SUM(
            CASE 
                WHEN menu.product_name = 'sushi' THEN 20 * menu.price
                WHEN sales.order_date < members.join_date + INTERVAL '7 days' THEN 20 * menu.price
                ELSE 10 * menu.price 
            END
        ) AS points
    FROM sales
    JOIN menu ON sales.product_id = menu.product_id
    JOIN members ON sales.customer_id = members.customer_id
    WHERE sales.order_date > members.join_date
    GROUP BY sales.customer_id;
    """

cursor.execute(sql.SQL(query))
sql_table = cursor.fetchall()
jan_points_df = pd.DataFrame(sql_table, columns=['customer_id', 'points'])
jan_points_df.sort_values(by='points', ascending=False).reset_index(drop=True)

Unnamed: 0,customer_id,points
0,A,720
1,B,440
