# Gourmet Meals Business -- SQL Project (Part 1.4 - Holiday Query)

Author: **Ethan Moody**

Date: **September 2022**

### Business Case

A few years ago, a new startup was born: **Agile Gourmet Meals (AGM)**.

The founder of AGM was a sous chef in a 5-star restaurant, named Joy, who had worked her way up from dishwasher to cook to sous chef. As part of her job, Joy frequently shopped at the high end grocery stores that featured organic and healthier selections for their food, at premium prices, as the 5-star restaurant wanted only the highest quality ingredients for their food. Also, part of her job was to be paid to eat meals on her time off at other restaurants from fast food to other 5-star to see what types of food and quality were being served.

Joy noticed that most young, single professionals tended to:
* Eat out frequently, with a mix of mostly casual dining, with some fast food, and occasional 5-star restaurants
* Order delivery at home or work
* Take out for home or work
* Buy frozen pre-made meals and microwave them at home

Joy also noticed that all of these options were typically not very healthy.

Joy had an idea to create a new business. She would cook healthy, gourmet quality meals and fix them in containers similar to the frozen pre-made meals purchased in grocery stores, except they would be fresh (not frozen) to improve the taste. She would seek to market them at a local high end grocery store.

Joy struck a deal with the high end grocery store to setup a small counter there near the entry way. At the counter she would educate the customers about her meals, take orders, and deliver them. As the business grew, Joy rented space near the store and setup her kitchen there, hiring someone else to staff the counter at the grocery store. Joy also hired a web developer to develop a website to take orders, handle payments, etc.

After a couple years, the grocery store's corporate office was so pleased with the arrangement, they asked AGM to expand to several other cities. They selected stores in the areas of town with more young professionals, and/or areas known for more affluence. They provided funding for a joint venture to allow AGM to setup kitchens near the store and enhance the web and phone app ordering system. In exchange for their investment, they received controlling interest in the business. Joy stayed on, where she would continue to act as an expert on the food side of the business.

AGM has just finished a very successful year on the enhanced computer systems, and now has a database of sales data for one year.

AGM charges a flat rate of $12 per meal with no minimum. Since the food has to be heated before eating, it is not subject to sales tax. Customers must order by 10am one day in order to pick up the meals the next day. The thinking is that AGM will waste much less food that way. Customers will have a maximum of one order per day.

AGM is in the process of creating a data science team and a data engineering team. You have just been hired as the first data engineer for the data engineering team. You met with the data science team and they explained to you the story above, and more importantly, that they now have a database of sales data for one year (2020).  

Together with the data science team, you worked out a list of high priority data engineering tasks that need to be done. The data science team has been working with the business side to come up with some business questions that will need some queries written against the sales database to help them answer:
* Sales Related Queries
* Customer Related Queries
* Meal Related Queries
* A Holiday Related Query

The data science team would like to see an example of a data visualization using Python from data in a Pandas dataframe containing data from an SQL query. They are familiar with other data visualization tools, but not with using Python, and they want to see a good example.

The data science team is building a model to help identify the company's best customers. They are starting with the very common RFM model. Since you will be the one looking at the database in the most detail, they would like for you to write up your best ideas on how the sales data can be used for this model.

# Included Modules and Packages

In [1]:
import math
import numpy as np
import pandas as pd
import psycopg2

# Additional Setup Code

In [2]:
# Function to run a select query and return rows in a pandas dataframe
# Note: pandas formats all numeric values from postgres as float

def my_select_query_pandas(query, rollback_before_flag, rollback_after_flag):
    "Function to run a select query and return rows in a pandas dataframe"
    
    if rollback_before_flag:
        connection.rollback()
    
    df = pd.read_sql_query(query, connection)
    
    if rollback_after_flag:
        connection.rollback()
    
    # Fix any float columns that really should be integers
    
    for column in df:
    
        if df[column].dtype == "float64":

            fraction_flag = False

            for value in df[column].values:
                
                if not np.isnan(value):
                    if value - math.floor(value) != 0:
                        fraction_flag = True

            if not fraction_flag:
                df[column] = df[column].astype('Int64')
    
    return(df)

In [3]:
# Set up connection to postgres
# Note: All connection inputs below have been removed for protection
connection = psycopg2.connect(
    user = "",
    password = "",
    host = "",
    port = "",
    database = ""
)

# 1.4 Find holiday days, and days within a one week of a holiday, where the actual sales differ by more than 15% from expected sales

In [4]:
# Query returns holiday days and days within one week of a holiday where actual sales differ by >15% from expected

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  tsub_salesbydate.holiday_name
, tsub_salesbydate.date_analyzed
, tsub_salesbydate.day_of_week
, tsub_salesbydate.actual_sales_dollars
, tsub_expectedsalesbydow.expected_sales_dollars
, round(tsub_salesbydate.actual_sales_dollars/tsub_expectedsalesbydow.expected_sales_dollars,2) as ratio_actual_expected

from
  (
  select
    t1_mydate.date as date_analyzed
  , t2_holidays.holiday_date as holiday_date
  , extract(dow from t1_mydate.date) as dow
  , to_char(t1_mydate.date, 'Day') as day_of_week
  , (t2_holidays.holiday_date - 7) as holiday_date_m7
  , (t2_holidays.holiday_date + 7) as holiday_date_p7
  , case
      when t1_mydate.date between '2019-12-25' and '2020-01-08' then 'New Year''s Day'
      when t1_mydate.date between '2020-01-13' and '2020-01-27' then 'MLK Day'
      when t1_mydate.date between '2020-02-10' and '2020-02-24' then 'President''s Day'
      when t1_mydate.date between '2020-04-05' and '2020-04-19' then 'Easter'
      when t1_mydate.date between '2020-05-03' and '2020-05-17' then 'Mother''s Day'
      when t1_mydate.date between '2020-05-18' and '2020-06-01' then 'Memorial Day'
      when t1_mydate.date between '2020-06-14' and '2020-06-26' then 'Father''s Day'
      when t1_mydate.date between '2020-06-27' and '2020-06-28' then 'Father''s Day or Independence Day'
      when t1_mydate.date between '2020-06-29' and '2020-07-11' then 'Independence Day'
      when t1_mydate.date between '2020-08-31' and '2020-09-14' then 'Labor Day'
      when t1_mydate.date between '2020-11-04' and '2020-11-18' then 'Veterans Day'
      when t1_mydate.date between '2020-11-19' and '2020-12-03' then 'Thanksgiving'
      when t1_mydate.date between '2020-12-18' and '2021-01-01' then 'Christmas'
      else 'None'
    end as holiday_name
  , case
      when sum(t2_sales.total_amount) is null then 0
      else sum(t2_sales.total_amount)
    end as actual_sales_dollars

  from generate_series('2020-01-01', '2020-12-31', '1 day'::interval) as t1_mydate

  left outer join holidays as t2_holidays
  on t1_mydate.date = t2_holidays.holiday_date

  left outer join sales as t2_sales
  on t1_mydate.date = t2_sales.sale_date

  group by
    date_analyzed
  , holiday_date
  , dow
  , day_of_week
  , holiday_date_m7
  , holiday_date_p7
  , holiday_name
  ) as tsub_salesbydate

left outer join
(
  select
    tsub_salesdate.dow as dow
  , tsub_salesdate.day_of_week as day_of_week
  , count(tsub_salesdate.dow) as dow_count
  , sum(tsub_salesdate.total_sales_dollars) as total_sales_dollars
  , round(sum(tsub_salesdate.total_sales_dollars)/count(tsub_salesdate.dow),0) as expected_sales_dollars

  from
    (
    select
      t2_sales.sale_date as sale_date
    , extract(dow from t2_sales.sale_date) as dow
    , to_char(t2_sales.sale_date, 'Day') as day_of_week
    , sum(t2_sales.total_amount) as total_sales_dollars
  
    from stores as t1_stores
  
    join sales as t2_sales
    on t1_stores.store_id = t2_sales.store_id
  
    group by
      sale_date
    , dow
    , day_of_week
    ) as tsub_salesdate

  group by
    dow
  , day_of_week

  order by
    dow
) as tsub_expectedsalesbydow
on tsub_salesbydate.day_of_week = tsub_expectedsalesbydow.day_of_week

where tsub_salesbydate.holiday_name not in ('None')
and round(tsub_salesbydate.actual_sales_dollars/tsub_expectedsalesbydow.expected_sales_dollars,2) < 0.85

order by
  tsub_salesbydate.date_analyzed

;

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,holiday_name,date_analyzed,day_of_week,actual_sales_dollars,expected_sales_dollars,ratio_actual_expected
0,New Year's Day,2020-01-01,Wednesday,133776,263256,0.51
1,MLK Day,2020-01-17,Friday,127092,252522,0.5
2,MLK Day,2020-01-18,Saturday,135204,373490,0.36
3,MLK Day,2020-01-19,Sunday,130368,357482,0.36
4,MLK Day,2020-01-20,Monday,130740,253225,0.52
5,President's Day,2020-02-14,Friday,133452,252522,0.53
6,President's Day,2020-02-15,Saturday,132096,373490,0.35
7,President's Day,2020-02-16,Sunday,132180,357482,0.37
8,President's Day,2020-02-17,Monday,135228,253225,0.53
9,Easter,2020-04-12,Sunday,136164,357482,0.38
