# Gourmet Meals Business -- SQL Project (Part 2.6 - Preliminary Analytics)

Author: **Ethan Moody**

Date: **October 2022**

### Business Case

Assume you are a data engineer working closely with the data science team at Agile Gourmet Meals (AGM).

AGM executives are considering adding a delivery option, with the hopes of increasing sales, growing the customer base, and increasing profitability.   

Management decided to do a proof of concept (POC) in the form of a three month trial run using one delivery service at the Berkeley store. They have called upon the data science team to help with this effort. In turn, the data science team has asked for your help in the data engineering aspects of the POC.

Management chose Peak Deliveries primarily because it's a newer operation with a model that takes a percentage cut of the product pricing instead of charging customers a delivery fee. Peak's cut is 18%. So, for each $12 meal, that equates to approximately $2.16. Customers may tip the delivery driver if they wish. AGM is not given any visibility into customer tips. (Peak is protecting its data on good tippers.) Peak has an outstanding reputation for great, fast, and efficient deliveries, with excellent customer service. Peak will only deliver to zip codes within a 5 mile radius of the store.

Integration with any third party sales channel always comes with its challenges. For large companies, like McDonalds, the delivery companies are willing to integrate and modify their computer systems as needed to get the contract. For small companies, like AGM, one of your only options is to use Peak's API to send and receive data. However, that would require you to write a lot of code, which management does not want to spend money on until the POC has proven successful. As an alternative, Peak can provide you with a JSON file at the end of each day with detailed sales information for that day. Management has decided to go with the daily JSON option for now for the POC. 

For products, AGM will enter products into Peak's system. Peak will assign an ID in their system to the product. You will need to create a mapping table to map Peak's IDs to AGM's IDs. In AGM's case, all products cost $12 and are tax exempt. AGM will mark them as exempt from sales tax.

Regarding the customer list, AGM does not want to give out their full customer list to third parties.  Customers will have to sign up with Peak, either using the website, the app, or by telephone.  AGM executives anticipate and understand that the trade off to not giving them the customer list is that you will probably have to validate and/or cleanse the customer data. Peak will assign their customer ID to each customer.

In this POC, you will focus on only 1 store: the Berkeley store. Peak will create a pickup location for the store and assign their own location ID to it. Even though all data will have the same store for now, you still want to receive it and process it so you can help leadership plan for possible future expansion to other stores and/or pickup locations.

Assume today is October 4, 2020. The first day of sales was October 3, 2020. The JSON file came in very early this morning. As a data engineer, you need to get started with parsing, staging, validating, etc. the file as soon as possible.  

The executives are anxious to understand how good the data is, if you will be able to continue withholding the customer data from Peak, and to get some preliminary analytics. Even though it's just one day's worth of data, the executives want as much information as soon as they can get it (which is very typical).

The data science team has met with you, and together you came up with a plan to get the data loaded and validated, explore the customer data, and perform some preliminary analytics. The data science team has been requested to give the executives an assessment of the customer data and whether or not they should continue to withhold customer data from Peak. Since you are going to be the first one to have an extensive look at the data, the data science team wants and values your opinion on the customer data.

# Included Modules and Packages

In [1]:
import csv
import json
import math
import numpy as np
import pandas as pd
import psycopg2

# Additional Setup Code

In [2]:
# Function to run a select query and return rows in a pandas dataframe
# Note: pandas formats all numeric values from postgres as float

def my_select_query_pandas(query, rollback_before_flag, rollback_after_flag):
    "Function to run a select query and return rows in a pandas dataframe"
    
    if rollback_before_flag:
        connection.rollback()
    
    df = pd.read_sql_query(query, connection)
    
    if rollback_after_flag:
        connection.rollback()
    
    # Fix any float columns that really should be integers
    
    for column in df:
    
        if df[column].dtype == "float64":

            fraction_flag = False

            for value in df[column].values:
                
                if not np.isnan(value):
                    if value - math.floor(value) != 0:
                        fraction_flag = True

            if not fraction_flag:
                df[column] = df[column].astype('Int64')
    
    return(df)

In [3]:
# Set up connection to postgres
# Note: All connection inputs below have been removed for protection
connection = psycopg2.connect(
    user = "",
    password = "",
    host = "",
    port = "",
    database = ""
)

In [4]:
cursor = connection.cursor()

# 2.6.1 Total dollar amount of sales

In [5]:
# Query shows the total dollar amount of sales from the staging table stage_1_peak_sales

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  sum(total_amount::numeric) as total_sales

from stage_1_peak_sales

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,total_sales
0,6480


# 2.6.2 Total number of sales

In [6]:
# Query shows the total number of sales from the staging table stage_1_peak_sales

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  count(sale_id::numeric) as total_number_of_sales

from stage_1_peak_sales

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,total_number_of_sales
0,97


# 2.6.3 Total dollar amount of sales, total cut paid to Peak, net to AGM

In [7]:
# Query shows the total dollar amount of sales, total cut paid to Peak, and the net dollar amount to AGM from the staging table stage_1_peak_sales

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  sum(total_amount::numeric) as total_sales
, sum(total_amount::numeric) * 0.18 as cut_paid_to_peak
, (sum(total_amount::numeric) - sum(total_amount::numeric) * 0.18) as net_to_agm

from stage_1_peak_sales

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,total_sales,cut_paid_to_peak,net_to_agm
0,6480,1166.4,5313.6


# 2.6.4 Total number of meals sold

In [8]:
# Query shows the total number of meals sold from the staging table stage_1_peak_line_items

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  sum(quantity::numeric) as total_number_of_meals_sold

from stage_1_peak_line_items

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,total_number_of_meals_sold
0,540


# 2.6.5 Total number of meals sold by meal

In [9]:
# Query shows the total number of meals sold by meal (ordered highest to lowest) from the staging table stage_1_peak_line_items, products table, and product mapping table

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  t3_products.description as meal
, sum(t1_s_plineitems.quantity::numeric) as total_number_of_meals_sold

from stage_1_peak_line_items as t1_s_plineitems

join peak_product_mapping as t2_pproductmapping
on t1_s_plineitems.product_id::numeric = t2_pproductmapping.peak_product_id

join products as t3_products
on t2_pproductmapping.product_id = t3_products.product_id

group by
  t3_products.description
  
order by
  sum(t1_s_plineitems.quantity::numeric) desc

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,meal,total_number_of_meals_sold
0,Pistachio Salmon,113
1,Eggplant Lasagna,107
2,Curry Chicken,101
3,Teriyaki Chicken,80
4,Brocolli Stir Fry,60
5,Tilapia Piccata,44
6,Spinach Orzo,27
7,Chicken Salad,8


# 2.6.6 Average number of meals per sale

In [10]:
# Query shows the average number of meals per sale from the staging table stage_1_peak_line_items

rollback_before_flag = True
rollback_after_flag = True

query = """

with

  t1v_total_meal_count as
  (
  select
    sum(quantity::numeric) as total_number_of_meals_sold
    
  from stage_1_peak_line_items
  ),

  t2v_total_sale_count as
  (
  select
    count(sale_id::numeric) as total_number_of_sales

  from stage_1_peak_sales
  )

select
  round(t1v_total_meal_count.total_number_of_meals_sold / t2v_total_sale_count.total_number_of_sales, 1) as average_meals_per_sale

from
  t1v_total_meal_count
, t2v_total_sale_count

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,average_meals_per_sale
0,5.6
