# Gourmet Meals Business -- SQL Project (Part 2.1 - Product Mapping)

Author: **Ethan Moody**

Date: **October 2022**

### Business Case

Assume you are a data engineer working closely with the data science team at Agile Gourmet Meals (AGM).

AGM executives are considering adding a delivery option, with the hopes of increasing sales, growing the customer base, and increasing profitability.   

Management decided to do a proof of concept (POC) in the form of a three month trial run using one delivery service at the Berkeley store. They have called upon the data science team to help with this effort. In turn, the data science team has asked for your help in the data engineering aspects of the POC.

Management chose Peak Deliveries primarily because it's a newer operation with a model that takes a percentage cut of the product pricing instead of charging customers a delivery fee. Peak's cut is 18%. So, for each $12 meal, that equates to approximately $2.16. Customers may tip the delivery driver if they wish. AGM is not given any visibility into customer tips. (Peak is protecting its data on good tippers.) Peak has an outstanding reputation for great, fast, and efficient deliveries, with excellent customer service. Peak will only deliver to zip codes within a 5 mile radius of the store.

Integration with any third party sales channel always comes with its challenges. For large companies, like McDonalds, the delivery companies are willing to integrate and modify their computer systems as needed to get the contract. For small companies, like AGM, one of your only options is to use Peak's API to send and receive data. However, that would require you to write a lot of code, which management does not want to spend money on until the POC has proven successful. As an alternative, Peak can provide you with a JSON file at the end of each day with detailed sales information for that day. Management has decided to go with the daily JSON option for now for the POC. 

For products, AGM will enter products into Peak's system. Peak will assign an ID in their system to the product. You will need to create a mapping table to map Peak's IDs to AGM's IDs. In AGM's case, all products cost $12 and are tax exempt. AGM will mark them as exempt from sales tax.

Regarding the customer list, AGM does not want to give out their full customer list to third parties.  Customers will have to sign up with Peak, either using the website, the app, or by telephone.  AGM executives anticipate and understand that the trade off to not giving them the customer list is that you will probably have to validate and/or cleanse the customer data. Peak will assign their customer ID to each customer.

In this POC, you will focus on only 1 store: the Berkeley store. Peak will create a pickup location for the store and assign their own location ID to it. Even though all data will have the same store for now, you still want to receive it and process it so you can help leadership plan for possible future expansion to other stores and/or pickup locations.

Assume today is October 4, 2020. The first day of sales was October 3, 2020. The JSON file came in very early this morning. As a data engineer, you need to get started with parsing, staging, validating, etc. the file as soon as possible.  

The executives are anxious to understand how good the data is, if you will be able to continue withholding the customer data from Peak, and to get some preliminary analytics. Even though it's just one day's worth of data, the executives want as much information as soon as they can get it (which is very typical).

The data science team has met with you, and together you came up with a plan to get the data loaded and validated, explore the customer data, and perform some preliminary analytics. The data science team has been requested to give the executives an assessment of the customer data and whether or not they should continue to withhold customer data from Peak. Since you are going to be the first one to have an extensive look at the data, the data science team wants and values your opinion on the customer data.

# Included Modules and Packages

In [1]:
import csv
import math
import numpy as np
import pandas as pd
import psycopg2

# Additional Setup Code

In [2]:
# Function to run a select query and return rows in a pandas dataframe
# Note: pandas formats all numeric values from postgres as float

def my_select_query_pandas(query, rollback_before_flag, rollback_after_flag):
    "Function to run a select query and return rows in a pandas dataframe"
    
    if rollback_before_flag:
        connection.rollback()
    
    df = pd.read_sql_query(query, connection)
    
    if rollback_after_flag:
        connection.rollback()
    
    # Fix any float columns that really should be integers
    
    for column in df:
    
        if df[column].dtype == "float64":

            fraction_flag = False

            for value in df[column].values:
                
                if not np.isnan(value):
                    if value - math.floor(value) != 0:
                        fraction_flag = True

            if not fraction_flag:
                df[column] = df[column].astype('Int64')
    
    return(df)

In [3]:
# Set up connection to postgres
# Note: All connection inputs below have been removed for protection
connection = psycopg2.connect(
    user = "",
    password = "",
    host = "",
    port = "",
    database = ""
)

In [4]:
cursor = connection.cursor()

In [5]:
# Function to read a csv file and print a set number of rows

def my_read_csv_file(file_name, limit):
    "Read the csv file and print only the first 'limit' rows"
    
    csv_file = open(file_name, "r")
    
    csv_data = csv.reader(csv_file)
    
    i = 0
    
    for row in csv_data:
        i += 1
        if i <= limit:
            print(row)
            
    print("\nPrinted ", min(limit, i), "lines of ", i, "total lines.")

# 2.1.1 Drop the product mapping table if it exists

In [6]:
# Query drops the product mapping table if it already exists

connection.rollback()

query = """

drop table if exists peak_product_mapping;

"""

cursor.execute(query)

connection.commit()

# 2.1.2 Create the product mapping table

In [7]:
# Query creates the structure of the product mapping table

connection.rollback()

query = """

create table peak_product_mapping (
  product_id numeric(3),
  peak_product_id numeric(12),
  primary key (product_id)
);

"""

cursor.execute(query)

connection.commit()

# 2.1.3 Create a CSV file of product mapping data and display it

In [8]:
# Creates dataframe for .csv file
mapping_data = {
    "product_id":      [1, 2, 3, 4, 5, 6, 7, 8],
    "peak_product_id": [42314677, 42314678, 42314679, 42314780, 42314781, 42314782, 42314783, 42314784]
}

df = pd.DataFrame(mapping_data)

# Converts dataframe to .csv file
df.to_csv("peak_product_mapping.csv", index = False)

# Reads/displays .csv file
my_read_csv_file("peak_product_mapping.csv", limit = 9)

['product_id', 'peak_product_id']
['1', '42314677']
['2', '42314678']
['3', '42314679']
['4', '42314780']
['5', '42314781']
['6', '42314782']
['7', '42314783']
['8', '42314784']

Printed  9 lines of  9 total lines.


# 2.1.4 Load product mapping data into database table

In [9]:
# Query loads .csv file with product mapping data into product mapping table

connection.rollback()

query = """

copy peak_product_mapping
from '/user/projects/project-2-ethanjmoody/peak_product_mapping.csv' delimiter ',' NULL '' csv header;

"""

cursor.execute(query)

connection.commit()

# 2.1.5 Verify the product mapping loaded correctly

In [10]:
# Query returns products by AGM's product id and Peak's product id based on product mapping table

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  t1_map.*
, t2_products.description as product_name
  
from peak_product_mapping as t1_map

join products as t2_products
on t1_map.product_id = t2_products.product_id

order by
  t1_map.product_id

;

"""

my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)

Unnamed: 0,product_id,peak_product_id,product_name
0,1,42314677,Pistachio Salmon
1,2,42314678,Teriyaki Chicken
2,3,42314679,Spinach Orzo
3,4,42314780,Eggplant Lasagna
4,5,42314781,Chicken Salad
5,6,42314782,Curry Chicken
6,7,42314783,Tilapia Piccata
7,8,42314784,Brocolli Stir Fry
