# Gourmet Meals Business -- SQL Project (Part 1.2 - Customer Queries)

Author: **Ethan Moody**

Date: **September 2022**

### Business Case

A few years ago, a new startup was born: **Agile Gourmet Meals (AGM)**.

The founder of AGM was a sous chef in a 5-star restaurant, named Joy, who had worked her way up from dishwasher to cook to sous chef. As part of her job, Joy frequently shopped at the high end grocery stores that featured organic and healthier selections for their food, at premium prices, as the 5-star restaurant wanted only the highest quality ingredients for their food. Also, part of her job was to be paid to eat meals on her time off at other restaurants from fast food to other 5-star to see what types of food and quality were being served.

Joy noticed that most young, single professionals tended to:
* Eat out frequently, with a mix of mostly casual dining, with some fast food, and occasional 5-star restaurants
* Order delivery at home or work
* Take out for home or work
* Buy frozen pre-made meals and microwave them at home

Joy also noticed that all of these options were typically not very healthy.

Joy had an idea to create a new business. She would cook healthy, gourmet quality meals and fix them in containers similar to the frozen pre-made meals purchased in grocery stores, except they would be fresh (not frozen) to improve the taste. She would seek to market them at a local high end grocery store.

Joy struck a deal with the high end grocery store to setup a small counter there near the entry way. At the counter she would educate the customers about her meals, take orders, and deliver them. As the business grew, Joy rented space near the store and setup her kitchen there, hiring someone else to staff the counter at the grocery store. Joy also hired a web developer to develop a website to take orders, handle payments, etc.

After a couple years, the grocery store's corporate office was so pleased with the arrangement, they asked AGM to expand to several other cities. They selected stores in the areas of town with more young professionals, and/or areas known for more affluence. They provided funding for a joint venture to allow AGM to setup kitchens near the store and enhance the web and phone app ordering system. In exchange for their investment, they received controlling interest in the business. Joy stayed on, where she would continue to act as an expert on the food side of the business.

AGM has just finished a very successful year on the enhanced computer systems, and now has a database of sales data for one year.

AGM charges a flat rate of $12 per meal with no minimum. Since the food has to be heated before eating, it is not subject to sales tax. Customers must order by 10am one day in order to pick up the meals the next day. The thinking is that AGM will waste much less food that way. Customers will have a maximum of one order per day.

AGM is in the process of creating a data science team and a data engineering team. You have just been hired as the first data engineer for the data engineering team. You met with the data science team and they explained to you the story above, and more importantly, that they now have a database of sales data for one year (2020).  

Together with the data science team, you worked out a list of high priority data engineering tasks that need to be done. The data science team has been working with the business side to come up with some business questions that will need some queries written against the sales database to help them answer:
* Sales Related Queries
* Customer Related Queries
* Meal Related Queries
* A Holiday Related Query

The data science team would like to see an example of a data visualization using Python from data in a Pandas dataframe containing data from an SQL query. They are familiar with other data visualization tools, but not with using Python, and they want to see a good example.

The data science team is building a model to help identify the company's best customers. They are starting with the very common RFM model. Since you will be the one looking at the database in the most detail, they would like for you to write up your best ideas on how the sales data can be used for this model.

# Included Modules and Packages

In [1]:
import math
import numpy as np
import pandas as pd
import psycopg2

# Additional Setup Code

In [2]:
# Function to run a select query and return rows in a pandas dataframe
# Note: pandas formats all numeric values from postgres as float

def my_select_query_pandas(query, rollback_before_flag, rollback_after_flag):
    "Function to run a select query and return rows in a pandas dataframe"
    
    if rollback_before_flag:
        connection.rollback()
    
    df = pd.read_sql_query(query, connection)
    
    if rollback_after_flag:
        connection.rollback()
    
    # Fix any float columns that really should be integers
    
    for column in df:
    
        if df[column].dtype == "float64":

            fraction_flag = False

            for value in df[column].values:
                
                if not np.isnan(value):
                    if value - math.floor(value) != 0:
                        fraction_flag = True

            if not fraction_flag:
                df[column] = df[column].astype('Int64')
    
    return(df)

In [3]:
# Set up connection to postgres
# Note: All connection inputs below have been removed for protection
connection = psycopg2.connect(
    user = "",
    password = "",
    host = "",
    port = "",
    database = ""
)

# 1.2.1 Total Number of Customers for all of AGM

In [4]:
# Query returns total number of customers

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  count(t1_customers.customer_id) as total_number_of_customers

from customers as t1_customers

;

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,total_number_of_customers
0,31082


# 1.2.2 Total Number of Customers by Store

In [5]:
# Query returns total number of customers by store

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  t1_stores.city as store_name 
, count(t2_customers.customer_id) as total_number_of_customers

from stores as t1_stores

join customers as t2_customers
on t1_stores.store_id = t2_customers.closest_store_id

group by
  store_name
  
order by
  store_name

;

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,store_name,total_number_of_customers
0,Berkeley,8138
1,Dallas,6359
2,Miami,5725
3,Nashville,3646
4,Seattle,7214


# 1.2.3 List of Customers who have signed up but not bought anything

In [6]:
# Query returns all customers who have signed up but not bought anything

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  t1_customers.last_name as last_name
, t1_customers.first_name as first_name

from customers as t1_customers

where
  t1_customers.customer_id not in
    (
    select distinct
      customer_id
    
    from sales as t2_sales
    )

order by
  last_name
, first_name

;

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,last_name,first_name
0,Agott,Tracy
1,Arnke,Daniella
2,Assandri,Hyacintha
3,Borman,Felice
4,Breit,Domini
5,Butterick,Jacenta
6,Camillo,Marysa
7,Dukelow,Lilas
8,Dukesbury,Corinna
9,Ellaway,Lorianna


# 1.2.4 What is the percentage of customers per population at the zip code level?

In [7]:
# Query returns the percentage of customers per population by zip code

rollback_before_flag = True
rollback_after_flag = True

query = """

select
  t1_customers.zip as zip
, round((count(t1_customers.customer_id)/t2_zipcodes.population)*100,3) as percentage_customers_per_population

from customers as t1_customers

join zip_codes as t2_zipcodes
on t1_customers.zip = t2_zipcodes.zip

group by
  t1_customers.zip
, t2_zipcodes.zip
  
order by
  (count(t1_customers.customer_id)/t2_zipcodes.population) desc

;

"""

df = my_select_query_pandas(query, rollback_before_flag, rollback_after_flag)
df

Unnamed: 0,zip,percentage_customers_per_population
0,98164,1.290
1,98050,1.087
2,33109,1.053
3,94613,1.045
4,37240,1.028
...,...,...
545,33033,0.002
546,75067,0.001
547,75035,0.001
548,94565,0.001
