# Gourmet Meals Business -- SQL Project (Part 1.6 - Best Customers)

Author: **Ethan Moody**

Date: **September 2022**

### Business Case

A few years ago, a new startup was born: **Agile Gourmet Meals (AGM)**.

The founder of AGM was a sous chef in a 5-star restaurant, named Joy, who had worked her way up from dishwasher to cook to sous chef. As part of her job, Joy frequently shopped at the high end grocery stores that featured organic and healthier selections for their food, at premium prices, as the 5-star restaurant wanted only the highest quality ingredients for their food. Also, part of her job was to be paid to eat meals on her time off at other restaurants from fast food to other 5-star to see what types of food and quality were being served.

Joy noticed that most young, single professionals tended to:
* Eat out frequently, with a mix of mostly casual dining, with some fast food, and occasional 5-star restaurants
* Order delivery at home or work
* Take out for home or work
* Buy frozen pre-made meals and microwave them at home

Joy also noticed that all of these options were typically not very healthy.

Joy had an idea to create a new business. She would cook healthy, gourmet quality meals and fix them in containers similar to the frozen pre-made meals purchased in grocery stores, except they would be fresh (not frozen) to improve the taste. She would seek to market them at a local high end grocery store.

Joy struck a deal with the high end grocery store to setup a small counter there near the entry way. At the counter she would educate the customers about her meals, take orders, and deliver them. As the business grew, Joy rented space near the store and setup her kitchen there, hiring someone else to staff the counter at the grocery store. Joy also hired a web developer to develop a website to take orders, handle payments, etc.

After a couple years, the grocery store's corporate office was so pleased with the arrangement, they asked AGM to expand to several other cities. They selected stores in the areas of town with more young professionals, and/or areas known for more affluence. They provided funding for a joint venture to allow AGM to setup kitchens near the store and enhance the web and phone app ordering system. In exchange for their investment, they received controlling interest in the business. Joy stayed on, where she would continue to act as an expert on the food side of the business.

AGM has just finished a very successful year on the enhanced computer systems, and now has a database of sales data for one year.

AGM charges a flat rate of $12 per meal with no minimum. Since the food has to be heated before eating, it is not subject to sales tax. Customers must order by 10am one day in order to pick up the meals the next day. The thinking is that AGM will waste much less food that way. Customers will have a maximum of one order per day.

AGM is in the process of creating a data science team and a data engineering team. You have just been hired as the first data engineer for the data engineering team. You met with the data science team and they explained to you the story above, and more importantly, that they now have a database of sales data for one year (2020).  

Together with the data science team, you worked out a list of high priority data engineering tasks that need to be done. The data science team has been working with the business side to come up with some business questions that will need some queries written against the sales database to help them answer:
* Sales Related Queries
* Customer Related Queries
* Meal Related Queries
* A Holiday Related Query

The data science team would like to see an example of a data visualization using Python from data in a Pandas dataframe containing data from an SQL query. They are familiar with other data visualization tools, but not with using Python, and they want to see a good example.

The data science team is building a model to help identify the company's best customers. They are starting with the very common RFM model. Since you will be the one looking at the database in the most detail, they would like for you to write up your best ideas on how the sales data can be used for this model.

# 1.6 Ideas on how the sales data can be used to help identify best customers

The data science team would like to know your best ideas on how the sales data can be used to help identify the company's best customers.



They are going to start with the most common and most basic model known as RFM, which consists of the 3 dimensions.

* R - Recency - How recently did the customer purchase?

* F - Frequency - How often do they purchase?

* M - Monetary Value - How much do they spend?


The data science team also has to come up with a way to synthesize the 3 dimensions into a single customer value for each customer.



The data science team would like for you to present your ideas in the form of 4 paragraphs as follows:

* Recency - A paragraph explaining your ideas on how the data can be used to determine recency.  

* Frequency - A paragraph explaining your ideas on how the data can be used to determine frequency.

* Monetary Value - A paragraph explaining your ideas on how the data can be used to determine monetary value.

* Synthesis - A paragraph explaining your ideas on how to synthesize the 3 dimensions of recency, frequency, and monetary value into a customer value for each customer and how to determine who the best customers are.

# Recommendations for RFM Analysis and Customer Value Identification

To determine **recency**, the data science team could start by creating a query that joins customer data (from the *customers* table) to sales data (from the *sales* table) and returns all customers and all sales by date in a table. Since their focus is on customers who have made a purchase, they should make sure to filter out those customers without any purchases/sales in the sales data. They could then sort their resulting table by most recent to least recent sale date to get a sense for which customer(s) has/have most recently made a purchase (i.e., the customer or customers at the top of the sorted table would have the highest level of **recency**). As an additional consideration, the team might want to include store data in their query (from the *stores* table, joined to the *customers* table) so that they could group customers by store and then determine the customers with most recent purchases at each store. This additional grouping would give the company a better idea of **recency** at each store, which could be helpful if they wanted to identify the “best customers” across each store’s unique customer base as well as (or even instead of) the “best customers” overall and then target their marketing in the most relevant way to each geographic area.

To determine **frequency**, the data science team could build off the same query suggested for **recency** – or create a similar one that joins customer data (from the *customers* table) to sales data (from the *sales* table) – and ensure it returns all customers and all sales by date in a table. They could then aggregate the data into a table that shows number of purchases/sales by customer based on a count of sale dates by each unique customer ID, again making sure to filter out any customers who haven’t made a purchase/sale. The resulting table would show how many times each customer made a purchase/sale. The team could then sort the table by most to least purchases/sales to get a sense for which customer(s) purchased most often over the timeframe in scope in the data (i.e., the customer or customers at the top of the sorted table would have the highest level of **frequency**). Again, as an additional consideration, the team could also include store data in their query (just like in the preceding discussion of **recency**) so that they could group customers by store and then determine the customers who purchased most often at each store – a step which would help the company identify the “best customers” across each store’s unique customer base as well as (or even instead of) the “best customers” overall and then target their marketing in the most relevant way to each geographic area.

To determine **monetary value**, the data science team could once again create a query that joins customer data (from the *customers* table) to sales data (from the *sales* table) and returns all customers and all sales in a table. They could then determine the **monetary value** of sales across all individual customers by summing up the total amount of sales by each unique customer ID, and they could sort the resulting table from highest to lowest total sales to get a sense for which customer(s) spent the most (i.e., the customer or customers at the top of the sorted table would have the highest level of **monetary value**). Since the focus for this dimension appears to be on how much each customer spends rather than how many meals each customer purchases, this resulting table should provide a clean and simple ranking of customers by their **monetary value** to the company. Like before, as an additional consideration, the team could also include store data in their query so that they could group customers by store and then determine the customers who generated the most dollars in sales – a step which would help the company identify the “best customers” across each store’s unique customer base as well as (or even instead of) the “best customers” overall and then target their marketing in the most relevant way to each geographic area.

To synthesize all three RFM dimensions into a single **customer value** for each customer, the data science team could start by dividing up the customers across their three individualized **recency**, **frequency**, and **monetary value** tables into deciles (i.e., 10 groups of roughly 10%/equal portions of total customers) and assigning each decile a distinct value from 10 (best) to 1 (worst). For example, they could assign the first 10% of customers in each table – specifically, those that are ranked/appear at the top – a decile value of 10 and the last 10% of customers in each table – specifically, those that are ranked/appear at the bottom – a decile value of 1. Next, to keep things simple and treat all three dimensions as equally important to the company, the team could multiply the decile values assigned to each unique customer ID across each of the three tables to get a “composite RFM score”; this score would take into account each customer’s decile position across each of the three RFM dimensions and serve as a proxy/representation of overall **customer value** (e.g., a customer who’s in decile 8 for recency, decile 10 for frequency, and decile 9 for monetary value would have a “composite RFM score” of 720). Finally, they could then rank/sort all customers from highest to lowest composite RFM score to get an idea of who the company’s “best customers” are, and the company could choose to focus on just the best overall decile (10%) of customers with the highest scores to target a small yet meaningful slice of the collective customer base for targeted marketing, outreach, or perks. This same analysis could be performed at a store-level if the company wanted to more intentionally (and perhaps more effectively) tailor marketing efforts to customers within each geographic area, provided the data science team grouped customers by store in the preceding suggested steps for determining **recency**, **frequency**, and **monetary value**.