# Table of Contents

- Creation of loyalty flag and order frequency flag


### Step 1

1. Create a new notebook for this task. Be sure to import the relevant libraries, along with your ords_prods_merge dataframe, which should include your newly derived columns from the previous Exercise.


In [1]:
# importing libraries
import pandas as pd
import numpy as np
import os

# setting master path
path = r'/Users/Norberto/Desktop/2023-10 Instacart Basket Analysis'

In [2]:
# read pkl files
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'orders_products_merged.pkl'))

### Step 2-3
2. In this Exercise, you learned how to find the aggregated mean of the “order_number” column grouped by “department_id” for a subset of your dataframe. Now, repeat this process for the entire dataframe.
3. Analyze the result. How do the results for the entire dataframe differ from those of the subset? Include your comments in a markdown cell below the executed code.


In [3]:
# create table with department id, mean, min, max to compare results against subset of data.
ords_prods_merge.groupby('department_id').agg({'order_number': ['mean', 'min', 'max']})

Unnamed: 0_level_0,order_number,order_number,order_number
Unnamed: 0_level_1,mean,min,max
department_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
1.0,15.457838,1,99
2.0,17.27792,1,99
3.0,17.170395,1,99
4.0,17.811403,1,99
5.0,15.215751,1,99
6.0,16.439806,1,99
7.0,17.225802,1,99
8.0,15.34065,1,99
9.0,15.895474,1,99
10.0,20.197148,1,99


The results for the entire dataframe are similar to that of the subset. Some means are higher in the whole dataframe and all the maxes are 99, whereas in the subset, 3 departments had slightly lower max values.

### Step 4-5
4. Follow the instructions in the Exercise for creating a loyalty flag for existing customers using the transform() and loc() functions.
5. The marketing team at Instacart wants to know whether there’s a difference between the spending habits of the three types of customers you identified. Use the loyalty flag you created and check the basic statistics of the product prices for each loyalty category (Loyal Customer, Regular Customer, and New Customer). What you’re trying to determine is whether the prices of products purchased by loyal customers differ from those purchased by regular or new customers.

In [4]:
# apply loyalty flag label based off the max_order column created during exercise 
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [5]:
# apply loyalty flag label based off the max_order column created during exercise 
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [6]:
# apply loyalty flag label based off the max_order column created during exercise 
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [7]:
# check column counts to ensure all rows were assigned a label 
ords_prods_merge['loyalty_flag'].value_counts(dropna=False)

loyalty_flag
Regular customer    15891507
Loyal customer      10294027
New customer         6249525
Name: count, dtype: int64

In [8]:
# visual inspection of new column and results
ords_prods_merge.head(100)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_day_new,busiest_period_of_day,max_order,loyalty_flag
0,2539329,1,1,2,8,,196,1,0,Soda,77.0,7.0,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
1,2539329,1,1,2,8,,14084,2,0,Organic Unsweetened Vanilla Almond Milk,91.0,16.0,12.5,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
2,2539329,1,1,2,8,,12427,3,0,Original Beef Jerky,23.0,19.0,4.4,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer
3,2539329,1,1,2,8,,26088,4,0,Aged White Cheddar Popcorn,23.0,19.0,4.7,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer
4,2539329,1,1,2,8,,26405,5,0,XL Pick-A-Size Paper Towel Rolls,54.0,17.0,1.0,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,738281,2,4,2,10,8.0,21150,13,0,Fire Grilled Steak Bowl,38.0,1.0,5.9,both,Mid-range product,Regularly busy,Regularly busy,Most orders,14,Regular customer
96,1673511,2,5,3,11,8.0,47144,1,0,Unsweetened Original Almond Breeze Almond Milk,91.0,16.0,14.0,both,Mid-range product,Regularly busy,Least busy,Most orders,14,Regular customer
97,1673511,2,5,3,11,8.0,5322,2,0,Gluten Free Dark Chocolate Chunk Chewy with a ...,3.0,19.0,2.9,both,Low-range product,Regularly busy,Least busy,Most orders,14,Regular customer
98,1673511,2,5,3,11,8.0,17224,3,0,Oats & Honey Gluten Free Granola,3.0,19.0,1.6,both,Low-range product,Regularly busy,Least busy,Most orders,14,Regular customer


In [9]:
# group and display data by product price range and customer type, and display total amount spent for each category.
ords_prods_merge.groupby(['price_range_loc','loyalty_flag'])['prices'].sum()

price_range_loc     loyalty_flag    
High-range product  Loyal customer       29326138.1
                    New customer         36046966.4
                    Regular customer     78920868.9
Low-range product   Loyal customer        9760615.3
                    New customer          5880713.7
                    Regular customer     14973270.3
Mid-range product   Loyal customer       67727288.8
                    New customer         41084107.1
                    Regular customer    104497554.0
Name: prices, dtype: float64

According to this breakdown, we compared the total revenue from each product across different customer types to see who contributed most to each category.

- In High-range products, regular customers were responsible for more than half of revenue, followed by new customers and loyal customers bringing in the least
- In Low-range products, regular customers were responsible for half of revenue, followed by loyal customers and new customers bringing in the least
- In Mid-range products, regular customers were responsible for almost half of revenure, followed by loyal cuustomers and new customers bring in the least.

Based on these results, new customers buy more high-range products than loyal customers.

### Step 6
6. The team now wants to target different types of spenders in their marketing campaigns. This can be achieved by looking at the prices of the items people are buying. Create a spending flag for each user based on the average price across all their orders using the following criteria:
- If the mean of the prices of products purchased by a user is lower than 10, then flag them as a “Low spender.”
- If the mean of the prices of products purchased by a user is higher than or equal to 10, then flag them as a “High spender.”

In [10]:
# create new column with average price paid by each customer
ords_prods_merge['avg_spending'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

In [11]:
# remove limit on display for jupyet noteboooks
pd.options.display.max_rows = None

# display 100 rows for visual inspection
ords_prods_merge.head(150)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,product_name,...,department_id,prices,_merge,price_range_loc,busiest_day,busiest_day_new,busiest_period_of_day,max_order,loyalty_flag,avg_spending
0,2539329,1,1,2,8,,196,1,0,Soda,...,7.0,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
1,2539329,1,1,2,8,,14084,2,0,Organic Unsweetened Vanilla Almond Milk,...,16.0,12.5,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
2,2539329,1,1,2,8,,12427,3,0,Original Beef Jerky,...,19.0,4.4,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
3,2539329,1,1,2,8,,26088,4,0,Aged White Cheddar Popcorn,...,19.0,4.7,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
4,2539329,1,1,2,8,,26405,5,0,XL Pick-A-Size Paper Towel Rolls,...,17.0,1.0,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
5,2398795,1,2,3,7,15.0,196,1,1,Soda,...,7.0,9.0,both,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797
6,2398795,1,2,3,7,15.0,10258,2,0,Pistachios,...,19.0,3.0,both,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797
7,2398795,1,2,3,7,15.0,12427,3,1,Original Beef Jerky,...,19.0,4.4,both,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797
8,2398795,1,2,3,7,15.0,13176,4,0,Bag of Organic Bananas,...,4.0,10.3,both,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797
9,2398795,1,2,3,7,15.0,26088,5,1,Aged White Cheddar Popcorn,...,19.0,4.7,both,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797


In [12]:
# create column and assign flag for average spending less than 10
ords_prods_merge.loc[ords_prods_merge['avg_spending'] > 10, 'spending_flag'] = 'Low Spender'

In [13]:
# assign flag for average spending more than or equal to 10
ords_prods_merge.loc[ords_prods_merge['avg_spending'] <= 10, 'spending_flag'] = 'High Spender'

In [14]:
# display counts for inspection
ords_prods_merge['spending_flag'].value_counts(dropna=False)

spending_flag
High Spender    31800764
Low Spender       634295
Name: count, dtype: int64

### Step 7

7. In order to send relevant notifications to users within the app (for instance, asking users if they want to buy the same item again), the Instacart team wants you to determine frequent versus non-frequent customers. Create an order frequency flag that marks the regularity of a user’s ordering behavior according to the median in the “days_since_prior_order” column. The criteria for the flag should be as follows:
- If the median of “days_since_prior_order” is higher than 20, then the customer should be labeled a “Non-frequent customer.”
- If the median is higher than 10 and lower than or equal to 20, then the customer should be labeled a “Regular customer.”
- If the median is lower than or equal to 10, then the customer should be labeled a “Frequent customer.”

In [15]:
# create column displaying a user's median days since priod order
ords_prods_merge['dspo_median'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)

In [16]:
ords_prods_merge.head(150)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,product_name,...,_merge,price_range_loc,busiest_day,busiest_day_new,busiest_period_of_day,max_order,loyalty_flag,avg_spending,spending_flag,dspo_median
0,2539329,1,1,2,8,,196,1,0,Soda,...,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5
1,2539329,1,1,2,8,,14084,2,0,Organic Unsweetened Vanilla Almond Milk,...,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5
2,2539329,1,1,2,8,,12427,3,0,Original Beef Jerky,...,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5
3,2539329,1,1,2,8,,26088,4,0,Aged White Cheddar Popcorn,...,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5
4,2539329,1,1,2,8,,26405,5,0,XL Pick-A-Size Paper Towel Rolls,...,both,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5
5,2398795,1,2,3,7,15.0,196,1,1,Soda,...,both,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5
6,2398795,1,2,3,7,15.0,10258,2,0,Pistachios,...,both,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5
7,2398795,1,2,3,7,15.0,12427,3,1,Original Beef Jerky,...,both,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5
8,2398795,1,2,3,7,15.0,13176,4,0,Bag of Organic Bananas,...,both,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5
9,2398795,1,2,3,7,15.0,26088,5,1,Aged White Cheddar Popcorn,...,both,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5


In [17]:
# assign flag based off value in DSPO_MEDIAN column
ords_prods_merge.loc[ords_prods_merge['dspo_median'] > 20, 'order_frequency_flag'] = 'Non-frequent customer'
ords_prods_merge.loc[ords_prods_merge['dspo_median'] <= 10, 'order_frequency_flag'] = 'Frequent customer'
ords_prods_merge.loc[(ords_prods_merge['dspo_median'] <= 20) & (ords_prods_merge['dspo_median'] >= 10), 'order_frequency_flag'] = 'Regular customer'

In [18]:
# check value counts
ords_prods_merge['order_frequency_flag'].value_counts(dropna=False)

order_frequency_flag
Frequent customer        20305301
Regular customer          8490038
Non-frequent customer     3639720
Name: count, dtype: int64

In [19]:
# visual inspection of  column
ords_prods_merge.head(150)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,product_name,...,price_range_loc,busiest_day,busiest_day_new,busiest_period_of_day,max_order,loyalty_flag,avg_spending,spending_flag,dspo_median,order_frequency_flag
0,2539329,1,1,2,8,,196,1,0,Soda,...,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
1,2539329,1,1,2,8,,14084,2,0,Organic Unsweetened Vanilla Almond Milk,...,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
2,2539329,1,1,2,8,,12427,3,0,Original Beef Jerky,...,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
3,2539329,1,1,2,8,,26088,4,0,Aged White Cheddar Popcorn,...,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
4,2539329,1,1,2,8,,26405,5,0,XL Pick-A-Size Paper Towel Rolls,...,Low-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
5,2398795,1,2,3,7,15.0,196,1,1,Soda,...,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
6,2398795,1,2,3,7,15.0,10258,2,0,Pistachios,...,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
7,2398795,1,2,3,7,15.0,12427,3,1,Original Beef Jerky,...,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
8,2398795,1,2,3,7,15.0,13176,4,0,Bag of Organic Bananas,...,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer
9,2398795,1,2,3,7,15.0,26088,5,1,Aged White Cheddar Popcorn,...,Low-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,High Spender,20.5,Non-frequent customer


In [20]:
# Exporting pkl 
ords_prods_merge.to_pickle(os.path.join(path, '02 Data','Prepared Data', 'orders_products_merged.pkl'))