# 4.8 Grouping & Aggregating Part 2 - Task Exercises

### Contents
01 Import Libraries and Data

02 Group by department_id and Find Mean

03 Loyalty Flag

04 Spending Flag

05 Order Frequency Flag

06 Export Data


### 01 Import Libraries and Data

1. Create a new notebook for this task. Be sure to import the relevant libraries, along with your ords_prods_merge dataframe, which should include your newly derived columns from the previous Exercise.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os

In [2]:
# Create path variable
path = r'D:\JupyterProjects\06-2022 Instacart Basket Analysis'

In [3]:
# Import pickle file
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'Four8NanData.pkl'))

In [4]:
# Check shape of dataframe
ords_prods_merge.shape

(32404859, 21)

In [5]:
# Check head of dataframe
ords_prods_merge.head(10)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,new_customer,product_id,add_to_cart_order,reordered,...,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag
0,2539329,1,1,2,8,,True,196,1,0,...,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
1,2398795,1,2,3,7,15.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Slowest days,Average orders,10,New customer
2,473747,1,3,3,12,21.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Slowest days,Average orders,10,New customer
3,2254736,1,4,4,7,29.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Least busy,Slowest days,Average orders,10,New customer
4,431534,1,5,4,15,28.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Least busy,Slowest days,Average orders,10,New customer
5,3367565,1,6,2,7,19.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
6,550135,1,7,1,9,20.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Busiest days,Average orders,10,New customer
7,3108588,1,8,1,14,14.0,False,196,2,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Busiest days,Average orders,10,New customer
8,2295261,1,9,1,16,0.0,False,196,4,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Busiest days,Average orders,10,New customer
9,2550362,1,10,4,8,30.0,False,196,1,1,...,77,7,9.0,both,Mid-range product,Least busy,Slowest days,Average orders,10,New customer


### 02 Group by department_id and Find Mean

2. In this Exercise, you learned how to find the aggregated mean of the “order_number” column grouped by “department_id” for a subset of your dataframe. Now, repeat this process for the entire dataframe.

In [6]:
# Find the average number of orders by department in data subset
ords_prods_merge.groupby('department_id').agg({'order_number': ['mean']})

Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
1,15.457838
2,17.27792
3,17.170395
4,17.811403
5,15.215751
6,16.439806
7,17.225802
8,15.34065
9,15.895474
10,20.197148


3. Analyze the result. How do the results for the entire dataframe differ from those of the subset? Include your comments in a 
markdown cell below the executed code.

The means for the entire dataframe range from 15 to 23 whereas the mean from the data subset ranged from 11 - 19.  Also, all 21 departments are included in the entire dataframe while the data subset only contained 8 of them.

### 03 Loyalty Flag

In [7]:
# View loyalty_flag frequencies for loyalty_flag field created in first part of excercise
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

Regular customer    15876776
Loyal customer      10284093
New customer         6243990
Name: loyalty_flag, dtype: int64

5. The marketing team at Instacart wants to know whether there’s a difference between the spending habits of the three types of customers you identified. Use the loyalty flag you created and check the basic statistics of the product prices for each loyalty category (Loyal Customer, Regular Customer, and New Customer). What you’re trying to determine is whether the prices of products purchased by loyal customers differ from those purchased by regular or new customers.

In [8]:
# Group by loyalty_flag and find mean, min, and max using the aggregate() function
ords_prods_merge.groupby('loyalty_flag').agg({'prices': ['mean', 'min', 'max']})

Unnamed: 0_level_0,prices,prices,prices
Unnamed: 0_level_1,mean,min,max
loyalty_flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Loyal customer,7.773438,1.0,25.0
New customer,7.800781,1.0,25.0
Regular customer,7.796875,1.0,25.0


The average prices of products purchased by loyal customers are lower than those for new customers and regular customers.

### 04 Spending Flag

6. The team now wants to target different types of spenders in their marketing campaigns. This can be achieved by looking at the prices of the items people are buying. Create a spending flag for each user based on the average price across all their orders using the following criteria:
If the mean of the prices of products purchased by a user is lower than 10, then flag them as a “Low spender.”
If the mean of the prices of products purchased by a user is higher than or equal to 10, then flag them as a “High spender.”

In [9]:
# Create mean_price column by applying transform () function to prices grouped by user_id
ords_prods_merge['mean_price'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

In [10]:
# Create spending flag for <10
ords_prods_merge.loc[ords_prods_merge['mean_price'] < 10, 'spending_flag'] = 'Low spender'

In [11]:
# Create spending flag for >= 10
ords_prods_merge.loc[ords_prods_merge['mean_price'] >= 10, 'spending_flag'] = 'High spender'

In [12]:
# Check spending_flag frequencies
ords_prods_merge['spending_flag'].value_counts(dropna = False)

Low spender     32284736
High spender      120123
Name: spending_flag, dtype: int64

In [13]:
# View top ten rows
ords_prods_merge.head(10)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,new_customer,product_id,add_to_cart_order,reordered,...,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,mean_price,spending_flag
0,2539329,1,1,2,8,,True,196,1,0,...,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367188,Low spender
1,2398795,1,2,3,7,15.0,False,196,1,1,...,9.0,both,Mid-range product,Regularly busy,Slowest days,Average orders,10,New customer,6.367188,Low spender
2,473747,1,3,3,12,21.0,False,196,1,1,...,9.0,both,Mid-range product,Regularly busy,Slowest days,Average orders,10,New customer,6.367188,Low spender
3,2254736,1,4,4,7,29.0,False,196,1,1,...,9.0,both,Mid-range product,Least busy,Slowest days,Average orders,10,New customer,6.367188,Low spender
4,431534,1,5,4,15,28.0,False,196,1,1,...,9.0,both,Mid-range product,Least busy,Slowest days,Average orders,10,New customer,6.367188,Low spender
5,3367565,1,6,2,7,19.0,False,196,1,1,...,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367188,Low spender
6,550135,1,7,1,9,20.0,False,196,1,1,...,9.0,both,Mid-range product,Regularly busy,Busiest days,Average orders,10,New customer,6.367188,Low spender
7,3108588,1,8,1,14,14.0,False,196,2,1,...,9.0,both,Mid-range product,Regularly busy,Busiest days,Average orders,10,New customer,6.367188,Low spender
8,2295261,1,9,1,16,0.0,False,196,4,1,...,9.0,both,Mid-range product,Regularly busy,Busiest days,Average orders,10,New customer,6.367188,Low spender
9,2550362,1,10,4,8,30.0,False,196,1,1,...,9.0,both,Mid-range product,Least busy,Slowest days,Average orders,10,New customer,6.367188,Low spender


### 05 Order Frequency Flag

7. In order to send relevant notifications to users within the app (for instance, asking users if they want to buy the same item again), the Instacart team wants you to determine frequent versus non-frequent customers. Create an order frequency flag that marks the regularity of a user’s ordering behavior according to the median in the “days_since_prior_order” column. The criteria for the flag should be as follows:
If the median of “days_since_prior_order” is higher than 20, then the customer should be labeled a “Non-frequent customer.”
If the median is higher than 10 and lower than or equal to 20, then the customer should be labeled a “Regular customer.”
If the median is lower than or equal to 10, then the customer should be labeled a “Frequent customer.”

In [14]:
# Create median_days_prior column by applying transform () function to days_since_prior_order grouped by user_id
ords_prods_merge['median_days_prior'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)

In [15]:
# Create frequency flag - Non-frequent customer
ords_prods_merge.loc[ords_prods_merge['median_days_prior'] > 20, 'frequency_flag'] = 'Non-frequent customer'

In [16]:
# Create frequency flag - Regular customer
ords_prods_merge.loc[(ords_prods_merge['median_days_prior'] > 10) & (ords_prods_merge['median_days_prior'] <= 20), 'frequency_flag'] = 'Regular customer'

In [17]:
# Create frequency flag - Frequent customer
ords_prods_merge.loc[ords_prods_merge['median_days_prior'] <= 10, 'frequency_flag'] = 'Frequent customer'

In [18]:
# Check frequency_flag frequencies
ords_prods_merge['frequency_flag'].value_counts(dropna = False)

Frequent customer        21559853
Regular customer          7208564
Non-frequent customer     3636437
NaN                             5
Name: frequency_flag, dtype: int64

In [19]:
ords_prods_merge[['days_since_prior_order', 'median_days_prior', 'frequency_flag']].head(30)

Unnamed: 0,days_since_prior_order,median_days_prior,frequency_flag
0,,20.5,Non-frequent customer
1,15.0,20.5,Non-frequent customer
2,21.0,20.5,Non-frequent customer
3,29.0,20.5,Non-frequent customer
4,28.0,20.5,Non-frequent customer
5,19.0,20.5,Non-frequent customer
6,20.0,20.5,Non-frequent customer
7,14.0,20.5,Non-frequent customer
8,0.0,20.5,Non-frequent customer
9,30.0,20.5,Non-frequent customer


In [20]:
ords_prods_merge.dtypes

order_id                     int32
user_id                      int32
order_number                  int8
orders_day_of_week            int8
order_hour_of_day             int8
days_since_prior_order     float16
new_customer                  bool
product_id                   int32
add_to_cart_order            int16
reordered                     int8
product_name                object
aisle_id                     int16
department_id                 int8
prices                     float16
_merge                    category
price_range_loc             object
busiest_day                 object
busiest_days                object
busiest_period_of_day       object
max_order                    int16
loyalty_flag                object
mean_price                 float16
spending_flag               object
median_days_prior          float16
frequency_flag              object
dtype: object

### 06 Export Data

In [21]:
# Export ords_prods_merge as 4_8_Data in pickle format
ords_prods_merge.to_pickle(os.path.join(path, '02 Data','Prepared Data', 'Four8Final.pkl'))