### **INDEX**
1. Setup Notebook
2. Exercise Directions

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

### **1. SETUP NOTEBOOK**

In [5]:
# import libraries
import pandas as pd
import numpy as np
import os
from pathlib import Path

In [6]:
# get the current working directory as a Path object
path = os.getcwd()

# update the path to parent folder
path = os.path.abspath(os.path.join(path, os.pardir))

In [7]:
# import orders products merged
df_ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'ords_prods_merge.pkl'))

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

### **2. EXERCISE DIRECTIONS**

**Dir 1: In this Exercise, you learned how to find the aggregated mean of the “order_number” column grouped by “department_id” for a subset of your dataframe. Now, repeat this process for the entire dataframe.**

In [11]:
df_ords_prods_merge.groupby('department_id')['order_number'].mean()

department_id
1     15.457838
2     17.277920
3     17.170395
4     17.811403
5     15.215751
6     16.439806
7     17.225802
8     15.340650
9     15.895474
10    20.197148
11    16.170638
12    15.887671
13    16.583536
14    16.773669
15    16.165037
16    17.665606
17    15.694469
18    19.310397
19    17.177343
20    16.473447
21    22.902379
Name: order_number, dtype: float64

**Dir 3: Analyze the result. How do the results for the entire dataframe differ from those of the subset? Include your comments in a markdown cell below the executed code.**

Comparing the mean of order_number between the subset and the whole dataframe reveals a slightly higher mean in the dataframe. While the difference is expected due to the larger size of the dataframe, it is not possible to predict whether the additional records would raise or lower the mean.

**Dir 4: Follow the instructions in the Exercise for creating a loyalty flag for existing customers using the transform() and loc() functions.**

In [15]:
df_ords_prods_merge['max_order'] = df_ords_prods_merge.groupby(['user_id'])['order_number'].transform('max')

In [16]:
df_ords_prods_merge.loc[df_ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [17]:
df_ords_prods_merge.loc[(df_ords_prods_merge['max_order'] <= 40) & (df_ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [18]:
df_ords_prods_merge.loc[df_ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [19]:
df_ords_prods_merge['loyalty_flag'].value_counts

<bound method IndexOpsMixin.value_counts of 0               New customer
1               New customer
2               New customer
3               New customer
4               New customer
                  ...       
32404854    Regular customer
32404855    Regular customer
32404856    Regular customer
32404857    Regular customer
32404858    Regular customer
Name: loyalty_flag, Length: 32404859, dtype: object>

**Dir 5: The marketing team at Instacart wants to know whether there’s a difference between the spending habits of the three types of customers you identified. Use the loyalty flag you created and check the basic statistics of the product prices for each loyalty category (Loyal Customer, Regular Customer, and New Customer). What you’re trying to determine is whether the prices of products purchased by loyal customers differ from those purchased by regular or new customers.**


In [21]:
df_ords_prods_merge.groupby('loyalty_flag').agg({'prices': ['mean', 'min', 'max']})

Unnamed: 0_level_0,prices,prices,prices
Unnamed: 0_level_1,mean,min,max
loyalty_flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Loyal customer,10.386336,1.0,99999.0
New customer,13.29467,1.0,99999.0
Regular customer,12.495717,1.0,99999.0


Loyal customers tend to spend less on average, however they make more frequent purchases than regular customers. 

**Dir:6 The team now wants to target different types of spenders in their marketing campaigns. This can be achieved by looking at the prices of the items people are buying. Create a spending flag for each user based on the average price across all their orders using the following criteria:
If the mean of the prices of products purchased by a user is lower than 10, then flag them as a “Low spender.”
If the mean of the prices of products purchased by a user is higher than or equal to 10, then flag them as a “High spender.”**

In [24]:
# create new column and aggregrate data
df_ords_prods_merge['average_spending'] = df_ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

  df_ords_prods_merge['average_spending'] = df_ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)


In [25]:
# create and set flag
df_ords_prods_merge.loc[df_ords_prods_merge['average_spending'] < 10, 'spending_type'] = 'Low spender'

In [26]:
df_ords_prods_merge.loc[df_ords_prods_merge['average_spending'] >= 10, 'spending_type'] = 'High spender'

In [27]:
# confirm new column
df_ords_prods_merge.head()

Unnamed: 0,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_last_order,First_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,busy_days,busiest_period_of_day,max_order,loyalty_flag,average_spending,spending_type
0,1,1,2,8,,True,196,1,0,Soda,77,7,9.0,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender
1,1,1,2,8,,True,14084,2,0,Organic Unsweetened Vanilla Almond Milk,91,16,12.5,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender
2,1,1,2,8,,True,12427,3,0,Original Beef Jerky,23,19,4.4,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender
3,1,1,2,8,,True,26088,4,0,Aged White Cheddar Popcorn,23,19,4.7,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender
4,1,1,2,8,,True,26405,5,0,XL Pick-A-Size Paper Towel Rolls,54,17,1.0,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender


**Dir 7: In order to send relevant notifications to users within the app, the Instacart team wants you to determine frequent versus non-frequent customers. Create an order frequency flag that marks the regularity of a user’s ordering behavior according to the median in the “days_since_prior_order” column. The criteria for the flag should be as follows:
If the median of “days_since_prior_order” is higher than 20, then the customer should be labeled a “Non-frequent customer.”
If the median is higher than 10 and lower than or equal to 20, then the customer should be labeled a “Regular customer.”
If the median is lower than or equal to 10, then the customer should be labeled a “Frequent customer.”**

In [29]:
# create new column and aggregrate data
df_ords_prods_merge['order_frequency'] = df_ords_prods_merge.groupby(['user_id'])['days_since_last_order'].transform(np.median)

  df_ords_prods_merge['order_frequency'] = df_ords_prods_merge.groupby(['user_id'])['days_since_last_order'].transform(np.median)


In [30]:
# create and set flag
df_ords_prods_merge.loc[df_ords_prods_merge['order_frequency'] >= 20, 'customer type'] = 'Non-frequent customer'

In [31]:
df_ords_prods_merge.loc[(df_ords_prods_merge['order_frequency'] > 10) & (df_ords_prods_merge['order_frequency'] < 20) , 'customer type'] = 'Regular customer'

In [32]:
df_ords_prods_merge.loc[df_ords_prods_merge['order_frequency'] <= 10, 'customer type'] = 'Frequent customer'

In [33]:
df_ords_prods_merge.head()

Unnamed: 0,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_last_order,First_order,product_id,add_to_cart_order,reordered,product_name,...,prices,_merge,busy_days,busiest_period_of_day,max_order,loyalty_flag,average_spending,spending_type,order_frequency,customer type
0,1,1,2,8,,True,196,1,0,Soda,...,9.0,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer
1,1,1,2,8,,True,14084,2,0,Organic Unsweetened Vanilla Almond Milk,...,12.5,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer
2,1,1,2,8,,True,12427,3,0,Original Beef Jerky,...,4.4,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer
3,1,1,2,8,,True,26088,4,0,Aged White Cheddar Popcorn,...,4.7,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer
4,1,1,2,8,,True,26405,5,0,XL Pick-A-Size Paper Towel Rolls,...,1.0,both,Regularly busy day,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer


In [34]:
df_ords_prods_merge.loc[df_ords_prods_merge['prices'] >100, 'prices'] = np.nan

In [35]:
#exporting df as pickle
df_ords_prods_merge.to_pickle(os.path.join(path, '02 Data','Prepared Data', 'ords_prods_merge.pkl'))