# 4.8: Grouping Data & Aggregating Variables

### This script contains the following points:
#### 1. Check the dimensions of the imported dataframe
#### 2. Exercise Q2. The aggregated mean of the "order_number" column grouped by "department_id" for the entire dataframe
#### 3. Exercise Q3: How the entire dataframe results differ from the subset results
#### 4. Exercise Q4a: Loyalty flag for existing customers using the transform() function
#### 5. Exercise Q4b: Loyalty flag for existing customers using the loc() function
#### 6. Exercise Q5: User's purchased product price basic statistics
#### 7. Exercise Q6a: Spending flag for existing customers using the transform() function
#### 8. Exercise Q6b: Spending flag for existing customers using the loc() function
#### 9. Exercise Q7a: Frequency flag for existing customers using the transform() function
#### 10. Exercise Q7b: Frequency flag for existing customers using the loc() function
#### 11. Exercise Q8-9: Export dataframe as pkl file

In [54]:
# Import libraries

import pandas as pd
import numpy as np
import os

In [55]:
# Path to main project folder

path = r'C:\Users\Mark\_Instacart Basket Analysis'

In [56]:
# Retrieve the orders_products_merged.pkl file

orders_products_merged = pd.read_pickle(os.path.join(path, '03 Scripts', 'Prepared Data', 'orders_products_merged_v2.pkl'))

#### 1. Check the dimensions of the imported dataframe

In [57]:
# Ensure nothing looks out of place with the imported dataframes

orders_products_merged.head()

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,_merge,Unnamed: 0_y,product_name,aisle_id,department_id,prices,match,price_range_loc,busiest day,busiest_period_of_day
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,both,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders
1,2,473747,1,prior,3,3,12,21.0,196,1,...,both,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,both,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders
3,4,431534,1,prior,5,4,15,28.0,196,1,...,both,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,both,195,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Average orders


In [58]:
# See if the data set is large, small, wide, or long

orders_products_merged.shape

(30328763, 21)

#### 2. Exercise Q2. The aggregated mean of the "order_number" column grouped by "department_id" for the entire dataframe

In [59]:
# Finding the mean with agg()
# Find the aggregated mean of the "order_number" column grouped by "department_id"

orders_products_merged.groupby('department_id').agg({'order_number': ['mean']})

Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
1,16.559358
2,18.413176
3,18.2796
4,18.91589
5,16.497751
6,17.60939
7,18.303975
8,16.383301
9,17.022963
10,21.227447


In [60]:
# Using the mean() function with Square Brackets

orders_products_merged.groupby('department_id')['order_number'].mean()

department_id
1     16.559358
2     18.413176
3     18.279600
4     18.915890
5     16.497751
6     17.609390
7     18.303975
8     16.383301
9     17.022963
10    21.227447
11    17.311768
12    16.953613
13    17.727470
14    17.898097
15    17.290123
16    18.757796
17    16.808683
18    20.336389
19    18.258620
20    17.578959
21    23.644477
Name: order_number, dtype: float64

In [61]:
# Using the mean() function with Dot Notation

orders_products_merged.groupby('department_id').order_number.mean()

department_id
1     16.559358
2     18.413176
3     18.279600
4     18.915890
5     16.497751
6     17.609390
7     18.303975
8     16.383301
9     17.022963
10    21.227447
11    17.311768
12    16.953613
13    17.727470
14    17.898097
15    17.290123
16    18.757796
17    16.808683
18    20.336389
19    18.258620
20    17.578959
21    23.644477
Name: order_number, dtype: float64

In [62]:
# Multiple statistics at the same time

orders_products_merged.groupby('department_id').agg({'order_number': ['mean', 'min', 'max']})

Unnamed: 0_level_0,order_number,order_number,order_number
Unnamed: 0_level_1,mean,min,max
department_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
1,16.559358,2,99
2,18.413176,2,99
3,18.2796,2,99
4,18.91589,2,99
5,16.497751,2,99
6,17.60939,2,99
7,18.303975,2,99
8,16.383301,2,99
9,17.022963,2,99
10,21.227447,2,99


#### Exercise Q3: How the entire dataframe results differ from the subset results: The first thing I noticed was that the number of department_ids in the subset were fewer than that of the main dataframe. This of course means that the statistics like average would differ between the two sets. For example, in the subset, the department_id 4 has a mean of 19.77, while for the same department_id 4 the mean is 18.92 in the main dataframe. The minimum and maximum in both are mostly the same except for department_id 17 of the subset, which has a maximum of 98. It is clear that a subset's statistics can be different from the main dataframe.


#### 4. Exercise Q4a: Loyalty flag for existing customers using the transform() function

In [63]:
# Step 1 (Group): the orders_products_merged dataframe is grouped by the "user_id" column
# Step 2 (Transform): the transform() function is applied to the "order_number" column with the np.max argument
# Step 3 (Store in new column): the maximum order number for each user is stored in a new column called "max_order"

orders_products_merged['max_order'] = orders_products_merged.groupby(['user_id'])['order_number'].transform(np.max)

In [64]:
# Review dataframe

orders_products_merged.head(100)

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,Unnamed: 0_y,product_name,aisle_id,department_id,prices,match,price_range_loc,busiest day,busiest_period_of_day,max_order
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
1,2,473747,1,prior,3,3,12,21.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
3,4,431534,1,prior,5,4,15,28.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Average orders,10
5,6,550135,1,prior,7,1,9,20.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10
6,7,3108588,1,prior,8,1,14,14.0,196,2,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10
7,8,2295261,1,prior,9,1,16,0.0,196,4,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10
8,9,2550362,1,prior,10,4,8,30.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
9,144,2968173,15,prior,15,1,9,7.0,196,2,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,22


In [65]:
# To see more rows set the pandas display option to "None"

pd.options.display.max_rows = None

In [66]:
# Review dataframe again

orders_products_merged.head(100)

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,Unnamed: 0_y,product_name,aisle_id,department_id,prices,match,price_range_loc,busiest day,busiest_period_of_day,max_order
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
1,2,473747,1,prior,3,3,12,21.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
3,4,431534,1,prior,5,4,15,28.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Average orders,10
5,6,550135,1,prior,7,1,9,20.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10
6,7,3108588,1,prior,8,1,14,14.0,196,2,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10
7,8,2295261,1,prior,9,1,16,0.0,196,4,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10
8,9,2550362,1,prior,10,4,8,30.0,196,1,...,195,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10
9,144,2968173,15,prior,15,1,9,7.0,196,2,...,195,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,22


#### 5. Exercise Q4b: Loyalty flag for existing customers using the loc() function

In [67]:
# Create a flag that assigns a "loyalty" label to a user ID based on its corresponding max order value
# Max orders higher than 40 - flag as a "Loyal customer"

orders_products_merged.loc[orders_products_merged['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [68]:
# Max orders, lower than or equal to 40, and higher than 10 - flag as a "Regular customer"

orders_products_merged.loc[(orders_products_merged['max_order'] <= 40) & (orders_products_merged['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [69]:
# Max orders lower than or equal to 10 - flag as a "New customer"

orders_products_merged.loc[orders_products_merged['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [70]:
# Review the frequency of the "loyalty_flag" column

orders_products_merged['loyalty_flag'].value_counts(dropna = False)

Regular customer    15081691
Loyal customer      10095381
New customer         5151691
Name: loyalty_flag, dtype: int64

In [71]:
# Review dataframe

orders_products_merged.head(100)

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,product_name,aisle_id,department_id,prices,match,price_range_loc,busiest day,busiest_period_of_day,max_order,loyalty_flag
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer
1,2,473747,1,prior,3,3,12,21.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer
3,4,431534,1,prior,5,4,15,28.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Average orders,10,New customer
5,6,550135,1,prior,7,1,9,20.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer
6,7,3108588,1,prior,8,1,14,14.0,196,2,...,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer
7,8,2295261,1,prior,9,1,16,0.0,196,4,...,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer
8,9,2550362,1,prior,10,4,8,30.0,196,1,...,Soda,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer
9,144,2968173,15,prior,15,1,9,7.0,196,2,...,Soda,77,7,9.0,both,Mid-range product,Busiest days,Average orders,22,Regular customer


In [72]:
# Review multiple columns of interest

orders_products_merged[['user_id', 'loyalty_flag', 'order_number']].head(100)

Unnamed: 0,user_id,loyalty_flag,order_number
0,1,New customer,2
1,1,New customer,3
2,1,New customer,4
3,1,New customer,5
4,1,New customer,6
5,1,New customer,7
6,1,New customer,8
7,1,New customer,9
8,1,New customer,10
9,15,Regular customer,15


#### 6. Exercise Q5: User's purchased product price basic statistics

In [73]:
# Multiple statistics at the same time

orders_products_merged.groupby('loyalty_flag').agg({'prices': ['mean', 'min', 'max']})

Unnamed: 0_level_0,prices,prices,prices
Unnamed: 0_level_1,mean,min,max
loyalty_flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Loyal customer,10.402162,1.0,99999.0
New customer,13.396333,1.0,99999.0
Regular customer,12.546842,1.0,99999.0


#### 7. Exercise Q6a: Spending flag for existing customers using the transform() function

In [74]:
# Step 1 (Group): the orders_products_merged dataframe is grouped by the "user_id" column
# Step 2 (Transform): the transform() function is applied to the "prices" column with the np.mean argument
# Step 3 (Store in new column): the mean prices for each user is stored in a new column called "mean_prices"

orders_products_merged['mean_prices'] = orders_products_merged.groupby(['user_id'])['prices'].transform(np.mean)

In [75]:
# Review dataframe

orders_products_merged.head(100)

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,aisle_id,department_id,prices,match,price_range_loc,busiest day,busiest_period_of_day,max_order,loyalty_flag,mean_prices
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222
1,2,473747,1,prior,3,3,12,21.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222
3,4,431534,1,prior,5,4,15,28.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Average orders,10,New customer,6.372222
5,6,550135,1,prior,7,1,9,20.0,196,1,...,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer,6.372222
6,7,3108588,1,prior,8,1,14,14.0,196,2,...,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer,6.372222
7,8,2295261,1,prior,9,1,16,0.0,196,4,...,77,7,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer,6.372222
8,9,2550362,1,prior,10,4,8,30.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222
9,144,2968173,15,prior,15,1,9,7.0,196,2,...,77,7,9.0,both,Mid-range product,Busiest days,Average orders,22,Regular customer,3.994118


#### 8. Exercise Q6b: Spending flag for existing customers using the loc() function

In [76]:
# Create a spending flag for each user based on the mean (average) price across all their orders
# Mean price higher than or equal to 10 - flag as a "High spender"

orders_products_merged.loc[orders_products_merged['mean_prices'] >= 10, 'spend_flag'] = 'High spender'

In [77]:
# Mean price lower than 10 - flag as a "Low spender"

orders_products_merged.loc[orders_products_merged['mean_prices'] < 10, 'spend_flag'] = 'Low spender'

In [78]:
# Review the frequency of the "spend_flag" column

orders_products_merged['spend_flag'].value_counts(dropna = False)

Low spender     29730531
High spender      598232
Name: spend_flag, dtype: int64

In [79]:
# Review dataframe

orders_products_merged.head()

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,department_id,prices,match,price_range_loc,busiest day,busiest_period_of_day,max_order,loyalty_flag,mean_prices,spend_flag
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender
1,2,473747,1,prior,3,3,12,21.0,196,1,...,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender
3,4,431534,1,prior,5,4,15,28.0,196,1,...,7,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,7,9.0,both,Mid-range product,Regularly busy,Average orders,10,New customer,6.372222,Low spender


In [80]:
# # Review multiple columns of interest

orders_products_merged[['user_id', 'spend_flag', 'mean_prices']].head(1000)

Unnamed: 0,user_id,spend_flag,mean_prices
0,1,Low spender,6.372222
1,1,Low spender,6.372222
2,1,Low spender,6.372222
3,1,Low spender,6.372222
4,1,Low spender,6.372222
5,1,Low spender,6.372222
6,1,Low spender,6.372222
7,1,Low spender,6.372222
8,1,Low spender,6.372222
9,15,Low spender,3.994118


#### 9. Exercise Q7a: Frequency flag for existing customers using the transform() function

In [81]:
# Step 1 (Group): the orders_products_merged dataframe is grouped by the "user_id" column
# Step 2 (Transform): the transform() function is applied to the "days_since_prior_order" column with the np.mean argument
# Step 3 (Store in new column): the mean days_since_prior_order for each user is stored in a new column called "mean_days_since_prior"

orders_products_merged['mean_days_since_prior'] = orders_products_merged.groupby(['user_id'])['days_since_prior_order'].transform(np.mean)

In [82]:
# Review dataframe

orders_products_merged.head(100)

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,prices,match,price_range_loc,busiest day,busiest_period_of_day,max_order,loyalty_flag,mean_prices,spend_flag,mean_days_since_prior
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259
1,2,473747,1,prior,3,3,12,21.0,196,1,...,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259
3,4,431534,1,prior,5,4,15,28.0,196,1,...,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,9.0,both,Mid-range product,Regularly busy,Average orders,10,New customer,6.372222,Low spender,20.259259
5,6,550135,1,prior,7,1,9,20.0,196,1,...,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer,6.372222,Low spender,20.259259
6,7,3108588,1,prior,8,1,14,14.0,196,2,...,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer,6.372222,Low spender,20.259259
7,8,2295261,1,prior,9,1,16,0.0,196,4,...,9.0,both,Mid-range product,Busiest days,Average orders,10,New customer,6.372222,Low spender,20.259259
8,9,2550362,1,prior,10,4,8,30.0,196,1,...,9.0,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259
9,144,2968173,15,prior,15,1,9,7.0,196,2,...,9.0,both,Mid-range product,Busiest days,Average orders,22,Regular customer,3.994118,Low spender,11.029412


#### 10. Exercise Q7b: Frequency flag for existing customers using the loc() function

In [83]:
# Create a frequency flag for each user based on the mean (average) days_since_prior_order across all their orders
# Mean days_since_prior_order higher than 20 - flag as a "Non-frequent customer"

orders_products_merged.loc[orders_products_merged['days_since_prior_order'] > 20, 'frequency_flag'] = 'Non-frequent customer'

In [84]:
# Mean price lower than 10 - flag as a "Frequent customer"

orders_products_merged.loc[orders_products_merged['days_since_prior_order'] <= 10, 'frequency_flag'] = 'Frequent customer'

In [85]:
# Mean days_since_prior_order higher than 10, and lower than or equal to 20 - flag as a "Regular customer"

orders_products_merged.loc[(orders_products_merged['days_since_prior_order'] > 10) & (orders_products_merged['days_since_prior_order'] <= 20), 'frequency_flag'] = 'Regular customer'

In [86]:
# Review the frequency of the "frequency_flag" column

orders_products_merged['frequency_flag'].value_counts(dropna = False)

Frequent customer        19042466
Regular customer          6055803
Non-frequent customer     5230494
Name: frequency_flag, dtype: int64

In [87]:
# Review dataframe

orders_products_merged.head()

Unnamed: 0,Unnamed: 0_x,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,...,match,price_range_loc,busiest day,busiest_period_of_day,max_order,loyalty_flag,mean_prices,spend_flag,mean_days_since_prior,frequency_flag
0,1,2398795,1,prior,2,3,7,15.0,196,1,...,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259,Regular customer
1,2,473747,1,prior,3,3,12,21.0,196,1,...,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259,Non-frequent customer
2,3,2254736,1,prior,4,4,7,29.0,196,1,...,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259,Non-frequent customer
3,4,431534,1,prior,5,4,15,28.0,196,1,...,both,Mid-range product,Least busy,Average orders,10,New customer,6.372222,Low spender,20.259259,Non-frequent customer
4,5,3367565,1,prior,6,2,7,19.0,196,1,...,both,Mid-range product,Regularly busy,Average orders,10,New customer,6.372222,Low spender,20.259259,Regular customer


In [88]:
# Review multiple columns of interest

orders_products_merged[['user_id', 'frequency_flag', 'mean_days_since_prior']].head(1000)

Unnamed: 0,user_id,frequency_flag,mean_days_since_prior
0,1,Regular customer,20.259259
1,1,Non-frequent customer,20.259259
2,1,Non-frequent customer,20.259259
3,1,Non-frequent customer,20.259259
4,1,Regular customer,20.259259
5,1,Regular customer,20.259259
6,1,Regular customer,20.259259
7,1,Frequent customer,20.259259
8,1,Non-frequent customer,20.259259
9,15,Frequent customer,11.029412


#### 11. Exercise Q8-9: Export dataframe as pkl file

In [89]:
# Export data to pkl

orders_products_merged.to_pickle(os.path.join(path, '03 Scripts','Prepared Data', 'orders_products_merged_v3.pkl'))

In [90]:
# Export merged data to csv

orders_products_merged.to_csv(os.path.join(path, '03 Scripts','Prepared Data', 'orders_products_merged_v3.csv'))