# 4.8 Grouping Data & Aggregating Variables

### This script contains the following points:

## Part One: Tasks from the Exercies
- **01. Importing Libraries**
- **02. Importing Data**
- **03. Create Data Subset - df**
- **04. Grouping Data**
- **05. Performing Aggregations**
- **06.  Create a Loyaty Flag**
- **07. Deriving Columns with loc()**

## Part Two: Exercise Task
- **08. Calculate Mean for Entire Dataframe**
- **09. Create Loyalty Flag for Existing Customers**
- **10. Analyzing Customer Spending by Loyality Level**
- **11. Customer Spending Classification for Targeted Marketing Campaigns**
- **12. Creating Order Frequency Flags for Targeted User Engagement**
- **13. Export Dataframe as a Pickle File**

# Part One: Tasks from the Exercies

## 01. Importing Libraries

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os

## 02. Importing Data

In [2]:
# Turn Path into a String & Assign Python Shortcut
path = '/Users/mgottlieb/Documents/Achievement 4'

In [3]:
# Import the orders_products_merg_busiest_times_and_days.pkl dataset (the export from last task)
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'orders_products_merg_busiest_times_and_days.pkl'))

## 03. Create Data Subset [df]

In [4]:
# Create a subset of the dataframe
df = ords_prods_merge[:1000000]

In [5]:
# Check the import
df.shape

(1000000, 19)

In [6]:
# Check the import
df.head()

Unnamed: 0,order_id,user_id,evaluation_status,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,days_since_prior_order_missing,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_days,busiest_period_of_day
0,2539329,1,prior,1,2,8,7.0,1,196,1,0,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders
1,2398795,1,prior,2,3,7,15.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
2,473747,1,prior,3,3,12,21.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
3,2254736,1,prior,4,4,7,29.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
4,431534,1,prior,5,4,15,28.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders


## 04. Grouping Data

In [7]:
df.groupby('product_name')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x130df7ad0>

## 05. Performing Aggregations

In [8]:
# Split subset of data into groups based on "department_id" and apply agg function
df.groupby('department_id').agg({'order_number': ['mean']})

Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
4,18.82578
7,17.472355
13,17.993423
14,19.246334
16,19.463012
17,11.294069
19,19.305237
20,17.599636


In [9]:
# Calculate the mean
df.groupby('department_id')['order_number'].mean()

department_id
4     18.825780
7     17.472355
13    17.993423
14    19.246334
16    19.463012
17    11.294069
19    19.305237
20    17.599636
Name: order_number, dtype: float64

In [10]:
df.groupby('department_id').agg({'order_number': ['mean', 'min', 'max']})

Unnamed: 0_level_0,order_number,order_number,order_number
Unnamed: 0_level_1,mean,min,max
department_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
4,18.82578,1,99
7,17.472355,1,99
13,17.993423,1,99
14,19.246334,1,99
16,19.463012,1,99
17,11.294069,1,98
19,19.305237,1,99
20,17.599636,1,99


## 06.  Create a Loyaty Flag

In [11]:
# Split the data (user_id), transform function (generate max orders per user), create a new column (max_order) for results
ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)

  ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)


In [12]:
# Use the string 'max' instead of np.max to ensure future compatibility and avoid the warning.
ords_prods_merge['max_order'] = ords_prods_merge.groupby('user_id')['order_number'].transform('max')

In [13]:
# Checking the output
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,evaluation_status,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,days_since_prior_order_missing,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_days,busiest_period_of_day,max_order
0,2539329,1,prior,1,2,8,7.0,1,196,1,0,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10
1,2398795,1,prior,2,3,7,15.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
2,473747,1,prior,3,3,12,21.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
3,2254736,1,prior,4,4,7,29.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
4,431534,1,prior,5,4,15,28.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
5,3367565,1,prior,6,2,7,19.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10
6,550135,1,prior,7,1,9,20.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10
7,3108588,1,prior,8,1,14,14.0,0,196,2,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10
8,2295261,1,prior,9,1,16,0.0,0,196,4,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10
9,2550362,1,prior,10,4,8,30.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10


In [14]:
# Display the first 15 rows with only 'user_id' and 'max_order' columns
print(ords_prods_merge.loc[:, ['user_id', 'max_order']].head(15))

    user_id  max_order
0         1         10
1         1         10
2         1         10
3         1         10
4         1         10
5         1         10
6         1         10
7         1         10
8         1         10
9         1         10
10       15         22
11       15         22
12       15         22
13       15         22
14       15         22


## 07. Deriving Columns with loc()

In [15]:
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [16]:
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [17]:
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [18]:
# Print the frequency of the new column
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

loyalty_flag
Regular customer    15876776
Loyal customer      10284093
New customer         6243990
Name: count, dtype: int64

In [19]:
ords_prods_merge[['user_id', 'loyalty_flag', 'order_number']].head(60)

Unnamed: 0,user_id,loyalty_flag,order_number
0,1,New customer,1
1,1,New customer,2
2,1,New customer,3
3,1,New customer,4
4,1,New customer,5
5,1,New customer,6
6,1,New customer,7
7,1,New customer,8
8,1,New customer,9
9,1,New customer,10


# Part Two: Exercise Task

## 08. Calculate Mean for Entire Dataframe

In [20]:
# Split the ENTIRE datafram into groups based on "department_id" and apply agg function
ords_prods_merge.groupby('department_id').agg({'order_number': ['mean']})

Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
1,15.457838
2,17.27792
3,17.170395
4,17.811403
5,15.215751
6,16.439806
7,17.225802
8,15.34065
9,15.895474
10,20.197148


For the means of the whole dataframe, this first thing to note is that this includes a comprehensive list of **ALL** department ids, not just 4, 7, 13, 14, 16, 17, 19, and 20.\
\
If comparing those department ids that are listed in both the subset and the entire dataset, we can see that the majority of the means are slightly higher in the subset:
#### Some of the means are slightly higher in the subset than the entire dataframe:
* Department_id: 4*
* Department_id: 7*
* Department_id: 13*
* Department_id: 14*
* Department_id: 16*
* Department_id: 19*
* Department_id: 20*

***Note: Department_id: 17 is the only subset that is slightly lower in the subset***\
Since most departments in the subset experience higher mean order numbers than the overall average the subset does not accurately reflect the entire dataset.\
\
For the **entire dataset**, we can see that there are a few departments with: 
*Highest Frequency: 10, 18, 21*
*Moderate Frequency: 2, 3, 4, 7, 16, 19*
*Lower Frequency: 1, 5, 6, 8, 9, 12, 13, 14, 15, 17, 20*
\
\
Seeing these results, we can then look deeper into to identify data points that are of importance for the highest and lowest frequecy.

## 09. Create Loyalty Flag for Existing Customers

In [21]:
# Checking the dataframe to see the previously created max_order and loyalty_flag
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,evaluation_status,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,days_since_prior_order_missing,product_id,add_to_cart_order,...,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_days,busiest_period_of_day,max_order,loyalty_flag
0,2539329,1,prior,1,2,8,7.0,1,196,1,...,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10,New customer
1,2398795,1,prior,2,3,7,15.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer
2,473747,1,prior,3,3,12,21.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer
3,2254736,1,prior,4,4,7,29.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer
4,431534,1,prior,5,4,15,28.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer
5,3367565,1,prior,6,2,7,19.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10,New customer
6,550135,1,prior,7,1,9,20.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10,New customer
7,3108588,1,prior,8,1,14,14.0,0,196,2,...,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10,New customer
8,2295261,1,prior,9,1,16,0.0,0,196,4,...,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10,New customer
9,2550362,1,prior,10,4,8,30.0,0,196,1,...,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer


In [22]:
# Drop the previously created 'loyalty_flag' and 'max_order' columns directly in the original DataFrame
ords_prods_merge.drop(columns=['loyalty_flag', 'max_order'], inplace=True)

In [23]:
# Checking to make sure columns were dropped
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,evaluation_status,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,days_since_prior_order_missing,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_days,busiest_period_of_day
0,2539329,1,prior,1,2,8,7.0,1,196,1,0,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders
1,2398795,1,prior,2,3,7,15.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
2,473747,1,prior,3,3,12,21.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
3,2254736,1,prior,4,4,7,29.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
4,431534,1,prior,5,4,15,28.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders
5,3367565,1,prior,6,2,7,19.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders
6,550135,1,prior,7,1,9,20.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders
7,3108588,1,prior,8,1,14,14.0,0,196,2,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders
8,2295261,1,prior,9,1,16,0.0,0,196,4,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders
9,2550362,1,prior,10,4,8,30.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders


In [24]:
# Split the data (user_id), transform function (generate max orders per user), create a new column (max_order) for results
ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)

  ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)


In [25]:
# Checking to make sure column was added
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,evaluation_status,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,days_since_prior_order_missing,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_days,busiest_period_of_day,max_order
0,2539329,1,prior,1,2,8,7.0,1,196,1,0,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10
1,2398795,1,prior,2,3,7,15.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
2,473747,1,prior,3,3,12,21.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
3,2254736,1,prior,4,4,7,29.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
4,431534,1,prior,5,4,15,28.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10
5,3367565,1,prior,6,2,7,19.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10
6,550135,1,prior,7,1,9,20.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10
7,3108588,1,prior,8,1,14,14.0,0,196,2,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10
8,2295261,1,prior,9,1,16,0.0,0,196,4,1,both,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10
9,2550362,1,prior,10,4,8,30.0,0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10


I can see that the new column of max_order has been created

In [26]:
# Use loc function to make loyalty flag column
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [27]:
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [28]:
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [29]:
# Check frequency of values in new loyalty flag column
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

loyalty_flag
Regular customer    15876776
Loyal customer      10284093
New customer         6243990
Name: count, dtype: int64

In [30]:
# Checking the relevant columns
ords_prods_merge[['user_id', 'loyalty_flag', 'order_number']].head(60)

Unnamed: 0,user_id,loyalty_flag,order_number
0,1,New customer,1
1,1,New customer,2
2,1,New customer,3
3,1,New customer,4
4,1,New customer,5
5,1,New customer,6
6,1,New customer,7
7,1,New customer,8
8,1,New customer,9
9,1,New customer,10


This looks correct so I can move on to analyzing customer spending

## 10. Analyzing Customer Spending by Loyality Level

The marketing team at Instacart is interested in understanding whether there are differences in the spending habits of customers based on their loyalty levels. Specifically, they want to investigate if the prices of products purchased by loyal customers differ from those bought by regular or new customers.

In [31]:
# Calculate summary statistics for product prices within each loyalty category
ords_prods_merge.groupby('loyalty_flag')['prices'].agg(['mean', 'median', 'std', 'min', 'max'])

Unnamed: 0_level_0,mean,median,std,min,max
loyalty_flag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Loyal customer,10.386336,7.4,328.017787,1.0,99999.0
New customer,13.29467,7.4,597.560299,1.0,99999.0
Regular customer,12.495717,7.4,539.720919,1.0,99999.0


In looking at these results, the results are:
- **Loyal customers have the lowest average spending per product 10.39 compared to New 13.29 and Regular Customers 12.50.**
- This is suprising that loyal customers have the lowest average in terms of spending.

## 11. Customer Spending Classification for Targeted Marketing Campaigns

To target different types of spenders in their marketing campaigns, Instacart wants to classify each user as a "Low spender" or a "High spender" based on the average price of the products they purchase.

- **Condition 1:** If the median is lower than or equal to 10, then the customer should be labeled a “Frequent customer.”

- **Condition 2:** If the median is higher than 10 and lower than or equal to 20, then the customer should be labeled a “Regular customer.”

- **Condition 3:** If the median of “days_since_prior_order” is higher than 20, then the customer should be labeled a “Non-frequent customer.” 

In [32]:
# Use transform() to determine the mean price of products purchased by each user
ords_prods_merge['user_avg_price'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

  ords_prods_merge['user_avg_price'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)


In [33]:
# Checking the output
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,evaluation_status,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,days_since_prior_order_missing,product_id,add_to_cart_order,...,product_name,aisle_id,department_id,prices,price_range_loc,busiest_days,busiest_period_of_day,max_order,loyalty_flag,user_avg_price
0,2539329,1,prior,1,2,8,7.0,1,196,1,...,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10,New customer,6.367797
1,2398795,1,prior,2,3,7,15.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer,6.367797
2,473747,1,prior,3,3,12,21.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer,6.367797
3,2254736,1,prior,4,4,7,29.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer,6.367797
4,431534,1,prior,5,4,15,28.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer,6.367797
5,3367565,1,prior,6,2,7,19.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Regularly busy,Average orders,10,New customer,6.367797
6,550135,1,prior,7,1,9,20.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10,New customer,6.367797
7,3108588,1,prior,8,1,14,14.0,0,196,2,...,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10,New customer,6.367797
8,2295261,1,prior,9,1,16,0.0,0,196,4,...,Soda,77,7,9.0,Mid-range product,Busiest days,Average orders,10,New customer,6.367797
9,2550362,1,prior,10,4,8,30.0,0,196,1,...,Soda,77,7,9.0,Mid-range product,Slowest days,Average orders,10,New customer,6.367797


In [34]:
# Create spending flag column based off of the given criteria
ords_prods_merge.loc[ords_prods_merge['user_avg_price'] < 10, 'spending_flag'] = 'Low spender'

In [35]:
ords_prods_merge.loc[ords_prods_merge['user_avg_price'] >= 10, 'spending_flag'] = 'High spender'


In [36]:
# Check the frequency of the new column
ords_prods_merge['spending_flag'].value_counts(dropna = False)

spending_flag
Low spender     31770614
High spender      634245
Name: count, dtype: int64

In [37]:
# Checking relevant columns
ords_prods_merge[['user_id', 'user_avg_price', 'spending_flag']].head(15)

Unnamed: 0,user_id,user_avg_price,spending_flag
0,1,6.367797,Low spender
1,1,6.367797,Low spender
2,1,6.367797,Low spender
3,1,6.367797,Low spender
4,1,6.367797,Low spender
5,1,6.367797,Low spender
6,1,6.367797,Low spender
7,1,6.367797,Low spender
8,1,6.367797,Low spender
9,1,6.367797,Low spender


## 12. Creating Order Frequency Flags for Targeted User Engagement

Instacart seeks to boost user engagement by sending personalized notifications tailored to customers' ordering frequencies. To achieve this, I need to categorize customers by the median days between their orders, allowing for targeted communication based on how frequently they shop.

In [38]:
# Calculate the median days since prior order for each user
ords_prods_merge['user_median_days'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)

  ords_prods_merge['user_median_days'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)


In [39]:
# Checking the output of relevent columns
ords_prods_merge.loc[:, ['user_id', 'days_since_prior_order', 'user_median_days']].head(15)

Unnamed: 0,user_id,days_since_prior_order,user_median_days
0,1,7.0,20.0
1,1,15.0,20.0
2,1,21.0,20.0
3,1,29.0,20.0
4,1,28.0,20.0
5,1,19.0,20.0
6,1,20.0,20.0
7,1,14.0,20.0
8,1,0.0,20.0
9,1,30.0,20.0


In [40]:
# Creating frequency flags
ords_prods_merge.loc[ords_prods_merge['user_median_days'] <= 10, 'order_frequency_flag'] = 'Frequent customer'

In [41]:
ords_prods_merge.loc[(ords_prods_merge['user_median_days'] > 10) & (ords_prods_merge['user_median_days'] <= 20), 'order_frequency_flag'] = 'Regular customer'

In [42]:
ords_prods_merge.loc[ords_prods_merge['user_median_days'] > 20, 'order_frequency_flag'] = 'Non-frequent customer'

In [43]:
# Print the frequency of the new column
ords_prods_merge['order_frequency_flag'].value_counts(dropna = False)

order_frequency_flag
Frequent customer        22790523
Regular customer          6927608
Non-frequent customer     2686728
Name: count, dtype: int64

In [46]:
# Check the relevant and new columns
ords_prods_merge[['user_id', 'days_since_prior_order', 'user_median_days', 'order_frequency_flag']].head(20)

Unnamed: 0,user_id,days_since_prior_order,user_median_days,order_frequency_flag
0,1,7.0,20.0,Regular customer
1,1,15.0,20.0,Regular customer
2,1,21.0,20.0,Regular customer
3,1,29.0,20.0,Regular customer
4,1,28.0,20.0,Regular customer
5,1,19.0,20.0,Regular customer
6,1,20.0,20.0,Regular customer
7,1,14.0,20.0,Regular customer
8,1,0.0,20.0,Regular customer
9,1,30.0,20.0,Regular customer


## 12. Export Dataframe as a Pickle File

In [51]:
# Export dataframe to pkl file
ords_prods_merge.to_pickle(os.path.join(path, '02 Data','Prepared Data', 'orders_products_flags_grouped.pkl'))