# 4.8 Grouping Data and Aggregating Variables (1)

##### This script contains the following points:

#### Step 1 - Create notebook, import libraries and data

#### Step 2 - Find the aggregated mean of the 'order_number' column grouped by 'department_id' for the entire dataframe

#### Step 3 - Analyze the result in a markdown cell below the executed code

#### Step 4 - Create loyalty flag for existing customers using the trasform( ) and loc( ) functions

#### (Dataframe exported as .pkl ... continued in next notebook)

### Step 1 - Create notebook, import libraries and data

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os

In [2]:
# Set path
path = r'/Users/mainframe/Documents/Instacart Basket Analysis'

In [3]:
# Import Data
ords_prods_merge = pd.read_pickle(os.path.join(path,'02 Data', 'Prepared Data', 'orders_products_merged_derived.pkl'))

### Step 2 - Find the aggregated mean of the 'order_number' column grouped by 'department_id' for the entire dataframe

In [4]:
# Perform mean aggregation of 'order_number' column, grouped by 'department_id' for ords_prods_merge
ords_prods_merge.groupby('department_id').agg({'order_number': ['mean']})

Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
1,15.457838
2,17.27792
3,17.170395
4,17.811403
5,15.215751
6,16.439806
7,17.225802
8,15.34065
9,15.895474
10,20.197148


### Step 3 - Analyze the result in a markdown cell below the executed code

#### The mean of the 'order_number' column grouped by 'department_id' for entire dataframe is higher in every column except for 17 and 20 than the mean of the subset of the first 1,000,000 rows. Also, 13 more 'department_id' values show up in the entire dataframe than the subset. 

In [5]:
# Create 'order_number' column
ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)

  ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)


In [6]:
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,hour_ordered,days_since_prior_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order
0,2539329,1,1,2,8,,196,1,0,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy days,Average orders,10
1,2539329,1,1,2,8,,14084,2,0,Organic Unsweetened Vanilla Almond Milk,91,16,12.5,both,Mid-range product,Regularly busy,Regularly busy days,Average orders,10
2,2539329,1,1,2,8,,12427,3,0,Original Beef Jerky,23,19,4.4,both,Low-range product,Regularly busy,Regularly busy days,Average orders,10
3,2539329,1,1,2,8,,26088,4,0,Aged White Cheddar Popcorn,23,19,4.7,both,Low-range product,Regularly busy,Regularly busy days,Average orders,10
4,2539329,1,1,2,8,,26405,5,0,XL Pick-A-Size Paper Towel Rolls,54,17,1.0,both,Low-range product,Regularly busy,Regularly busy days,Average orders,10
5,2398795,1,2,3,7,15.0,196,1,1,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Least busy days,Average orders,10
6,2398795,1,2,3,7,15.0,10258,2,0,Pistachios,117,19,3.0,both,Low-range product,Regularly busy,Least busy days,Average orders,10
7,2398795,1,2,3,7,15.0,12427,3,1,Original Beef Jerky,23,19,4.4,both,Low-range product,Regularly busy,Least busy days,Average orders,10
8,2398795,1,2,3,7,15.0,13176,4,0,Bag of Organic Bananas,24,4,10.3,both,Mid-range product,Regularly busy,Least busy days,Average orders,10
9,2398795,1,2,3,7,15.0,26088,5,1,Aged White Cheddar Popcorn,23,19,4.7,both,Low-range product,Regularly busy,Least busy days,Average orders,10


### Step 4 - Create loyalty flag for existing customers using the trasform( ) and loc( ) functions

In [7]:
# Create loyalty flag
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [8]:
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [9]:
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [10]:
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

loyalty_flag
Regular customer    15876776
Loyal customer      10284093
New customer         6243990
Name: count, dtype: int64

In [11]:
ords_prods_merge[['user_id', 'loyalty_flag', 'order_number']].head(60)

Unnamed: 0,user_id,loyalty_flag,order_number
0,1,New customer,1
1,1,New customer,1
2,1,New customer,1
3,1,New customer,1
4,1,New customer,1
5,1,New customer,2
6,1,New customer,2
7,1,New customer,2
8,1,New customer,2
9,1,New customer,2


### (Dataframe exported as .pkl ... continued in next notebook)

In [12]:
# Export as .pkl
ords_prods_merge.to_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'ords_prods_merged_derived.pkl'))