# Table of Contents

Q1. Import the data set you exported in Exercise 4.9 into a new Jupyter notebook. This should be the merged version of your 
data that contains the new customers column.

Q2. Consider any security implications that might exist for this new data. You’ll need to address any PII data in the data 
before continuing your analysis.

Q3. The Instacart officers are interested in comparing customer behavior in different geographic areas. 
Create a regional segmentation of the data. You’ll need to create a “Region” column based on the “State” column from your 
customers data set:

a. Use the region information in this Wikipedia article to create your column (you only need to create regions, not divisions)

b. Determine whether there’s a difference in spending habits between the different U.S. regions. (Hint: You can do this by crossing the variable you just created with the spending flag.)

    
Q4. The Instacart CFO isn’t interested in customers who don’t generate much revenue for the app. Create an exclusion flag 
for low-activity customers (customers with less than 5 orders) and exclude them from the data. Make sure you export 
this sample

Q5. The marketing and business strategy units at Instacart want to create more-relevant marketing strategies for different 
products and are, thus, curious about customer profiling in their database. Create a profiling variable based on age, income, 
certain goods in the “department_id” column, and number of dependents. You might also use the “orders_day_of_the_week” and 
“order_hour_of_day” columns if you can think of a way they would impact customer profiles. 
(Hint: As an example, try thinking of what characteristics would lead you to the profile “Single adult” or “Young parent.”)

# 01. Import Libraries

In [1]:
# Import libraries

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
import scipy

### Import the data set you exported in Exercise 4.9 into a new Jupyter notebook. This should be the merged version of your data that contains the new customers column.

# 02. Importing Data

In [2]:
# Project folder pathway

path = r'C:\Users\SD.DESKTOP-45BHU58\OneDrive\Desktop\CF\Section 4 - Python\Instacart Basket Analysis'

In [3]:
path

'C:\\Users\\SD.DESKTOP-45BHU58\\OneDrive\\Desktop\\CF\\Section 4 - Python\\Instacart Basket Analysis'

In [4]:
# Importing most updated merged data set

ords_prods_custs_merged = pd.read_pickle(os.path.join(path,'02 Data','Prepared Data','orders_products_customers_merged.pkl'))

In [5]:
# Checking shape

ords_prods_custs_merged.shape

(32404859, 34)

In [6]:
# Checking headers

ords_prods_custs_merged.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_the_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,frequency_flag,first_name,last_name,gender,state,age,date_joined,number_of_dependants,fam_status,income
0,2539329,1,1,2,8,,True,196,1,0,...,Non-frequent customer,Linda,Nguyen,Female,Alabama,31,2/17/2019,3,married,40423
1,2398795,1,2,3,7,15.0,False,196,1,1,...,Non-frequent customer,Linda,Nguyen,Female,Alabama,31,2/17/2019,3,married,40423
2,473747,1,3,3,12,21.0,False,196,1,1,...,Non-frequent customer,Linda,Nguyen,Female,Alabama,31,2/17/2019,3,married,40423
3,2254736,1,4,4,7,29.0,False,196,1,1,...,Non-frequent customer,Linda,Nguyen,Female,Alabama,31,2/17/2019,3,married,40423
4,431534,1,5,4,15,28.0,False,196,1,1,...,Non-frequent customer,Linda,Nguyen,Female,Alabama,31,2/17/2019,3,married,40423


### Consider any security implications that might exist for this new data. You’ll need to address any PII data in the data before continuing your analysis

# 03. Handling PII Data

'first_name' and 'last_name' columns contain sensitive information since it can be traced back to particular people.

These columns will not be used in the analysis and a copy of the original dataframe will be created excluding these 2 columns.

In [7]:
# Copy of 'ords_prods_custs_merged' without 'first_name' and 'last_name' columns

ords_prods_custs_merged_2 = ords_prods_custs_merged.drop(columns = ['first_name', 'last_name'])

In [8]:
# Checking headers of ords_prods_custs_merged_2 - ensuring there is no PII data

ords_prods_custs_merged_2.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_the_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,spender_flag,order_frequency_by_cust,frequency_flag,gender,state,age,date_joined,number_of_dependants,fam_status,income
0,2539329,1,1,2,8,,True,196,1,0,...,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423
1,2398795,1,2,3,7,15.0,False,196,1,1,...,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423
2,473747,1,3,3,12,21.0,False,196,1,1,...,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423
3,2254736,1,4,4,7,29.0,False,196,1,1,...,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423
4,431534,1,5,4,15,28.0,False,196,1,1,...,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423


### The Instacart officers are interested in comparing customer behavior in different geographic areas. Create a regional segmentation of the data. You’ll need to create a “Region” column based on the “State” column from your customers data set:  
### a. Use the region information in this Wikipedia article to create your column (you only need to create regions, not divisions) 



# 04. Creating Region Column

In [9]:
# Checking value counts of 'state' column

ords_prods_custs_merged_2['state'].value_counts(dropna = False)

Pennsylvania            667082
California              659783
Rhode Island            656913
Georgia                 656389
New Mexico              654494
Arizona                 653964
North Carolina          651900
Oklahoma                651739
Alaska                  648495
Minnesota               647825
Massachusetts           646358
Wyoming                 644255
Virginia                641421
Missouri                640732
Texas                   640394
Colorado                639280
Maine                   638583
North Dakota            638491
Alabama                 638003
Kansas                  637538
Louisiana               637482
Delaware                637024
South Carolina          636754
Oregon                  636425
Arkansas                636144
Nevada                  636139
New York                635983
Montana                 635265
South Dakota            633772
Illinois                633024
Hawaii                  632901
Washington              632852
Mississi

In [10]:
# Creating a list of region values using if-statement with for-loops

region = []

for value in ords_prods_custs_merged_2['state']:
    if (value in ['Maine', 'New Hampshire', 'Vermont', 'Massachusetts', 'Rhode Island', 'Connecticut', 'New York', 'Pennsylvania', 'New Jersey']):
        region.append('Northeast')
    elif (value in ['Wisconsin', 'Michigan', 'Illinois', 'Ohio', 'North Dakota', 'South Dakota', 'Nebraska', 'Minnesota', 'Iowa', 'Missouri']):
        region.append('Midwest')
    elif (value in ['Delaware', 'Maryland', 'District of Columbia', 'Virginia', 'West Virginia', 'North Carolina', 'South Carolina', 'Georgia', 'Florida', 'Kentucky', 'Tennessee', 'Mississippi', 'Alabama', 'Oklahoma', 'Texas', 'Arkansas', 'Louisiana']):
        region.append('South')
    else:
        region.append('West')

In [11]:
# Printing region values

region

['South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'South',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'West',
 'Wes

In [12]:
# Creating new 'region' column within df, using 'region' values

ords_prods_custs_merged_2['region'] = region

In [13]:
# Checking 'region' value counts

ords_prods_custs_merged_2['region'].value_counts(dropna = False)

South        10791885
West          9557733
Midwest       6332505
Northeast     5722736
Name: region, dtype: int64

'region' values of 32,404,859 equals total row count

In [14]:
# Checking shape

ords_prods_custs_merged_2.shape

(32404859, 33)

In [15]:
# Checking headers

pd.set_option('display.max_columns', None)
ords_prods_custs_merged_2.head(1)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_the_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_order_cost_by_cust,spender_flag,order_frequency_by_cust,frequency_flag,gender,state,age,date_joined,number_of_dependants,fam_status,income,region
0,2539329,1,1,2,8,,True,196,1,0,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy days,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423,South


### b. Determine whether there’s a difference in spending habits between the different U.S. regions

In [16]:
# Creating crosstab

crosstab = pd.crosstab(ords_prods_custs_merged_2['region'], ords_prods_custs_merged_2['spender_flag'], dropna = False)

In [17]:
crosstab

spender_flag,High spender,Low spender
region,Unnamed: 1_level_1,Unnamed: 2_level_1
Midwest,127656,6204849
Northeast,108225,5614511
South,209691,10582194
West,188673,9369060


In [18]:
# High spender percentages by region

Midwest = (127656/(127656+6204849))*100
Northeast = (108225/(108225+5614511))*100
South = (209691/(209691+10582194))*100
West = (188673/(188673+9369060))*100

In [19]:
# Percentage results

Midwest, Northeast, South, West

(2.0158847091316945,
 1.8911408808653762,
 1.9430433144904713,
 1.9740350562209679)

There is not a significant difference between the regions in terms of spending habits. The South region had the highest amount of customers that were high spenders but the Midwest region had a higher percentage of population that were high spenders.

### The Instacart CFO isn’t interested in customers who don’t generate much revenue for the app. Create an exclusion flag for low-activity customers (customers with less than 5 orders) and exclude them from the data. Make sure you export this sample

# 05. Excluding Low-Activity Customers

In [20]:
# Creating low-activity flag - customers w/ less than 5 orders

ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['max_order'] < 5, 'activity_flag'] = 'Low activity'

In [21]:
# Creating low-activity flag - customers w/ more than 5 orders

ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['max_order'] >= 5, 'activity_flag'] = 'Normal activity'

In [22]:
# Checking values under 'activity_flag' column

ords_prods_custs_merged_2['activity_flag'].value_counts(dropna = False)

Normal activity    30964564
Low activity        1440295
Name: activity_flag, dtype: int64

In [23]:
# Creating df with only 'Low activity' customers

ords_prods_low_activity_custs = ords_prods_custs_merged_2[ords_prods_custs_merged_2['activity_flag']=='Low activity']

In [24]:
# Checking 'max_order' value count of low activity df

ords_prods_low_activity_custs['max_order'].value_counts(dropna = False)

4    753543
3    686741
2         6
1         5
Name: max_order, dtype: int64

Total of 1,440,295 records with no values of 5 or above

In [25]:
# Creating df with 'Normal activity' customers

ords_prods_norm_activity_custs = ords_prods_custs_merged_2[ords_prods_custs_merged_2['activity_flag']=='Normal activity']

In [26]:
# Checking 'max_order' value count of normal activity df

ords_prods_norm_activity_custs['max_order'].value_counts(dropna = False)

99    1171333
8      811843
6      811396
9      810213
7      803979
       ...   
93      50333
90      48328
97      44949
98      44587
96      40453
Name: max_order, Length: 95, dtype: int64

In [53]:
# Exporting dataframe containing normal activity customers

ords_prods_norm_activity_custs.to_csv(os.path.join(path,'02 Data','Prepared Data','ords_prods_normal_activity_custs.csv'))

### The marketing and business strategy units at Instacart want to create more-relevant marketing strategies for different products and are, thus, curious about customer profiling in their database. Create a profiling variable based on age, income, certain goods in the “department_id” column, and number of dependents. You might also use the “orders_day_of_the_week” and “order_hour_of_day” columns if you can think of a way they would impact customer profiles. (Hint: As an example, try thinking of what characteristics would lead you to the profile “Single adult” or “Young parent.”

# 06. Customer Profiling

In [27]:
# Checking headers

ords_prods_custs_merged_2.head(1)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_the_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_order_cost_by_cust,spender_flag,order_frequency_by_cust,frequency_flag,gender,state,age,date_joined,number_of_dependants,fam_status,income,region,activity_flag
0,2539329,1,1,2,8,,True,196,1,0,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy days,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423,South,Normal activity


### Classifying Age Groups

In [28]:
# Checking age range of customers

ords_prods_custs_merged_2['age'].value_counts().sort_index()

18    500474
19    519606
20    502848
21    512527
22    513853
       ...  
77    502120
78    495311
79    529742
80    515246
81    509284
Name: age, Length: 64, dtype: int64

In [29]:
# Classifying age groups using the if-statement w/ for-loops

generation = []

for value in ords_prods_custs_merged_2['age']:
    if (value >=18) and (value <=25):
        generation.append('Gen Z 18-25')
    elif (value >=26) and (value <=41):
        generation.append('Millennials 26-41')
    elif (value >=42) and (value <=57):
        generation.append('Gen X 42-57')
    elif (value >=58) and (value<=76):
        generation.append('Baby Boomers 58-76')
    else:
        generation.append('Silent Generation >76')
        

In [30]:
# Adding 'generation' column with results from generation list

ords_prods_custs_merged_2['generation'] = generation

In [31]:
# Check value counts of 'generation' column

ords_prods_custs_merged_2['generation'].value_counts(dropna = False)

Baby Boomers 58-76       9520639
Gen X 42-57              8187246
Millennials 26-41        8100082
Gen Z 18-25              4045189
Silent Generation >76    2551703
Name: generation, dtype: int64

### Classifying Income

In [32]:
# Checking income range of customers

ords_prods_custs_merged_2['income'].value_counts().sort_index()

25903      20
25911      28
25937      39
25941      31
25955     392
         ... 
584097    826
590790     54
591089    176
592409    318
593901    640
Name: income, Length: 108012, dtype: int64

In [33]:
# Classifying income groups using if-statement w/ for-loops

income_group = []

for value in ords_prods_custs_merged_2['income']:
    if (value < 32000):
        income_group.append('Lower class <32K')
    elif (value >= 32000) and (value <= 53000):
        income_group.append('Lower-middle class 32K-53K')
    elif (value > 53000) and (value <= 107000):
        income_group.append('Middle class 53K-107K')
    elif (value > 107000) and (value <=374000):
        income_group.append('Upper middle class 107K-374K')
    else:
        income_group.append('Upper class >374K')
    

In [34]:
# Adding 'income_group' column with results from income_group list

ords_prods_custs_merged_2['income_group'] = income_group

In [35]:
ords_prods_custs_merged_2['income_group'].value_counts(dropna = False)

Middle class 53K-107K           15403902
Upper middle class 107K-374K    12772553
Lower-middle class 32K-53K       3651538
Lower class <32K                  517878
Upper class >374K                  58988
Name: income_group, dtype: int64

### Parent Status Based on Dependants

In [36]:
# Checking number of dependants

ords_prods_custs_merged_2['number_of_dependants'].value_counts(dropna = False)

3    8135109
0    8097503
2    8091143
1    8081104
Name: number_of_dependants, dtype: int64

In [37]:
# Classifying customers as parent/non-parent based on 'number_of_depedants' using loc() function

ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['number_of_dependants'] == 0, 'parent_status'] = 'Non-parent'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['number_of_dependants'] >= 1, 'parent_status'] = 'Parent'

In [38]:
# Checking values for 'parent_status' column

ords_prods_custs_merged_2['parent_status'].value_counts(dropna = False)

Parent        24307356
Non-parent     8097503
Name: parent_status, dtype: int64

In [39]:
# Checking columns

ords_prods_custs_merged_2.head(1)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_the_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_order_cost_by_cust,spender_flag,order_frequency_by_cust,frequency_flag,gender,state,age,date_joined,number_of_dependants,fam_status,income,region,activity_flag,generation,income_group,parent_status
0,2539329,1,1,2,8,,True,196,1,0,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy days,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423,South,Normal activity,Millennials 26-41,Lower-middle class 32K-53K,Parent


### Listing Out Name of Weekday

In [40]:
# Creating weekday column based on 'orders_day_of_the_week'

ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 0, 'weekday'] = 'Saturday'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 1, 'weekday'] = 'Sunday'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 2, 'weekday'] = 'Monday'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 3, 'weekday'] = 'Tuesday'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 4, 'weekday'] = 'Wednesday'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 5, 'weekday'] = 'Thursday'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['orders_day_of_the_week'] == 6, 'weekday'] = 'Friday'

In [41]:
# Checking values for 'weekday'

ords_prods_custs_merged_2['weekday'].value_counts(dropna = False)

Saturday     6204182
Sunday       5660230
Friday       4496490
Monday       4213830
Thursday     4205791
Tuesday      3840534
Wednesday    3783802
Name: weekday, dtype: int64

### Listing Out Department Names

In [42]:
# Creating department_name column based on department_id

ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 1, 'department_name'] = 'Frozen'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 2, 'department_name'] = 'Other'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 3, 'department_name'] = 'Bakery'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 4, 'department_name'] = 'Produce'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 5, 'department_name'] = 'Alcohol'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 6, 'department_name'] = 'International'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 7, 'department_name'] = 'Beverages'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 8, 'department_name'] = 'Pets'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 9, 'department_name'] = 'Dry goods'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 10, 'department_name'] = 'Bulk'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 11, 'department_name'] = 'Personal care'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 12, 'department_name'] = 'Meat seafood'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 13, 'department_name'] = 'Pantry'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 14, 'department_name'] = 'Breakfast'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 15, 'department_name'] = 'Canned goods'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 16, 'department_name'] = 'Dairy Eggs'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 17, 'department_name'] = 'Household'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 18, 'department_name'] = 'Babies'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 19, 'department_name'] = 'Snacks'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 20, 'department_name'] = 'Deli'
ords_prods_custs_merged_2.loc[ords_prods_custs_merged_2['department_id'] == 21, 'department_name'] = 'Missing'

In [43]:
# Checking values for 'department_name'

ords_prods_custs_merged_2['department_name'].value_counts(dropna = False)

Produce          9479291
Dairy Eggs       5398747
Snacks           2887550
Beverages        2688123
Frozen           2234743
Pantry           1875369
Bakery           1172428
Canned goods     1068058
Deli             1051249
Dry goods         866627
Household         738666
Meat seafood      708927
Breakfast         703033
Personal care     447572
Babies            423802
International     269253
Alcohol           153696
Pets               97716
Missing            69145
Other              36291
Bulk               34573
Name: department_name, dtype: int64

In [44]:
# Checking headers

ords_prods_custs_merged_2.head(1)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_the_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_order_cost_by_cust,spender_flag,order_frequency_by_cust,frequency_flag,gender,state,age,date_joined,number_of_dependants,fam_status,income,region,activity_flag,generation,income_group,parent_status,weekday,department_name
0,2539329,1,1,2,8,,True,196,1,0,Soda,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy days,Average orders,10,New customer,6.367797,Low spender,20.5,Non-frequent customer,Female,Alabama,31,2/17/2019,3,married,40423,South,Normal activity,Millennials 26-41,Lower-middle class 32K-53K,Parent,Monday,Beverages


# 07. Exporting Data

### Notebook is slowing down, script will continue in Part 2

In [52]:
# Exporting current df as csv file. Computer had issues trying to export as pickle file

ords_prods_custs_merged_2.to_csv(os.path.join(path, '02 Data','Prepared Data', 'ords_prods_custs_merged_2.csv'))