# Table of Contents

- Creation of two new columns, "price_label" and "busiest_day"

- Creation of "busiest_day_new" column to reflect new conditions

- Creation of "busiest_period_of_day" column based off Most, Average, and Fewest orders 

## Step 1

1. If you haven’t done so already, complete the instructions in the Exercise for creating the “price_label” and “busiest_day” columns.

In [1]:
# importing libraries
import pandas as pd
import numpy as np
import os

# setting master path
path = r'/Users/Norberto/Desktop/2023-10 Instacart Basket Analysis'

In [2]:
# read pkl files
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'orders_products_merged.pkl'))

#### Creating "price_label_loc" column

In [3]:
# search price column and set price_range_loc column value if high-range product
ords_prods_merge.loc[ords_prods_merge['prices'] > 15, 'price_range_loc'] = 'High-range product'

In [4]:
# search price column and set price_range_loc column value if mid-range product
ords_prods_merge.loc[(ords_prods_merge['prices'] <= 15) & (ords_prods_merge['prices'] > 5), 'price_range_loc'] = 'Mid-range product' 

In [5]:
# search price column and set price_range_loc column value if low-range product
ords_prods_merge.loc[ords_prods_merge['prices'] <= 5, 'price_range_loc'] = 'Low-range product'

In [6]:
# inspect the price_range_loc column values
ords_prods_merge['price_range_loc'].value_counts(dropna=False)

price_range_loc
Mid-range product     21860860
Low-range product     10126321
High-range product      417678
NaN                      30200
Name: count, dtype: int64

#### Creating "busiest_day" column

In [7]:
ords_prods_merge['orders_day_of_week'].value_counts(dropna=False)

orders_day_of_week
0    6209808
1    5665951
6    4500391
2    4217868
5    4209603
3    3844175
4    3787263
Name: count, dtype: int64

In [8]:
# create empty list to store values from for-loop
orders_dow_list = []

# iterate over orders_dow column in df and add values to empty list.
for value in ords_prods_merge['orders_day_of_week']:
# if dow is Saturday/0 label busiest day
  if value == 0:
    orders_dow_list.append('Busiest day')
# if dow is Wednesday/4 label lest busy
  elif value == 4:
    orders_dow_list.append('Least busy')
# all other values labeled regularly busy
  else:
    orders_dow_list.append('Regularly busy')

In [9]:
# assign result list to new column and inspect values
ords_prods_merge['busiest_day'] = orders_dow_list
ords_prods_merge['busiest_day'].value_counts(dropna=False)

busiest_day
Regularly busy    22437988
Busiest day        6209808
Least busy         3787263
Name: count, dtype: int64

## Steps 2 & 3

2. Suppose your clients have changed their minds about the labels you created in your “busiest_day” column. Now, they want “Busiest day” to become “Busiest days” (plural). This label should correspond with the two busiest days of the week as opposed to the single busiest day. At the same time, they’d also like to know the two slowest days. Create a new column for this using a suitable method.

3. Check the values of this new column for accuracy. Note any observations in markdown format.

In [10]:
# create empty list to store values from for-loop
orders_dow_list_new = []

# iterate over orders_dow column in df and add values to empty list.
for value in ords_prods_merge['orders_day_of_week']:
# if dow is Saturday/0 label busiest day
  if value in (0,1):
    orders_dow_list_new.append('Busiest days')
# if dow is Wednesday/4 label lest busy
  elif value in (4,3):
    orders_dow_list_new.append('Least busy')
# all other values labeled regularly busy
  else:
    orders_dow_list_new.append('Regularly busy')

In [11]:
# assign result list to new column and inspect values
ords_prods_merge['busiest_day_new'] = orders_dow_list_new
ords_prods_merge['busiest_day_new'].value_counts(dropna=False)

busiest_day_new
Regularly busy    12927862
Busiest days      11875759
Least busy         7631438
Name: count, dtype: int64

The number of regularly busy days decreased as more conditions were presented to capture the values in the orders_dow column.

## Steps 4 & 5

4. When too many users make Instacart orders at the same time, the app freezes. The senior technical officer at Instacart wants you to identify the busiest hours of the day. Rather than by hour, they want periods of time labeled “Most orders,” “Average orders,”and “Fewest orders.” Create a new column containing these labels called “busiest_period_of_day.”

5. Print the frequency for this new column.


In [12]:
# list value counts for different hours of the day
ords_prods_merge['order_hour_of_day'].value_counts(dropna=False)

order_hour_of_day
10    2764476
11    2738647
14    2691598
15    2664583
13    2663346
12    2620898
16    2537506
9     2456751
17    2089510
8     1719991
18    1637956
19    1259416
20     977049
7      891951
21     796379
22     634743
23     402621
6      290796
0      218951
1      115787
5       88064
2       69435
4       53284
3       51321
Name: count, dtype: int64

In [13]:
# create empty list to store values from for-loop
orders_hod_list = []

# iterate over orders_dow column in df and add values to empty list.
for value in ords_prods_merge['order_hour_of_day']:
# if H.O.D. is in list label most orders
  if value in (10,11,14,15,13,12,16,9):
    orders_hod_list.append('Most orders')
# if H.O.D. is in list label average orders
  elif value in (17,8,18,19,20,7,21,22) :
    orders_hod_list.append('Average orders')
# if H.O.D. is in list label fewest orders
  elif value in (23,6,0,1,5,2,4,3):
    orders_hod_list.append('Fewest orders')
# if H.O.D. is not in any list, label time unavailable
  else:
    orders_hod_list.append('Time Unavailable')

In [14]:
# assign result list to new column and inspect values
ords_prods_merge['busiest_period_of_day'] = orders_hod_list
ords_prods_merge['busiest_period_of_day'].value_counts(dropna=False)

busiest_period_of_day
Most orders       21137805
Average orders    10006995
Fewest orders      1290259
Name: count, dtype: int64

In [15]:
# Exporting left merge to pkl since it is the most comprehensive data set
ords_prods_merge.to_pickle(os.path.join(path, '02 Data','Prepared Data', 'orders_products_merged_4.7.pkl'))