# MM&A Supermarket Case

## 1. Exploring the Data (EDA)

In [2]:
# importing necessary libraries
import numpy as np
import pandas as pd

In [3]:
# reading in the csv file into a dataframe
file = 'mma_mart.csv'
df = pd.read_csv(file)

In [4]:
# observing the data structure
df.head()

Unnamed: 0,order_id,product_id,product_name,aisle_id,aisle,department_id,department
0,1,49302,Bulgarian Yogurt,120,yogurt,16,dairy eggs
1,1,11109,Organic 4% Milk Fat Whole Milk Cottage Cheese,108,other creams cheeses,16,dairy eggs
2,1,10246,Organic Celery Hearts,83,fresh vegetables,4,produce
3,1,49683,Cucumber Kirby,83,fresh vegetables,4,produce
4,1,43633,Lightly Smoked Sardines in Olive Oil,95,canned meat seafood,15,canned goods


In [6]:
# finding the shape of the data set (row and column)
df.shape

(987259, 7)

The data shows 987259 rows and 7 columns.

In [7]:
# finding amount of unique entries in each column
df.nunique()

order_id         97833
product_id       35070
product_name     35070
aisle_id           134
aisle              134
department_id       21
department          21
dtype: int64

There are ~98000 orders and ~35000 unique products which is across 134 aisles that belong to 21 departments. This shows that selecting 1000-1200 products for the instabasket aisle is selecting only 1000/35000 = ~3% of the total product selection offered at MM&A supermarket.

Next, exploring the aisle and departments to maybe gain some insight on what products should be considered as refrigerant and frozen products.

In [16]:
df['aisle'].unique()

array(['yogurt', 'other creams cheeses', 'fresh vegetables',
       'canned meat seafood', 'fresh fruits', 'packaged cheese', 'eggs',
       'spices seasonings', 'oils vinegars', 'baking ingredients',
       'doughs gelatins bake mixes', 'spreads',
       'packaged vegetables fruits', 'soy lactosefree', 'poultry counter',
       'bread', 'breakfast bakery', 'cold flu allergy',
       'energy granola bars', 'breakfast bars pastries', 'chips pretzels',
       'trail mix snack mix', 'crackers', 'refrigerated',
       'energy sports drinks', 'salad dressing toppings',
       'prepared soups salads', 'milk', 'paper goods',
       'water seltzer sparkling water', 'kosher foods',
       'packaged poultry', 'instant foods', 'packaged produce',
       'cookies cakes', 'candy chocolate', 'body lotions soap',
       'dry pasta', 'laundry', 'air fresheners candles', 'frozen produce',
       'buns rolls', 'canned fruit applesauce', 'juice nectars',
       'granola', 'fresh herbs', 'baby food formul

As can be seen from the aisle names, there are aisles with "frozen" such as `frozen produce` or `frozen meat seafood` in it which all products listed under these aisles would be considered for the 100 frozen product for instabasket. The same can't be said for refridgerant as there a lot of aisles to parse through so it might be better to look at the department to determine what department contains refridgerant products.

In [22]:
df['department'].unique()

array(['dairy eggs', 'produce', 'canned goods', 'pantry', 'meat seafood',
       'bakery', 'personal care', 'snacks', 'breakfast', 'beverages',
       'deli', 'household', 'international', 'dry goods pasta', 'frozen',
       'babies', 'pets', 'alcohol', 'bulk', 'missing', 'other'],
      dtype=object)

From the look at department names, `dairy eggs`, `produce`, `meat seafood`, `deli`, `frozen`, `missing` and `other` are all potential products for fridge and freezer space so it might be better to take a closer look at the products in those department.

In [38]:
dept_list = ['dairy eggs', 'produce', 'meat seafood', 'deli', 'frozen', 'missing', 'other']

In [80]:
# looking at products and aisles of those products for insight as well as how many products each department had
for department in dept_list:
    print(f"Department: {department}")
    print(f"Number of products: {df[df['department']==department]['product_name'].nunique()}\n")
    display(df[df['department']==department][['product_name', 'aisle']])

Department: dairy eggs
Number of products: 2886



Unnamed: 0,product_name,aisle
0,Bulgarian Yogurt,yogurt
1,Organic 4% Milk Fat Whole Milk Cottage Cheese,other creams cheeses
7,Organic Whole String Cheese,packaged cheese
8,Organic Egg Whites,eggs
17,Total 2% with Strawberry Lowfat Greek Strained...,yogurt
...,...,...
987240,Large Grade AA Organic Eggs,eggs
987241,Reduced Fat Mozarella String Cheese,packaged cheese
987242,Vanilla Light & Fit Greek Yogurt,yogurt
987243,Non-Fat Vanilla Blended Greek Yogurt,yogurt


Department: produce
Number of products: 1437



Unnamed: 0,product_name,aisle
2,Organic Celery Hearts,fresh vegetables
3,Cucumber Kirby,fresh vegetables
5,Bag of Organic Bananas,fresh fruits
6,Organic Hass Avocado,fresh fruits
9,Michigan Organic Kale,fresh vegetables
...,...,...
987245,Gala Apples,fresh fruits
987246,Organic Yellow Onion,fresh vegetables
987247,Organic Baby Carrots,packaged vegetables fruits
987249,Organic Baby Spinach,packaged vegetables fruits


Department: meat seafood
Number of products: 692



Unnamed: 0,product_name,aisle
23,Air Chilled Organic Boneless Skinless Chicken ...,poultry counter
46,Boneless Skinless Chicken Breast Fillets,packaged poultry
97,Boneless Beef Sirloin Steak,meat counter
176,Low Sodium Bacon,hot dogs bacon sausage
204,Boneless And Skinless Chicken Breast,poultry counter
...,...,...
986977,Organic Air Chilled Whole Chicken,poultry counter
987067,Natural Hickory Smoked Canadian Bacon Center C...,hot dogs bacon sausage
987068,All Natural Boneless & Skinless Chicken Breast...,packaged poultry
987175,Organic Air Chilled Whole Chicken,poultry counter


Department: deli
Number of products: 1069



Unnamed: 0,product_name,aisle
40,Fresh Fruit Salad,prepared soups salads
101,Mango Pineapple Salsa,fresh dips tapenades
179,"Basil, Asiago & Pine Nut Pesto Ravioli",prepared meals
191,Yuba Tofu Skin,tofu meat alternatives
192,Organic Firm Tofu,tofu meat alternatives
...,...,...
987153,Spicy Avocado Hummus,fresh dips tapenades
987158,Guacamole,fresh dips tapenades
987199,Guacamole Dip,fresh dips tapenades
987204,Original Hummus,fresh dips tapenades


Department: frozen
Number of products: 3127



Unnamed: 0,product_name,aisle
68,Pineapple Chunks,frozen produce
100,Teriyaki & Pineapple Chicken Meatballs,frozen meals
114,All Natural Boneless Skinless Chicken Breasts,frozen meat seafood
118,Combination Pizza Rolls,frozen appetizers sides
135,Organic Mini Homestyle Waffles,frozen breakfast
...,...,...
987176,Organic Ice Cream Vanilla Bean,ice cream ice
987180,Dairy Free Coconut Milk Frozen Dessert Minis,ice cream ice
987184,Organic Mango Chunks,frozen produce
987210,Bag of Large Lemons,frozen meat seafood


Department: missing
Number of products: 518



Unnamed: 0,product_name,aisle
654,Tomato Basil Bisque Soup,missing
1511,Cold Pressed Watermelon & Lemon Juice Blend,missing
1512,Paleo Blueberry Muffin,missing
2126,"Magic Tape Refillable Dispenser 3/4\"" x 850\""",missing
4121,Organic Poblano Pepper,missing
...,...,...
986067,Oatneal Cookie Ice Cream,missing
986086,Dairy Free Unsweetened Almond Milk Beverage,missing
986103,Organic Asian Chopped Salad Kit,missing
986275,Lemon Bag,missing


Department: other
Number of products: 303



Unnamed: 0,product_name,aisle
691,Coffee Mate French Vanilla Creamer Packets,other
1077,SleepGels Nighttime Sleep Aid,other
1926,Roasted Unsalted Almonds,other
1985,"Camilia, Single Liquid Doses",other
2127,Maximum Strength Original Paste Diaper Rash Oi...,other
...,...,...
983992,Roasted Unsalted Almonds,other
984112,Boneless Pork Tenderloin,other
984515,Roasted Almond Butter,other
984534,Light CocoWhip! Coconut Whipped Topping,other


Looking at the results, products from `dairy eggs`, `meat seafood` `deli`, and some products from `missing` and `other` (would need to look at what exactly the product is since there was "Oatneal Cookie Ice Cream" in `missing` aisle which is a frozen item and "Boneless Pork Tenderloin" in `other` which is a `meat seafood` item) would be the departments that require refridgerant. Produce might not be a neccisity to be refridgerated as it can be stored at room temperature but only refridgerated to prolong shelf life so if there is a lot of demand for the produce, then it wouldn't be on the shelf for long but if there is low demand for it, then it might be better to not even have it as a selection so it might be a better idea to give the refridgerator space to the other aisles.

In [58]:
# Sanity check to see if aisles such as frozen produce are part of the frozen department or produce department
frozen_aisles = df[df['aisle'].str.contains('frozen', case=False)]['aisle'].unique()
print(frozen_aisles)

['frozen produce' 'frozen meals' 'frozen meat seafood'
 'frozen appetizers sides' 'frozen breakfast' 'frozen breads doughs'
 'frozen vegan vegetarian' 'frozen pizza' 'frozen dessert' 'frozen juice']


In [59]:
frozen_department = df[df['aisle'].str.contains('frozen', case=False)]['department'].unique()
print(frozen_department)

['frozen']


It is safe to assume that 100 products for the frozen products will be from the frozen department and maybe `missing` and `other` department.