# Finding Meals that Conform to Your Taste 

## Introduction 

According to a recent [survey](https://www.iol.co.za/lifestyle/food-drink/many-cookbooks-yet-same-old-supper-1824825), the average Briton has a rotation of nine different recipes. Part of this lack of diversity is likely the fear of preparing a meal that doesn't conform to your tastes and being forced to throw it out. 

### Project Objective

For this project, we will create a recipe recommendation engine based on your tastes and customize your weekly menu within seconds. 

Our aim is to diversify your menu by helping you discover new meals that incorporate flavors that you already love, using items you already buy. 

### Datasets

1. **Instacart Market Basket Analysis Datasets**
    - order
    - ailes: 132 unique store ailes
    - departments: 24 unique departments
    - products: 49.7k unique products. 
    
    
2. **Mariano's Grocery Prices**: 


3. **Simply Recipes recipe database**

### Summery of Results

blah blah blah blah 

## Data Collection - Web Crawling

In order to start our project we will need to collect three different types of data: 
* First, we will need a dataset filled with different users food preferences. Because food rating data is difficult to come by, we can instead use point of purchase grocery store data for users and utilize implicit feedback (i.e., assume that customers that bought an item liked the item). 

* Second, we need a repository of diverse recipes. Although a number of recipe datasets are available online, no dataset that I found has all the required attributes. Because of the lack of appropriate available data, this data will need to be collected from a website via webscraping. 

* Third, we will need a database of grocery prices for pricing our recipes. Because datasets with prices are and far between, and quickly outdated, we will need to manually collect grocery pricing data from a store website. 

In [35]:
%load_ext autoreload
%autoreload 2

import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sys
import warnings
warnings.filterwarnings('ignore')

from selenium import webdriver

src_dir = os.path.join(os.getcwd(), '..', '..', 'src')
sys.path.append(src_dir)

from d00_utils import utils
from d01_data import clean_data
from d01_data.web_scraping import sr_scraping, marianos_insta_scraping

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Recipe Web Scraping - simplyrecipes.com

After conducting research on a number of sites, I chose simplyrecipes.com to scrape for a number of reasons. 

* The website contained diverse recipes that could appeal to a number of different pallets
* Each recipe came pre-tagged with meal and dietary preferences

If you would like to scrape the website yourself please run ```sr_scraping()``` in a cell within this notebook. The full script takes around 1.5 hours to run and all files are saved to the ```data/01_raw/simply_recipes``` folder. I'm reading in the scraped and concatenated dataset for convenience. 

In [10]:
recipes_sr_orig = utils.read_multiple_csv_and_concat('../../data/01_raw/simply_recipes/simply_recipes*')
recipes_sr_orig.drop(columns='Unnamed: 0', inplace=True)

In [18]:
recipes_sr_inter = clean_data.intermediate_clean_recipes_sr(recipes_sr_orig)

At the moment we have 1752 different recipe entries. From a quick glance, some of the things marked as recipes are actually how-to guides. These will need to come out because they don't provide ingredients to base our model off of. 

In [44]:
recipes_sr_inter.head(2)

Unnamed: 0,title,prep_time,cook_time,tags,ingredients,recipe_yield,byline,link_food
0,Grilled Cheese BLT,10 minutes,10 minutes,"['Dinner', 'Lunch', 'Sandwich', 'Favorite Summ...","[8 slices sourdough bread, 4 tablespoon unsalt...",4 sandwiches,Aaron Hutcherson,https://www.simplyrecipes.com/recipes/grilled_...
1,Pulled Pork Sandwich,10 minutes,"2 hours, 45 minutes","['Dinner', 'Sandwich', 'Budget', 'Comfort Food...","[For the sauce:, 1 large onion, chopped, 6 gar...",Serves 6 to 8,Elise Bauer,https://www.simplyrecipes.com/recipes/pulled_p...


### Grocery Price Web Scraping - Chicago's Marianos

If you would like to scrape the website yourself please run ```marianos_insta_scraping()``` in a cell within this notebook. The full script takes around 5 hours to run and all files are saved to the ```data/01_raw folder``` as ```prod_aile_*```. Once you run the script selenium will open and you'll need to sign into instacart wit your username and password. After 80 seconds the script will began scraping the site on it's own. 

Let's go ahead and read in the concatenated files from marianos. 

In [38]:
grocery_prices_orig = utils.read_multiple_csv_and_concat('../../data/01_raw/grocery_prices_marianos/prod_aile*')

In [45]:
grocery_prices_orig.head(2)

Unnamed: 0,product,main_price,prod_aile,price_per_lb,measure_words_main_price,item_weight_count_vol,date_collected,store,location
0,"Halls Defense Dietary Supplement Drops, Assort...",$1.79,"Cold, Flu & Allergy",,,30 count,2019-08-28,Marianos,60615
1,Halls Suppressant/Oral Anesthetic Halls Relief...,$1.79,"Cold, Flu & Allergy",,,30 count,2019-08-28,Marianos,60615


In [40]:
grocery_prices_inter = clean_data.intermediate_clean_marianos_prices(grocery_prices_orig)

In [46]:
grocery_prices_inter.head(2)

Unnamed: 0,product,main_price,prod_aile,price_per_lb,measure_words_main_price,item_weight_count_vol,date_collected,store,location
0,"Halls Defense Dietary Supplement Drops, Assort...",$1.79,"Cold, Flu & Allergy",,,30 count,2019-08-28,Marianos,60615
1,Halls Suppressant/Oral Anesthetic Halls Relief...,$1.79,"Cold, Flu & Allergy",,,30 count,2019-08-28,Marianos,60615


## Exploratory Data Analysis

### Instacart Market Basket Analysis Datasets

Let's read in our datasets and merge them so that we can see what each user purchased. 

In [2]:
aisles = pd.read_csv('../../data/01_raw/instacart_2017_05_01/aisles.csv')
departments = pd.read_csv('../../data/01_raw/instacart_2017_05_01/departments.csv')
order = pd.read_csv('../../data/01_raw/instacart_2017_05_01/orders.csv')
order_products__prior = pd.read_csv('../../data/01_raw/instacart_2017_05_01/order_products__prior.csv')
products = pd.read_csv('../../data/01_raw/instacart_2017_05_01/products.csv')

In [6]:
instacart_baskets = clean_data.combine_instacart_kaggle_datasets(aisles, departments, order, 
                                                                 order_products__prior, products)
instacart_baskets.head()

In [8]:
instacart_baskets.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32434489 entries, 0 to 32434488
Data columns (total 15 columns):
order_id                  int64
product_id                int64
add_to_cart_order         int64
reordered                 int64
user_id                   int64
eval_set                  object
order_number              int64
order_dow                 int64
order_hour_of_day         int64
days_since_prior_order    float64
product_name              object
aisle_id                  int64
department_id             int64
aisle                     object
department                object
dtypes: float64(1), int64(10), object(4)
memory usage: 3.9+ GB


The highest number of baskets by customer is 99 (this means 99 different orders for each of the 5 customers below). 

In [9]:
pd.DataFrame(instacart_baskets.groupby('user_id')['order_id']\
             .nunique()).sort_values('order_id', ascending=False)\
             .head(5)

Unnamed: 0_level_0,order_id
user_id,Unnamed: 1_level_1
152340,99
185641,99
185524,99
81678,99
70922,99


### Simply Recipes' Recipe Dataset

In [47]:
recipes_sr_inter

Unnamed: 0,title,prep_time,cook_time,tags,ingredients,recipe_yield,byline,link_food
0,Grilled Cheese BLT,10 minutes,10 minutes,"['Dinner', 'Lunch', 'Sandwich', 'Favorite Summ...","[8 slices sourdough bread, 4 tablespoon unsalt...",4 sandwiches,Aaron Hutcherson,https://www.simplyrecipes.com/recipes/grilled_...
1,Pulled Pork Sandwich,10 minutes,"2 hours, 45 minutes","['Dinner', 'Sandwich', 'Budget', 'Comfort Food...","[For the sauce:, 1 large onion, chopped, 6 gar...",Serves 6 to 8,Elise Bauer,https://www.simplyrecipes.com/recipes/pulled_p...
2,How to Make Bacon in the Oven,5 minutes,20 minutes,"['Tips', 'Breakfast and Brunch', 'Baking', 'Ho...","[12 strips bacon, 1/2 teaspoon ground black pe...",12 strips,Nick Evans,https://www.simplyrecipes.com/recipes/how_to_m...
3,Sausage Stuffed Zucchini,15 minutes,1 hour,"['Dinner', 'Favorite Summer', 'Make-ahead', 'I...","[2 tablespoons extra virgin olive oil, 1/2 pou...",Serves 4,Elise Bauer,https://www.simplyrecipes.com/recipes/italian_...
4,The Best Dry Rub for Ribs,5 minutes,,"['Favorite Fall', 'Favorite Summer', 'Game Day...",[3/4 cup packed dark brown sugar (or 1/2 cup i...,,Irvin Lin,https://www.simplyrecipes.com/recipes/the_best...
5,Ginger Pork Rice Bowls,15 minutes,20 minutes,"['Eat Your Food', 'Family Dinner Ideas', 'Dinn...","[For the bowls:, 1 tablespoon olive oil, 1/2 c...",4-6 servings,Nick Evans,https://www.simplyrecipes.com/recipes/ginger_p...
6,Hawaiian SPAM Tacos with Pineapple,10 minutes,30 minutes,"['Eat Your Food', 'Dinner', 'Kid-friendly', 'H...","[DAD ADD: Haba, ñero Guacamole, 1 habañero pep...","12 small tacos, serving 4 adults",Nick Evans,https://www.simplyrecipes.com/recipes/hawaiian...
7,Air Fryer Chinese Egg Rolls,20 minutes,25 minutes,"['Appetizer', 'Snack', 'Air Fryer', 'Quick and...","[For the egg rolls:, 1 tablespoon olive oil, 1...",12 egg rolls,Nick Evans,https://www.simplyrecipes.com/recipes/air_frye...
8,Grilled Bacon-Wrapped Stuffed Hot Dogs,10 minutes,11 minutes,"['Dinner', 'Favorite Summer', 'Grill', 'Bacon'...","[1 teaspoon ketchup, 1 teaspoon Dijon mustard,...",Makes 4 stuffed hot dogs,Elise Bauer,https://www.simplyrecipes.com/recipes/grilled_...
9,Italian Grilled Cheese Sandwiches,10 minutes,20 minutes,"['Eat Your Food', 'Dinner', 'Sandwich', 'Kid-f...","[For the DAD ADD: , Olive and Cauliflower Giar...",4 sandwiches,Nick Evans,https://www.simplyrecipes.com/recipes/italian_...


### Mariano's Grocery Prices Dataset

## Feature Construction 

## Model Exploration & Selection 

## Application Testing 

## Conclusion & Next Steps