# Finding Meals that Conform to Your Taste Within Budget

## Introduction 

According to a recent [survey](https://www.iol.co.za/lifestyle/food-drink/many-cookbooks-yet-same-old-supper-1824825), although Americans have on average 6 cookbooks in the home, they make the same 9 meals on rotation. Part of this lack of diversity is likely the fear of preparing a new meal that does not conform to your taste profile and goes to waste. 

### Project Objective

In this project, my aim is to help discover new meals that will conform to a users specific tastes within their budget. 

### Datasets

1. **Instacart Market Basket Analysis Datasets**
    - order
    - ailes: 132 unique store ailes
    - departments: 24 unique departments
    - products: 49.7k unique products. 
    
    
2. **Mariano's Grocery Prices**: 


3. **Simply Recipes recipe database**

### Summery of Results

blah blah blah blah 

## Data Collection and Exploratory Analysis

In order to start our project we will need a number of diverse datasets. First, we will need a dataset filled with different users food preferences. Because food rating data is difficult to come by, we can instead use point of purchase grocery store data for users and utilize implicit feedback (i.e., assume that customers that bought an item liked the item). 

Second, we need a repository of diverse recipes. Although a number of recipe datasets are available online, no dataset that I found has all the required attributes. Because of the lack of appropriate available data, this data will need to be collected from a website via webscraping. 

Third, we will need a database of grocery prices for pricing our recipes. Because datasets with prices are vew and far between, and quickly outdated, we will need to manually collect grocery pricing data from a store website. 

### Instacart Market Basket Analysis Datasets

Since we have the Instacart Market Basket Analysis Datasets from Kaggle, let's take a look at it to better understand what other data we will need to collect in order to complete this project. 

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sys

import warnings
warnings.filterwarnings('ignore')

src_dir = os.path.join(os.getcwd(), '..', '..', 'src')
sys.path.append(src_dir)

from d01_data.clean_data import combine_instacart_kaggle_datasets
from d01_data.web_scraping import sr_scraping

In [None]:
aisles = pd.read_csv('../../data/01_raw/instacart_2017_05_01/aisles.csv')
departments = pd.read_csv('../../data/01_raw/instacart_2017_05_01/departments.csv')
order = pd.read_csv('../../data/01_raw/instacart_2017_05_01/orders.csv')
order_products__prior = pd.read_csv('../../data/01_raw/instacart_2017_05_01/order_products__prior.csv')
products = pd.read_csv('../../data/01_raw/instacart_2017_05_01/products.csv')

Let's combine our instacart kaggle dataset in a way that allows us to see what is in each users basket

In [None]:
instacart_baskets = combine_instacart_kaggle_datasets(aisles, departments, order, order_products__prior, products)

In [None]:
instacart_baskets.head()

In [None]:
instacart_baskets.info()

The highest number of baskets by customer is 99 (this means 99 different orders for each of the 5 customers below). 

In [None]:
pd.DataFrame(instacart_baskets.groupby('user_id')['order_id']\
             .nunique()).sort_values('order_id', ascending=False)\
             .head(5)

Now that we have our baskets, let's move onto collecting our recipe dataset. 

### Recipe Web Scraping - Simply Recipes

In [None]:
# this function scrapes simply recipes website and saves the files to data/01_raw folder. 
# This takes around 2 hours to complete 

# sr_scraping()

let's 