# Building a 2-in-1 intelligent tool to recognise ingredients and recommend recipes
`Pantry-to-Plate buddy.ai`

Problem statement: 
Given the current food waste situation in Singapore (detailed in repo's ppt deck) and Singaporeans' time-starved lifestyle, how might we help households in Singapore to deal with excess groceries at home and reduce food waste? 

Solution aim: 
This project aims to supplement other food reduction efforts by providing an easy tool on utilising existing ingredients at home to whip up tasty meals. To this end, there will be two parts in this project: 1) an ingredient detection model, and 2) a recipe recommender system. 

In this project series, there are 4 notebooks: 
1. Data collection - recipe dataset
2. EDA and recipe recommender 
3. Collecting images of ingredients 
4. Custom training of object detection model (note: this is run on Google Colab) 

# 1) Data collection - recipe dataset

Although there are numerous public datasets, these datasets are often more suited to global audience. Since this project is catered towards Singaporeans and their tastebuds, I have chosen to obtain localised recipe data. In view of this, I have decided to obtain recipes from themeatmen.sg which is an up and coming "foodie" team in Singapore. They also have a substantial number of recipes (around 700) which can provide a good variety of recommendations. 


In [1]:
from urllib.request import urlopen
import json
import re
from bs4 import BeautifulSoup
import pandas as pd
from urllib.parse import urlparse
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

In [None]:
recipes = []

with urlopen("https://themeatmen.sg/wp-json/easymeals/v1/get-posts?options%5Bplugin%5D=easymeals_core&options%5Bmodule%5D=plugins%2Frecipe%2Fpost-types%2Frecipe%2Fshortcodes&options%5Bshortcode%5D=recipe-list-with-filter&options%5Bpost_type%5D=recipe&options%5Bnext_page%5D=1&options%5Bmax_pages_num%5D=79&options%5Bbehavior%5D=columns&options%5Bimages_proportion%5D=full&options%5Bcolumns%5D=2&options%5Bcolumns_responsive%5D=predefined&options%5Bspace%5D=normal&options%5Bposts_per_page%5D=800&options%5Borderby%5D=date&options%5Border%5D=DESC&options%5Blayout%5D=info-below&options%5Btitle_tag%5D=h5&options%5Btext_transform%5D=none&options%5Bpagination_type%5D=infinite-scroll&options%5Bobject_class_name%5D=EasyMealsCoreRecipeListWithFilterShortcode&options%5Btaxonomy_filter%5D=recipe-category&options%5Bunique%5D=1&options%5Bspace_value%5D=15") as response:
    body = response.read()
    mainpage = json.loads(body)["data"]["html"]
    results = re.findall('<a itemprop="url" href="(https.*)">\\n\\t\\t\\t<img.*src="(https.*)" class.*/>',mainpage)

for item in results:
    recipe = {}
    recipe["url"] = item[0]
    recipe["image"] = item[1]
    with urlopen(item[0]) as response:
        body = response.read()
        soup = BeautifulSoup(body)

        prep = soup.find("p",class_="qodef-recipe-prep-time")
        if prep is not None: 
            prep_time = []        
            prep = prep.text.strip()
            prep = re.sub('\s+', ' ', prep)
            prep_time.append(prep.strip())
            recipe["prep_time"] = prep_time
        
        diff = soup.find("p",class_="qodef-recipe-difficulty")
        if diff is not None: 
            difficulty = []
            diff = diff.text.strip()
            diff = re.sub('\s+', ' ', diff)
            difficulty.append(diff.strip())
            recipe["difficulty"] = diff
        
        ingredients = soup.find_all("tr",class_="qodef-ingredients-items")
        ingrs = []
        for ingredient in ingredients:
            ingr = ingredient.text.strip()
            ingr = re.sub('\s+', ' ', ingr)
            ingrs.append(ingr.strip())
        recipe["ingredients"] = ingrs

        directions = soup.find_all("div",class_="qodef-directions-items")
        dirs = []
        for direction in directions:
            dir = direction.text.strip()
            dir = re.sub('\s+', ' ', dir)
            dir = re.sub('Mark as complete', '', dir)
            dirs.append(dir.strip())
        recipe["directions"] = dirs

    recipes.append(recipe)
    
meatmen_df = pd.DataFrame(recipes)
meatmen_df.head()

In [None]:
recipe_names = []

for url in meatmen_df['url']:
    parsed_url = urlparse(url)
    path_segments = parsed_url.path.split('/')
    recipe_name = path_segments[1].replace('-', ' ')
    recipe_names.append(recipe_name)

meatmen_df['recipe_name'] = recipe_names

meatmen_df.to_csv('meatmen_scraped_raw.csv', index=False)
meatmen_df.sample(10)

#### Round 2 of scraping, as I realised that I had left out other features such as prep time and difficulty.

In [8]:
recipes = []

with urlopen("https://themeatmen.sg/wp-json/easymeals/v1/get-posts?options%5Bplugin%5D=easymeals_core&options%5Bmodule%5D=plugins%2Frecipe%2Fpost-types%2Frecipe%2Fshortcodes&options%5Bshortcode%5D=recipe-list-with-filter&options%5Bpost_type%5D=recipe&options%5Bnext_page%5D=1&options%5Bmax_pages_num%5D=79&options%5Bbehavior%5D=columns&options%5Bimages_proportion%5D=full&options%5Bcolumns%5D=2&options%5Bcolumns_responsive%5D=predefined&options%5Bspace%5D=normal&options%5Bposts_per_page%5D=800&options%5Borderby%5D=date&options%5Border%5D=DESC&options%5Blayout%5D=info-below&options%5Btitle_tag%5D=h5&options%5Btext_transform%5D=none&options%5Bpagination_type%5D=infinite-scroll&options%5Bobject_class_name%5D=EasyMealsCoreRecipeListWithFilterShortcode&options%5Btaxonomy_filter%5D=recipe-category&options%5Bunique%5D=1&options%5Bspace_value%5D=15") as response:
    body = response.read()
    mainpage = json.loads(body)["data"]["html"]
    results = re.findall('<a itemprop="url" href="(https.*)">\\n\\t\\t\\t<img.*src="(https.*)" class.*/>',mainpage)

for item in results:
    recipe = {}
    recipe["url"] = item[0]
    recipe["image"] = item[1]
    with urlopen(item[0]) as response:
        body = response.read()
        soup = BeautifulSoup(body)
        
        prep = soup.find("p",class_="qodef-recipe-prep-time")
        if prep is not None: 
            prep_time = []        
            prep = prep.text.strip()
            prep = re.sub('\s+', ' ', prep)
            prep_time.append(prep.strip())
            recipe['prep_time'] = prep_time
        
        diff = soup.find("p",class_="qodef-recipe-difficulty")
        if diff is not None: 
            difficulty = []
            diff = diff.text.strip()
            diff = re.sub('\s+', ' ', diff)
            difficulty.append(diff.strip())
            recipe['difficulty'] = diff
    
        recipes.append(recipe) 

prep_diff_df = pd.DataFrame(recipes)
prep_diff_df.head()


Unnamed: 0,url,image,prep_time,difficulty
0,https://themeatmen.sg/jb-ah-meng-chao-da-bee-hoon/,https://themeatmen.sg/wp-content/uploads/2023/07/JB-Ah-Meng-Chao-Da-Bee-Hoon-scaled.jpg,[30 mins],easy
1,https://themeatmen.sg/easy-thai-bbq-pork-collar/,https://themeatmen.sg/wp-content/uploads/2023/06/DSC09626-scaled.jpg,[20 minutes],easy
2,https://themeatmen.sg/filipino-sinigang-na-baboy-hack/,https://themeatmen.sg/wp-content/uploads/2023/06/DSC09665-scaled.jpg,[1 hr 15 min],easy
3,https://themeatmen.sg/mothers-day-special-manuka-honey-yogurt-parfait/,https://themeatmen.sg/wp-content/uploads/2023/05/DSC00752-1-scaled.jpg,[5 min],easy
4,https://themeatmen.sg/mee-soto-2/,https://themeatmen.sg/wp-content/uploads/2023/04/DSC09768-scaled.jpg,[2 hour],easy
