# Using NLP to Improve Your Cooking

Michael Humkey, Conor Marsten

## Introduction

### Why?
Surely all cooking can be improved by a little constructive criticism. But how do we determine what is useful?
This is something easy for a person to do.

![](http://eatslikeaduck.com/wp-content/uploads/2015/09/Homer%E2%80%99s-Breakfast-for-Mr.-Burns-Screenshot-3.png)

### How?
Given a recipe and list of reviews for the recipe, we will look for criticisms by performing POS  tagging and pulling out nouns to provide suggestions for improving the recipe

## Background

Restaurants use NLP to make improvements based on customer reviews.

IBM chef watson will make a recipe when given up to four different food items.

## Building the Dataset
![](http://resources3.atgtickets.com/static/699_full.jpg)

First we are using the food2fork api to search for a recipe. For simplicity’s sake we are just using the first element returned in the list of query results. This recipe consists of a source URL, a list of ingredients, a title, and a publisher name among a handful of other fields. Then we are using BeautifulSoup to load in the source URL and scrape each review from the web page. 

![](https://singbookswithemily.files.wordpress.com/2015/03/beautiful-soup-can-w-mock-turtle-coin.jpg)

In [5]:
import urllib
import urllib.request

In [6]:
url = "http://food2fork.com/api/search?"
args = {'key' : 'd8b2df92a9cb994f2009b2be8410c1a3', 'q' : 'chicken', 'sort' : 'r'}
data = urllib.parse.urlencode(args)
req = urllib.request.Request(url+data, headers={'User-Agent': 'Mozilla/5.0'})
response = urllib.request.urlopen(req).read()

In [7]:
import json

In [8]:
response = json.loads(response)
recipes = response['recipes']

In [9]:
from bs4 import BeautifulSoup
import requests

In [10]:
def getReviews(recipes, i=0):

    recipe = recipes[i]
    url = "http://food2fork.com/api/get?"
    data = urllib.parse.urlencode({'key' : 'd8b2df92a9cb994f2009b2be8410c1a3', 'rId' : recipe['recipe_id']})
    req = urllib.request.Request(url+data, headers={'User-Agent': 'Mozilla/5.0'})
    recipe = json.loads(urllib.request.urlopen(req).read())['recipe']
    bs =  BeautifulSoup(requests.get(recipe['source_url']).content, 'html.parser')
    reviews = []

    if (recipe['publisher'] == '101 Cookbooks'): #1
        reviews = bs.find_all(attrs={'class': 'card-body'})

    elif (recipe['publisher'] == 'BBC Good Food'):
        reviews = bs.find_all(attrs={'class': 'field-item even'})

    elif (recipe['publisher'] == 'Closet Cooking' 
           or recipe['publisher'] == 'Eats Well With Others' 
           or recipe['publisher'] == "Lisa's Kitchen"
           or recipe['publisher'] == 'A Spicy Perspective'
           or recipe['publisher'] == 'Naturally Ella'
           or recipe['publisher'] == 'Pastry Affair'):
        reviews = bs.find_all(attrs={'class': 'comment-body'})

    elif (recipe['publisher'] == 'Food Republic'):
        reviews = bs.find_all(attrs={'class': '_5mdd'}) #uses facebook comments plugin

    elif (recipe['publisher'] == 'PBS Food'):
        reviews = bs.find_all(attrs={'class': 'post-message'})

    elif (recipe['publisher'] == 'Simply Recipes'
          or recipe['publisher'] == 'Homesick Texan'
          or recipe['publisher'] == 'Tasty Kitchen'):
        reviews = bs.find_all(attrs={'class': 'comment-text'})

    elif (recipe['publisher'] == 'Two Peas and Their Pod'
          or recipe['publisher'] == 'Cookie and Kate'
          or recipe['publisher'] == "Elana's Pantry"
          or recipe['publisher'] == 'My Baking Addiction'
          or recipe['publisher'] == 'Smitten Kitchen'
          or recipe['publisher'] == 'Vintage Mixer'
          or recipe['publisher'] == 'Cookin Canuck'
          or recipe['publisher'] == 'Healthy Delicious'
          or recipe['publisher'] == 'Steamy Kitchen'
          or recipe['publisher'] == "What's Gaby Cooking"
          or recipe['publisher'] == 'Bunky Cooks'
          or recipe['publisher'] == 'Serious Eats'):
        reviews = bs.find_all(attrs={'class': 'comment-content'})

    elif (recipe['publisher'] == 'All Recipes'):
        reviews = bs.find_all(attrs={'class' : 'review-detail__link'})
        revList = []
        for review in reviews:
            revList.append(str(review).split(' ')[2].replace('href=\"', '').replace('\"', ''))
        reviews = []
        for url in revList:
            bs =  BeautifulSoup(requests.get(url).content, 'html.parser')
            reviews.append(bs.find(itemprop="reviewBody"))

    elif (recipe['publisher'] == 'Big Girls Small Kitchen'
          or recipe['publisher'] =='Jamie Oliver'
          or recipe['publisher'] == 'The Pioneer Woman'):
        reviews = bs.find_all(attrs={'data-role': 'message'})

    elif (recipe['publisher'] == 'Framed Cooks'
          or recipe['publisher'] == 'Picky Palate'):
        reviews = bs.find_all(attrs={'class': 'comment even thread-even depth-1'})

    elif (recipe['publisher'] == 'Bon Appetit'):
        reviews = bs.find_all(attrs={'class': 'review-body'})

    elif (recipe['publisher'] == 'Epicurious'):
        reviews = bs.find_all(attrs={'class': 'review-text'})

    elif (recipe['publisher'] == 'Cookstr'):
        reviews = bs.find_all(attrs={'class': 'commentText'})

    elif (recipe['publisher'] == 'Panini Happy'):
        #careful, sloppy html
        reviews = bs.find_all(attrs={'class': 'format_text'})

    elif (recipe['publisher'] == 'Real Simple'):
        reviews = bs.find_all(attrs={'class': 'comment_txt'})
        
    elif (recipe['publisher'] == 'Chow'):
        reviews = bs.find_all(itemprop="comment")

    elif (recipe['publisher'] == 'Delishhh'):
        reviews = bs.find_all(attrs={'class': 'commentmeta'})

    elif (recipe['publisher'] == 'Food Network'):
        reviews = bs.find_all(attrs={'class': 'gig-comment-body'})    
    else:
        print(recipe['publisher'])
        return getReviews(recipes, i+1)
        
    return reviews, recipe

To keep bad data from being processed, we do a little bit of cleaning.

![](https://i2.wp.com/ghostpoint.com/wp-content/uploads/2016/07/BadData.jpg?fit=467%2C358)

In [11]:
reviews, recipe = getReviews(recipes)    
# if type(reviews[0]) is not str:
revs = [rv.get_text().replace('\n', ' ').replace('\r', '') for rv in reviews]
# else:
#     revs = [rv.replace('\n', ' ').replace('\r', '') for rv in reviews]

In [12]:
print(reviews[0])
print(revs[0])

<dd class="comment-body">
<p>I HAVE to make this for my husband sometime soon.  We're trying to use up the last jalapenos from our garden this year - this would be perfect!!! Yum</p>
</dd>
 I HAVE to make this for my husband sometime soon.  We're trying to use up the last jalapenos from our garden this year - this would be perfect!!! Yum 


Now that we have all of our reviews it's time to process them.

## Checking for foods

![](http://citynews.com.au/wp-content/uploads/2017/09/034A9230-e1506583883937-500x282.png)

In [13]:
recipe['publisher']

'Closet Cooking'

In [14]:
recipe

{'f2f_url': 'http://food2fork.com/view/35120',
 'image_url': 'http://static.food2fork.com/Bacon2BWrapped2BJalapeno2BPopper2BStuffed2BChicken2B5002B5909939b0e65.jpg',
 'ingredients': ['4 small chicken breasts, pounded thin',
  'salt and pepper to taste',
  '4 jalapenos, diced',
  '4 ounces cream cheese, room temperature',
  '1 cup cheddar cheese, shredded',
  '8 slices bacon\n'],
 'publisher': 'Closet Cooking',
 'publisher_url': 'http://closetcooking.com',
 'recipe_id': '35120',
 'social_rank': 100.0,
 'source_url': 'http://www.closetcooking.com/2012/11/bacon-wrapped-jalapeno-popper-stuffed.html',
 'title': 'Bacon Wrapped Jalapeno Popper Stuffed Chicken'}

In [15]:
import nltk
from nltk import word_tokenize, pos_tag
text = word_tokenize(reviews[0].get_text())
tagged_text = nltk.pos_tag(text)
print(tagged_text)

[('I', 'PRP'), ('HAVE', 'VBP'), ('to', 'TO'), ('make', 'VB'), ('this', 'DT'), ('for', 'IN'), ('my', 'PRP$'), ('husband', 'NN'), ('sometime', 'RB'), ('soon', 'RB'), ('.', '.'), ('We', 'PRP'), ("'re", 'VBP'), ('trying', 'VBG'), ('to', 'TO'), ('use', 'VB'), ('up', 'RP'), ('the', 'DT'), ('last', 'JJ'), ('jalapenos', 'NN'), ('from', 'IN'), ('our', 'PRP$'), ('garden', 'NN'), ('this', 'DT'), ('year', 'NN'), ('-', ':'), ('this', 'DT'), ('would', 'MD'), ('be', 'VB'), ('perfect', 'JJ'), ('!', '.'), ('!', '.'), ('!', '.'), ('Yum', 'NN')]


We will be tagging each review and searching through it for foods by looking for the noun tag.

In [16]:
import sqlite3
conn = sqlite3.connect('usda.sql3')
excluded_words = ['recipe', 'powder', 'ground', 'sea']
out =[]

for rv in revs:
    text = word_tokenize(reviews[0].get_text())
    tagged_text = nltk.pos_tag(text)
    for tt in tagged_text:
        if tt[1] == 'NN' and tt[0] not in excluded_words:
            search = '%' + tt[0] + '%'
            c = conn.cursor()
            c.execute('SELECT id, short_desc FROM food WHERE short_desc LIKE ?', (search,))
            values = c.fetchall()
            if len(values) != 0:
                for sentence in rv.split('.'):
                    if tt[0] in sentence and sentence not in out:
                        out.append(sentence)
#             for row in values:
#                 print(tt, row[0], row[1])
for x in out:
    print(x)

  We're trying to use up the last jalapenos from our garden this year - this would be perfect!!! Yum 
 Yum! Loved the taste
 My family is having a party next week, I am going to need to make these! Yum
  If it is that you cannot find fresh jalapenos, you could used pickled jalapenos
 Like I stated, I used a ladle to form: 4 slices bacon, then one x-thinly sliced chicken breast/diced al dente veggies (broccoli, carrot, celery, onion,) slice provolone, the jalapenos, sea salt n' pepper
 My brother makes stuffed jalapenos wrapped in bacon
 using fresh jalapenos is great, but they taste much better if you roast them first!  Roasting really brings out the depth of flavor that you need with chicken and cheese
 I loved the taste of fresh jalapenos and have now officially added this to my product list
 I'm going to add this to Yummly and try it soon


Now that the nouns have been searched for in the database to verify they are a food item, the sentences containing them are added to a list.

## Performing sentiment analysis

![](http://sherimcnally.com/wp-content/uploads/2016/02/Depositphotos_39533227_m-2015-630x315.jpg)

In [17]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as sia

In [18]:
analyzer = sia()
scored = []
for sent in out:
    res = analyzer.polarity_scores(sent)
    tup = (sent, res)
    scored.append(tup)

In [19]:
def getKey(tup):
    return tup[1]['compound']

scored = list(reversed(sorted(scored, key=getKey)))

We now have performed sentiment analysis on every sentence that we stored.

In [20]:
for sent in scored:
    print(sent[1])
    print(sent[0])
    print()

{'neg': 0.0, 'neu': 0.764, 'pos': 0.236, 'compound': 0.8096}
 using fresh jalapenos is great, but they taste much better if you roast them first!  Roasting really brings out the depth of flavor that you need with chicken and cheese

{'neg': 0.0, 'neu': 0.693, 'pos': 0.307, 'compound': 0.7351}
 I loved the taste of fresh jalapenos and have now officially added this to my product list

{'neg': 0.0, 'neu': 0.788, 'pos': 0.212, 'compound': 0.6784}
  We're trying to use up the last jalapenos from our garden this year - this would be perfect!!! Yum 

{'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.636}
 Yum! Loved the taste

{'neg': 0.0, 'neu': 0.824, 'pos': 0.176, 'compound': 0.4574}
 My family is having a party next week, I am going to need to make these! Yum

{'neg': 0.0, 'neu': 0.918, 'pos': 0.082, 'compound': 0.3612}
 Like I stated, I used a ladle to form: 4 slices bacon, then one x-thinly sliced chicken breast/diced al dente veggies (broccoli, carrot, celery, onion,) slice provo

In [21]:
from IPython.display import Image
from IPython.core.display import HTML 
out = "<div><h1 style=\"text-align:center\">{title}</h1><br/><img style=\"margin:auto\" src=\"{img_url}\"><ul><h3>Ingredients</h3>{ingredients}</ul><ul><h3>Users enjoyed this recipe more with the following modifications:</h3>{suggestions}</ul><a href=\"{src}\">View full recipe here</a></div>"
ingredients = "".join("<li>"+item+"</li>" for item in recipe['ingredients'])
suggestions = "".join("<li>"+item[0]+"</li>" for item in scored)
HTML(out.format(title=recipe['title'], img_url=recipe['image_url'], ingredients=ingredients, suggestions=suggestions, src=recipe['source_url']))

## Next Steps 

![](https://i.cbc.ca/1.3237852.1442878202!/fileImage/httpImage/image.jpg_gen/derivatives/16x9_620/pizza-rat.jpg)

Some possible changes going forward is that we can modify the tagger so that all the food included in our database is viewed as a noun making it easier to find those modifications. We'd also like to be able to alter the recipe's steps.

Using the default POS tagger from nltk limits how much we can do with the data we retrieve. A modification would need to be done so that it would recognize all food as nouns and then look for modifcations to those for more accurate suggestions. 

Multi word ingredients are also not checked for currently such as chili powder vs. chili.