# scraping for recipes

This notebook is used to test packages for scraping the web for different recipes. Thanks to [hhursev](https://github.com/hhursev) for the [recipe scraper package](https://github.com/hhursev/recipe-scrapers)

In [None]:
!pip install git+git://github.com/hhursev/recipe-scrapers.git

## Packages

In [2]:
# recipe scraper package
from recipe_scrapers import scrape_me

In [3]:
# spacy and nlp module
import spacy
nlp = spacy.load('en_core_web_md')

In [158]:
def remove_all(bye, words):
    while(bye in words):
        words.remove(bye)

## get source recipes

In [4]:
drinks_sources = [line.strip() for line in open('./recipe_sources.txt').readlines()]

In [5]:
drinks_scrape = [scrape_me(item) for item in drinks_sources]

Now, I need to take the different types of contents. The ones I'll use (for now) are ingredients and instructions

#### Ingredients

In [6]:
ingredients = []
[ingredients.extend(drink.ingredients()) for drink in drinks_scrape]

[None, None, None, None, None, None, None, None, None, None, None, None, None]

Now I need to separate the numbers, the amount units and strip away the nouns...

In [7]:
ingredients

['2 fluid ounces dark rum',
 '4 fluid ounces ginger beer',
 '1/2 cup ice',
 '10 fresh mint leaves',
 '1/2 lime, cut into 4 wedges',
 '2 tablespoons white sugar, or to taste',
 '1 cup ice cubes',
 '1 1/2 fluid ounces white rum',
 '1/2 cup club soda',
 '5 fluid ounces tequila',
 '3 fluid ounces fresh lime juice',
 '1 fluid ounce sweetened lime juice',
 '3 fluid ounces triple sec (orange-flavored liqueur)',
 'ice cubes',
 '1 lime, cut into wedges',
 'rimming salt',
 '6 mint leaves',
 '4 teaspoons white sugar',
 '1 lime, cut into 6 wedges',
 '2 (1.5 fluid ounce) jiggers lemon-flavored rum',
 '1 cup ice cubes, or as needed',
 '1/2 cup carbonated water, or as needed',
 '3 cups bottled Bloody Mary mix',
 '1 tablespoon prepared horseradish',
 '1 teaspoon chopped fresh dill',
 "1 teaspoon hot pepper sauce (such as Frank's RedHot®)",
 '2 tablespoons dill pickle juice',
 '1/2 cup kosher salt',
 '1 teaspoon ground black pepper',
 '1 teaspoon celery seed',
 '1 lime, juiced',
 '6 (1.5 fluid ounce) j

For the ingredients, I will just create lists by hand. It is better, as I won't be able to control certain measures otherwise.

In [161]:
ingr_extra = ["fresh mint leaves",
             "1 lime, cut into wedges",
             "ice cubes",
             "rimming salt",
             "1 orange, sliced",
             "twist lime zest",
             "maraschino cherries",
             "pineapple wedges"]

In [162]:
ingr_units = ["fluid oz.",
             "tablespoons",
             "teaspoons",
             "oz."]

### Instructions

From these, I'm mainly interested in the verbs

In [8]:
instructions = []
[instructions.append(drink.instructions()) for drink in drinks_scrape]

[None, None, None, None, None, None, None, None, None, None, None, None, None]

It would be helpful to be able to analyse the last sentence of every set of instructions separately, as they are often different and wrap everything up nicely.

In [9]:
nlp_instructions = [list(nlp(inst).sents) for inst in instructions]

In [10]:
nlp_instructions

[[Combine rum and ginger beer in an old-fashioned glass., Add ice and stir.],
 [Place mint leaves and 1 lime wedge into a sturdy glass.,
  Use a muddler to crush the mint and lime to release the mint oils and lime juice.,
  Add 2 more lime wedges and the sugar, and muddle again to release the lime juice.,
  Do not strain the mixture.,
  Fill the glass almost to the top with ice.,
  Pour the rum over the ice, and fill the glass with carbonated water.,
  Stir, taste, and add more sugar if desired.,
  Garnish with the remaining lime wedge.],
 [Measure the tequila, lime juice, sweetened lime juice and triple sec into a cocktail shaker and add a generous scoop of ice.,
  Cover and shake until the shaker is frosty, about 30 seconds.,
  Rub a lime wedge around the rim of a margarita glass and dip in salt.,
  Fill each glass with ice.,
  Strain equal amounts of the cocktail into the glasses to serve.,
  Garnish with a lime wedge.],
 [Put 3 mint leaves and 2 teaspoons sugar into each of 2 glass

In [62]:
for instr_set in nlp_instructions:
    for sent in instr_set:
        print("==========")
        print(sent.text)
        for word in sent:
            if word.tag_ in ["VB", "VBN"]:
                print(".text: ", word.text)
                print(".tag_: ", word.tag_)
                print(".head: ", word.head.text)
                print(".children: ", list(word.children))
                for ch in list(word.children):
                    print("  -ch.text: ", ch.text)
                    print("  -ch.tag_: ", ch.tag_)
                    print("  -ch.dep_: ", ch.dep_)
                    if(ch.dep_ in ["prep", "dobj"]):
                        print("    -ch.subtree", list(ch.subtree))
                print()

Combine rum and ginger beer in an old-fashioned glass.
.text:  Combine
.tag_:  VB
.head:  rum
.children:  []

Add ice and stir.

.text:  Add
.tag_:  VB
.head:  Add
.children:  [ice, and, stir, .]
  -ch.text:  ice
  -ch.tag_:  NN
  -ch.dep_:  dobj
    -ch.subtree [ice]
  -ch.text:  and
  -ch.tag_:  CC
  -ch.dep_:  cc
  -ch.text:  stir
  -ch.tag_:  VB
  -ch.dep_:  conj
  -ch.text:  .
  -ch.tag_:  .
  -ch.dep_:  punct

.text:  stir
.tag_:  VB
.head:  Add
.children:  []

Place mint leaves and 1 lime wedge into a sturdy glass.
.text:  Place
.tag_:  VB
.head:  leaves
.children:  []

Use a muddler to crush the mint and lime to release the mint oils and lime juice.
.text:  Use
.tag_:  VB
.head:  Use
.children:  [muddler, .]
  -ch.text:  muddler
  -ch.tag_:  NN
  -ch.dep_:  dobj
    -ch.subtree [a, muddler, to, crush, the, mint, and, lime, to, release, the, mint, oils, and, lime, juice]
  -ch.text:  .
  -ch.tag_:  .
  -ch.dep_:  punct

.text:  crush
.tag_:  VB
.head:  muddler
.children:  [to, m

.head:  Blend
.children:  [for, .]
  -ch.text:  for
  -ch.tag_:  IN
  -ch.dep_:  prep
    -ch.subtree [for, 30, seconds, or, until, smooth]
  -ch.text:  .
  -ch.tag_:  .
  -ch.dep_:  punct

Serve in margarita glasses with the rims dipped in powdered sugar.

.text:  Serve
.tag_:  VB
.head:  Serve
.children:  [in, with, .]
  -ch.text:  in
  -ch.tag_:  IN
  -ch.dep_:  prep
    -ch.subtree [in, margarita, glasses]
  -ch.text:  with
  -ch.tag_:  IN
  -ch.dep_:  prep
    -ch.subtree [with, the, rims, dipped, in, powdered, sugar]
  -ch.text:  .
  -ch.tag_:  .
  -ch.dep_:  punct

.text:  dipped
.tag_:  VBN
.head:  rims
.children:  [in]
  -ch.text:  in
  -ch.tag_:  IN
  -ch.dep_:  prep
    -ch.subtree [in, powdered, sugar]

Stir together the Pimm's liqueur and lemonade together in a serving pitcher.
.text:  Stir
.tag_:  VB
.head:  Stir
.children:  [together, liqueur, in, .]
  -ch.text:  together
  -ch.tag_:  RB
  -ch.dep_:  prt
  -ch.text:  liqueur
  -ch.tag_:  NN
  -ch.dep_:  dobj
    -ch.subt

In [115]:
def verb_tracer_simple(s):
    verb = ""
    for word in s:
        if word.tag_ in ["VB", "VBN"]:
            verb = word
    if verb is "":
        return ""
    elif len(list(verb.children)) > 0:
        prep_children = [ch for ch in list(verb.children) if ch.dep_ in ["prep", "dobj"]]
        return verb.text + " " + " ".join([ch.text+" #np#" for ch in prep_children])
    else:
        return verb.text+" #np#"

In [85]:
def subtree_flatten(children_list):
    sub = []
    for ch in children_list:
        sub.append(" ".join([word.text for word in ch.subtree]))
    return " ".join(sub)

In [72]:
def verb_tracer_subtree(s):
    verb = ""
    for word in s:
        if word.tag_ in ["VB", "VBN"]:
            verb = word
    if verb is "":
        return ""
    elif len(list(verb.children)) > 0:
        prep_children = [ch for ch in list(verb.children) if ch.dep_ in ["prep", "dobj"]]
        return verb.text + " " + subtree_flatten(prep_children)
#         return verb.text + " " + " ".join([ch.text+" #np#" for ch in prep_children])
    else:
        return verb.text+" #np#"

In [74]:
# compare both functions
for instr_set in nlp_instructions:
    for sent in instr_set:
        print("")
        print("=========")
        print(sent.text)
        print(verb_tracer_simple(sent))
        print(verb_tracer_subtree(sent))


Combine rum and ginger beer in an old-fashioned glass.
Combine #np#
Combine #np#

Add ice and stir.

stir #np#
stir #np#

Place mint leaves and 1 lime wedge into a sturdy glass.
Place #np#
Place #np#

Use a muddler to crush the mint and lime to release the mint oils and lime juice.
release oils #np#
release the mint oils and lime juice

Add 2 more lime wedges and the sugar, and muddle again to release the lime juice.
release juice #np#
release the lime juice

Do not strain the mixture.
strain mixture #np#
strain the mixture

Fill the glass almost to the top with ice.
Fill glass #np# to #np# with #np#
Fill the glass almost to the top with ice

Pour the rum over the ice, and fill the glass with carbonated water.
fill glass #np# with #np#
fill the glass with carbonated water

Stir, taste, and add more sugar if desired.
desired 
desired 

Garnish with the remaining lime wedge.

Garnish with #np#
Garnish with the remaining lime wedge

Measure the tequila, lime juice, sweetened lime juice a

In [91]:
def verb_tracer_smart(s):
    verb = ""
    for word in s:
        if word.tag_ == "VB":
            verb = word
    if verb is "":
        return ""
    elif len(list(verb.children)) > 0:
        prep_children = [ch for ch in list(verb.children) if ch.dep_ == "prep"]
        dobj_children = [ch for ch in list(verb.children) if ch.dep_ == "dobj"]
        return verb.text + " " + subtree_flatten(dobj_children) + " " + " ".join([ch.text+" #np#" for ch in prep_children])
#         return verb.text + " " + subtree_flatten(prep_children)
#         return verb.text + " " + " ".join([ch.text+" #np#" for ch in prep_children])
    else:
        return verb.text+" #np#"

In [92]:
# almost done function
for instr_set in nlp_instructions:
    for sent in instr_set:
        print("")
        print("=========")
        print(sent.text)
        print(verb_tracer_smart(sent))


Combine rum and ginger beer in an old-fashioned glass.
Combine #np#

Add ice and stir.

stir #np#

Place mint leaves and 1 lime wedge into a sturdy glass.
Place #np#

Use a muddler to crush the mint and lime to release the mint oils and lime juice.
release the mint oils and lime juice 

Add 2 more lime wedges and the sugar, and muddle again to release the lime juice.
release the lime juice 

Do not strain the mixture.
strain the mixture 

Fill the glass almost to the top with ice.
Fill the glass to #np# with #np#

Pour the rum over the ice, and fill the glass with carbonated water.
fill the glass with #np#

Stir, taste, and add more sugar if desired.
add more sugar 

Garnish with the remaining lime wedge.



Measure the tequila, lime juice, sweetened lime juice and triple sec into a cocktail shaker and add a generous scoop of ice.
add a generous scoop of ice 

Cover and shake until the shaker is frosty, about 30 seconds.

shake  

Rub a lime wedge around the rim of a margarita glass a

In [154]:
def verb_tracer(s):
    # look for the verb on the sentence
    verb = ""
    for word in s:
        if word.tag_ == "VB":
            verb = word
    # if there's no verb, return ""
    if verb is "":
        return ""
    # if the verb has children, go through them looking for "prep" and "dobj"
    elif len(list(verb.children)) > 0:
        # get the prep
        prep_children = [ch for ch in list(verb.children) if ch.dep_ == "prep"]
        # join them for tracery
        prep_text = " ".join([ch.text+" #np#" for ch in prep_children])
        # get the dobj
        dobj_children = [ch for ch in list(verb.children) if ch.dep_ == "dobj"]
        # joint them as a string
        dobj_text = ""
        for ch in dobj_children:
            dobj_text += " ".join([word.text for word in ch.subtree])
        # get the noun_chunks from the sentence and replace them with
        # a tracery placeholder in the dobj_text
        chunks = s.noun_chunks
        for ch in chunks:
            dobj_text = dobj_text.replace(ch.text, "#np#")
        # return the beautiful phrase
        return verb.text + " " + dobj_text + " " + prep_text
    # else, just return the verb + a tracery placeholder
    else:
        return verb.text+" #np#"
    
# def subtree_flatten(children_list):
#     sub = []
#     for ch in children_list:
#         sub.append(" ".join([word.text for word in ch.subtree]))
#     return " ".join(sub)

In [151]:
verb_tracer(nlp_instructions[1][1])

'release #np# oils and #np# juice '

In [139]:
aux_chunks = nlp_instructions[1][1].noun_chunks
for chunk in aux_chunks:
    print(chunk.text)

a muddler
the mint
lime
the mint oils
lime juice


In [141]:
# final function
for instr_set in nlp_instructions:
    for sent in instr_set:
        print("")
        print("=========")
        print(sent.text)
        print(verb_tracer(sent))


Combine rum and ginger beer in an old-fashioned glass.
Combine #np#

Add ice and stir.

stir #np#

Place mint leaves and 1 lime wedge into a sturdy glass.
Place #np#

Use a muddler to crush the mint and lime to release the mint oils and lime juice.
release #np# oils and #np# juice 

Add 2 more lime wedges and the sugar, and muddle again to release the lime juice.
release #np# 

Do not strain the mixture.
strain #np# 

Fill the glass almost to the top with ice.
Fill #np# to #np# with #np#

Pour the rum over the ice, and fill the glass with carbonated water.
fill #np# with #np#

Stir, taste, and add more sugar if desired.
add #np# 

Garnish with the remaining lime wedge.



Measure the tequila, lime juice, sweetened lime juice and triple sec into a cocktail shaker and add a generous scoop of ice.
add #np# of #np# 

Cover and shake until the shaker is frosty, about 30 seconds.

shake  

Rub a lime wedge around the rim of a margarita glass and dip in salt.
Rub #np# around #np#

Fill each 

Finally, it works as I want it! :D



Now, on to the actual instruction scrapping...

In [145]:
instr_finish = [instr[-1].text.strip() for instr in nlp_instructions]

In [159]:
instr_nlp_body = []
[instr_nlp_body.extend(instr[:-1]) for instr in nlp_instructions]
instr_body = [verb_tracer(instr).strip() for instr in instr_nlp_body]
remove_all('', instr_body)

In [160]:
instr_body

['Combine #np#',
 'Place #np#',
 'release #np# oils and #np# juice',
 'release #np#',
 'strain #np#',
 'Fill #np# to #np# with #np#',
 'fill #np# with #np#',
 'add #np#',
 'add #np# of #np#',
 'shake',
 'Rub #np# around #np#',
 'Fill #np# with #np#',
 'serve',
 'release #np#',
 'release #np#',
 'stir #np# , #np# , #np# , #np# , and #np# pickle juice In #np#',
 'adjust #np#',
 'stir #np# , #np# and #np# #np# In #np#',
 'Pour #np# onto #np#',
 'coat #np#',
 'Fill #np# with #np#',
 'Pour #np# of #np# into #np#',
 'Fill  with #np#',
 'Fill #np# with #np#',
 'Pour  in #np#',
 'Add #np# and #np#',
 'Blend  for #np#',
 "Stir the Pimm 's liqueur and #np# in #np#",
 'Add #np# , #np# , orange , #np# , #np# , pine#np# , #np# , and #np#',
 'Stir  until #np#',
 'Add #np# of #np#',
 'combine #np# , #np# , #np# , #np# and #np# In #np#',
 'Pour #np# into #np# with #np#',
 'strain  into #np#',
 'Mix #np# of your favorite sparkling white to #np#',
 'Pour #np# , #np# and #np# into #np#',
 'strain  into #

In [106]:
instr_finish

['Add ice and stir.',
 'Garnish with the remaining lime wedge.',
 'Garnish with a lime wedge.',
 'Fill glasses with ice cubes and top with carbonated water; stir.',
 'Garnish each glass with a wedge of lime and a dill pickle spear.',
 'Serve in margarita glasses with the rims dipped in powdered sugar.',
 'Refrigerate until cold, or serve over ice.',
 'Adjust with additional water, if needed.',
 'Pour into glasses, and serve immediately.',
 'Garnish with a lime twist.',
 'Enjoy!',
 'Garnish with a piece of pineapple and a cherry on a skewer.',
 'Stir.']