# Find a list of key ingredients 

The idea was to find a "short" list of key ingredients consisting of "short" words.

Final list is at the end, this documentation mainly contains the process used to find the list.

Some facts about the final list: its length is 559, every ingredient in the training data that appears at least 13 times overall or 10 times in one cuisine can be represented by one of the words in the list. 

If we choose to work with this key list of ingredients, then we will need to go through and replace every ingredient by a key ingredient/word and delete every ingredient that cannot be replaced.

Warnings: some words on the final list are purposely misspelled/shorted in accomodate different spellings or misspellings (for example, food colo for food color/colour, and berr for berry/berries); some words on the final list do not appear to be food (for example, 'min') but were included anyway because they appeared enough times. Thought: maybe use regular expressions and keep whole words.

In [17]:
import pandas as pd

In [18]:
## WhatsCooking is the folder my data is in

food = pd.read_json('../WhatsCooking/TrainingData/train.json')

In [19]:
## Getting a list of all ingredients in all recipes
total_ingredients = []
for i in range(len(food['ingredients'])):
    total_ingredients.extend(food['ingredients'][i])
    

## Changing the list of all ingredients to a series
tot_ingred_series = pd.Series(total_ingredients)

In [20]:
# is there even a point? It's not properly deep -- if you edit something in the ingredients column (specifially one of the strings in one of the lists, it edits the original dataframe too)

copy = food.copy()

In [21]:
## Getting all lowercase letters

for i in range(len(copy['ingredients'])):
    for j in range(len(copy['ingredients'][i])):
        copy['ingredients'][i][j] = copy['ingredients'][i][j].replace(copy['ingredients'][i][j], copy['ingredients'][i][j].casefold()).strip()

In [22]:
#total ingredients for the copy
total_ingred_copy = []
for i in range(len(copy['ingredients'])):
    total_ingred_copy.extend(copy['ingredients'][i])

## Changing the list of all ingredients to a series
ingred = pd.Series(total_ingred_copy).value_counts().index

# Getting the Key Words

Started by scrapping a website for basic food words

In [23]:
import requests
from bs4 import BeautifulSoup

In [24]:
response = requests.get(
    url = "https://www.speaklanguages.com/english/vocab/foods")

soup = BeautifulSoup(response.content, 'html.parser')

In [25]:
response.status_code

200

In [26]:
temp_key = []

for word in soup.find_all('a',{'lang':"en"}):
    temp_key.append(word.text.replace("\n"," ").strip())

Then, manually edited the temp_key: reduced most words to one key word. This was the starting point.

In [16]:
key = ['bacon',
 'beef',
 'chicken',
 'cooked meat',
 'duck',
 'ham',
 'kidneys',
 'lamb',
 'liver',
 'mince',
 'minced beef',
 'paté',
 'salami',
 'sausages',
 'pork',
 'sausage',
 'turkey',
 'veal',
 'apple',
 'apricot',
 'banana',
 'blackberry',
 'blackcurrant',
 'blueberry',
 'cherry',
 'coconut',
 'fig',
 'gooseberry',
 'grape',
 'grapefruit',
 'kiwi',
 'lemon',
 'lime',
 'mango',
 'melon',
 'orange',
 'peach',
 'pear',
 'pineapple',
 'plum',
 'pomegranate',
 'raspberry',
 'redcurrant',
 'rhubarb',
 'strawberry',
 'anchovy',
 'cod',
 'haddock',
 'herring',
 'kipper',
 'mackerel',
 'pilchard',
 'plaice',
 'salmon',
 'sardine',
 'sole',
 'trout',
 'tuna',
 'artichoke',
 'asparagus',
 'aubergine',
 'avocado',
 'beans',
 'beansprouts',
 'beetroot',
 'broad beans',
 'broccoli',
 'brussels sprouts',
 'cabbage',
 'carrot',
 'cauliflower',
 'celery',
 'chilli',
 'courgette',
 'cucumber',
 'garlic',
 'ginger',
 'leek',
 'lettuce',
 'mushroom',
 'onion',
 'peas',
 'pepper',
 'potato',
 'potatoes',
 'pumpkin',
 'radish',
 'rocket',
 'swede',
 'tomato',
 'turnip',
 'spinach',
 'squash',
 'beef',
 'soup',
 'chips',
 'oil',
 'stock',
 'butter',
 'cream',
 'cheese',
 'crème',
 'egg',
 'margarine',
 'milk',
 'cream',
 'yoghurt',
 'baguette',
 'roll',
 'loaf',
 'cake',
 'pastry',
 'quiche',
 'cake',
 'baking powder',
 'flour',
 'cornflour',
 'sugar',
 'yeast',
 'apricots',
 'prunes',
 'dates',
 'raisins',
 'sultanas',
 'cereal',
 'cornflakes',
 'honey',
 'jam',
 'marmalade',
 'muesli',
 'porridge',
 'toast',
 'noodles',
 'pasta',
 'pizza',
 'rice',
 'spaghetti',
 'ketchup',
 'mayonnaise',
 'mustard',
 'pepper',
 'dressing',
 'salt',
 'vinaigrette',
 'vinegar',
 'biscuits',
 'chocolate',
 'crisps',
 'hummus',
 'nuts',
 'olives',
 'peanuts',
 'sweets',
 'walnuts',
 'basil',
 'chives',
 'coriander',
 'dill',
 'parsley',
 'rosemary',
 'sage',
 'thyme',
 'chilli',
 'cinnamon',
 'cumin',
 'curry',
 'nutmeg',
 'paprika',
 'saffron']

Next,

0. Reorganized list based on personal preferences.
1. Checked which ingredients do not contain one of these key words. 
2. Added a word (maybe phrase) for ingredients that appeared many times. 
   Or edited existing word to allow for different spellings.
3. Kept reruning this code and adding/editing different words until every ingredient that showed up at least 13 times is a key word

Note: This step used the four inputs below.

In [78]:
## Replacing words that contain a key word with just the key word

for i in range(len(copy['ingredients'])):
    for word in key:
        for j in range(len(copy['ingredients'][i])):
            if word in copy['ingredients'][i][j]:
                copy['ingredients'][i][j] = copy['ingredients'][i][j].replace(copy['ingredients'][i][j],word)

In [79]:
## checking what ingredients are still not a key ingredient

checks = []

for i in range(len(copy['ingredients'])):
    for j in range(len(copy['ingredients'][i])):
        if copy['ingredients'][i][j] not in key:
            checks.append(copy['ingredients'][i][j])

In [80]:
## basic information about what ingredients could not be replaced by the current key words

pd.Series(checks).value_counts()

soy            12
riesling       12
foie gras      12
jalape         12
preserves      12
               ..
poi             1
petits pois     1
morsels         1
morcilla        1
lop chong       1
Length: 894, dtype: int64

In [32]:
## Getting a list of all ingredients that are still not a key ingredient
## Manually manipulate the middle line to make list shorter for sanity

for i in range(len(pd.Series(checks).value_counts().index)):
    if 12 < pd.Series(checks).value_counts()[i]: ## manipulated to make list shorter
        print(pd.Series(checks).value_counts().index[i])

This is the updated key words list.

In [27]:
key = [# condiments and sauces
'adobo sauce',
'alfredo',
'barbecue sauce',
'chutney',
'enchilada sauce',
'kecap manis',
'ketchup',
'hot sauce',
'maple syrup',
'mayo',
'mustard',
'pepper',
'picante sauce',
'pico de gallo',
'dressing',
'salsa',
'sriracha',
'taco sauce',
'tahini',
'teriyaki',
'tzatziki',
'vinaigrette',
'worcestershire',
# cooking supplies/pantry staples
'angel hair',
'baking powder',
'baking soda',
'bean',
'bonito flake',
'bouillon',
'bucatini',
'bulgur',
'cocoa',
'spray', # as in cooking or nonstick
'corn syrup',
'cornmeal',
'couscous',
'fettuc',
'fish sauce',
'flour',
'fusilli',
'gelatin',
'gemelli',
'ghee',
'hoisin sauce',
'honey',
'lard',
'lentil',
'linguine',
'macaroni',
'masa',
'mirin',
'miso',
'molasses',
'oat',
'oil',
'orecchiette',
'oyster sauce',
'orzo',
'panko',
'penne',
'piloncillo',
'polenta',
'quinoa',
'rice',
'rigatoni',
'rotelle',
'rotini',
'salt',
'semolina',
'shortening',
'shoyu',
'soba',
'soy sauce',
'spaghetti',
'splenda',
'starch',
'stevia',
'stock',
'suet',
'sugar',
'tagliatelle',
'tortellini',
'tortilla',
'turbinado',
'udon',
'urad dal',
'vanilla',
'vinegar',
'water',
'wonton wrapper',
'yeast',
# dairy
'asadero',
'asiago',
'bocconcini',
'butter',
'cheese',
'chevre',
'cotija',
'cream',
'crema',
'crème',
'curd',
'egg',
'fontina',
'half', # for half and half (various spellings)
'gorgonzola',
'gouda',
'margarine',
'mascarpone',
'milk',
'monterey',
'mozzarella',
'paneer',
'parmesan',
'queso',
'ricotta',
'velveeta',
'yog', # for yoghurt and yogurt
'yolk',
# fruit
'açai',
'agave',
'apple',
'apricot',
'banana',
'blackberr',
'blackcurrant',
'blueberr',
'calamansi',
'cantaloupe',
'cherr',
'coconut',
'cranberr',
'currant',
'date',
'fig',
'gooseberr',
'grape',
'grapefruit',
'kiwi',
'kumquat',
'lemon',
'lime',
'mango',
'melon',
'nectarine',
'orange',
'papaya',
'peach',
'pear',
'pineapple',
'plantain',
'plum',
'pomegranate',
'ponzu',
'prune',
'raisin',
'raspberr',
'redcurrant',
'rhubarb',
'strawberr',
'sultana',
'tamarind',
'tangerine',
# meat
'bacon',
'beef',
'chicken',
'chorizo',
'chuck',
'duck',
'guanciale',
'ham',
'hot dog',
'kidney',
'kielbasa',
'lamb',
'liver',
'meat',
'mince',
'mutton',
'oxtail',
'pancetta',
'paté',
'pig',
'prosciutto',
'rabbit',
'rib',
'roast',
'round',
'salami',
'sausag',
'sirloin',
'soppressata',
'steak',
'pork',
'tasso',
'turkey',
'veal',
# premade
'baguette',
'bawang goreng',
'biscuit',
'bread',
'broth',
'bun',
'cake',
'candy',
'cereal',
'chip',
'chocolate',
'ciabatta',
'cookie',
'cracker',
'crepe',
'crouton',
'crust',
'dough',
'dumpling',
'farro',
'gochujang',
'gravy',
'gnocchi',
'guacamole',
'gyoza',
'harissa',
'hummus',
'jam',
'juice',
'kimchi',
'ladyfinger',
'loaf',
'marinara sauce',
'marmalade',
'marshmallow',
'muesli',
'naan',
'noodle',
'passata',
'pasta',
'pastry',
'pesto',
'phyllo',
'pickle',
'pimento',
'pita',
'pizza dough',
'pizza sauce',
'poha',
'porridge',
'quiche',
'ravioli',
'relish',
'roll',
'sambal ulek',
'sauerkraut',
'simple syrup',
'slaw',
'soup',
'taco shell',
'tapenade',
'toast',
'toor dal',
'tostada',
'ziti',
# other (largely nuts, alcohol/drinks)
'ale',
'almond',
'amaretto',
'ancho',
'armagnac',
'arrowroot',
'asafoetida',
'beer',
'bitters',
'brandy',
'cachaca',
'calvados',
'caper',
'capsicum',
'cashew',
'club soda',
'coffee',
'cognac',
'dashi',
'edamame',
'espresso',
'fillet',
'food colo', # for different spelling of color
'frosting',
'gari',
'gin',
'marnier', # for grand marnier
'grit',
'hominy',
'ice',
'jelly',
'kahlúa',
'kalamata',
'kirsch',
'lager',
'liqueur',
'madeira',
'nori',
'nut',
'olive',
'pan dripping',
'peanut',
'pecan',
'pernod',
'piecrust',
'pistachio',
'port',
'rum',
'sake',
'serrano',
'sesame seed',
'sherry',
'shiitake',
'soda',
'sprinkle',
'sprite',
'stout',
'tea',
'tempeh',
'tequila',
'tofu',
'triple sec',
'vermouth',
'vodka',
'walnut',
'whipped topping',
'whiskey',
'wine',
'xanthan gum',
# seafood
'anchov',
'bass',
'calamari',
'catfish',
'caviar',
'clam',
'cod',
'crab',
'crawfish',
'grouper',
'halibut',
'haddock',
'herring',
'kipper',
'lobster',
'mackerel',
'mussel',
'oyster',
'pilchard',
'plaice',
'prawn',
'salmon',
'sardine',
'scallop',
'shrimp',
'snapper',
'sole',
'sprout',
'squid',
'tentacle',
'tilapia',
'trout',
'tuna',
# spices and herbs
'ajwain',
'allspice',
'amchur',
'anise',
'asafetida',
'basil',
'bay le', # there is a bay leaves and bay leaf
'canela',
'cardamo',
'cayenne',
'chervil',
'chili',
'chilli',
'chive',
'clove',
'cinnamon',
'cilantro',
'coriander',
'cumin',
'curry',
'dill',
'epazote',
'file',
'five-spice',
'fleur de sel', 
'gochugaru',
'guajillo',
'herbes de provence',
'jaggery',
'jeera',
'kasuri methi',
'masala',
'mace',
'marjoram',
'methi',
'mint',
'msg',
'nutmeg',
'oregano',
'paprika',
'parsley',
'ras el hanout',
'radicchio',
'rosemary',
'saffron',
'sage',
'seasoning',
'seed',
'sesame',
'shichimi togarashi',
'shiso',
'thyme',
'togarashi',
'meric', # for tumeric and turmeric
# veggies and greens
'artichok',
'arugula',
'asparagus',
'aubergine',
'avocado',
'bamboo',
'beansprout',
'bean',
'beet',
'bok choy',
'broccoli',
'cabbage', 
'carrot',
'cauliflower',
'celery',
'chanterelle',
'chayote',
'chile',
'collard green',
'corn',
'courgette',
'cucumber',
'daikon',
'endive',
'escarole',
'eggplant',
'enokitake',
'fennel',
'fenugreek',
'frisee',
'endive',
'gai lan',
'galangal',
'garlic',
'ginger',
'habanero',
'haricots vert',
'hearts of palm',
'jalapeno',
'jicama',
'kale',
'kelp',
'konbu',
'leek',
'lettuce',
'morel',
'mushroom',
'nopale',
'okra',
'onion',
'parsnip',
'peas',
'pepper',
'poblano',
'potato',
'pumpkin',
'radish',
'rocket',
'romaine',
'rutabaga',
'swede',
'tarragon',
'tomatillo',
'tomato',
'turnip',
'scallion',
'seaweed',
'shallot',
'sorrel',
'spinach',
'squash',
'swiss chard',
'vegetable',
'wakame',
'wasabi',
'yam',
'zucchini',
# want at end
'adobo',
'baking mix',
'base',
'berr',
'bicarbonate of soda',
'brine',
'browning',
'canola',
'cutlet',
'dal',
'dipping sauce',
'drippings',
'essence',
'extract',
'fat',
'fish',
'flavoring',
'floret',
'frond',
'fruit',
'garni',
'glaze',
'gluten',
'green',
'hand',
'herb',
'jack',
'leaf',
'leaves',
'liquid',
'maggi',
'marinade',
'mini bells',
'min',
'rub',
'savory',
'salad',
'sauce',
'seafood',
'shell',
'spice',
'spread',
'stuffing',
'sweetener',
'syrup',
'wheat',
'wrapper']

Lastly, checked the key word list against each cuisine. 

Specifially, we want to make sure that any common ingredient in a specific cusisine is represented.

Anything that showed up ten or more times was added to the end of the key list.

In [34]:
## separates data based on cuisine type

italian = copy.loc[copy['cuisine'] == 'italian']
mexican = copy.loc[copy['cuisine'] == 'mexican']
southern_us = copy.loc[copy['cuisine'] == 'southern_us']
indian = copy.loc[copy['cuisine'] == 'indian']
chinese = copy.loc[copy['cuisine'] == 'chinese']
french = copy.loc[copy['cuisine'] == 'french']
cajun_creole = copy.loc[copy['cuisine'] == 'cajun_creole']
thai = copy.loc[copy['cuisine'] == 'thai']
japanese = copy.loc[copy['cuisine'] == 'japanese']
greek = copy.loc[copy['cuisine'] == 'greek']
spanish = copy.loc[copy['cuisine'] == 'spanish']
korean = copy.loc[copy['cuisine'] == 'korean']
vietnamese = copy.loc[copy['cuisine'] == 'vietnamese']
moroccan = copy.loc[copy['cuisine'] == 'moroccan']
british = copy.loc[copy['cuisine'] == 'british']
filipino = copy.loc[copy['cuisine'] == 'filipino']
irish = copy.loc[copy['cuisine'] == 'irish']
jamaican = copy.loc[copy['cuisine'] == 'jamaican']
russian = copy.loc[copy['cuisine'] == 'russian']
brazilian = copy.loc[copy['cuisine'] == 'brazilian']

In [70]:
## This function takes one of the above subdataframes and finds all ingredients not replaced by a key word depending on how many times they appear
## first entry must be one of our special subdataframes
## second entry is a number - looking for ingredients that show up that specific number of times

def not_in_key(my_list,n):
    check = []
    for i in range(len(my_list['ingredients'])):
        for j in range(len(my_list['ingredients'].iloc[i])):
            if my_list['ingredients'].iloc[i][j] not in key:
                check.append(my_list['ingredients'].iloc[i][j])
    for i in range(len(pd.Series(check).value_counts().index)):
        if n <= pd.Series(check).value_counts()[i] < n+1:
            print(pd.Series(check).value_counts().index[i])
    #print(pd.Series(check).value_counts().head(n))

In [74]:
n = 10

print("Here are word(s) appearing exactly n times not replaced by a key word based on cuisine type.")
print()
print("n equals", n)
print()

print("italian")
not_in_key(italian, n)
print()

print("mexican")
not_in_key(mexican, n)
print()

print("southern_us")
not_in_key(southern_us, n)
print()

print("indian")
not_in_key(indian, n)
print()

print("chinese")
not_in_key(chinese, n)
print()

print("french")
not_in_key(french, n)
print()

print("cajun_creole")
not_in_key(cajun_creole, n)
print()

print("thai")
not_in_key(thai, n)
print()

print("japanese")
not_in_key(japanese, n)
print()

print("greek")
not_in_key(greek, n)
print()

print("spanish")
not_in_key(spanish, n)
print()

print("korean")
not_in_key(korean, n)
print()

print("vietnamese")
not_in_key(vietnamese, n)
print()

print("moroccan")
not_in_key(moroccan, n)
print()

print("british")
not_in_key(british, n)
print()

print("filipino")
not_in_key(filipino, n)
print()

print("irish")
not_in_key(irish, n)
print()

print("jamaican")
not_in_key(jamaican, n)
print()

print("russian")
not_in_key(russian, n)
print()

print("brazilian")
not_in_key(brazilian, n)
print()

Here are word(s) appearing exactly n times not replaced by a key word based on cuisine type and how frequently they appears.

n equals 10

italian
focaccia
prosecco
chianti

mexican
carnitas

southern_us
nilla wafers

indian
ravva

chinese

french

cajun_creole

thai

japanese

greek
ouzo

spanish

korean

vietnamese

moroccan

british

filipino
lumpia skins

irish

jamaican

russian

brazilian
granola



# Final Key Word List

In [76]:
key = [# condiments and sauces
'adobo sauce',
'alfredo',
'barbecue sauce',
'chutney',
'enchilada sauce',
'kecap manis',
'ketchup',
'hot sauce',
'maple syrup',
'mayo',
'mustard',
'pepper',
'picante sauce',
'pico de gallo',
'dressing',
'salsa',
'sriracha',
'taco sauce',
'tahini',
'teriyaki',
'tzatziki',
'vinaigrette',
'worcestershire',
# cooking supplies/pantry staples
'angel hair',
'baking powder',
'baking soda',
'bean',
'bonito flake',
'bouillon',
'bucatini',
'bulgur',
'cocoa',
'spray', # as in cooking or nonstick
'corn syrup',
'cornmeal',
'couscous',
'fettuc',
'fish sauce',
'flour',
'fusilli',
'gelatin',
'gemelli',
'ghee',
'hoisin sauce',
'honey',
'lard',
'lentil',
'linguine',
'macaroni',
'masa',
'mirin',
'miso',
'molasses',
'oat',
'oil',
'orecchiette',
'oyster sauce',
'orzo',
'panko',
'penne',
'piloncillo',
'polenta',
'quinoa',
'rice',
'rigatoni',
'rotelle',
'rotini',
'salt',
'semolina',
'shortening',
'shoyu',
'soba',
'soy sauce',
'spaghetti',
'splenda',
'starch',
'stevia',
'stock',
'suet',
'sugar',
'tagliatelle',
'tortellini',
'tortilla',
'turbinado',
'udon',
'urad dal',
'vanilla',
'vinegar',
'water',
'wonton wrapper',
'yeast',
# dairy
'asadero',
'asiago',
'bocconcini',
'butter',
'cheese',
'chevre',
'cotija',
'cream',
'crema',
'crème',
'curd',
'egg',
'fontina',
'half', # for half and half (various spellings)
'gorgonzola',
'gouda',
'margarine',
'mascarpone',
'milk',
'monterey',
'mozzarella',
'paneer',
'parmesan',
'queso',
'ricotta',
'velveeta',
'yog', # for yoghurt and yogurt
'yolk',
# fruit
'açai',
'agave',
'apple',
'apricot',
'banana',
'blackberr',
'blackcurrant',
'blueberr',
'calamansi',
'cantaloupe',
'cherr',
'coconut',
'cranberr',
'currant',
'date',
'fig',
'gooseberr',
'grape',
'grapefruit',
'kiwi',
'kumquat',
'lemon',
'lime',
'mango',
'melon',
'nectarine',
'orange',
'papaya',
'peach',
'pear',
'pineapple',
'plantain',
'plum',
'pomegranate',
'ponzu',
'prune',
'raisin',
'raspberr',
'redcurrant',
'rhubarb',
'strawberr',
'sultana',
'tamarind',
'tangerine',
# meat
'bacon',
'beef',
'chicken',
'chorizo',
'chuck',
'duck',
'guanciale',
'ham',
'hot dog',
'kidney',
'kielbasa',
'lamb',
'liver',
'meat',
'mince',
'mutton',
'oxtail',
'pancetta',
'paté',
'pig',
'prosciutto',
'rabbit',
'rib',
'roast',
'round',
'salami',
'sausag',
'sirloin',
'soppressata',
'steak',
'pork',
'tasso',
'turkey',
'veal',
# premade
'baguette',
'bawang goreng',
'biscuit',
'bread',
'broth',
'bun',
'cake',
'candy',
'cereal',
'chip',
'chocolate',
'ciabatta',
'cookie',
'cracker',
'crepe',
'crouton',
'crust',
'dough',
'dumpling',
'farro',
'gochujang',
'gravy',
'gnocchi',
'guacamole',
'gyoza',
'harissa',
'hummus',
'jam',
'juice',
'kimchi',
'ladyfinger',
'loaf',
'marinara sauce',
'marmalade',
'marshmallow',
'muesli',
'naan',
'noodle',
'passata',
'pasta',
'pastry',
'pesto',
'phyllo',
'pickle',
'pimento',
'pita',
'pizza dough',
'pizza sauce',
'poha',
'porridge',
'quiche',
'ravioli',
'relish',
'roll',
'sambal ulek',
'sauerkraut',
'simple syrup',
'slaw',
'soup',
'taco shell',
'tapenade',
'toast',
'toor dal',
'tostada',
'ziti',
# other (largely nuts, alcohol/drinks)
'ale',
'almond',
'amaretto',
'ancho',
'armagnac',
'arrowroot',
'asafoetida',
'beer',
'bitters',
'brandy',
'cachaca',
'calvados',
'caper',
'capsicum',
'cashew',
'club soda',
'coffee',
'cognac',
'dashi',
'edamame',
'espresso',
'fillet',
'food colo', # for different spelling of color
'frosting',
'gari',
'gin',
'marnier', # for grand marnier
'grit',
'hominy',
'ice',
'jelly',
'kahlúa',
'kalamata',
'kirsch',
'lager',
'liqueur',
'madeira',
'nori',
'nut',
'olive',
'pan dripping',
'peanut',
'pecan',
'pernod',
'piecrust',
'pistachio',
'port',
'rum',
'sake',
'serrano',
'sesame seed',
'sherry',
'shiitake',
'soda',
'sprinkle',
'sprite',
'stout',
'tea',
'tempeh',
'tequila',
'tofu',
'triple sec',
'vermouth',
'vodka',
'walnut',
'whipped topping',
'whiskey',
'wine',
'xanthan gum',
# seafood
'anchov',
'bass',
'calamari',
'catfish',
'caviar',
'clam',
'cod',
'crab',
'crawfish',
'grouper',
'halibut',
'haddock',
'herring',
'kipper',
'lobster',
'mackerel',
'mussel',
'oyster',
'pilchard',
'plaice',
'prawn',
'salmon',
'sardine',
'scallop',
'shrimp',
'snapper',
'sole',
'sprout',
'squid',
'tentacle',
'tilapia',
'trout',
'tuna',
# spices and herbs
'ajwain',
'allspice',
'amchur',
'anise',
'asafetida',
'basil',
'bay le', # there is a bay leaves and bay leaf
'canela',
'cardamo',
'cayenne',
'chervil',
'chili',
'chilli',
'chive',
'clove',
'cinnamon',
'cilantro',
'coriander',
'cumin',
'curry',
'dill',
'epazote',
'file',
'five-spice',
'fleur de sel', 
'gochugaru',
'guajillo',
'herbes de provence',
'jaggery',
'jeera',
'kasuri methi',
'masala',
'mace',
'marjoram',
'methi',
'mint',
'msg',
'nutmeg',
'oregano',
'paprika',
'parsley',
'ras el hanout',
'radicchio',
'rosemary',
'saffron',
'sage',
'seasoning',
'seed',
'sesame',
'shichimi togarashi',
'shiso',
'thyme',
'togarashi',
'meric', # for tumeric and turmeric
# veggies and greens
'artichok',
'arugula',
'asparagus',
'aubergine',
'avocado',
'bamboo',
'beansprout',
'bean',
'beet',
'bok choy',
'broccoli',
'cabbage', 
'carrot',
'cauliflower',
'celery',
'chanterelle',
'chayote',
'chile',
'collard green',
'corn',
'courgette',
'cucumber',
'daikon',
'endive',
'escarole',
'eggplant',
'enokitake',
'fennel',
'fenugreek',
'frisee',
'endive',
'gai lan',
'galangal',
'garlic',
'ginger',
'habanero',
'haricots vert',
'hearts of palm',
'jalapeno',
'jicama',
'kale',
'kelp',
'konbu',
'leek',
'lettuce',
'morel',
'mushroom',
'nopale',
'okra',
'onion',
'parsnip',
'pea',
'pepper',
'poblano',
'potato',
'pumpkin',
'radish',
'rocket',
'romaine',
'rutabaga',
'swede',
'tarragon',
'tomatillo',
'tomato',
'turnip',
'scallion',
'seaweed',
'shallot',
'sorrel',
'spinach',
'squash',
'swiss chard',
'vegetable',
'wakame',
'wasabi',
'yam',
'zucchini',
# want at end
'adobo',
'baking mix',
'base',
'berr',
'bicarbonate of soda',
'brine',
'browning',
'canola',
'cutlet',
'dal',
'dipping sauce',
'drippings',
'essence',
'extract',
'fat',
'fish',
'flavoring',
'floret',
'frond',
'fruit',
'garni',
'glaze',
'gluten',
'green',
'hand',
'herb',
'jack',
'leaf',
'leaves',
'liquid',
'maggi',
'marinade',
'mini bells',
'min',
'rub',
'savory',
'salad',
'sauce',
'seafood',
'shell',
'spice',
'spread',
'stuffing',
'sweetener',
'syrup',
'wheat',
'wrapper',
# Specality based on cuisine, added at end (using not_in_key function)
# n = 12
'bengal gram',
'dorito',
# n = 11
'achiote',
'dhal',
'ditalini',
'lasagne',
'pappardelle',
'risotto',
# n = 10
'focaccia',
'prosecco',
'chianti',
'carnita',
'wafer',
'ravva',
'ouzo',
'lumpia',
'granola']

Information about ingredients that were not replaced:

1. they appear in the overall ingredient list at most 12 times
2. they appear in each cuisine's ingredient list at most 9 times

In [77]:
## Number of key words

len(key)

559