# Distance Model Integration
BS"D

In this notebook, I will explore how to integrate the filtering model and the similar ingredients model to create the distance model.

The API will accept a list of ingredients (a recipe) and a set of dietary restrictions. It will return a new recipe that meets the dietary restrictions and is similar to the original recipe.

In [50]:
import pandas as pd
import torch.nn as nn
from collections import OrderedDict

from filtering_model.filtering_model import FilteringModel
from similar_ingredients.get_similar_ingredients import load_model as load_similar_ingredients_model

## Load the Models
### Load the Filtering Model
This will be based on how the filtering model is loaded in the "Data Filtering" notebook.

It seems like the following steps are needed to initialize the filtering model:
1. Load the training data
2. Define the column name with the ingredient names
3. Specify the embeddings to use
4. Build the internal model

After this the model must be trained and then it can be used to filter the data.

In [51]:
# Load the training data
file_path = "../../data_preparation/classification_dataset/common_ingredients.csv"
training_data = pd.read_csv(file_path)

training_data

Unnamed: 0,ingredient,vegetarian,vegan,dairy_free,gluten_free
0,salt,yes,yes,yes,yes
1,olive oil,yes,yes,yes,yes
2,onions,yes,yes,yes,yes
3,water,yes,yes,yes,yes
4,garlic,yes,yes,yes,yes
...,...,...,...,...,...
494,boneless chicken breast,no,no,yes,yes
495,crème fraîche,yes,no,no,yes
496,cooked white rice,yes,yes,yes,yes
497,pecans,yes,yes,yes,yes


In [52]:
# Build the internal model
internal_model = nn.Sequential(OrderedDict([
    ('fc1', nn.Linear(768, 256)),
    ('relu1', nn.LeakyReLU()),
    ('bn1', nn.BatchNorm1d(256)),
    ('fc2', nn.Linear(256, 64)),
    ('dr1', nn.Dropout(0.3)),
    ('relu2', nn.LeakyReLU()),
    ('bn2', nn.BatchNorm1d(64)),
    ('fc3', nn.Linear(64, 4)),
    ('sg1', nn.Sigmoid())
]))

In [53]:
ingredient_column = 'ingredient'
embedding_model = "facebook/drama-base"

filtering_model = FilteringModel(training_data, ingredient_column, embedding_model, internal_model)

In [54]:
# Train the model
filtering_model.train_model(epochs=20,batch_size=33,val_split=0.2)

Epoch: 0 | Train Loss: 8.483355224132538 | Val Loss: 10.219291388988495 | Val Acc: tensor(0.7475)
Epoch: 1 | Train Loss: 7.165204107761383 | Val Loss: 9.542191326618195 | Val Acc: tensor(0.8075)
Epoch: 2 | Train Loss: 6.677533596754074 | Val Loss: 8.989309251308441 | Val Acc: tensor(0.9250)
Epoch: 3 | Train Loss: 5.982009142637253 | Val Loss: 8.295326918363571 | Val Acc: tensor(0.9450)
Epoch: 4 | Train Loss: 5.8170821368694305 | Val Loss: 7.254605531692505 | Val Acc: tensor(0.9475)
Epoch: 5 | Train Loss: 5.1021237671375275 | Val Loss: 7.0185666382312775 | Val Acc: tensor(0.9325)
Epoch: 6 | Train Loss: 4.955486953258514 | Val Loss: 6.294499754905701 | Val Acc: tensor(0.9175)
Epoch: 7 | Train Loss: 4.780937641859055 | Val Loss: 5.422311872243881 | Val Acc: tensor(0.9375)
Epoch: 8 | Train Loss: 4.096723735332489 | Val Loss: 6.057175487279892 | Val Acc: tensor(0.9225)
Epoch: 9 | Train Loss: 3.6709144115448 | Val Loss: 5.460329428315163 | Val Acc: tensor(0.9550)
Epoch: 10 | Train Loss: 3.34

In [55]:
# Try it out on a hamburger recipe
recipe = ["beef", "onion", "garlic", "salt", "pepper", "cheese", "lettuce", "tomato", "bun"]

filtering_model.filter(recipe, threshold=0.5)

Unnamed: 0,ingredient,vegetarian,vegan,dairy_free,gluten_free
0,beef,no,no,yes,yes
1,onion,yes,yes,yes,yes
2,garlic,yes,yes,yes,yes
3,salt,yes,yes,yes,yes
4,pepper,yes,yes,yes,yes
5,cheese,yes,no,no,yes
6,lettuce,yes,yes,yes,yes
7,tomato,yes,yes,yes,yes
8,bun,yes,no,yes,yes


### Load the Similar Ingredients Model

In [56]:
file_path = "similar_ingredients/all_ingredients.json"

similar_ingredients_model = load_similar_ingredients_model(file_path)

In [57]:
# Get similar ingredients for each ingredient in the recipe
similar_ingredients = {}

for ingredient in recipe:
    # Get similar ingredients
    similar = similar_ingredients_model(ingredient)
    
    # Store the similar ingredients in the dictionary
    similar_ingredients[ingredient] = similar

# Add the results to a dataframe
similar_ingredients_df = pd.DataFrame.from_dict(similar_ingredients, orient='index').reset_index()

similar_ingredients_df

Unnamed: 0,index,0,1,2,3,4,5,6,7,8,9
0,beef,"(meat, 0.7286377549171448)","(Beef, 0.7230441570281982)","(pork, 0.6601365208625793)","(veal, 0.6273409724235535)","(lamb, 0.5826153755187988)","(chicken, 0.577904999256134)","(Meat, 0.5627167820930481)","(chuck roast, 0.5397898554801941)","(Pork, 0.5307552814483643)","(grassfed beef, 0.5286645293235779)"
1,onion,"(onions, 0.8558207750320435)","(cauliflower, 0.702358603477478)","(bell pepper, 0.6711967587471008)","(potatoes, 0.6605713963508606)","(garlic, 0.6516684293746948)","(asparagus, 0.6511366963386536)","(shallots, 0.6489439606666565)","(tomatoes, 0.6406427621841431)","(eggplant, 0.6400142908096313)","(cabbage, 0.6383322477340698)"
2,garlic,"(fennel, 0.7304399609565735)","(oregano, 0.7030937075614929)","(parsley, 0.7022287249565125)","(basil, 0.6971802711486816)","(cilantro, 0.6914703249931335)","(onions, 0.6913739442825317)","(shallots, 0.6845468878746033)","(cabbage, 0.6810933947563171)","(rosemary, 0.680901288986206)","(roasted garlic, 0.6715270280838013)"
3,salt,"(Salt, 0.589695155620575)","(brine, 0.5670109391212463)","(garlic salt, 0.5598366856575012)","(kosher salt, 0.5527639985084534)","(Kosher salt, 0.5492516756057739)","(coarse kosher salt, 0.544251561164856)","(coarse salt, 0.5065730810165405)","(beet juice, 0.4978453814983368)","(wasabi powder, 0.48599207401275635)","(unsalted butter, 0.4835364818572998)"
4,pepper,"(cumin, 0.6287866830825806)","(capsicum, 0.6136705875396729)","(garlic, 0.607893705368042)","(peppercorns, 0.604446291923523)","(coriander, 0.6012625694274902)","(sesame seeds, 0.5909296870231628)","(paprika, 0.5800351500511169)","(mustard seeds, 0.5795686841011047)","(cinnamon, 0.576282262802124)","(pepper flakes, 0.575015127658844)"
5,cheese,"(goat cheese, 0.7297402620315552)","(Cheese, 0.7286962270736694)","(cheddar cheese, 0.725513756275177)","(Cheddar cheese, 0.6943709254264832)","(Camembert, 0.6623162627220154)","(mozzarella cheese, 0.6550924777984619)","(havarti, 0.6534529328346252)","(camembert, 0.6466249227523804)","(Mozzarella cheese, 0.6437987685203552)","(ricotta, 0.6399509310722351)"
6,lettuce,"(spinach, 0.7631042003631592)","(tomatoes, 0.7614434957504272)","(iceberg lettuce, 0.7238020896911621)","(broccoli, 0.715250551700592)","(leaf lettuce, 0.7119097709655762)","(romaine lettuce, 0.7091070413589478)","(cantaloupe, 0.688593864440918)","(asparagus, 0.6758716106414795)","(celery, 0.6734380722045898)","(cauliflower, 0.6686928272247314)"
7,tomato,"(tomatoes, 0.8442263007164001)","(lettuce, 0.7069937586784363)","(asparagus, 0.7050933837890625)","(peaches, 0.6938520073890686)","(cherry tomatoes, 0.6897530555725098)","(strawberries, 0.6832595467567444)","(cantaloupe, 0.6780219078063965)","(celery, 0.675195574760437)","(spinach, 0.6682621240615845)","(cauliflower, 0.668158769607544)"
8,bun,"(buns, 0.7008675932884216)","(ciabatta roll, 0.5964027643203735)","(toasted buns, 0.5916598439216614)","(ciabatta, 0.5904300808906555)","(shredded cheddar cheese, 0.5894902944564819)","(mashed potatoes, 0.5863385200500488)","(french bread, 0.585828423500061)","(breadstick, 0.5819747447967529)","(cutlet, 0.5814052224159241)","(baguette, 0.5781702995300293)"


## Integrate the Models
The final model will have three steps:
1. Flag the ingredients that violate the dietary restrictions (using the filtering model)
2. Find similar ingredients for the flagged ingredients (using the similar ingredients model)
3. Pick a new ingredient from the similar ingredients list that is not flagged (using the filtering model)

I'd like to define a few hyperparameters for the model that control the threshold for the filtering model for step 1 and step 3, and the number of similar ingredients to return for step 2.

In [66]:
threshold_1 = 0.5
threshold_2 = 0.5
num_similar_ingredients = 20

In [71]:
# Step 1: Flag the ingredients that violate the dietary restrictions
def flag_violations(recipe, dietary_restrictions, threshold=0.5):
    '''
    Flag ingredients in the recipe that violate dietary restrictions.

    Parameters
    -----------
    recipe : list
        List of ingredients in the recipe.

    dietary_restrictions : list
        List of dietary restrictions to check against. (Can be "vegan", "vegetarian", "gluten_free", or "dairy_free")

    Returns
    --------
    non_violations : list
        List of ingredients that do not violate the dietary restrictions.

    violations : list
        List of ingredients that violate the dietary restrictions.
    '''

    # First run the recipe through the filtering model
    filtered_recipe = filtering_model.filter(recipe, threshold=threshold)

    # Then convert it to a dictionary of dictionaries
    recipe_dict = filtered_recipe.to_dict(orient='records')

    # Split the recipe into non-violations and violations
    non_violations = []
    violations = []

    for ingredient in recipe_dict:
        violation = False

        for restriction in dietary_restrictions:
            if ingredient[restriction] == "no":
                violation = True
                break

        if violation:
            violations.append(ingredient['ingredient'])
        else:
            non_violations.append(ingredient['ingredient'])
    return non_violations, violations

In [72]:
restrictions = ["vegan", "gluten_free"]

non_violations, violations = flag_violations(recipe, restrictions, threshold=threshold_1)

print("Non-violations:", ", ".join(non_violations))
print("Violations:", ", ".join(violations))

Non-violations: onion, garlic, salt, pepper, lettuce, tomato
Violations: beef, cheese, bun


In [73]:
# Step 2: Get similar ingredients for the violations
def get_similar_ingredients(violations, top_n = num_similar_ingredients):
    '''
    Get similar ingredients for the violations.

    Parameters
    -----------
    violations : list
        List of ingredients that violate the dietary restrictions.

    top_n : int
        Number of similar ingredients to return.

    Returns
    --------
    similar_ingredients : dict
        Dictionary of ingredients and their similar ingredients.
    '''

    # Get similar ingredients for each violation
    similar_ingredients = {}

    for ingredient in violations:
        # Get similar ingredients
        similar = similar_ingredients_model(ingredient, top_n=top_n)
    
        # Store the similar ingredients in the dictionary
        similar_ingredients[ingredient] = similar

    return similar_ingredients

In [74]:
similar_ingredients = get_similar_ingredients(violations)

similar_ingredients

{'beef': [('meat', 0.7286377549171448),
  ('Beef', 0.7230441570281982),
  ('pork', 0.6601365208625793),
  ('veal', 0.6273409724235535),
  ('lamb', 0.5826153755187988),
  ('chicken', 0.577904999256134),
  ('Meat', 0.5627167820930481),
  ('chuck roast', 0.5397898554801941),
  ('Pork', 0.5307552814483643),
  ('grassfed beef', 0.5286645293235779),
  ('pork sausages', 0.5261837840080261),
  ('venison', 0.5257208347320557),
  ('sirloin', 0.5230457186698914),
  ('spinach', 0.5227606296539307),
  ('beef sirloin', 0.5160597562789917),
  ('poultry', 0.5132147073745728),
  ('steak', 0.5128119587898254),
  ('hamburger', 0.5067266225814819),
  ('cheese', 0.5016943216323853),
  ('cheddar cheese', 0.5011695027351379)],
 'cheese': [('goat cheese', 0.7297402620315552),
  ('Cheese', 0.7286962270736694),
  ('cheddar cheese', 0.725513756275177),
  ('Cheddar cheese', 0.6943709254264832),
  ('Camembert', 0.6623162627220154),
  ('mozzarella cheese', 0.6550924777984619),
  ('havarti', 0.6534529328346252),
  (

In [76]:
# Step 3: Pick ingredients from the similar ingredients list that do not violate the dietary restrictions
def pick_substitutes(similar_ingredients, dietary_restrictions, num_suggestions = 3):
    '''
    Pick ingredients from the similar ingredients list that do not violate the dietary restrictions.

    Parameters
    -----------
    similar_ingredients : dict
        Dictionary of ingredients and their similar ingredients.

    dietary_restrictions : list
        List of dietary restrictions to check against. (Can be "vegan", "vegetarian", "gluten_free", or "dairy_free")

    num_suggestions : int
        Number of suggestions to return for each violation.

    Returns
    --------
    substitutes : dict
        Dictionary of violations and their substitutes.
    '''

    # Get substitutes for each violation
    substitutes = {}
    
    for ingredient, similar in similar_ingredients.items():
        # Convert the similar ingredients to a dictionary from the ingredient to the similarity score
        similar_dict = {i[0]: i[1] for i in similar}

        # Get a list of the similar ingredients
        similar_list = list(similar_dict.keys())

        # Check which of them violate the dietary restrictions
        non_violations, _ = flag_violations(similar_list, dietary_restrictions, threshold=threshold_2)

        # Sort the non-violations by their similarity score
        non_violations = sorted(non_violations, key=lambda x: similar_dict[x], reverse=True)

        # Pick the top n suggestions
        substitutes[ingredient] = non_violations[:num_suggestions]
    return substitutes

In [77]:
picked_substitutes = pick_substitutes(similar_ingredients, restrictions, num_suggestions=5)
picked_substitutes

{'beef': ['veal', 'spinach'],
 'cheese': ['havarti', 'Reblochon', 'fontina', 'Fontina'],
 'bun': ['ciabatta roll',
  'toasted buns',
  'ciabatta',
  'mashed potatoes',
  'cutlet']}

### Combine the Steps

In [93]:
def generate_substitutes(recipe, dietary_restrictions, num_suggestions=3):
    '''
    Generate substitutes for the recipe based on the dietary restrictions.

    Parameters
    -----------
    recipe : list
        List of ingredients in the recipe.

    dietary_restrictions : list
        List of dietary restrictions to check against. (Can be "vegan", "vegetarian", "gluten_free", or "dairy_free")

    num_suggestions : int
        Number of suggestions to return for each violation.

    Returns
    --------
    substitutes : dict
        Dictionary of violations and their substitutes.
    '''

    # Flag the violations
    non_violations, violations = flag_violations(recipe, dietary_restrictions, threshold=threshold_1)

    # Get similar ingredients for the violations
    similar_ingredients = get_similar_ingredients(violations, top_n=num_similar_ingredients)

    # Pick substitutes from the similar ingredients list that do not violate the dietary restrictions
    substitutes = pick_substitutes(similar_ingredients, dietary_restrictions, num_suggestions=num_suggestions)

    return non_violations, substitutes

In [100]:
threshold_1 = 0.5
threshold_2 = 0.9
num_similar_ingredients = 100

generate_substitutes(recipe, restrictions, num_suggestions=5)

(['onion', 'garlic', 'salt', 'pepper', 'lettuce', 'tomato'],
 {'beef': ['spinach',
   'potatoes',
   'fresh spinach',
   'Russet potatoes',
   'asparagus'],
  'cheese': ['Taleggio',
   'grana padano',
   'spicy salami',
   'robiola',
   'pepperoni slices'],
  'bun': ['mashed potatoes',
   'vermicelli noodles',
   'Mashed potatoes',
   'lavash',
   'corn muffin']})

## Encapsulate the Model
The final model will be encapsulated in a class that has the following methods:
1. `__init__`: Initialize the model, and get it ready for use
2. generate_substitutes: This will take a list of ingredients and dietary restrictions, and return a new recipe that meets the dietary restrictions and is similar to the original recipe.
3. set_hyperparameters: This will set the hyperparameters for the model.
4. get_hyperparameters: This will return the hyperparameters for the model.

### Demonstration

In [1]:
# Import the integrated model
from integrated_model import DistanceModel

In [2]:
model = DistanceModel()

Epoch: 0 | Train Loss: 8.657139897346497 | Val Loss: 10.499389290809631 | Val Acc: tensor(0.6725)
Epoch: 1 | Train Loss: 7.112708389759064 | Val Loss: 10.069274127483368 | Val Acc: tensor(0.6975)
Epoch: 2 | Train Loss: 6.71885359287262 | Val Loss: 9.647980272769928 | Val Acc: tensor(0.7350)
Epoch: 3 | Train Loss: 6.155917048454285 | Val Loss: 8.878890573978424 | Val Acc: tensor(0.8725)
Epoch: 4 | Train Loss: 5.699215829372406 | Val Loss: 7.940732270479202 | Val Acc: tensor(0.9275)
Epoch: 5 | Train Loss: 5.762453943490982 | Val Loss: 6.358504146337509 | Val Acc: tensor(0.9250)
Epoch: 6 | Train Loss: 4.791776031255722 | Val Loss: 6.685418605804443 | Val Acc: tensor(0.9000)
Epoch: 7 | Train Loss: 4.54699245095253 | Val Loss: 8.164930701255798 | Val Acc: tensor(0.8975)
Epoch: 8 | Train Loss: 4.318123668432236 | Val Loss: 7.095163434743881 | Val Acc: tensor(0.9025)
Epoch: 9 | Train Loss: 4.061548143625259 | Val Loss: 7.679188013076782 | Val Acc: tensor(0.8625)
Epoch: 10 | Train Loss: 3.3184

In [7]:
recipe = ["beef", "onion", "garlic", "salt", "pepper", "cheese", "lettuce", "tomato", "bun"]
restrictions = ["vegan", "gluten_free"]

In [8]:
model.generate_substitutes(recipe, restrictions)

['veal',
 'onion',
 'garlic',
 'salt',
 'pepper',
 'havarti',
 'lettuce',
 'tomato',
 'bun']