# MealAdaptor

Provides suggested substitutions for each staple food in a recipe
<br><br>
Steps:
1. a recipe is passed through the LDA model and assigned to a topic. 
2. for each ingredient, suggested substitutions are determined based on the Word2Vec model trained within the topic
3. alternative substitutions are also determined using the Word2Vec model trained on the entire corpus of recipes
4. a single list of suggested substitutions is chosen based on relatedness to the original ingredient and to one another

Details regarding preprocessing recipes and training each model are available in the 'clean_combine_recipes' and 'train_models' notebooks, respectively.

Note: Rank ordering of substitutions based on 'avoids' and 'superfoods' is not implemented in this notebook. Furthermore, substitutions are only provided for staple foods in the original recipe. 

In [1]:
import pickle
from gensim import models, corpora
import pandas as pd

### Load LDA and Word2Vec models

In [2]:
## LDA model
lda_model = models.LdaModel.load('lda_model_10p4t_full_training.gensim')
dictionary = corpora.Dictionary.load('lda_10p4t_full_training_dictionary')

In [3]:
## Word2Vec models
model_list = []
for i in range(4):
    file = 'w2v_topic'+ str(i)+ '_fulltrain.model'
    model = models.Word2Vec.load(file)
    model_list.append(model)

full_model = models.Word2Vec.load('full_model_0206.model')

Load staple foods

In [5]:
df_staples = pd.read_csv('staples_tagged_singular.csv')
all_staples = list(df_staples['AbbrvName'])

### Load dataframe of recipes to modify

see 'clean_combine_recipes' notebook for formatting 

In [4]:
with open('sample_recipes.pickle', 'rb') as f:
    df_recipes = pickle.load(f)

### Functions to retrieve ingredient substitutions

Pass a recipe through the LDA model to assign a topic:


In [6]:
def getTopic(instructions, lda_model, lda_dict):
    bow = lda_dict.doc2bow(instructions)
    weights = lda_model[bow]
    topic = sorted(weights, key=lambda x: x[1])[-1:][0][0]
    return topic

<br>
Determine the group of food a staple falls into: <br>
    
(options: p = protein, v = veg, s = spice, f = fruit, g = grain, o = oil, l = liquid)

In [7]:
def getGroup(staple):
    group = df_staples.loc[df_staples['AbbrvName'] == staple, 'Group'].iloc[0]
    return group

<br>
Determine coherence (relatedness) of suggested substitutions, based on the number of groups of food they fall into:

In [8]:
def evalSubs(subs_list, item_topic):
    if subs_list:
        groups = [getGroup(x[0]) for x in subs_list]
        coherence = len([x for x in groups if x == item_topic])
        return groups, coherence
    else:
        groups, coherence = ['none'], 0
        return groups, coherence

<br>
Determines which set of substitutions to suggest:

(Chooses based on relatedness of the top suggestion to the original item, followed by coherence. Defaults to topic-based suggestions whenever possible)

In [9]:
def chooseModel(item, topical_list, full_list):
    item_topic = getGroup(item)
    groups_topical, coherence_topical = evalSubs(topical_list, item_topic)
    groups_full, coherence_full = evalSubs(full_list, item_topic)
    if groups_topical[0] == item_topic:
        return topical_list
    elif groups_full[0] == item_topic:
        return full_list  
    elif coherence_topical >= coherence_full:
        return topical_list
    else:
        return full_list

<br>
Primary function to get substitutions:
<br>
chooses between two sets of potential substitutions and presents results in a dataframe

In [10]:
def getSubs(staple, sub_model, full_model):
    
    df_temp = pd.DataFrame(columns = ['ingredient', 'substitute', 'rank'])
    
    ## check if staple is present in the vocabulary for both models ##
    if staple in full_model.wv.vocab and staple in sub_model.wv.vocab:
        
        ## get top 3 suggested substitutions from each model ##
        similar_foods = sub_model.wv.most_similar(staple)
        similar_staples = [x for x in similar_foods if x[0] in all_staples][:3]
        full_similar = full_model.wv.most_similar(staple)
        full_similar_staples = [x for x in full_similar if x[0] in all_staples][:3]
    
        ## choose which list of suggestions to present ##
        suggestions = chooseModel(staple, similar_staples, full_similar_staples)
        suggestions = [x[0] for x in suggestions]
        
        ## append suggestions to dataframe ##
        if suggestions:
            for i in range(len(suggestions)):
                df_temp = df_temp.append({'ingredient':staple, 
                                          'substitute':suggestions[i], 
                                          'rank':i+1}, ignore_index = True)
        else:
            df_temp = df_temp.append({'ingredient':staple, 
                                      'substitute':'no suggestions', 
                                      'rank':''}, ignore_index = True)
    
    ## fallback if staple only in full vocabulary ##
    elif staple in full_model.wv.vocab:
        full_similar = full_model.wv.most_similar(staple)
        full_similar_staples = [x for x in full_similar if x[0] in all_staples][:3]
        suggestions = [x[0] for x in full_similar_staples]
        
        if suggestions:
            for i in range(len(suggestions)):
                df_temp = df_temp.append({'ingredient':staple, 
                                          'substitute':suggestions[i], 
                                          'rank':i+1}, ignore_index = True)
        else:
            df_temp = df_temp.append({'ingredient':staple, 
                                      'substitute':'no suggestions', 
                                      'rank':''}, ignore_index = True)

    ## if staple is not present in the vocabulary ##
    else:
        df_temp = df_temp.append({'ingredient':staple, 
                                  'substitute':'no suggestions', 
                                  'rank':''}, ignore_index = True)
    
    return df_temp

### Example Case

In [11]:
recipe_name = 'Vegetarian Lasagna'

row = df_recipes.loc[df_recipes['names'] == recipe_name]
topic = getTopic(row['instructions'].iloc[0], lda_model, dictionary)
sub_model = model_list[topic]

ingr_list = row['ingredients'].iloc[0]
staples_in = [i for i in ingr_list if i in all_staples]

df_subs = pd.DataFrame(columns = ['ingredient', 'substitute', 'rank'])
for staple in staples_in:
    df_subs = pd.concat([df_subs, getSubs(staple, sub_model, full_model)])

In [12]:
df_subs

Unnamed: 0,ingredient,substitute,rank
0,broccoli,cauliflower,1
1,broccoli,kale,2
2,broccoli,cashew,3
0,tomato,bell pepper,1
1,tomato,oregano,2
2,tomato,olive,3
0,carrot,cabbage,1
1,carrot,mushroom,2
2,carrot,parsley,3
0,zucchini,squash,1
