# Notes for deployment

### A. Below Input files are read by this notebook, they need to be deployed to the AWS first:

1. "core-data_recipe.csv": recipe detail data (recipe_id, direction, ingredient list, nutrition fact, etc.)
Used in Recipe similarity calculation and recommendation. 
Link: https://github.com/wliu24/Capstone_Fall2020_Nutrition/blob/master/final_data_file/core-data_recipe.csv.zip

2. "recipesall.csv": recipe detail with more granular ingredient data
Used in ingredient similarity calculation and subsitition recommendation. 
Link: https://github.com/wliu24/Capstone_Fall2020_Nutrition/blob/master/final_data_file/recipesall.csv

3. "a_userinput.json": user input via the App (App output) regarding pregnancy stage, ingredient and nutrients preference, etc.
Link: https://github.com/wliu24/Capstone_Fall2020_Nutrition/blob/master/sample_JSON_file_new/a_userinput.json

4. "Nutrition_intake_reference.csv" : nutrient intake recommendation benchmark data, used to generate heatmap and flag
https://github.com/wliu24/Capstone_Fall2020_Nutrition/blob/master/final_data_file/Nutrition_intake_reference.csv

5. "Nutrient_level_nutrition_fact.csv":  


### B. Notebook Output is a Datafame, including:

1. Five Recommended recipe's recipe ID 
   And the details for the Five recommended recipes: direction, ingredient list, time, etc, for App display purpose
2. App Heatmap data:  Amount, %of the recommended intake amount, deficiency flag, for all nutrients to construct the Heatmap on the App
3. Ingredient subsitution list

### C. As Jennifer Suggested, a JSON file will be built up within Lambda function based on Notebook's output along with the above data files, for the App to read.
This target JSON could be found in:
https://github.com/wliu24/Capstone_Fall2020_Nutrition/blob/master/sample_JSON_file_new/b_recipelist_heatmap_recommendation_horizontal.json

1. Note1: The recipe_id should be join with file #1 above to get recipe detail for App display. The csv file is very straightforward. Or you could just solely use the Notebook output Dataframe to contrusct this JSON without join other flat files.

2. Note2: Any suggestion on the JSON file structure change could be discussed with Suzy directly to make front end code change; also make sure the upload the new JSON file structure to the same gitbut folder. 

In [2]:
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
import numpy as np
import collections
import json
import gensim 
import ast
from gensim.models import word2vec, phrases
from gensim.parsing.preprocessing import remove_stopwords, strip_punctuation, strip_numeric,\
                    strip_non_alphanum, strip_multiple_whitespaces, strip_short
from textblob import TextBlob, Word

import re
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
%matplotlib inline


from strsimpy.cosine import Cosine

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)




In [3]:
recipe = pd.DataFrame()
for chunk in pd.read_csv('raw-data_recipe.csv',  chunksize=1000):
    recipe = pd.concat([recipe, chunk], ignore_index=True)
    
recipe = recipe.drop(columns=['image_url', 'reviews'])
recipe['nutritions'] = recipe['nutritions'].replace("\'", "\"").apply(ast.literal_eval)
recipe.head()

Unnamed: 0,recipe_id,recipe_name,aver_rate,review_nums,ingredients,cooking_directions,nutritions
0,222388,Homemade Bacon,5.0,3,pork belly^smoked paprika^kosher salt^ground b...,{'directions': u'Prep\n5 m\nCook\n2 h 45 m\nRe...,"{'niacin': {'hasCompleteData': False, 'name': ..."
1,240488,"Pork Loin, Apples, and Sauerkraut",4.764706,29,sauerkraut drained^Granny Smith apples sliced^...,{'directions': u'Prep\n15 m\nCook\n2 h 30 m\nR...,"{'niacin': {'hasCompleteData': False, 'name': ..."
2,218939,Foolproof Rosemary Chicken Wings,4.571429,12,chicken wings^sprigs rosemary^head garlic^oliv...,"{'directions': u""Prep\n20 m\nCook\n40 m\nReady...","{'niacin': {'hasCompleteData': True, 'name': '..."
3,87211,Chicken Pesto Paninis,4.625,163,focaccia bread quartered^prepared basil pesto^...,{'directions': u'Prep\n15 m\nCook\n5 m\nReady ...,"{'niacin': {'hasCompleteData': True, 'name': '..."
4,245714,Potato Bacon Pizza,4.5,2,red potatoes^strips bacon^Sauce:^heavy whippin...,{'directions': u'Prep\n20 m\nCook\n45 m\nReady...,"{'niacin': {'hasCompleteData': True, 'name': '..."


In [4]:
sampledf1 = pd.DataFrame(recipe['nutritions'].values.tolist()).applymap(lambda x: x.get('amount', np.nan) \
                        if isinstance(x, dict) else np.nan)

In [71]:
# 'iron' mg, 'calcium' mg, 'folate' mcg, 'protein' g, 'vitaminA' iu, 'vitaminB6' mg, 'vitaminC' mg

new_recipe_file = pd.concat([recipe, sampledf1], axis=1)
df = new_recipe_file[['recipe_id', 'recipe_name', 'ingredients', 'cooking_directions', 'nutritions','iron', 'calcium',
                       'folate', 'protein', 'vitaminA', 'vitaminB6', 'vitaminC']]
original_df = df
df.head()

Unnamed: 0,recipe_id,recipe_name,ingredients,cooking_directions,nutritions,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC
0,222388,Homemade Bacon,pork belly^smoked paprika^kosher salt^ground b...,{'directions': u'Prep\n5 m\nCook\n2 h 45 m\nRe...,"{'niacin': {'hasCompleteData': False, 'name': ...",1.240848,11.18365,2.109131,21.00254,474.2073,0.23298,0.776127
1,240488,"Pork Loin, Apples, and Sauerkraut",sauerkraut drained^Granny Smith apples sliced^...,{'directions': u'Prep\n15 m\nCook\n2 h 30 m\nR...,"{'niacin': {'hasCompleteData': False, 'name': ...",6.622245,135.4538,83.73925,36.39878,73.17785,1.328631,52.76848
2,218939,Foolproof Rosemary Chicken Wings,chicken wings^sprigs rosemary^head garlic^oliv...,"{'directions': u""Prep\n20 m\nCook\n40 m\nReady...","{'niacin': {'hasCompleteData': True, 'name': '...",1.704567,60.08832,6.907802,23.91265,359.364,0.5538,5.307448
3,87211,Chicken Pesto Paninis,focaccia bread quartered^prepared basil pesto^...,{'directions': u'Prep\n15 m\nCook\n5 m\nReady ...,"{'niacin': {'hasCompleteData': True, 'name': '...",5.01162,528.4617,234.2137,32.37537,604.7537,0.273496,18.01502
4,245714,Potato Bacon Pizza,red potatoes^strips bacon^Sauce:^heavy whippin...,{'directions': u'Prep\n20 m\nCook\n45 m\nReady...,"{'niacin': {'hasCompleteData': True, 'name': '...",1.024803,132.2265,49.48131,7.059566,168.3245,0.055718,0.905797


In [72]:
df.replace("[^a-zA-Z ]",", ",regex=True, inplace=True)

In [73]:
df = df.drop(columns = ['cooking_directions', 'nutritions'])

In [74]:
df['details'] = df['recipe_name'].str.lower() + ', '+ df['ingredients'].str.lower()
df.head(1)

Unnamed: 0,recipe_id,recipe_name,ingredients,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC,details
0,222388,Homemade Bacon,"pork belly, smoked paprika, kosher salt, groun...",1.240848,11.18365,2.109131,21.00254,474.2073,0.23298,0.776127,"homemade bacon, pork belly, smoked paprika, ko..."


In [75]:
meat = ['beef', 'veal', 'pork', 'chicken', 'turkey', 'salmon', 'tuna', 'shrimp']
df['Meat'] = df['details'].str.contains('|'.join(meat))

In [76]:
spicy = ['chili', 'sriracha', 'spicy', 'jalapeno', 'scechuan']
df['Spicy'] = df['details'].str.contains('|'.join(spicy))

In [77]:
# Food Allergen
#https://www.fda.gov/food/buy-store-serve-safe-food/what-you-need-know-about-food-allergies
#Milk
#Eggs
#Fish (e.g., bass, flounder, cod)
#Crustacean shellfish (e.g., crab, lobster, shrimp)
#Tree nuts (e.g., almonds, walnuts, pecans)
#Peanuts
#Wheat
#Soybean

In [78]:
df['Soybean'] = df['details'].str.contains('soy')
df['Peanuts'] = df['details'].str.contains('peanut')

milk = ['milk', 'butter', 'yogurt', 'cream', 'cheese', 'gelato', 'half-and-half']
df['Milk'] = df['details'].str.contains('|'.join(milk))

egg = ['marshmallow', 'mayonnaise', 'meringue', 'frostings', 'custard', 'gelato', 'pretzel']
df['Eggs'] = df['details'].str.contains('|'.join(milk))

fish = ['bass', 'flounder', 'cod']
df['Fish'] = df['details'].str.contains('|'.join(fish))

shellfish = ['crab', 'lobster', 'shrimp']
df['Shell_fish'] = df['details'].str.contains('|'.join(shellfish))

treenuts = ['almond', 'walnut', 'pecan']
df['Tree_nuts'] = df['details'].str.contains('|'.join(treenuts))

wheat = ['bread', 'cake', 'pasta', 'farina', 'starch', 'soy sauce']
df['Wheat'] = df['details'].str.contains('|'.join(wheat))



In [79]:
df.head(1)

Unnamed: 0,recipe_id,recipe_name,ingredients,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC,...,Meat,Spicy,Soybean,Peanuts,Milk,Eggs,Fish,Shell_fish,Tree_nuts,Wheat
0,222388,Homemade Bacon,"pork belly, smoked paprika, kosher salt, groun...",1.240848,11.18365,2.109131,21.00254,474.2073,0.23298,0.776127,...,True,False,False,False,False,False,False,False,False,False


In [11]:
#df.shape

In [12]:
#df_veggie = df.loc[df['Meat'] == False]

In [13]:
#df_veggie.head()

In [14]:
#details=[]
#for row in range(0,len(df.index)):
#    details.append(df.iloc[row,3])

In [15]:
#details

In [16]:
# input json  
# 'a_userinput.json' = {
#    "uuid": 1234,
#    "user_id": 1234,
#    "request_id": 1,
#    "request_time": "9/28/2020",
#    "user input": {
#        "motherhood_stage": "pregnancy first trimester",
#        "ingredient_list": ["chicken", "tomato"],
#        "nutrition_focus": ["full-eval","iron"],
#        "if_vegetierian": true,
#        "ingredient_exclusion": ["onion","cilantro","cheese"]
#    }
# }

# Read a_userinput.json

In [23]:
import pandas as pd
df_json_a = pd.read_json (r'a_userinput_2.json')
df_json_a

Unnamed: 0,uuid,user_id,request_id,request_time,user input
eggs,1234,1234,1,2020-09-28,False
excludeIngredients,1234,1234,1,2020-09-28,[beef]
fish,1234,1234,1,2020-09-28,False
includeIngredients,1234,1234,1,2020-09-28,"[chicken, tomato, potato]"
milk,1234,1234,1,2020-09-28,False
nonspicy,1234,1234,1,2020-09-28,False
nutritionPriority,1234,1234,1,2020-09-28,Overall
peanuts,1234,1234,1,2020-09-28,False
shellfish,1234,1234,1,2020-09-28,True
soybean,1234,1234,1,2020-09-28,False


In [25]:
json_input = df_json_a.iloc[3,4]
#json_input

['chicken', 'tomato', 'potato']

In [26]:
ingredients_input = ', '.join(df_json_a.iloc[3,4])
#ingredients_input
#type(ingredients_input)

'chicken, tomato, potato'

In [27]:
ingredients_excl = ', '.join(df_json_a.iloc[1,4])
ingredients_excl

'beef'

In [30]:
stage = str(df_json_a.iloc[10,4])
stage

'Pregnant - First Trimester'

In [16]:
#json_input
#type(json_input)

In [31]:
ingredient_1 = json_input[0]
ingredient_2 = json_input[1]
ingredient_3 = json_input[2]

In [32]:
ingredient_1

'chicken'

In [80]:
df['input'] = ingredients_input
df['ingredient_1'] = json_input[0]
df['ingredient_2'] = json_input[1]
df['ingredient_3'] = json_input[2]
df['stage'] = stage

In [69]:
df.head(1)

Unnamed: 0,recipe_id,recipe_name,ingredients,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC,...,Tree_nuts,Wheat,input,ingredient_1,ingredient_2,ingredient_3,p0,p1,cosine_sim,stage
0,222388,Homemade Bacon,"pork belly, smoked paprika, kosher salt, groun...",1.240848,11.18365,2.109131,21.00254,474.2073,0.23298,0.776127,...,False,False,"chicken, tomato, potato",chicken,tomato,potato,"{'ch': 1, 'hi': 1, 'ic': 1, 'ck': 1, 'ke': 1, ...","{'ho': 1, 'om': 1, 'me': 1, 'em': 1, 'ma': 1, ...",0.285241,Pregnant - First Trimester


In [81]:
cosine = Cosine(2)
df["p0"] = df["input"].apply(lambda s: cosine.get_profile(s)) 
df["p1"] = df["details"].apply(lambda s: cosine.get_profile(s)) 
df["cosine_sim"] = [cosine.similarity_profiles(p0,p1) for p0,p1 in zip(df["p0"],df["p1"])]

df_2 = df.drop(["p0", "p1"], axis=1)

In [82]:
df_3 = df_2[df_2['details'].str.contains('|'.join(ingredients_input))]
df_4 = df_3[df_3['details'].str.contains(ingredient_1)]
df_5 = df_4[df_4['details'].str.contains(ingredients_excl) == False]
#df_5 = df_4[df_4['details'].str.contains('|'.join(ingredients_excl)) == False]

In [83]:
#df_4.head()

In [84]:
df_3.shape

(49698, 27)

In [85]:
df_4.shape

(7915, 27)

In [86]:
df_5.shape

(7658, 27)

In [90]:
output = df_4.nlargest(10, 'cosine_sim')
output

Unnamed: 0,recipe_id,recipe_name,ingredients,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC,...,Fish,Shell_fish,Tree_nuts,Wheat,input,ingredient_1,ingredient_2,ingredient_3,stage,cosine_sim
4126,8905,Golden Vegetable Chicken,"to , pound, whole chicken, onion, potatoes, ...",4.898875,96.44892,77.79769,59.06287,11425.62,1.949879,57.68585,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.697486
2792,241808,"Chicken Stew,","water, chicken tenders, carrots, celery, potat...",2.718017,70.61034,72.65667,21.83167,13212.33,0.766883,25.30492,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.668043
35330,79383,Chunky Tomato Potato Soup,"butter, onions, peeled cubed potatoes, chopped...",1.675372,132.1445,39.62579,4.969843,4292.148,0.41517,25.63989,...,False,False,False,True,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.647649
35359,216719,"Cabbage, Potato, and Tomato Soup","butter, onion, potatoes, celery, garlic, water...",1.771694,77.80801,52.64242,3.463315,470.148,0.417481,46.05133,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.636102
47357,17900,Tomato Florentine Soup II,"chicken stock, tomato sauce, tomato juice, tom...",2.16078,42.6075,46.135,4.13514,3309.487,0.239076,23.6831,...,False,False,False,True,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.63556
537,16398,Chinese Chicken and Potato Soup,"potatoes, carrot, turnip chopped, onion, garli...",2.329285,65.74283,53.49817,11.55067,4036.223,0.848375,56.19033,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.628655
27892,16965,Lower Fat Potato Soup,"onion, celery, fat, free chicken broth, potato...",1.698442,128.4593,43.45811,10.86989,246.524,0.62235,39.92476,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.622903
28763,31587,Stewed Potatoes,"vegetable oil, garlic, large onion, plum tomat...",1.982785,49.92833,45.26167,5.030766,102.87,0.704325,52.79133,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.621556
10423,15737,Quick and Easy Enchiladas,"corn oil, onion, tomato, chicken chunks, salt ...",2.772729,375.6119,25.14292,34.81299,1239.359,0.547886,12.16075,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.61526
29349,15636,Roasted Potato Medley,"russet potato, red potato, sweet potato, olive...",0.566217,16.7425,8.228333,1.952238,3087.364,0.169972,9.150517,...,False,False,False,False,"chicken, tomato, potato",chicken,tomato,potato,Pregnant - First Trimester,0.614085


In [91]:
df_intake = pd.read_csv('nutrition_ref.csv')
df_intake

Unnamed: 0,stage,iron_ref,calcium_ref,folate_ref,protein_ref,vitaminA_ref,vitaminC_ref,vitaminB6_ref
0,Pregnant - First Trimester,27,1000,600,70,770,85,1.9
1,Pregnant - Second Trimester,27,1000,600,80,770,85,1.9
2,Pregnant - Third Trimester,27,1000,600,90,770,85,1.9
3,Breastfeeding,9,1000,500,70,1300,120,2.0


In [93]:
output2 = pd.merge(output,df_intake,how="left",on=None,left_on='stage',right_on='stage',left_index=True,
    right_index=False, sort=True, suffixes=("_1", "_2"),copy=True,indicator=False,validate=None)
output2

Unnamed: 0,recipe_id,recipe_name,ingredients,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC,...,ingredient_3,stage,cosine_sim,iron_ref,calcium_ref,folate_ref,protein_ref,vitaminA_ref,vitaminC_ref,vitaminB6_ref
0,8905,Golden Vegetable Chicken,"to , pound, whole chicken, onion, potatoes, ...",4.898875,96.44892,77.79769,59.06287,11425.62,1.949879,57.68585,...,potato,Pregnant - First Trimester,0.697486,27,1000,600,70,770,85,1.9
0,241808,"Chicken Stew,","water, chicken tenders, carrots, celery, potat...",2.718017,70.61034,72.65667,21.83167,13212.33,0.766883,25.30492,...,potato,Pregnant - First Trimester,0.668043,27,1000,600,70,770,85,1.9
0,79383,Chunky Tomato Potato Soup,"butter, onions, peeled cubed potatoes, chopped...",1.675372,132.1445,39.62579,4.969843,4292.148,0.41517,25.63989,...,potato,Pregnant - First Trimester,0.647649,27,1000,600,70,770,85,1.9
0,216719,"Cabbage, Potato, and Tomato Soup","butter, onion, potatoes, celery, garlic, water...",1.771694,77.80801,52.64242,3.463315,470.148,0.417481,46.05133,...,potato,Pregnant - First Trimester,0.636102,27,1000,600,70,770,85,1.9
0,17900,Tomato Florentine Soup II,"chicken stock, tomato sauce, tomato juice, tom...",2.16078,42.6075,46.135,4.13514,3309.487,0.239076,23.6831,...,potato,Pregnant - First Trimester,0.63556,27,1000,600,70,770,85,1.9
0,16398,Chinese Chicken and Potato Soup,"potatoes, carrot, turnip chopped, onion, garli...",2.329285,65.74283,53.49817,11.55067,4036.223,0.848375,56.19033,...,potato,Pregnant - First Trimester,0.628655,27,1000,600,70,770,85,1.9
0,16965,Lower Fat Potato Soup,"onion, celery, fat, free chicken broth, potato...",1.698442,128.4593,43.45811,10.86989,246.524,0.62235,39.92476,...,potato,Pregnant - First Trimester,0.622903,27,1000,600,70,770,85,1.9
0,31587,Stewed Potatoes,"vegetable oil, garlic, large onion, plum tomat...",1.982785,49.92833,45.26167,5.030766,102.87,0.704325,52.79133,...,potato,Pregnant - First Trimester,0.621556,27,1000,600,70,770,85,1.9
0,15737,Quick and Easy Enchiladas,"corn oil, onion, tomato, chicken chunks, salt ...",2.772729,375.6119,25.14292,34.81299,1239.359,0.547886,12.16075,...,potato,Pregnant - First Trimester,0.61526,27,1000,600,70,770,85,1.9
0,15636,Roasted Potato Medley,"russet potato, red potato, sweet potato, olive...",0.566217,16.7425,8.228333,1.952238,3087.364,0.169972,9.150517,...,potato,Pregnant - First Trimester,0.614085,27,1000,600,70,770,85,1.9


In [97]:
output2['iron_percentage'] = output2['iron']/output2['iron_ref']
output2['calcium_percentage'] = output2['calcium']/output2['calcium_ref']
output2['folate_percentage'] = output2['folate']/output2['folate_ref']
output2['protein_percentage'] = output2['protein']/output2['protein_ref']
output2['vitaminA_percentage'] = (output2['vitaminA']*0.3)/(output2['vitaminA_ref'])
output2['vitaminB6_percentage'] = output2['vitaminB6']/output2['vitaminB6_ref']
output2['vitaminC_percentage'] = output2['vitaminC']/output2['vitaminC_ref']

In [98]:
output2.head()

Unnamed: 0,recipe_id,recipe_name,ingredients,iron,calcium,folate,protein,vitaminA,vitaminB6,vitaminC,...,vitaminA_ref,vitaminC_ref,vitaminB6_ref,iron_percentage,calcium_percentage,folate_percentage,protein_percentage,vitaminA_percentage,vitaminB6_percentage,vitaminC_percentage
0,8905,Golden Vegetable Chicken,"to , pound, whole chicken, onion, potatoes, ...",4.898875,96.44892,77.79769,59.06287,11425.62,1.949879,57.68585,...,770,85,1.9,0.18144,0.096449,0.129663,0.843755,4.45154,1.026252,0.678657
0,241808,"Chicken Stew,","water, chicken tenders, carrots, celery, potat...",2.718017,70.61034,72.65667,21.83167,13212.33,0.766883,25.30492,...,770,85,1.9,0.100667,0.07061,0.121094,0.311881,5.147661,0.403623,0.297705
0,79383,Chunky Tomato Potato Soup,"butter, onions, peeled cubed potatoes, chopped...",1.675372,132.1445,39.62579,4.969843,4292.148,0.41517,25.63989,...,770,85,1.9,0.062051,0.132144,0.066043,0.070998,1.672265,0.218511,0.301646
0,216719,"Cabbage, Potato, and Tomato Soup","butter, onion, potatoes, celery, garlic, water...",1.771694,77.80801,52.64242,3.463315,470.148,0.417481,46.05133,...,770,85,1.9,0.065618,0.077808,0.087737,0.049476,0.183175,0.219727,0.54178
0,17900,Tomato Florentine Soup II,"chicken stock, tomato sauce, tomato juice, tom...",2.16078,42.6075,46.135,4.13514,3309.487,0.239076,23.6831,...,770,85,1.9,0.080029,0.042607,0.076892,0.059073,1.289411,0.125829,0.278625


In [64]:
#result = output[['recipe_id']]

In [100]:
result2 = output2[['recipe_id','recipe_name', 'ingredients', 'protein', 'protein_percentage', 'calcium', 'calcium_percentage', 'iron', 'iron_percentage', 'folate', 'folate_percentage',
                 'vitaminA', 'vitaminA_percentage', 'vitaminC', 'vitaminC_percentage',  'vitaminB6', 'vitaminB6_percentage', 'ingredient_1', 'ingredient_2', 'ingredient_3']]
result2.head(1)

Unnamed: 0,recipe_id,recipe_name,ingredients,protein,protein_percentage,calcium,calcium_percentage,iron,iron_percentage,folate,folate_percentage,vitaminA,vitaminA_percentage,vitaminC,vitaminC_percentage,vitaminB6,vitaminB6_percentage,ingredient_1,ingredient_2,ingredient_3
0,8905,Golden Vegetable Chicken,"to , pound, whole chicken, onion, potatoes, ...",59.06287,0.843755,96.44892,0.096449,4.898875,0.18144,77.79769,0.129663,11425.62,4.45154,57.68585,0.678657,1.949879,1.026252,chicken,tomato,potato


In [100]:
#result2.rename(columns={'protein':'protein_g', 'calcium':'calcium_mg', 'iron':'iron_mg', 'folate':'folate_mcg', 'vitaminA':'vitaminA_i', 'vitaminB6':'vitaminB6_mg', 'vitaminC':'vitaminC_mg'})


In [101]:
result2['p_c_i_pct'] = result2[['protein_percentage','iron_percentage','calcium_percentage']].mean(axis=1)

In [102]:
result2['protein_deficiency'] = np.where(result2['protein_percentage']>=0.33, 0, 1)
result2['iron_deficiency'] = np.where(result2['iron_percentage']>=0.33, 0, 1)
result2['calcium_deficiency'] = np.where(result2['calcium_percentage']>=0.33, 0, 1)
result2['folate_deficiency'] = np.where(result2['folate_percentage']>=0.33, 0, 1)
result2['vitaminA_deficiency'] = np.where(result2['vitaminA_percentage']>=0.33, 0, 1)
result2['vitaminB6_deficiency'] = np.where(result2['vitaminB6_percentage']>=0.33, 0, 1)
result2['vitaminC_deficiency'] = np.where(result2['vitaminC_percentage']>=0.33, 0, 1)

In [103]:
recipe_list = result2.nlargest(3, 'p_c_i_pct')
recipe_list

Unnamed: 0,recipe_id,recipe_name,ingredients,protein,protein_percentage,calcium,calcium_percentage,iron,iron_percentage,folate,...,ingredient_2,ingredient_3,p_c_i_pct,protein_deficiency,iron_deficiency,calcium_deficiency,folate_deficiency,vitaminA_deficiency,vitaminB6_deficiency,vitaminC_deficiency
0,8905,Golden Vegetable Chicken,"to , pound, whole chicken, onion, potatoes, ...",59.06287,0.843755,96.44892,0.096449,4.898875,0.18144,77.79769,...,tomato,potato,0.373881,0,1,1,1,0,0,0
0,15737,Quick and Easy Enchiladas,"corn oil, onion, tomato, chicken chunks, salt ...",34.81299,0.497328,375.6119,0.375612,2.772729,0.102694,25.14292,...,tomato,potato,0.325211,0,1,0,1,0,1,1
0,241808,"Chicken Stew,","water, chicken tenders, carrots, celery, potat...",21.83167,0.311881,70.61034,0.07061,2.718017,0.100667,72.65667,...,tomato,potato,0.161053,1,1,1,1,0,0,1


# Suzy output 
so all I did in the codes above was to keep the recipe_id column in your output table.
Afterwards a join on the original recipes dataframe and your output based on the recipe_id.
I couldn't think of a smart way to do the loop of nutrients so I did a very crass brute force method of writing everything down lol. I only did it for protein and calcium for this example.

In [134]:
final_result = result5.drop(columns = ['food_1', 'food_2', 'food']).rename(columns= ({'substitution': 'substitution3'})
final_result.head(5)

Unnamed: 0,recipe_id,recipe_name,protein,protein_percentage,calcium,calcium_percentage,iron,iron_percentage,folate,folate_percentage,...,iron_deficiency,calcium_deficiency,folate_deficiency,vitaminA_deficiency,vitaminB6_deficiency,vitaminC_deficiency,p_c_i_pct,substitutuion_1,substitutuion_2,substitutuion
2,241808,Chicken Stew,21.8317,0.256843,70.6103,0.0706103,2.71802,0.100667,72.6567,0.121094,...,1,1,1,0,0,1,0.142707,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,...","[egg noodle, chuck, hungarian paprika, beef br..."
2,79383,Chunky Tomato Potato Soup,4.96984,0.0584687,132.144,0.132144,1.67537,0.0620508,39.6258,0.066043,...,1,1,1,0,1,1,0.084221,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,...","[egg noodle, chuck, hungarian paprika, beef br..."
2,8905,Golden Vegetable Chicken,59.0629,0.694857,96.4489,0.0964489,4.89888,0.18144,77.7977,0.129663,...,1,1,1,0,0,0,0.324249,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,...","[egg noodle, chuck, hungarian paprika, beef br..."
2,17900,Tomato Florentine Soup II,4.13514,0.0486487,42.6075,0.0426075,2.16078,0.0800289,46.135,0.0768917,...,1,1,1,0,1,1,0.057095,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,...","[egg noodle, chuck, hungarian paprika, beef br..."
2,16398,Chinese Chicken and Potato Soup,11.5507,0.13589,65.7428,0.0657428,2.32929,0.0862698,53.4982,0.0891636,...,1,1,1,0,0,0,0.095968,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,...","[egg noodle, chuck, hungarian paprika, beef br..."


In [60]:
final_result.columns

Index(['recipe_id', 'recipe_name', 'protein', 'protein_percentage', 'calcium',
       'calcium_percentage', 'iron', 'iron_percentage', 'folate',
       'folate_percentage', 'vitaminA', 'vitaminA_percentage', 'vitaminC',
       'vitaminC_percentage', 'vitaminB6', 'vitaminB6_percentage',
       'ingredient_1', 'ingredient_2', 'ingredient_3', 'protein_deficiency',
       'iron_deficiency', 'calcium_deficiency', 'folate_deficiency',
       'vitaminA_deficiency', 'vitaminB6_deficiency', 'vitaminC_deficiency',
       'substitutuion_1', 'substitutuion_2', 'substitutuion'],
      dtype='object')

In [137]:
output_recipe_details = recipe[['recipe_id', 'recipe_name','ingredients','cooking_directions','nutritions', 'aver_rate', 'review_nums']].rename(columns={'cooking_directions':'recipe_directions', 'ingredients': 'recipe_ingredients'})
output_recipe_details

Unnamed: 0,recipe_id,recipe_name,recipe_ingredients,recipe_directions,nutritions,aver_rate,review_nums
0,222388,Homemade Bacon,pork belly^smoked paprika^kosher salt^ground b...,{'directions': u'Prep\n5 m\nCook\n2 h 45 m\nRe...,"{'niacin': {'hasCompleteData': False, 'name': ...",5.000000,3
1,240488,"Pork Loin, Apples, and Sauerkraut",sauerkraut drained^Granny Smith apples sliced^...,{'directions': u'Prep\n15 m\nCook\n2 h 30 m\nR...,"{'niacin': {'hasCompleteData': False, 'name': ...",4.764706,29
2,218939,Foolproof Rosemary Chicken Wings,chicken wings^sprigs rosemary^head garlic^oliv...,"{'directions': u""Prep\n20 m\nCook\n40 m\nReady...","{'niacin': {'hasCompleteData': True, 'name': '...",4.571429,12
3,87211,Chicken Pesto Paninis,focaccia bread quartered^prepared basil pesto^...,{'directions': u'Prep\n15 m\nCook\n5 m\nReady ...,"{'niacin': {'hasCompleteData': True, 'name': '...",4.625000,163
4,245714,Potato Bacon Pizza,red potatoes^strips bacon^Sauce:^heavy whippin...,{'directions': u'Prep\n20 m\nCook\n45 m\nReady...,"{'niacin': {'hasCompleteData': True, 'name': '...",4.500000,2
5,218545,Latin-Inspired Spicy Cream Chicken Stew,skinless boneless chicken breast halves^diced ...,{'directions': u'Prep\n10 m\nCook\n8 h 15 m\nR...,"{'niacin': {'hasCompleteData': False, 'name': ...",4.605769,85
6,20453,Reuben Sandwich I,rye bread^butter^thinly sliced corned beef^sau...,{'directions': u'Cook\n5 m\nReady In\n5 m\nHea...,"{'niacin': {'hasCompleteData': True, 'name': '...",4.250000,15
7,244856,Turkey Black Bean Burgers,extra lean ground turkey^cooked black beans co...,{'directions': u'Prep\n10 m\nCook\n8 m\nReady ...,"{'niacin': {'hasCompleteData': True, 'name': '...",4.857143,7
8,22402,Cranberry Pork Chops II,pork chops^fresh^white sugar^salt^ground black...,{'directions': u'Prep\n25 m\nCook\n45 m\nReady...,"{'niacin': {'hasCompleteData': True, 'name': '...",4.427481,100
9,258163,Schnitzel Sandwich,skinless boneless chicken breasts^salt and gro...,{'directions': u'Prep\n20 m\nCook\n20 m\nReady...,"{'niacin': {'hasCompleteData': True, 'name': '...",5.000000,1


In [138]:
output_nutrition_details = final_result[['recipe_id', 'protein', 'protein_percentage', 'calcium',
       'calcium_percentage', 'protein_deficiency',
       'calcium_deficiency', 'substitutuion_1','substitutuion_2', ]].rename(columns={'substitutuion_1': 'protein_sub','substitutuion_2': 'calcium_sub'})
#output_nutrition_details['aver_rate'] = 3
#output_nutrition_details['review_nums'] = 50 
output_nutrition_details.head()

Unnamed: 0,recipe_id,protein,protein_percentage,calcium,calcium_percentage,protein_deficiency,calcium_deficiency,protein_sub,calcium_sub
2,241808,21.8317,0.256843,70.6103,0.0706103,1,1,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,..."
2,79383,4.96984,0.0584687,132.144,0.132144,1,1,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,..."
2,8905,59.0629,0.694857,96.4489,0.0964489,0,1,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,..."
2,17900,4.13514,0.0486487,42.6075,0.0426075,1,1,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,..."
2,16398,11.5507,0.13589,65.7428,0.0657428,1,1,"[chicken thigh, chicken breast, chily, cilantr...","[plum tomato, ium tomato, tomato juice, salsa,..."


In [139]:
output_df = output_nutrition_details.join(output_recipe_details, on = 'recipe_id',lsuffix = "_")
example_row = output_df.iloc[0:2]
example_row.iloc[0]

recipe_id_                                                       241808
protein                                                         21.8317
protein_percentage                                             0.256843
calcium                                                         70.6103
calcium_percentage                                            0.0706103
protein_deficiency                                                    1
calcium_deficiency                                                    1
protein_sub           [chicken thigh, chicken breast, chily, cilantr...
calcium_sub           [plum tomato, ium tomato, tomato juice, salsa,...
recipe_id                                                           NaN
recipe_name                                                         NaN
recipe_ingredients                                                  NaN
recipe_directions                                                   NaN
nutritions                                                      

In [140]:
def nutrient_dict(example_row):
    total_nutrients = []
    protein_dict = example_row.nutritions.get('protein')
    protein_dict.pop('hasCompleteData', None)
    protein_dict['benchmark_percentage'] = example_row.get('protein_percentage') 
    protein_dict['benchmark_flag'] = example_row.get('protein_deficiency') 
    protein_dict['cooccurrence_top_list'] = example_row.get('protein_sub')
    protein_dict['raw_nutrition_top_list'] =  ["milk1","kale1","tofu1"]
    calcium_dict = example_row.nutritions.get('calcium')
    calcium_dict.pop('hasCompleteData', None)
    calcium_dict['benchmark_percentage'] = example_row.get('calcium_percentage') 
    calcium_dict['benchmark_flag'] = example_row.get('calcium_deficiency') 
    calcium_dict['cooccurrence_top_list'] = example_row.get('calcium_sub')
    calcium_dict['raw_nutrition_top_list'] =  ["milk2","kale2","tofu2"]
    return  [protein_dict, calcium_dict]

In [141]:
nlist = []
for index, row in example_row.iterrows():
    rowlist = []
    rowlist.extend(nutrient_dict(row))
    nlist.append(rowlist)
nlist
#     nlist.extend(nutrient_dict(row))
# example_row['recipe_nutrition_result']= nlist
# example_row

AttributeError: 'float' object has no attribute 'get'

In [66]:
example_row['recipe_nutrition_result'] = nlist

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [142]:
final_output = example_row[['recipe_id', 'aver_rate', 'review_nums', 'recipe_name', 'recipe_directions', 'recipe_ingredients', 'recipe_nutrition_result']]

KeyError: "['recipe_nutrition_result'] not in index"

In [143]:
final_output

NameError: name 'final_output' is not defined