# Prototyping Different Kinds of Problem Formulations including optimization, predictive, and search-based

An optimization problem might be fastest since each constraint can be added nicely. But what to optimize remains important. With a lot of ratings and with personal ratings, could optimize for 'taste' or 'enjoyment' based on a blend of a star rating for the person and the population

The predictive model also would benefit a lot from the crowd advantage. But it differs from the optimization in that it would only try to optimize positive clicks. So if taste isn't the biggest factor, it wouldn't recommend based on taste.

For Reinforcement Learning - it could be modeled with different simultaneous 'agents' controlling to recommendations from each category.

Finally, it can be planned to evolve from manual to AI-based recommendations. But for that to happen, data must be collected correctly and time to viability for each algorithm has to be estimated

### The Baseline Model

Before getting into advanced techniques, will look at the efficacy of the current search-based model on the 450k item dataset. 

Can simulate partitions and indexes by splitting and sorting dataframes

The baseline simulation will be a single-day meal with the plan to use a restaurant food for one meal only.

In [1]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("C:/Users/jvanc494/Documents/nutrition_sm_ss_formatted.csv",encoding='ISO-8859-1')

In [3]:
df = df.drop(['Unnamed: 0'], axis=1)

In [4]:
len(df)

469828

In [5]:
useful_df = df[pd.notnull(df.calories)]

Formatting to make this useable

In [6]:
useful_df.protein_g[useful_df['protein_g'].isnull()] = 0
useful_df.fat_g[useful_df['fat_g'].isnull()] = 0
useful_df.carb_g[useful_df['carb_g'].isnull()] = 0
#useful_df.isnull().sum()
useful_df.protein_g = useful_df.protein_g.astype(int)
useful_df.fat_g = useful_df.fat_g.astype(int)
useful_df.carb_g = useful_df.carb_g.astype(int)
useful_df['prot_prop'] = useful_df.protein_g/(useful_df.protein_g+useful_df.fat_g+useful_df.carb_g)
useful_df['fat_prop'] = useful_df.fat_g/(useful_df.protein_g+useful_df.fat_g+useful_df.carb_g)
useful_df['carb_prop'] = useful_df.carb_g/(useful_df.protein_g+useful_df.fat_g+useful_df.carb_g)

Setting user requirements

In [28]:
prot_req = 209
fat_req = 64
carb_req = 183

Setting group requirements

In [8]:
food_type_grps = useful_df['food_type_grp'].unique()

In [13]:
food_type_grps = sorted(food_type_grps)

In [14]:
food_type_grps

['grocery', 'raw ingredient', 'recipe', 'restaurant']

In [15]:
food_type_grp_vec = {food_type_grps[0]:np.nan,food_type_grps[1]:0,food_type_grps[2]:1,food_type_grps[3]:1}

In [16]:
food_type_grp_vec

{'grocery': nan, 'raw ingredient': 0, 'recipe': 1, 'restaurant': 1}

Function 'add_sorted_item' takes a point in time of the meal and some mealset (usually a subset) and finds the meals within the mealset that most match the proportions of macros needed (this only works for macros). It returns the updated meal plan, the new delta and the new proportions

In [17]:
def add_sorted_item(useful_df, plan_df, delta, proportions):
    delta = np.array([prot_req,fat_req,carb_req])-np.array([sum(plan_df['protein_g']),sum(plan_df['fat_g']),sum(plan_df['carb_g'])])
    proportions = np.array([delta[0]/sum(delta),delta[1]/sum(delta),delta[2]/sum(delta)])
    useful_df['prop_distance'] = np.sqrt((useful_df.prot_prop-proportions[0])**2+(useful_df.fat_prop - proportions[1])**2+(useful_df.carb_prop - proportions[2])**2)# a
    useful_df = useful_df.sort_values(by=['prop_distance'])
# Append closest distance
    plan_df = plan_df.append([useful_df.iloc[0,:]],ignore_index=True)
# Drop used
    useful_df = useful_df.drop(useful_df.index[[0]])

    return plan_df, delta, proportions

Below takes the raw macro requirements and calculates the macros as an array of proportions.
Then it takes the distance between the requirement proportions and the macros for each food item to get initial proportions.
Then sorts the entire food database by that distance. 
Then it assigns the first item of the sorted database to the plan. Then it drops that item from the food database
Then it re-calculates the delta

In [37]:
proportions = np.array([prot_req/sum([prot_req,fat_req,carb_req]),fat_req/sum([prot_req,fat_req,carb_req]),carb_req/sum([prot_req,fat_req,carb_req])])

useful_df['prop_distance'] = np.sqrt((useful_df.prot_prop-proportions[0])**2+(useful_df.fat_prop - proportions[1])**2+(useful_df.carb_prop - proportions[2])**2)

useful_df = useful_df.sort_values(by=['prop_distance'])
plan_df = pd.DataFrame([useful_df.iloc[0,:]])
useful_df = useful_df.drop(useful_df.index[[0]])
delta = np.array([prot_req,fat_req,carb_req])-np.array([sum(plan_df['protein_g']),sum(plan_df['fat_g']),sum(plan_df['carb_g'])])

My food_type_grp requirement is to have a recipe for one meal, a restaurant for another meal, and am ambiguous for all others. 
To accomplish this, am creating seperate breakfast, lunch, dinner and snacks 'meal_df' s

A pretty standard requirement would be to not have all the calories take place in one meal. So will start out with some requirements on *calories* per meal. 

It's important that macros aren't necessarily constrained by meal (especially since further requirements will make this very hard, and it will also remove all possibility of having a little piece of chocolate as a snack

In [19]:
columns = ['food_key','meal_time']
plan_key_df = pd.DataFrame(columns=columns)

This dataframe represents one full day, but is split into meal times. 
This same framework will be useful when abstracting to weekly requirements (veges 3x a week)

In [20]:
plan_key_df

Unnamed: 0,food_key,meal_time


We don't know ahead of time how many food items will occur per meal. So will start out with a vector of meal times

In [21]:
meal_times = ['breakfast', 'lunch', 'dinner', 'snack']

We want each meal to encompass an appropriate amount of calories, so we'll first calculate the calories reqs from macros and then apply the proportion as a constraint

In [29]:
cal_req = prot_req*4 + fat_req*9 + carb_req*4

In [30]:
cal_req

2144

In [31]:
meal_cals = {meal_times[0]:int(cal_req*.2), meal_times[1]:int(cal_req*.3), meal_times[2]:int(cal_req*.4), meal_times[3]:int(cal_req*.1)}

This has to flexible, but a guideline. 

In [32]:
sum(meal_cals.values())

2142

The algorithm has to create a daily meal plan that fits the macro requirements. 
It does not have to achieve this at the meal time level. In fact, the less it only spits out one type of food, the better. 

The algorithm below has only one action: select the item from the given list that has the best fit for the requirements. To start with it, it can be given another option of selecting the correct food type for the particular meal time. 

In order to 'choose' correctly, a Markovian process should occur where the program is aware of the distance to the final requirements and tries to close that distance. But when an item is significantly better in the 'yummy' category, it works to incorporate that one. 

In [None]:
while sum((abs(delta)<np.array([0.05*prot_req,0.05*fat_req,0.05*carb_req])))!=3:
    if sum(delta<0)==3:
        plan_df = plan_df.drop(plan_df.index[[randint(0,len(plan_df)-1)]])
    if j <= 4:
        plan_df, delta, proportions = add_sorted_item(useful_df[useful_df['rand_group']==j], plan_df, delta, proportions)
    else:
        plan_df, delta, proportions = add_sorted_item(useful_df, plan_df, delta, proportions)
    j+=1

In [35]:
meal_cals 

{'breakfast': 428, 'dinner': 857, 'lunch': 643, 'snack': 214}

In [36]:
delta

array([119,  53, 109], dtype=int64)

In [39]:
proportions

array([ 0.45833333,  0.14035088,  0.40131579])

In [41]:
meal_times_food_types = {'breakfast':'grocery_item', 'lunch':'restaurant', 'dinner':'recipe', 'snack':'grocery_item'}

Until there is any data on 'likes', the problem is missing that dimension. In this case we assume there is no other good way to select a food other than 'appropriate group' and proportions / macro

In [1]:
# Defaults to prop fit given good enough macro reqs
def add_prop_fit(useful_df, plan_df, proportions, cal_delta):
    #
    eval_df = useful_df[['food_key','calories', 'prot_prop','carb_prop','fat_prop']]
    #
    eval_df['cal_distance'] =  np.sqrt((eval_df.calories - cal_delta)**2)
    # Doesn't have to be quantile - can just be some percentage close to zero
    eval_df['macro_distance'] = np.sqrt((eval_df.prot_prop-proportions[0])**2+(eval_df.fat_prop - proportions[1])**2+(eval_df.carb_prop - proportions[2])**2) 
    #
    # Could bin the distances in groups of ntile(20), and then pull the best proportions from those groups
    # Would be nice if it had some sort of random process at the start that got narrower and narrower towards reqs.
    eval_df['cal_ntile'] = pd.qcut(eval_df.cal_distance, q=30)
    eval_df = eval_df.sort_values(by=['macro_distance'])
    # Could we select the top n % by a random distribution favoring the top, 
# Append closest distance
    plan_df = plan_df.append([eval_df.iloc[0,:]],ignore_index=True)
    cal_delta = sum(plan_df.calories) - cal_delta
    prot = sum(plan_df.protein_g)
    fat = sum(plan_df.fat_g)
    carb = sum(plan_df.carb_g)
    
    proportions = np.array([prot/(prot+fat+carb),fat/(prot+fat+carb),carb/(prot+fat+carb)])

    return plan_df, cal_delta, proportions

In [45]:
useful_df.columns.values

array(['food_key', 'food_description', 'brand', 'food_type_grp', 'source',
       'ingredients_list', 'serving_size_raw', 'serving_size_val',
       'serving_size_unit', 'calories', 'protein_g', 'fat_g',
       'saturated_fat_g', 'carb_g', 'fiber_g', 'sugar_g', 'sodium_mg',
       'cholesterol_mg', 'calcium_mg', 'iron_mg', 'vit_a_mcg', 'vit_c_mg',
       'prot_prop', 'fat_prop', 'carb_prop', 'prop_distance'], dtype=object)

In [66]:
plan_df = pd.DataFrame()
for meal in meal_times:
    calories = meal_cals[meal]
    cal_delta = calories
    food_type_grp = meal_times_food_types[meal]
    ###### Add Other Constraints on Meal Group Here - including preferences and product mix 
    # This level of granularity doesn't have to be for meal types. Can keep going smaller to courses, sides, etc
    meal_df = pd.DataFrame()
    meal_df, cal_delta, proportions = add_prop_fit(useful_df[useful_df.food_type_grp==food_type_grp], meal_df,proportions, cal_delta)
    
    while abs(cal_delta) >= 0.1*calories:
        meal_df, cal_delta, proportions = add_prop_fit(useful_df[useful_df.food_type_grp==food_type_grp], meal_df,proportions, cal_delta)
    
    plan_df = plan_df.append([meal_df],ignore_index=True)
    

ValueError: Bin edges must be unique: array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan]).
You can drop duplicate edges by setting the 'duplicates' kwarg

In [68]:
cal_delta

428

In [67]:
useful_df.calories[0:10]

185754     521.0
351227     170.0
383894     170.0
138695     170.0
116103     160.0
1474       250.0
188998     260.0
170505     540.0
122147     460.0
170438    1266.0
Name: calories, dtype: float64

In [63]:
meal

'snack'

The edit and replace process will also be a very important function