# McDonalds and Starbucks: what would be better as meals for teens?
<b>2. McDonalds (McD) breakfasts & dinners menu-maker: a combinatorial approach</b>

The main idea is to create "meaningful" (no repetitions, should not contain only food or only drinks) combinations from the menu of each restaurant. Second step is selection of combinations that meet certain criteria. 

<b>Advantages:</b> 
- Allows not only answer the question "which restaurant is better" but also offer the most diverse menu, regardless of the selected restaurant. 
- Criteria (both statistical and individual indicators) could be flexibly setted, changed, results could be quickly saved and compared. 
- This approach can be considered as a sketch, proof of concept for further development of the solution.

DONE:
0. Insert cells with official norm values and their processings from <b>data_processing</b> notebook.
1. Normalize values of all menu items by dividing them on <b>meal_norm</b> values dict.
2. Restrict whole dataset to <b>only McD items</b>
3. Select item groups of breakfast / dinner (according to format [1-st food item, 2-nd food item, 3-rd drink item]).
4. Build a <b>meal-maker</b> row-applied function (args: item groups) which returns a dataset of combitations of items from passed item groups as argument.
5. Calculating a <b>total values</b> (calories, fat... e t.c.): sum of values for each item in certain meal combination across all generated meal combinations.
6. Calculating <b>MEAN</b> and <b>STDEV</b> as metrics of caloric and nutrition balace for each meal combination across totals (calories, fat... e t.c.). If <b>MEAN</b> ≈ 1 and <b>STDEV</b> ≈ 0 --> meal combination is well fitted to norms.
7. Append the portion of result (after selection another item groups in p.3 with futher processing) to final <b>'results_all'</b> df.  
8. Write <b>'results_all'</b> in .csv.

In [1]:
# required imports
import pandas as pd
from itertools import product, combinations
import matplotlib
%matplotlib inline

### Official norms

In [2]:
# Dict with actual norms (from official document-1)
norms = {}

f = open ('actual_norms.txt')

for line in f:
    line = line.strip().split(',')
    norms[line[0]] = line[1]
    
del norms['Item']
norms

{'Energy(kcal)': '2900',
 'Fat(g)': '97',
 'Carbohydrates(g)': '421',
 'Fiber(g)': '20',
 'Protein(g)': '87',
 'Sodium(g)': '1.3'}

In [3]:
# (From official document-2): 
# -breakfast + dinner = 20-25% + 30-35% daily energy value respectively --> 
# 25% and 35% (max due to sports competitions)
# -breakfast + dinner = 55-60% total daily nutrients value --> 
# 25% and 35% (max due to sports competitions)
# (only breakfast and dinner mentioned in the task)
# Let's assume that breakfast / dinner share are equal --> 
# meal_norm (30% and 30% respectively)

# calculating weighted norms:

meal_norm = {}

for key, value in norms.items():
    meal_norm[key] = float(value)*0.3
    
# sum_norm dict for data filtering (del items which contain greater values):   
sum_norm = {key:float(value)*0.6 for key, value in norms.items()}
    
print (meal_norm)
print (sum_norm)

{'Energy(kcal)': 870.0, 'Fat(g)': 29.099999999999998, 'Carbohydrates(g)': 126.3, 'Fiber(g)': 6.0, 'Protein(g)': 26.099999999999998, 'Sodium(g)': 0.39}
{'Energy(kcal)': 1740.0, 'Fat(g)': 58.199999999999996, 'Carbohydrates(g)': 252.6, 'Fiber(g)': 12.0, 'Protein(g)': 52.199999999999996, 'Sodium(g)': 0.78}


In [4]:
# Earlier in 'data_processing' notebook we increased the sodium sum 
# (breakfast + dinner) norm from 0.78 to 1.0, i.e. +0.22. 
# In turns it increases sodium meal_norm by 0.11 g. Dicts should be updated: 
meal_norm['Sodium(g)'] += 0.11
sum_norm['Sodium(g)'] += 0.22
print (meal_norm)
print (sum_norm)

{'Energy(kcal)': 870.0, 'Fat(g)': 29.099999999999998, 'Carbohydrates(g)': 126.3, 'Fiber(g)': 6.0, 'Protein(g)': 26.099999999999998, 'Sodium(g)': 0.5}
{'Energy(kcal)': 1740.0, 'Fat(g)': 58.199999999999996, 'Carbohydrates(g)': 252.6, 'Fiber(g)': 12.0, 'Protein(g)': 52.199999999999996, 'Sodium(g)': 1.0}


In [5]:
# df for final results joining
results_all = pd.DataFrame()

### McD & SB menus

In [6]:
# SB and McD combined and processed menu: 
menu_df = pd.read_csv ('combined_processed.csv', sep=',', encoding = 'koi8-r')
menu_df.head()

Unnamed: 0,McD/SB,Category,Kind,Item,Energy(kcal),Fat(g),Carbohydrates(g),Fiber(g),Protein(g),Sodium(g)
0,SB,food,Bakery,Chonga Bagel,300,5.0,50,3.0,12,0.53
1,SB,food,Bakery,8-Grain Roll,380,6.0,70,7.0,10,0.43
2,SB,food,Bakery,Almond Croissant,410,22.0,45,3.0,10,0.39
3,SB,food,Bakery,Banana Nut Bread,420,22.0,52,2.0,6,0.32
4,SB,food,Bakery,Birthday Cake Pop,170,9.0,23,0.0,1,0.11


In [7]:
# normalizing whole dataset for meal_norm --> new features:
for item in menu_df[['Energy(kcal)','Fat(g)', 'Carbohydrates(g)','Fiber(g)', 'Protein(g)', 'Sodium(g)']]:
    for key in meal_norm:
        if key == item:
            menu_df['n_'+item] = menu_df[item] / float (meal_norm[key])

menu_df.head()
# menu_df.to_csv('menu_normalized.csv')

Unnamed: 0,McD/SB,Category,Kind,Item,Energy(kcal),Fat(g),Carbohydrates(g),Fiber(g),Protein(g),Sodium(g),n_Energy(kcal),n_Fat(g),n_Carbohydrates(g),n_Fiber(g),n_Protein(g),n_Sodium(g)
0,SB,food,Bakery,Chonga Bagel,300,5.0,50,3.0,12,0.53,0.344828,0.171821,0.395883,0.5,0.45977,1.06
1,SB,food,Bakery,8-Grain Roll,380,6.0,70,7.0,10,0.43,0.436782,0.206186,0.554236,1.166667,0.383142,0.86
2,SB,food,Bakery,Almond Croissant,410,22.0,45,3.0,10,0.39,0.471264,0.756014,0.356295,0.5,0.383142,0.78
3,SB,food,Bakery,Banana Nut Bread,420,22.0,52,2.0,6,0.32,0.482759,0.756014,0.411718,0.333333,0.229885,0.64
4,SB,food,Bakery,Birthday Cake Pop,170,9.0,23,0.0,1,0.11,0.195402,0.309278,0.182106,0.0,0.038314,0.22


### Preserve McD menu

In [8]:
# selecting McD part of whole df by index:
McD_menu = menu_df.iloc[217:]

#  drop cols with 'old' (original, non-normalized) values:
McD_menu.drop(['McD/SB', 'Energy(kcal)', 
               'Fat(g)', 'Carbohydrates(g)', 
               'Fiber(g)', 'Protein(g)', 'Sodium(g)'], axis=1, inplace=True)

McD_menu.reset_index(drop=True, inplace=True)
McD_menu.head ()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,Category,Kind,Item,n_Energy(kcal),n_Fat(g),n_Carbohydrates(g),n_Fiber(g),n_Protein(g),n_Sodium(g)
0,food,Hot Breakfast,Egg McMuffin,0.344828,0.446735,0.245447,0.666667,0.651341,1.5
1,food,Hot Breakfast,Egg White Delight,0.287356,0.274914,0.23753,0.666667,0.689655,1.54
2,food,Hot Breakfast,Sausage McMuffin,0.425287,0.790378,0.229612,0.666667,0.536398,1.56
3,food,Hot Breakfast,Sausage McMuffin with Egg,0.517241,0.962199,0.23753,0.666667,0.804598,1.72
4,food,Hot Breakfast,Sausage McMuffin with Egg Whites,0.45977,0.790378,0.23753,0.666667,0.804598,1.76


In [9]:
# all McD item groups
McD_menu['Kind'].unique()

array(['Hot Breakfast', 'Hot Big Breakfast', 'Sandwiches', 'Burgers',
       'Salads', 'Chiken Snacks', 'Sweets/Desserts', 'Sweets/Snacks',
       'Cold Drinks', 'Hot Drinks'], dtype=object)

In [10]:
# for McD menu-based let's assume menus of following item groups. They could be
# combined to obtain 2-item or 3-item meal. Lists of groups:
McD_first = ['Salads', 'Chiken Snacks']
McD_second = ['Hot Big Breakfast', 'Hot Breakfast', 'Sandwiches', 'Burgers']
McD_desserts = ['Sweets/Desserts', 'Sweets/Snacks']
McD_snacks = ['Chiken Snacks', 'Snacks']
McD_drinks = ['Cold Drinks', 'Hot Drinks']

# 2-item 'solid breakfast':
McD_b = [McD_second, McD_drinks] 

# 2-item 'light breakfast':
McD_light_b = [McD_first, McD_drinks]

# 3-item 'solid dinner':
McD_d = [McD_snacks, McD_desserts, McD_drinks]

# 3-item 'light dinner':
McD_light_d = [McD_first, McD_snacks, McD_drinks]

# 2-item 'snack breakfast/dinner':
McD_snack = [McD_snacks, McD_drinks]

# all non-repeated elemwnt-wise combinatorial trick with Itertools 'product' function
McD_current = list(product(*McD_light_d))

In [11]:
McD_current

[('Salads', 'Chiken Snacks', 'Cold Drinks'),
 ('Salads', 'Chiken Snacks', 'Hot Drinks'),
 ('Salads', 'Snacks', 'Cold Drinks'),
 ('Salads', 'Snacks', 'Hot Drinks'),
 ('Chiken Snacks', 'Chiken Snacks', 'Cold Drinks'),
 ('Chiken Snacks', 'Chiken Snacks', 'Hot Drinks'),
 ('Chiken Snacks', 'Snacks', 'Cold Drinks'),
 ('Chiken Snacks', 'Snacks', 'Hot Drinks')]

In [12]:
# see what (and how many) combinations in certain meal:
McD_current
len (McD_current)

8

Current list name (<b>'McD_light_d'</b> as example) should be passed to <b>'McD_current'</b> variable for unpacking and processing:

In [13]:
# Full set of combinations for 2-item breakfast / 3-item dinner formats from above catrgories
# I know... Not full-automated and bad code... 

# 3-item or 2-item meals (manual switch):

k1, k2, k3 = [], [], []
# k1, k2 = [], []

def meal_maker (row, args=McD_current[0]): # iterated manually from 0 to len(McD_current)-1
    
    '''function for making lists of all items that contain in respective item groups in 
    'McD_current' format 3-item or 2-item --> 3 (or 2) lists of items'''
    
    # Well, not so elegant code... But slices cause kernel death

    if args[0] in row[1]:
        k1.append (list ([row[2], row[3], row[4], row[5], row[6], row[7], row[8]])) 
    if args[1] in row[1]:
        k2.append (list ([row[2], row[3], row[4], row[5], row[6], row[7], row[8]])) 
    if args[2] in row[1]:
        k3.append (list ([row[2], row[3], row[4], row[5], row[6], row[7], row[8]])) 
    
    return k1,k2,k3

McD_menu.apply (meal_maker, axis=1)

# make all non-repeated element-wise combinations of items (by default of 3-item meal)
# and create with the results
arrs = [k1,k2,k3]
combinator = list(product(*arrs))
df_combi = pd.DataFrame (combinator, columns = ['first', 'second', 'third'])
df_combi.head()

Unnamed: 0,first,second,third
0,"[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Small), 0.1609195402298850..."
1,"[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Medium), 0.229885057471264..."
2,"[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Large), 0.3218390804597701..."
3,"[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Child), 0.1149425287356321..."
4,"[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Diet Coke (Small), 0.0, 0.0, 0.0, 0.0, 0.0, 0..."


In [24]:
df_combi.iloc[0,:]

first     [Premium Bacon Ranch Salad (without Chicken), ...
second    [Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...
third     [Coca-Cola Classic (Small), 0.1609195402298850...
Name: 0, dtype: object

In [26]:
# Add to all these combinations the metrics of total caloric and nutrition values 
# (Yes, it could be designed with more pythonic and elegant code...)

'''df row-applied functions sums of each value (calories, fat... e t.c.) for each 
    item in meal combination across all generated meal combinations.'''

# ENERGY total of certain meal combination:
def e_sum (row):
    e_sum = row[0][1] + row[1][1] + row[2][1]
    return e_sum

# FAT total of certain meal combination:
def f_sum (row):   
    f_sum = row[0][2] + row[1][2] + row[2][2]
    return f_sum

# CARBOHYDRATES total of certain meal combination:
def c_sum (row):       
    c_sum = row[0][3] + row[1][3] + row[2][3]        
    return c_sum

# FIBER total of certain meal combination:
def fi_sum (row):
    fi_sum = row[0][4] + row[1][4] + row[2][4]
    return fi_sum

# PROTEIN total of certain meal combination:
def p_sum (row): 
    p_sum = row[0][5] + row[1][5] + row[2][5]
    return p_sum

# SODIUM total of certain meal combination:
def s_sum (row):
    s_sum = row[0][6] + row[1][6] + row[2][6]
    return s_sum

df_combi['Energy'] = df_combi.apply (e_sum, axis=1)
df_combi['Fat'] = df_combi.apply (f_sum, axis=1)
df_combi['Carbohydrates'] = df_combi.apply (c_sum, axis=1)
df_combi['Fiber'] = df_combi.apply (fi_sum, axis=1)
df_combi['Protein'] = df_combi.apply (p_sum, axis=1)
df_combi['Sodium'] = df_combi.apply (s_sum, axis=1)

# full item combinations in terms of total normalized caloric and nutrition values.
# Add cols for whole group name and group combination names:
df_combi.insert(0, 'McD/SB', 'McD_light_dinner')
df_combi.insert(1, 'Kind', str(McD_current[0]))

In [27]:
# Finally mean and std for all totals as metrics wether certain item combination
# is well-balanced: MEAN ≈ 1 --> average value across 6 values ≈ norm, 
# STDEV ≈ 0 --> each value of all 6 values ≈ norm 
df_combi['MEAN'] = df_combi.mean(axis=1)
df_combi['STDEV'] = df_combi.iloc[:,:-1].std(axis=1)
df_combi

Unnamed: 0,McD/SB,Kind,first,second,third,Energy,Fat,Carbohydrates,Fiber,Protein,Sodium,MEAN,STDEV
0,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Small), 0.1609195402298850...",0.712644,0.756014,0.680918,0.666667,0.881226,2.16,0.976245,0.585038
1,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Medium), 0.229885057471264...",0.781609,0.756014,0.807601,0.666667,0.881226,2.17,1.010519,0.572310
2,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Large), 0.3218390804597701...",0.873563,0.756014,0.973872,0.666667,0.881226,2.17,1.053557,0.557334
3,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Coca-Cola Classic (Child), 0.1149425287356321...",0.666667,0.756014,0.593824,0.666667,0.881226,2.16,0.954066,0.598962
4,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Diet Coke (Small), 0.0, 0.0, 0.0, 0.0, 0.0, 0...",0.551724,0.756014,0.372130,0.666667,0.881226,2.18,0.901293,0.650209
5,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Diet Coke (Medium), 0.0, 0.0, 0.0, 0.0, 0.0, ...",0.551724,0.756014,0.372130,0.666667,0.881226,2.20,0.904627,0.658080
6,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Diet Coke (Large), 0.0, 0.0, 0.0, 0.0, 0.0, 0...",0.551724,0.756014,0.372130,0.666667,0.881226,2.23,0.909627,0.669898
7,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Diet Coke (Child), 0.0, 0.0, 0.0, 0.0, 0.0, 0...",0.551724,0.756014,0.372130,0.666667,0.881226,2.19,0.902960,0.654144
8,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Dr Pepper (Small), 0.16091954022988506, 0.0, ...",0.712644,0.756014,0.665083,0.666667,0.881226,2.25,0.988606,0.623091
9,McD_light_dinner,"('Salads', 'Chiken Snacks', 'Cold Drinks')","[Premium Bacon Ranch Salad (without Chicken), ...","[Chipotle BBQ Snack Wrap (Crispy Chicken), 0.3...","[Dr Pepper (Medium), 0.21839080459770116, 0.0,...",0.770115,0.756014,0.791766,0.666667,0.881226,2.29,1.025965,0.623059


In [None]:
# results df building
results_all = results_all.append(df_combi)
results_all

In [None]:
# # writing results df (when all combinations processed):
# results_all.reset_index(inplace=True, drop=True)
# results_all.to_csv('McD_light_d_results.csv', index=False)